cloudpakfordata-credit-risk-workshop
workshop-CPDaaS-master
workshop-CPDaaS-master
  • Introduction
  • Getting Started
    • Pre-work
  • Credit Risk Workshop
    • Data Visualization with Data Refinery
    • Enterprise data governance for Viewers using Watson Knowledge Catalog
    • Enterprise data governance for Admins using Watson Knowledge Catalog
    • Machine Learning with Jupyter
    • Machine Learning with AutoAI
    • Deploy and Test Machine Learning Models
    • Monitoring models with OpenScale GUI (Fastpath Monitoring)
    • Monitoring models with OpenScale GUI (Manual Config)
    • Monitoring models with OpenScale (Notebook)
  • Workshop Resources
    • Instructor Guide
  • Resources
    • IBM Cloud Pak for Data - Information and Trial
    • IBM Cloud Pak for Data - Knowledge Center
    • IBM Cloud Pak for Data - Platform API
    • IBM Cloud Pak for Data - Community
    • Watson Knowledge Catalog
    • Watson Knowledge Catalog Learning Center
    • IBM Developer
    • IBM Developer - Cloud Pak for Data
    • IBM Garage Architecture - Data
Powered by GitBook
On this page
  • 1. Set up Catalog and Data
  • Create the catalog
  • Add data assets
  • Add Connection
  • 2. Add collaborators and control access
  • 3. Add categories
  • Add category
  • 4. Add Business terms
  • 5. Add rules for policies
  • How to create a Business term review
  • Adding a Policy and Rule
  • Wrap up

Was this helpful?

  1. Credit Risk Workshop

Enterprise data governance for Admins using Watson Knowledge Catalog

PreviousEnterprise data governance for Viewers using Watson Knowledge CatalogNextMachine Learning with Jupyter

Last updated 4 years ago

Was this helpful?

This exercise demonstrates how to solve the problems of enterprise data governance using Watson Knowledge Catalog on the Cloud Pak for Data-as-a-Service (CP4DaaS). We'll explain how to use governance, data quality and active policy management in order to help your organization protect and govern sensitive data, trace data lineage and manage data lakes. This knowledge will help users quickly discover, curate, categorize and share data assets, data sets, analytical models and their relationships with other members of your organization. It serves as a single source of truth for data engineers, data stewards, data scientists and business analysts to gain self-service access to data they can trust.

You will need the Admin role to create a catalog.

This section is comprised of the following steps:

1. Set up Catalog and Data

First we'll create a catalog and load some data

Create the catalog

Add Watson Knowledge Catalog the First Time

  • Once you are on IBM Cloud Pak for Data, on the top right corner click on your avatar, and then click on Profile and settings. Go to the Services tab.

If the Watson Knowledge Catalog service instance is not added then click Add, choose the right plan for you and create the service.

Open Watson Knowledge Catalog

  • Go to the upper-left (☰) hamburger menu and choose Catalogs -> View All catalogs:

  • From the Your catalogs page, click either Create catalog +, and fill in all the required information, and click create

Add data assets

  • Under the Browse Assets tab, below "Now you can add assets!" click here or Add to Catalog + in the top right and, for example, choose Local files:

Then click browse or you can drag and drop your local file. Browse to the /data/split/applicant_personal_data.csv file and double-click or click Open. Add an optional description and click Add:

  • Browse to the /data/split/applicant_personal_data.csv file and double-click or click Open. Add an optional description and click Add:

  • The newly added file will show up under the Browse Assets tab of your catalog:

Add Connection

  • You can add a connection to a remote DB, for example DB2 Warehouse in IBM Cloud, by choosing + Add to catalog -> Connection:

  • Choose your remote DB and click:

  • Enter the connection details and click Create:

  • The connection now shows up in the catalog:

2. Add collaborators and control access

  • Under the Access Control tab you can click Add Collaborator + to give other users access to your catalog:

  • You can search for a user, click on the name to select them, choose a role for that use and click Add:

  • To access data in the catalog, click on the name of the data:

  • A preview of the data will open, with metadata and the first few rows:

  • You can click the Review tab and rate the data, as well as comment on it, to provide feedback for your teammates:

3. Add categories

The fundamental abstraction in Watson Knowledge Catalog is the Category. A category is analogous to a folder. You can add categories as needed.

Add category

  • Add a category for your assets by going to the upper-left (☰) hamburger menu, choose Governance -> Policy Manager, then click Create category or click Add + from top right menu and choose Category:

  • Give your category a name, such as Personal Data, and an optional description, and then click Create:

4. Add Business terms

  • From the upper-left (☰) hamburger menu, choose Governance -> Business Glossary, and click Add Terms + and from drop down select Import or Create New:

  • Give the new Business term a name such as Contact Information and optional description, and click Save as draft.

  • You can edit the term saved as draft by clicking 3 dots. You can also add tags, owner and term ex: Business Term For now, click Publish to make this term available to users of the platform.

  • Now go back to your Credit Risk Catalog by opening it up to the column view ((☰) hamburger menu Catalogs -> choose Credit Risk Catalog). Under the Browse assets tab, click on the data set applicant_personal_data.csv to get the column/row preview. Scroll right to get to the email column and click the Column information icon (looks like an "eye"):

  • In the window that opens, click the edit icon (looks like a "pencil") next to Business terms :

  • Enter Contact Information under Business terms and the term will be searched for. Click on the Contact Information term that is found, and click Apply:

Close that window once the term has been applied. Now, do the same thing to add the Contact Information Business term to the Telephone column.

  • You will now be able to search for these terms from within the platform. For example, going back to your top level Credit Risk Catalog, in the search bar with the comment "What assets are you searching for?" enter your Contact Information term:

The applicant_personal_data.csv data set will show up, since it contains columns tagged with the Contact Infomation business term.

5. Add rules for policies

We can now create rules to control how a user can access data.

  • Create a business term called CustomerID and assign it to your CustomerID column in the data set using the instructions above. See below if you need details, but try it yourself first, and skip to Adding a rule below if you do not need a reminder.

How to create a Business term review

  • From the upper-left (☰) hamburger menu, choose Governance -> Business Glossary.

  • Click on the upper-right Add terms + button.

  • Give the new Business term the name CustomerID and optional description, and click Publish.

  • Now go back to your Credit Risk Catalog by opening it up to the column view ((☰) hamburger menu Governance -> and choose Credit Risk Catalog). Under the Browse assets tab, click on the data set applicant_personal_data.csv to get the column/row preview. Scroll right to get to the CustomerID column and click the Column information icon (looks like an "eye").

  • In the window that opens, click the edit icon (looks like a "pencil") next to Business terms .

  • Enter CustomerID under Business terms and the term will be searched for. Click on the CustumerID term that is found, and click Apply.

Adding a Policy and Rule

  • From the upper-left (☰) hamburger menu, choose Governance -> Policy Manager, then click Add + and select Policy.

  • Under Details give your rule a Name, Type = Access, Category ex: Personal Data that you added earlier, and Description.

  • Next, under Rule builder Condition1 fill out If Business term Contains any CustomerID and Action then anonymize data in columns containing Product Data. Choose the tile for Substitute, which will make a non-identifiable hash. This obscures the actual CustomerID, but allows actions like database joins to still work. Click Create:

  • Now if we go back to our applicant_personal_data.csv asset in the catalog at the CustomerID column, it will look the same as before. But a non-admin user will see the "lock" icon and see that the customerID has now been substituted with a hash value:

  • To add a rule to Obfuscate data, create a new data class called Age. See the instructions above if needed.

  • Back in the Credit Risk Catalog, under the applicant_personal_data.csv asset, go to the Overview tab and scroll to the Age column. Click the "down arrow" and you can see that the data has been inferred to be classified as a Code:

  • Change the classifier by clicking View all.

  • Now change the classifier by starting to type Age. When this comes up in the search, click Use and then Close:

  • You can build a rule to Obfuscate this Age column:

  • And now when that column is viewed by a non-admin user, it will have data that is replaced with similarly formatted data:

Wrap up

In this lab, we learned how to:

  • Set up Catalog and Data

  • Add collaborators and control access

  • Add categories

  • Add Business terms

  • Add rules for policies

You can use to standardize definitions of business concepts so that your data is described in a uniform and easily understood way across your enterprise.

Business terms
Set up Catalog and Data
Add collaborators and control access
Add categories
Add Business terms
Add rules for policies
CPDaaS WML instance add
CPDaaS WML instance name
open catalog menu
create WKC catalog
add local files to catalog
click add for local files to catalog
newly added data in catalog
add connection to catalog
add connection to catalog
enter db2 warehouse connection details
give users access to the catalog
search for user and add as collaborator
click data name to open
preview of data
review data
organize data categories
organize Data Business terms
create business term
create business term
choose email columnn information
search business terms
search using business terms
define rule for masking customerID
customerID is now masked
Age classified as Code
Change classifier and use
Age obfuscate rule
Age column obfuscated