Enterprise data governance for Admins using Watson Knowledge Catalog
Last updated
Was this helpful?
Last updated
Was this helpful?
This exercise demonstrates how to solve the problems of enterprise data governance using Watson Knowledge Catalog on the Cloud Pak for Data-as-a-Service (CP4DaaS). We'll explain how to use governance, data quality and active policy management in order to help your organization protect and govern sensitive data, trace data lineage and manage data lakes. This knowledge will help users quickly discover, curate, categorize and share data assets, data sets, analytical models and their relationships with other members of your organization. It serves as a single source of truth for data engineers, data stewards, data scientists and business analysts to gain self-service access to data they can trust.
You will need the Admin role to create a catalog.
This section is comprised of the following steps:
First we'll create a catalog and load some data
Once you are on IBM Cloud Pak for Data, on the top right corner click on your avatar, and then click on Profile and settings
. Go to the Services
tab.
If the Watson Knowledge Catalog
service instance is not added then click Add
, choose the right plan for you and create the service.
Go to the upper-left (☰) hamburger menu and choose Catalogs
-> View All catalogs
:
From the Your catalogs page, click either Create catalog +
, and fill in all the required information, and click create
Under the Browse Assets tab, below "Now you can add assets!" click here
or Add to Catalog +
in the top right and, for example, choose Local files
:
Browse to the /data/split/applicant_personal_data.csv
file or application_personal_data.csv
file if you've just downloaded the raw file from the step above, and double-click or click Open
. Add an optional description and click Add
:
The newly added file will show up under the Browse Assets tab of your catalog:
Under the Access Control tab you can click Add Collaborator +
to give other users access to your catalog:
You can search for a user, click on the name to select them, choose a role for that use and click Add
:
To access data in the catalog, click on the name of the data:
A preview of the data will open, with metadata and the first few rows:
You can click the Review
tab and rate the data, as well as comment on it, to provide feedback for your teammates:
The fundamental abstraction in Watson Knowledge Catalog is the Category. A category is analogous to a folder. You can add categories as needed.
Add a category for your assets by going to the upper-left (☰) hamburger menu, choose Governance
-> Policy Manager
, then click Create category
or click Add +
from top right menu and choose Category
:
Give your category a name, such as Personal Data, and an optional description, and then click Create
:
From the upper-left (☰) hamburger menu, choose Governance
-> Business Glossary
, and click Add Terms +
and from drop down select Create New
:
Give the new Business term a name such as Contact Information and optional description, and click Save as draft
.
You can edit
the term saved as draft by clicking 3 dots. You can also add tags
, owner
and term
ex: Business Term
For now, click Publish
to make this term available to users of the platform.
Now go back to your Credit Risk Catalog by opening it up to the column view ((☰) hamburger menu Catalogs
-> choose Credit Risk Catalog
). Under the Browse assets tab, click on the data set applicant_personal_data.csv to get the column/row preview. Scroll right to get to the email column and click the Column information icon (looks like an "eye"):
In the window that opens, click the edit icon (looks like a "pencil") next to Business terms :
Enter Contact Information under Business terms and the term will be searched for. Click on the Contact Information
term that is found, and click Apply
:
Close that window once the term has been applied. Now, do the same thing to add the Contact Information Business term to the Telephone column.
You will now be able to search for these terms from within the platform. For example, going back to your top level Credit Risk Catalog, in the search bar with the comment "What assets are you searching for?" enter your Contact Information term:
The applicant_personal_data.csv data set will show up, since it contains columns tagged with the Contact Infomation business term.
We can now create rules to control how a user can access data.
Create a business term called CustomerID and assign it to your CustomerID column in the data set using the instructions above. See below if you need details, but try it yourself first, and skip to Adding a rule below if you do not need a reminder.
From the upper-left (☰) hamburger menu, choose Governance
-> Business Glossary
.
Click on the upper-right Add terms +
button.
Give the new Business term the name CustomerID and optional description, and click Publish
.
Now go back to your Credit Risk Catalog by opening it up to the column view ((☰) hamburger menu Governance
-> and choose Credit Risk Catalog
). Under the Browse assets tab, click on the data set applicant_personal_data.csv to get the column/row preview. Scroll right to get to the CustomerID column and click the Column information icon (looks like an "eye").
In the window that opens, click the edit icon (looks like a "pencil") next to Business terms .
Enter CustomerID under Business terms and the term will be searched for. Click on the CustumerID
term that is found, and click Apply
.
From the upper-left (☰) hamburger menu, choose Governance
-> Policy Manager
, then click Add +
and select Policy
.
Under Details give your rule a Name, Type = Access, Category ex: Personal Data
that you added earlier, and Description
.
Next, under Rule builder Condition1 fill out If Business term Contains any CustomerID and Action then anonymize data in columns containing Product Data. Choose the tile for Substitute
, which will make a non-identifiable hash. This obscures the actual CustomerID, but allows actions like database joins to still work. Click Create
:
Now if we go back to our applicant_personal_data.csv asset in the catalog at the CustomerID column, it will look the same as before. But a non-admin user will see the "lock" icon and see that the customerID has now been substituted with a hash value:
To add a rule to Obfuscate data, create a new data class called Age. See the instructions above if needed.
Back in the Credit Risk Catalog, under the applicant_personal_data.csv asset, go to the Overview
tab and scroll to the Age column. Click the "down arrow" and you can see that the data has been inferred to be classified as a Code:
Change the classifier by clicking View all
.
Now change the classifier by starting to type Age. When this comes up in the search, click Use
and then Close
:
You can build a rule to Obfuscate this Age column:
And now when that column is viewed by a non-admin user, it will have data that is replaced with similarly formatted data:
In this lab, we learned how to:
Set up Catalog and Data
Add collaborators and control access
Add categories
Add Business terms
Add rules for policies
Download the application_personal_data.csv
file from . If the download doesn't start automatically, right-click on the white space in the file and click Save-As
and then name the file application_personal_data.csv
.
You can use to standardize definitions of business concepts so that your data is described in a uniform and easily understood way across your enterprise.