# Enterprise data governance for Admins using Watson Knowledge Catalog

This exercise demonstrates how to solve the problems of enterprise data governance using Watson Knowledge Catalog on the Cloud Pak for Data-as-a-Service (CP4DaaS). We'll explain how to use governance, data quality and active policy management in order to help your organization protect and govern sensitive data, trace data lineage and manage data lakes. This knowledge will help users quickly discover, curate, categorize and share data assets, data sets, analytical models and their relationships with other members of your organization. It serves as a single source of truth for data engineers, data stewards, data scientists and business analysts to gain self-service access to data they can trust.

You will need the *Admin* role to create a catalog.

This section is comprised of the following steps:

1. [Set up Catalog and Data](#1-set-up-catalog-and-data)
2. [Add collaborators and control access](#2-add-collaborators-and-control-access)
3. [Add categories](#3-add-categories)
4. [Add Business terms](#5-add-business-terms)
5. [Add rules for policies](#6-add-rules-for-policies)

## 1. Set up Catalog and Data

First we'll create a catalog and load some data

### Create the catalog

#### Add Watson Knowledge Catalog the First Time

* Once you are on IBM Cloud Pak for Data, on the top right corner click on your avatar, and then click on `Profile and settings`. Go to the `Services` tab.

If the `Watson Knowledge Catalog` service instance is not added then click `Add`, choose the right plan for you and create the service.

![CPDaaS WML instance add](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F9fd9f920d552e40517f0b79d600cd03bdbc3b0ca.png?generation=1611886699887709\&alt=media)

![CPDaaS WML instance name](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F16acf0843636ad1a7bb2f081fec956c142ea22c1.png?generation=1603461112222981\&alt=media)

#### Open Watson Knowledge Catalog

* Go to the upper-left (☰) hamburger menu and choose `Catalogs` -> `View All catalogs`:

![open catalog menu](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2Fce7a2d4e29e448aa8153b8292a2f0e71358e2e44.png?generation=1603976841480202\&alt=media)

* From the *Your catalogs* page, click either `Create catalog +`, and fill in all the required information, and click `create`

![create WKC catalog](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F064123e493e1859f8b183d72aea09ae139caa303.png?generation=1603976838719378\&alt=media)

### Add data assets

* Under the *Browse Assets* tab, below "Now you can add assets!" click `here` or `Add to Catalog +` in the top right and, for example, choose `Local files`:

![add local files to catalog](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F7fd9f2b6e006249ef9a08f5816de47ec25c8c1a6.png?generation=1603461106323383\&alt=media)

* Download the `application_personal_data.csv` file from [here](https://github.com/IBM/credit-risk-workshop-cpd/raw/workshop-DDC/data/split/applicant_personal_data.csv). If the download doesn't start automatically, right-click on the white space in the file and click `Save-As` and then name the file `application_personal_data.csv`.
* Browse to the `/data/split/applicant_personal_data.csv` file or `application_personal_data.csv` file if you've just downloaded the raw file from the step above, and double-click or click `Open`. Add an optional description and click `Add`:

![click add for local files to catalog](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F20159a04b0ebe9ebc3119e894483abc112d2f9bf.png?generation=1603976847987744\&alt=media)

* The newly added file will show up under the *Browse Assets* tab of your catalog:

![newly added data in catalog](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2Fbf599fcce3216ce18917897de573cb58196c1808.png?generation=1603976839437332\&alt=media)

## 2. Add collaborators and control access

* Under the *Access Control* tab you can click `Add Collaborator +` to give other users access to your catalog:

![give users access to the catalog](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2Fec216bd8e9c8a2bcd4a56cf8e959bc960bc6695b.png?generation=1603976843107511\&alt=media)

* You can search for a user, click on the name to select them, choose a role for that use and click `Add`:

![search for user and add as collaborator](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2Fb5090a1979ca1a93b8dab5ab1f016dec137a9da8.png?generation=1603976837377305\&alt=media)

* To access data in the catalog, click on the name of the data:

![click data name to open](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2Ff22369198dc51745afa0fc7400784ce0e50783bc.png?generation=1603976844571035\&alt=media)

* A preview of the data will open, with metadata and the first few rows:

![preview of data](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F968cd08fcf1a4e4601688a12e2d780b4ba87dd4d.png?generation=1603976846715964\&alt=media)

* You can click the `Review` tab and rate the data, as well as comment on it, to provide feedback for your teammates:

![review data](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F3296b6c6a7c18a404b01547c9f079186feb80e96.png?generation=1603976850509598\&alt=media)

## 3. Add categories

The fundamental abstraction in Watson Knowledge Catalog is the Category. A category is analogous to a folder. You can add categories as needed.

### Add category

* Add a category for your assets by going to the upper-left (☰) hamburger menu, choose `Governance` -> `Policy Manager`, then click `Create category` or click `Add +` from top right menu and choose `Category`:

![organize data categories](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F340f1e90f5a7a5f85d5d659c3bd044b996092e4a.png?generation=1603976840932156\&alt=media)

* Give your category a name, such as *Personal Data*, and an optional description, and then click `Create`:

## 4. Add Business terms

You can use [Business terms](https://dataplatform.cloud.ibm.com/docs/content/wsj/governance/dmg16.html) to standardize definitions of business concepts so that your data is described in a uniform and easily understood way across your enterprise.

* From the upper-left (☰) hamburger menu, choose `Governance` -> `Business Glossary`, and click `Add Terms +` and from drop down select `Create New`:

![organize Data Business terms](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F1f74f481560889cf6ce778d565f74b081de4b588.png?generation=1603976846503161\&alt=media)

* Give the new Business term a name such as *Contact Information* and optional description, and click `Save as draft`.&#x20;

![create business term](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F3edd0a816a3bace15ad6724616fc97ea65f4121f.png?generation=1603976848400740\&alt=media)

* You can `edit` the term saved as draft by clicking 3 dots. You can also add `tags`, `owner` and `term` ex: `Business Term` For now, click `Publish` to make this term available to users of the platform.

![create business term](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F2b0f4f3ba7d512b2cb490ea2b0d932bbbca314d6.png?generation=1603976843486706\&alt=media)

* Now go back to your *Credit Risk Catalog* by opening it up to the column view ((☰) hamburger menu `Catalogs` -> choose `Credit Risk Catalog`). Under the *Browse assets* tab, click on the data set *applicant\_personal\_data.csv* to get the column/row preview. Scroll right to get to the *email* column and click the *Column information* icon (looks like an "eye"):

![choose email columnn information](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F7842741ee560c4360aeef2e7a8e03a01ad3edc13.png?generation=1603461086584644\&alt=media)

* In the window that opens, click the *edit* icon (looks like a "pencil") next to *Business terms* :

![search business terms](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F0a59e6822c945d348cefb62f06531dc8ada53b6f.png?generation=1603976849481447\&alt=media)

* Enter *Contact Information* under *Business terms* and the term will be searched for. Click on the `Contact Information` term that is found, and click `Apply`:

Close that window once the term has been applied. Now, do the same thing to add the *Contact Information* Business term to the *Telephone* column.

* You will now be able to search for these terms from within the platform. For example, going back to your top level *Credit Risk Catalog*, in the search bar with the comment "What assets are you searching for?" enter your  *Contact Information* term:

![search using business terms](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2Fe8471c7e352f27fd724c5fcf5810952173d47c88.png?generation=1603461100318981\&alt=media)

The *applicant\_personal\_data.csv* data set will show up, since it contains columns tagged with the *Contact Infomation* business term.

## 5. Add rules for policies

We can now create rules to control how a user can access data.

* Create a business term called *CustomerID* and assign it to your *CustomerID* column in the data set using the instructions above. See below if you need details, but try it yourself first, and skip to *Adding a rule* below if you do not need a reminder.

### How to create a Business term review

* From the upper-left (☰) hamburger menu, choose `Governance` -> `Business Glossary`.
* Click on the upper-right `Add terms +` button.
* Give the new Business term the name *CustomerID* and optional description, and click `Publish`.
* Now go back to your *Credit Risk Catalog* by opening it up to the column view ((☰) hamburger menu `Governance` -> and choose `Credit Risk Catalog`). Under the *Browse assets* tab, click on the data set *applicant\_personal\_data.csv* to get the column/row preview. Scroll right to get to the *CustomerID* column and click the *Column information* icon (looks like an "eye").
* In the window that opens, click the *edit* icon (looks like a "pencil") next to *Business terms* .
* Enter *CustomerID* under *Business terms* and the term will be searched for. Click on the `CustumerID` term that is found, and click `Apply`.

### Adding a Policy and Rule

* From the upper-left (☰) hamburger menu, choose `Governance` -> `Policy Manager`, then click `Add +` and select `Policy`.
* Under *Details* give your rule a *Name*, *Type* = *Access*, *Category* ex: `Personal Data` that you added earlier, and `Description`.
* Next, under *Rule builder* *Condition1* fill out If *Business term* *Contains any* *CustomerID* and Action then *anonymize data* *in columns containing* *Product Data*. Choose the tile for `Substitute`, which will make a non-identifiable hash. This obscures the actual CustomerID, but allows actions like database joins to still work. Click `Create`:

![define rule for masking customerID](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2Fbf5f4071eeca466ac2b5be504be14c60b3ea461b.png?generation=1603461102746878\&alt=media)

* Now if we go back to our *applicant\_personal\_data.csv* asset in the catalog at the *CustomerID* column, it will look the same as before. But a non-admin user will see the "lock" icon and see that the customerID has now been substituted with a hash value:

![customerID is now masked](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F51c1dde201a96ef753456e3af50de8b56f5e6814.png?generation=1590012682485955\&alt=media)

* To add a rule to *Obfuscate* data, create a new data class called *Age*. See the instructions above if needed.
* Back in the *Credit Risk Catalog*, under the *applicant\_personal\_data.csv* asset, go to the `Overview` tab and scroll to the *Age* column. Click the "down arrow" and you can see that the data has been inferred to be classified as a *Code*:

![Age classified as Code](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F6d1eb100efa2f0382e8f7d8ac3ff7969a7c35da1.png?generation=1590012683756431\&alt=media)

* Change the classifier by clicking `View all`.
* Now change the classifier by starting to type *Age*. When this comes up in the search, click `Use` and then `Close`:

![Change classifier and use](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F4c3dec3ad96e7ad344d5fb446c2819a6c5a90278.png?generation=1590012684203719\&alt=media)

* You can build a rule to *Obfuscate* this *Age* column:

![Age obfuscate rule ](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F828dc669db8962049c65d8b5674b3a047137f6b1.png?generation=1590012683079981\&alt=media)

* And now when that column is viewed by a non-admin user, it will have data that is replaced with similarly formatted data:

![Age column obfuscated](https://1986830131-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2Fd578f25b74d429844d45da647d92a51b0770d503.png?generation=1590012684299884\&alt=media)

## Wrap up

In this lab, we learned how to:

* Set up Catalog and Data
* Add collaborators and control access
* Add categories
* Add Business terms
* Add rules for policies
