cloudpakfordata-telco-workshop
  • Introduction
  • Getting Started
    • Pre-work
    • Git integration (using GitLab)
  • Telco Workshop
    • Data Connection and Virtualization
    • Import Data to Project
    • Data Visualization with Data Refinery
    • Enterprise data governance for Viewers using Watson Knowledge Catalog
    • Enterprise data governance for Admins using Watson Knowledge Catalog
    • Machine Learning with Jupyter
    • Machine Learning with AutoAI
    • Deploy and Test Machine Learning Models
    • Monitoring models with OpenScale GUI (Auto setup Monitoring)
    • Monitoring models with OpenScale GUI (Manual Config)
      • OpenScale Manual Config Part 2
      • OpenScale Manual Config Part 3
      • OpenScale Manual Config Part 4
      • OpenScale Manual Config Part 5
    • Monitoring models with OpenScale (Notebook)
  • Workshop Resources
    • Instructor Guide
  • Resources
    • IBM Cloud Pak for Data Platform API
    • IBM Cloud Pak for Data APIs
    • Watson Knowledge Catalog
    • Watson Knowledge Catalog Learning Center
    • IBM Developer
    • Cloud Pak for Data 2.5
    • Cloud Pak Experiences - Free 7 day trial
    • Cloud Pak for Applications
Powered by GitBook
On this page
  • 1. Start virtualizing data
  • Join the virtualized data
  • 2. Grant access to the virtualized data
  • Conclusion

Was this helpful?

  1. Telco Workshop

Data Connection and Virtualization

PreviousGit integration (using GitLab)NextImport Data to Project

Last updated 4 years ago

Was this helpful?

This section requires a Data Engineer role to be assigned to your user account. The section is broken up into the following steps:

1. Start virtualizing data

In this section, since we now have access to the Db2 Warehouse data, we can virtualize the data to our Cloud Pak for Data project.

To launch the data virtualization tool, go the (☰) menu and click Collect and then Data Virtualization.

Click on the Data sources pulldown and choose Virtualize.

Several tables names will be displayed (many of these tables are created as sample data when a Db2 Warehouse instance is provisioned). Find the tables we will be using for this workshop: CUSTOMER, PRODUCT and BILLING (You can search using the Schema name (NULLIDRA) for the tables and they should show up). Once selected click on Add to cart and then on View Cart. :

The next panel prompts the user to select where to assign the virtualized tables. Select the My virtualized data radio button. If there is a Submit to catalog checkbox on the top right, unselect it and finally click the Virtualize button to add the virtualized tables to your data.

You'll be notified that the virtual tables have been created. Let's see the new virtualized tables from the Data Virtualization tool by clicking View my virtualized data button.

Join the virtualized data

Now we're going to join the tables we created so we have a merged set of data. It will be easier to do it here rather than in a notebook where we'd have to write code to handle three different data sets. Click on any two tables (PRODUCTS and BILLING for instance) and click the Join view button.

To join the tables we need to pick a key that is common to both data sets. Here we choose to map customerID from the first table to customerID on the second table. Do this by clicking on one and dragging it to another. When the line is drawn click on Next.

Next, you have a chance to Edit column names. We'll keep them as-is. Click Next.

In the next panel we'll give our joined data a unique name (to be consistent with SQL standards, pick an all uppercase name), I chose XXXBILLINGPRODUCTS (where XXX is my All Upper Case user ID). Under Assign to, we choose where to assign the joined view we created. Select the My virtualized data radio button. If there is a Submit to catalog checkbox on the top right, unselect it and finally click the Create view button to add the virtualized aggregate table to your data.

You'll be notified that the join view creation has succeeded! Click on the View my virutalized data button.

You will need to repeat this again until we have joined all three tables.

IMPORTANT Repeat the same steps as above, but this time choose to join the new joined view (XXXBILLINGPRODUCTS) and the last virtualized table (CUSTOMERS) to create a new joined view that has all three tables. Let's call it XXXBILLINGPRODUCTSCUSTOMERS. Switching to our project should show all three virtualized tables, and two joined tables. Do not go to the next section until this step is performed.

2. Grant access to the virtualized data

Note: This section only needs to be completed if there are non-Admin or non-Data Engineer users you are working in a group with. The instructors would have indicated that it needs to be completed to give those users access to the data you have virtualized above.

In order for other users to have access to the data that you just virtualized, you need to grant them access. Follow these steps to make your virtualized data visible to them.

Go to Data Virtualization option from the (☰) menu. Select My virtualized data in the pulldown.

Click on the virtualized data you've created, then click the 3 vertical dots to the right of one, and choose Manage access:

Click the Specific users button and click Add user +:

Select the users you wish to grant access to and click Add users:

Conclusion

In this section we learned how to make connection to databases that contain our data, how to virtualize them, and how to allow other to collaborate with us and use the virtualized data.

Remember that you can add data from different databases and servers if you need to. Moreover, you can virtualize this data from different sources together as well! The goal is to take care of bringing the data to the platform early on so all the data scientists can use it without reinventing the wheel while you keep full control of who has access to what data.

Start virtualizing data
Grant access to the virtualized data
(☰) Menu -> Collect -> Data Virtualization
Menu -> Virtualize
Choose the tables to virtualize
Add virtualized data to your project
We've got virtualized data
Choose to join two tables
Map the two customerID keys
Review the proposed joined table
Add joined data tables to your project
The data join succeeded!
Our data sets at the end of this section
Manage access to virtualized data
Grant Access to specific users
Select Users to Grant Access to