# Data Connection and Virtualization

This section will cover aspects of collecting data in Cloud Pak for Data. Specifically we will be connecting to different data sources and creating views against those data sources to create a single unified set of data assets that can be used in other modules of this workshop.

> **Note: To complete this section, an Admin or Data Engineer role needs to be assigned to your user account. The workshop instructor will assign this role as appropriate.**

## Virtualizing Data

In this section, we will gather data from several tables across data sources. We will use data virtualization to access these tables and then create joined views against those virtualized tables.

### Create Virtualized Tables

* To launch the data virtualization tool, go the (☰) navigation menu and under the *`Data`* section click on `Data virtualization`.

![(☰) Menu -> Collect -> Data Virtualization](https://4168210942-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2Ff64c2c3848dbc870d95640e33026527bb918fd6d.png?generation=1626367017265364\&alt=media)

* From the Data virtualization sub-menu at the top left of the page, click on the menu drop down list and choose *`Virtualize`*.

![Menu -> Virtualize](https://4168210942-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2Ff913405ca986fba1cab7f819445134e0b31ab301.png?generation=1626367024524571\&alt=media)

* Several tables names will be displayed across any of the data sources that are included in the data virtualization server. You will notice that on the top of the panel, we can filter the tables being displayed by selecting the database type.
* To simplify the search for tables you will use, click on the `Schemas` column header to sort the tables by Schema. Then find the tables we will be using for this workshop: `APPLICANTFINANCIALDATA`, `APPLICANTPERSONALDATA` and `LOANS`, which are under the `CP4DCREDIT` schema. Select the checkboxes next to these three tables, and then click on *`Add to cart`* followed by the *`View Cart`* button.

> *Note: You may need to page through the available tables by clicking on the right arrow at the bottom of the tables view. Or changing the number of `Items per page` to a larger value at the bottom left of the tables view.*

![Choose the tables to virtualize](https://4168210942-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F5a9a501f338b101bddce85421e9c30ed1914d1ae.png?generation=1626367014966125\&alt=media)

* The next panel prompts you to select where to assign the virtualized tables. Select the `My virtualized data` radio button and click the *`Virtualize`* button to add the virtualized tables to your data (we left the default values, so the tables will be virtualized under your own user schema with the same table names as the original tables).

![Add virtualized data to your project](https://4168210942-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2Ff9e59fe8cb6c2a60580ae2d20044ae2e0853f091.png?generation=1626367013701426\&alt=media)

* A pop up dialog panel will indicate that the virtual tables have been created. Let's see the new virtualized tables by clicking the *`View my virtualized data`* button.

![We've got virtualized data](https://4168210942-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F426264f9cd81e84da3d18f6bb27b5cbc9fbf275a.png?generation=1626367040182881\&alt=media)

> *Note: You may receive a notification at the top of the page that the virtual assets were published to the catalog. Feel free to dismiss the notification by clicking on the `X`*

### Create Joined Virtual Views

Now we're going to **join** the tables we previously virtualized, so we have a final merged set of data. It will be easier to do it here rather than in a notebook where we'd have to write code to handle three different data sets.

* From the 'My virtualized data' page, Click on two of the virtualized tables (`APPLICANTPERSONALDATA` and `APPLICANTFINANCIALDATA`) and click the *`Join`* button.

![Choose to join two tables](https://4168210942-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F3bf6510778b37f7afacbc2dd40e81ee2e49de99f.png?generation=1626367025244949\&alt=media)

* To join the tables we need to pick a key that is common to both data sets. Here we choose to map `CustomerID` from the first table to `CustomerID` on the second table. Do this by clicking on one and dragging it to another. We will leave the default of having all the column names selected. When the line is drawn, click on the *`Next`* button.

![Map the two customerID keys](https://4168210942-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F7a5cf91b875475a220bb6c0741e5f305fceb97aa.png?generation=1626367013001942\&alt=media)

* In the next panel, although we could change the names of our columns, we will accept the existing names for our columns. Click the *`Next`* button to continue.

![Review joined column names](https://4168210942-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F162e615748cf6f3d247f084ea38440b80164b749.png?generation=1626366985813446\&alt=media)

* In the next panel we'll give our joined data view a unique name (to be consistent with SQL standards, pick an all uppercase name), choose someething like: `XXXAPPLICANTFINANCIALPERSONALDATA` (where `XXX` is your initials in *all upper case*). Also select the `My virtualized data` radio button and then click the *`Create view`* button to add the virtualized aggregate view to your data.

![Review the proposed joined table](https://4168210942-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2Ff721be03ebb41835a81cafe2141730495a995240.png?generation=1626367035926125\&alt=media)

* A pop up dialog panel will indicate that the join view creation has succeeded! Click on *`View my virutalized data`* button.

![The data join succeeded!](https://4168210942-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F426970cfabc89c74b6b8408999b38b3956c132d3.png?generation=1626367014461547\&alt=media)

* Repeat the same steps as above, but this time choose to join the new joined view you just created (`XXXAPPLICANTFINANCIALPERSONALDATA`) and the last virtualized table (`LOANS`), to create a new joined view that has all three tables. Click the *`Join`* button.

![Join final tables](https://4168210942-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2Fb80eb1ab49267c945d4f781a30084f778cb3ede0.png?generation=1626367029693041\&alt=media)

* Again join the two tables by selecting/mapping the `CustomerID` from the first table to `CustomerID` on the second table. Do this by clicking on one and dragging it to another.  We will leave the default of having all the column names selected. When the line is drawn, click on the *`Next`* button.

![Map the two customerID keys](https://4168210942-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F37dea236efe3afca190421e8173b5083c56eba60.png?generation=1626367030952834\&alt=media)

* In the next panel, although we could change the names of our columns, we will accept the existing names for our columns. Click the *`Next`* button to continue.
* In the next panel we'll give our joined data view a unique name (to be consistent with SQL standards, pick an all uppercase name), choose someething like: `XXXAPPLICANTFINANCIALPERSONALLOANSDATA` (where `XXX` is your initials in *all upper case*). Also select the `My virtualized data` radio button and then click the *`Create view`* button to add the virtualized aggregate view to your data.
* A pop up dialog panel will indicate that the join view creation has succeeded! Click on *`View my virtualized data`* button.
* From the `My virtualized data` page you should now see all three virtualized tables and two joined tables. Do not go to the next section until you have all the tables.

![Our data sets at the end of this section](https://4168210942-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2F361504081fe4beda2e7dc239f798086bd36f603f.png?generation=1626367016969264\&alt=media)

## Conclusion

In this section we learned how to make connection to databases that contain our data, how to virtualize them, and how to use the virtualized data.

Remember that you can add data from different databases and servers if you need to. Moreover, you can virtualized these data from different sources together as well! The goal is to take care of bringing the data to the platform early on so all the data scientists can use it without reinventing the wheel while you keep full control of who has access to what data.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ibm-developer.gitbook.io/cloudpakfordata-credit-risk-workshop/credit-risk-workshop/data-connection-and-virtualization.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
