# Introduction

Welcome to our workshop! In this workshop we'll be using the Cloud Pak for Data platform to Collect Data, Organize Data, Analyze Data, and Infuse AI into our applications. The goals of this workshop are:

* Visualize data with Data Refinery
* Create and deploy a machine learning model
* Monitor the model
* Create a Python app to use the model

## About this workshop

* [Agenda](#agenda)
* [Compatability](#compatability)
* [About Cloud Pak for Data](#about-cloud-pak-for-data)
* [Credits](#credits)

### About the data set

In this workshop we will be using a credit risk / lending scenario. In this scenario, lenders respond to an increased pressure to expand lending to larger and more diverse audiences, by using different approaches to risk modeling. This means going beyond traditional credit data sources to alternative credit sources (i.e. mobile phone plan payment histories, education, etc), which may introduce risk of bias or other unexpected correlations.

![Use Case Diagram](https://2515596412-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M0YPhXPJogwds2wYiEl%2Fsync%2Fc5218bc381cb4fb5ce75e05bb064550bfdc312f7.png?generation=1603461098035249\&alt=media)

The credit risk model that we are exploring in this workshop uses a training data set that contains 20 attributes about each loan applicant. The scenario and model use synthetic data based on the \[UCI German Credit dataset]\(<https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data>)). The data is split into three CSV files and are located in the [data](https://github.com/IBM/credit-risk-workshop-cpd/tree/3a7aea796014d9d75e0aed6dfea23d2b498395f9/data/split/README.md) directory of the GitHub repository you will download in the pre-work section.

#### [Applicant Financial Data](https://github.com/IBM/credit-risk-workshop-cpd/tree/3a7aea796014d9d75e0aed6dfea23d2b498395f9/data/split/applicant_financial_data.csv)

This file has the following attributes:

* CUSTOMERID (hex number, used as Primary Key)
* CHECKINGSTATUS
* CREDITHISTORY
* EXISTINGSAVINGS
* INSTALLMENTPLANS
* EXISTINGCREDITSCOUNT

#### [**Applicant Loan Data**](https://github.com/IBM/credit-risk-workshop-cpd/tree/3a7aea796014d9d75e0aed6dfea23d2b498395f9/data/split/applicant_loan_data.csv)

This file has the following attributes:

* CUSTOMERID
* LOANDURATION
* LOANPURPOSE
* LOANAMOUNT
* INSTALLMENTPERCENT
* OTHERSONLOAN
* RISK

#### [**Applicant Personal Data**](https://github.com/IBM/credit-risk-workshop-cpd/tree/3a7aea796014d9d75e0aed6dfea23d2b498395f9/data/split/applicant_personal_data.csv)

This file has the following attributes:

* CUSTOMERID
* EMPLOYMENTDURATION
* SEX
* CURRENTRESIDENCEDURATION
* OWNSPROPERTY
* AGE
* HOUSING
* JOB
* DEPENDENTS
* TELEPHONE
* FOREIGNWORKER
* FIRSTNAME
* LASTNAME
* EMAIL
* STREETADDRESS
* CITY
* STATE
* POSTALCODE

## Agenda

|                                                                                                                                                                                                  |                                                                                                                          |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------ |
| [Pre-work](https://ibm-developer.gitbook.io/cloudpakfordata-credit-risk-workshop/workshop-cpdaas-master/getting-started/pre-work)                                                                | Creating a project, downloading the data set, seeding a database                                                         |
| [Data Visualization with Data Refinery](https://ibm-developer.gitbook.io/cloudpakfordata-credit-risk-workshop/workshop-cpdaas-master/credit-risk-workshop/data-visualization-and-refinery)       | Refining the data, vizualizing and profiling the data                                                                    |
| [Machine Learning with Jupyter](https://ibm-developer.gitbook.io/cloudpakfordata-credit-risk-workshop/workshop-cpdaas-master/credit-risk-workshop/machine-learning-in-jupyter-notebook)          | Building a model with Spark, deploying the model with Watson Maching Learning, testing the model with a Python Flask app |
| [Machine Learning with AutoAI](https://ibm-developer.gitbook.io/cloudpakfordata-credit-risk-workshop/workshop-cpdaas-master/credit-risk-workshop/machine-learning-autoai)                        | Use AutoAi to quickly generate a Machine Learning pipeline and model                                                     |
| [Deploy and Test Machine Learning Models](https://ibm-developer.gitbook.io/cloudpakfordata-credit-risk-workshop/workshop-cpdaas-master/credit-risk-workshop/machine-learning-deployment-scoring) | Deploy and machine learning models using several approaches                                                              |

## Compatability

This workshop has been tested on the following platforms:

* **macOS**: Mojave (10.14), Catalina (10.15)
  * Google Chrome version 81
* **Microsoft**: Windows 10
  * Google Chrome, Microsoft Edge


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ibm-developer.gitbook.io/cloudpakfordata-credit-risk-workshop/workshop-cpdaas-master/undefined.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
