Introduction

Analyzing Telco Customer Churn with Cloud Pak for Data on OpenShift

Welcome to our workshop! In this workshop we'll be using the Cloud Pak for Data platform to Collect Data, Organize Data, Analyze Data, and Infuse AI into our applications. The goals of this workshop are:

  • Collect and virtualize data

  • Visualize data with Data Refinery

  • Create and deploy a machine learning model

  • Monitor the model

  • Create a Python app to use the model

About this workshop

The introductory page of the workshop is broken down into the following sections:

About the data set

The data set used for this workshop is originally from Watson Analytics and was used on a Kaggle project, it contains information about customer churn for a Telecommunications company. The data is split into three CSV files and are located in the data directory of the GitHub repository you will download in the pre-work section.

This file has the following attributes:

  • Customer ID

  • Contract (Month-to-month, one year, two year)

  • Paperless Billing (Yes, No)

  • Payment Method (Bank transfer, Credit card, Electronic check, Mailed check)

  • Monthly Charges ($)

  • Total Charges ($)

  • Churn (Yes, No)

  • Customer ID

  • Gender (Male, Female)

  • Senior Citizen (1, 0)

  • Partner (Yes, No)

  • Dependents (Yes, No)

  • Tenure (1-100)

  • Customer ID

  • Phone Service (Yes, No)

  • Multiple Lines (Yes, No, No phone service)

  • Internet Service (DSL, Fiber optic, No)

  • Online Security (Yes, No, No internet service)

  • Online Backup (Yes, No, No internet service)

  • Device Protection (Yes, No, No internet service)

  • Tech Support (Yes, No, No internet service)

  • Streaming TV (Yes, No, No internet service)

  • Streaming Movies (Yes, No, No internet service)

Agenda

Pre-work

Creating a project, downloading the data set, seeding a database

Data Connection and Virtualization

Creating a new connection, virtualizing the data, importing the data into the project

Import Data to Project

Import the data into your project

Data Visualization with Data Refinery

Refining the data, vizualizing and profiling the data

Enterprise data governance for Viewers using Watson Knowledge Catalog

Use and Enterprise data catalog to search, manage, and protect data

Enterprise data governance for Admins using Watson Knowledge Catalog

Create new Categories, Business terms, Policies and Rules in Watson Knowledge Catalog

Machine Learning with Jupyter

Building a model with Spark, deploying the model with Watson Maching Learning, testing the model with a Python Flask app

Machine Learning with AutoAI

Use AutoAi to quickly generate a Machine Learning pipeline and model

Deploy and Test Machine Learning Models

Deploy and machine learning models using several approaches

Monitoring models with OpenScale GUI (Fastpath Monitoring)

Quickly deploy an OpenScale demo with FastPath

Monitoring models with OpenScale (Notebook)

See the OpenScale APIs in a Jupyter notebook and manually configure the monitors

Compatability

This workshop has been tested on the following platforms:

  • macOS: Mojave (10.14), Catalina (10.15)

About Cloud Pak for Data

Cloud Pak for Data represents an all-in-one platform for all your data needs. Cloud Pak for data tries to eliminate silos between Architect, Data Scientist, Developer, and Data Stewards. Cloud Pak for Data helps to streamline work by creating a pipeline for collecting, organizing, analyzing, and consuming data.

Cloud Pak for Data pipeline

A few other noteworthy mentions

Cloud Pak for Data:

  • ... is installed on Red Hat OpenShift providing an enterprise quality container platform

  • ...you can choose the services that you want to run on Cloud Pak for Data. This means you are running only the services that are important for your line of business.

  • ...you can extend the functionality of IBM Cloud Pak for Data by installing services and by integrating Cloud Pak for Data with other applications.

  • ... added Services include:

    • Watson Assistant

    • Watson OpenScale

    • R Studio

    • Data Virtualization

    • any many more

  • ... can be deployed on any major cloud provider (IBM, AWS, Azure, GCP)

  • ... provides a free 7-day trial -- Cloud Pak Experience

Cloud Pak for Data stack

Credits

This workshop was primarily written by Scott D'Angelo and Steve Martinelli. Many other IBMers have contributed to help shape, test, and contribute to the workshop. Special thanks to Rick Buglio and team for the great Watson Knowledge Catalog demo.