Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This section will explain how to access your environment.
The lab instructors will provide everyone with a set of URLs at the beginning of the lab.
Accessing the URL will open a Skytap environment that is password protected.
Typing in the password provided by the lab instructor will give you access to 4 virtual machines. Turn the VMs on by clicking the "Play" button.
Once the VMs are turned on, click on the tile to access the virtual machine.
Use the toolbar at the top to make the window fullscreen. The "Ctrl-Alt-Del" button can be used to log into Windows environments.
NOTE: The Skytap environment will automatically turn off the VMs if they are not used within two hours. You can simply start them again to resume your work.
Windows VM
Administrator
inf0Xerver
Linux VM
root
inf0Xerver
Information Server
isadmin
inf0Xerver
Welcome to our workshop! In this workshop we'll be using IBM Information Server to design data processing jobs with DataStage and create rules to analyze data.
Get acquainted with Information Server
Run a few DataStage jobs
Learn the ins and outs of data rules
Have fun!
IBM InfoSphere Information Server is a market-leading data integration platform which includes a family of products that enable you to understand, cleanse, monitor, transform and deliver data, and to collaborate to bridge the gap between business and IT. InfoSphere Information Server provides massively parallel processing (MPP) capabilities to deliver a highly scalable and flexible integration platform that handles all data volumes, big and small.
DataStage Flow Designer enables users to create, edit, load, and run DataStage jobs which can be used to perform integration of data from various sources in order to glean meaningful and valuable information.
The purpose of this lab is to familiarize ourselves with the DataStage Flow Designer and the Operations Console.
In this lab, you will learn the following:
How to create a job in DataStage.
How to read tabular data from a file into DataStage.
How to perform transformations (such as filtering out records and removing duplicate values) on a table.
How to write tabular data from DataStage into a file.
How to run jobs.
How to view logs for jobs.
How to schedule jobs.
This lab is comprised of the following steps:
Before we start the lab, let's switch to the iis-client
VM and launch Firefox
.
Click on Classic Launchpad
in the Bookmarks tab. The first time you try this out, you might see a certificate error. To get past it, click on Advanced...
and then click Accept the Risk and Continue
.
Click on DataStage Flow Designer
.
Login with the credentials isadmin
/inf0Xerver
.
This brings up the DataStage Flow Designer
. Click OK
.
On the IBM DataStage Flow Designer, click on the Projects
tab and click + Create
. In the modal that opens up, type in a name for the project and click Create
.
The project takes a few minutes to be created and once ready, it will be visible on the Projects
tab.
Click on the tile for your newly created project. In the modal that opens up, verify that the name of your project is provided as the Project Name
and click OK
to switch the project.
We will be using the raviga-products.csv
file which contains tabular data having 2 columns:
On the DataStage Flow Designer, click on the Table Definitions
tab and then click + Create
. In the modal that opens up, provide the definition of the table contained in the input file. Go to the General
tab and provide the Data source type as File
, the Data source name as raviga
and the Table/File name as ravigaproducts
. If you wish to, you can also provide optional Short description and Long description.
Next, go to the Columns
tab and click + Add
. A new entry is added in the table. Edit the entry and update the Column name to ProductID
. Mark the column as Key. Update the SQL type to Varchar
and the Length to 9
.
Click on + Add
to add another entry in the table. Edit this entry and update the Column name to Zip
, SQL type to Integer
, Extended to "" (blank)
and Length to 6
. Mark the column as Nullable. Click OK
. In the modal that pops up, click Save
.
Click on the Jobs
tab and then click + Create
. Click Parallel Job
.
A new tab with the name Job_1*
opens up where you can now start designing the parallel job.
Before designing the job, take a few minutes to look at the various buttons and menus available on the screen.
The Show/Hide Palette
button can be used to show or hide the palette on the left of the screen which contains the connectors and stages that can be used while designing the job.
The Save
, Compile
and Run
icons are used to save, compile and run the job respectively.
The Compare
icon is used to compare this job with another job in the Project.
The View
menu has the following options:
View OSH code - which is available once the job has been successfully compiled.
View log - which is available once the job has been run.
Properties - based on what is selected in the job canvas before clicking on Properties
, either the properties of the job or of a connector/stage/link are displayed.
The Schedule
icon is used to set a schedule to run the job.
The Settings
menu has a number of options such as
Apply horizontal layout - which arranges all the connectors and stages in the canvas in a horizontal manner with data flowing from left to right.
The ability to view/hide annotations, arrows, link names, and the type of the connector/stage.
Smart palette - which applies smart logic based on usage patterns to reorder the items available in the palette. If disabled, the items in the palette are displayed in an alphabetical order.
Smart stage suggestions - applies smart logic based on usage patterns to suggest the next stage that you might want to add to the job.
Drag a File connector to the canvas. The Table Definition Asset Browser
opens up. Select raviga\ravigaproducts
and click Next
.
Ensure that both the columns ProductID
and Zip
are selected and click Add to Job
.
The File connector should now be visible on the canvas. Drag and drop a Filter stage on the canvas. Provide the output of the File connector as the input to the Filter stage. To do this, click on the little blue dot on the right side of the File connector and drag the mouse pointer to the Filter stage.
NOTE: For another method to connect the File connector to the Filter stage, click on the File connector to select it and then drag and drop the Filter stage. The Filter stage will automatically be connected to the File connector.
Double click on the File connector to open the Connector's Properties page. Provide the File name as /IBMdemos/raviga-products.csv
which is where the source file is located. Further down, provide the File format as Comma-separated value (CSV)
.
Scroll further down and update the Null value to ""
. Click OK
.
Double click the Filter stage to open the Stage's Properties page. Provide the value for Where Clause as Zip IS NOT NULL
. This will filter out all the records in the table where the value of Zip is NULL. Click OK
.
We will use the Peek stage to look at the intermediate state of the table. Drag and drop a Peek stage to the canvas and provide the output of the Filter stage as the input to the Peek stage.
Next, we will add a Remove Duplicates stage and provide the output of the Peek stage as the input to the Remove Duplicates stage.
Double click on the Remove Duplicates stage to open the Stage's Properties page. Under KEYS THAT DEFINE DUPLICATES, select Zip
. Set the value of Duplicate to retain as First
. This will ensure that if there are any duplicate Zip values, the job will only retain the first record out of the records where this Zip value occurs. Click OK
.
Drag and drop a File connector to the canvas. In the modal that opens up, check the Add connector as target
checkbox and click Add to Job
.
Provide the output of the Remove Duplicates stage as the input to the File connector. Double click the File connector to open the stage page, and provide the File name as output.csv
. Specify Comma-separated value (CSV)
as the File format. Click OK
.
Click the Save
icon to save the job. If you wish to, you can provide a different name for the job in the modal that pops up. Click Save
. Once the job is saved, click on the Compile
icon to compile it. If compilation is successful, you should see a green check mark and the message Compiled successfully
displayed on the screen. The OSH Code will also be available for viewing.
If the compilation was not successful, you should see a red error icon and the message Compiler failed with ### errors.
where ### represents the number of errors. Clicking on View errors
brings up a modal that lists all the errors that were found. The stage/connector/link that has the error is also highlighted in the canvas and you can hover over it to see the error.
Click the Run
icon to run the job. In the modal that opens up, click Run
.
Once the run completes, you should see the status of the run on the screen. The status will also indicate if there were any warnings or errors. Click on View log
to get more details about the run.
Everything that happens as part of the job run is available in the log. This includes warnings, errors and informational messages such as the results of the Peek stage. Scroll through the log to reach the Peek results. By default, the Peek stage only logs the first 10 records of the partition (the data was split into two partitions in this case). You can see how in both Peek logs there are no records where Zip value is NULL.
At the bottom of the log, click on Open Operations Console
. Alternatively, you can open the Classic Launchpad
in Firefox and click on Operations Console
.
The Operations console can be used to view logs for previous runs of the job. Switch to the Projects
tab. In the left-hand navigation pane, expand Lab1
(the transformation project), followed by Jobs
. Click on Job_1
(the job that was just saved/compiled/run). The Job Runs
table at the bottom of the screen will be updated to reflect all the times that the job was run. Click on any job run in the list to select it and click View Details
.
Go to the Log Messages
tab and check the Full Messages
checkmark to see the complete log for the job run. Clicking on an entry in the job log will display the log message details at the bottom.
DataStage Flow Designer provides the ability to run jobs as per a schedule.
Back in the DataStage Flow Designer where we were designing the job, click on the Schedule
icon. In the modal that opens up, provide the frequency for running the job - whether you want to run it every Day
, Week
or Month
. Next to it, provide the time of the day when the job should run. Click Schedule
.
Alternatively you can go up one level to the Jobs
tab, and hover over your job's tile to see the kebab icon (â‹®). Click on the icon and select Schedule
to open up the scheduling modal where you can set up a schedule for your job.
NOTE: You can only schedule your job once it has been successfully compiled.
The output file will be saved on the server. Switch to the server VM by clicking the first icon on the Environment VMs panel
and selecting iis-server
. Login as the root
user with the password inf0Xerver
.
CD to the location where you had stored the file. If you provided a path starting at "/", then it will be stored at that location in the server. Since we had only provided output.csv
as the file path in the File connector, the file will be available in the Transformation project's folder, i.e.,
CONGRATULATIONS!! You have completed this lab!
You can use the REST protocols to incorporate IBM DataStage Flow Designer features in your custom application. You use the REST protocol to perform actions such as compiling jobs, running jobs, and getting job status.
The following REST calls are supported:
XML
JSON
In this lab we'll learn how to call a JSON endpoint and pipe those results to a file using DataStage.
This lab consists of the following steps:
Before we start the lab, let's switch to the iis-client
VM and launch Firefox
.
Launch the desktop client by going to the start menu and searching for DataStage Designer
.
Start a new Parallel Job
project and create a job that looks like the image below. Remember to wire the elements together. It should have:
1 x Hierarchical Data
1 x Peek
1 X Sequential File
Double click on the Hierarchical Data
node and click on Edit assembly
.
The "Assembly Editor" will launch. This allows for fine tuning the properties of certain jobs.
Click on Palette
and add a REST
step and a JSON Parser
step.
From the REST
step, in the General
tab, specify the following URL:
From the REST
step, in the Response
tab, specify the content type to be test/plain
:
From the REST
step, in the Mappings
tab, click on the Map Automatically
button:
Before proceeding to the JSON Parser
step click on the Libraries
button on the top left and choose the New Library
button.
Click on Import New Resource
and browse to the newly downloaded file.
DataStage will automatically use this file as a JSON schema
From the JSON Parser
step, in the Configuration
tab, choose the String set
option and use the drop down menu. There is only one result.
From the JSON Parser
step, in the Documentation Root
tab, specify the documentation root to be the new library from the previous step, likely called employees
.
From the Output
step, in the Mappings
tab, click on the Map Automatically
button. Remove the image profile
and any others you do not want to use.
Click OK
to exit out of the Assembly Editor
.
Before running the job we need to modify the Sequential File
by double clicking on its icon on the designer. Specify an output file, say output.csv
.
Also update the columns by adding the id
, employee_name
, employee_salary
, and employee_age
fields.
Finally, compile and run the job.
The output file will be saved on the server. Switch to the server VM by clicking the first icon on the Environment VMs panel
and selecting iis-server
. Login as the root
user with the password inf0Xerver
.
Change your direcotory using cd
to the location where you had stored the file. If you provided a path starting at "/", then it will be stored at that location in the server. Since we had only provided output.csv
as the file path in the File connector, the file will be available in the Transformation project's folder, i.e.,
Finally, output your results using the cat
command.
CONGRATULATIONS!! You have completed this lab!
IBM DataStage Flow Designer allows you to read data from a mainframe. More specifically, you can specify inputs in your DataStage job to be in EBCIDIC format and to import COBOL copybooks as table definitions.
This lab consists of the following steps:
The example binary data should be downloaded to the server. Switch to the server and run:
Before we start the lab, let's switch to the iis-client
VM and launch Firefox
.
Launch the desktop client by going to the start menu and searching for DataStage Designer
.
Start a new Parallel Job
project and create a job that looks like the image below. Remember to wire the elements together. It should have:
1 x Complex Flat File
1 x Peek
1 X Sequential File
In the toolbar click on Import
> Table Definitions
> COBOL File Definitions
.
Specify the downloaded copybook file and click Import
.
You have just imported your copybook definitions!
The first step is to double click on the Complex Flat File node, go to the File options
tab, and specify the example binary file as the input. Critically, we must specify Fixed block
as the record type.
Go to the Record options
tab and choose the Binary
data format and EBCDIC
as the character set.
Go to the Records
tab and click the Load
button, this will give us the option to specify a copybook.
Choose to use on the copybook that was imported in the previous step.
Select the >>
icon to use all fields from the copybook.
The Records
tab should now show the various column names from the copybook.
Double clicking on the Peek
node allows us to map output from the Complex Flat File
. Click on the Output
section and choose the Columns
tab.
Enter the following new columns:
ID
SHORT_NAME
CLIENTID
COMPANY_ID_NUM
COMPANY_ID_STR
REGISTRATION_NUM
NUMBER_OF_ACCTS
ACCOUNT_NUMBER
ACCOUNT_TYPE_X
Still in the Output
section now click on the Mapping
tab and choose to Auto-Match
.
Double clicking on the Sequential File
node brings up a single option. To specify a filename, choose mainframe.csv
for example.
Compile and run the job using the usual icons from the toolbar.
After running the job you can view the output from the Designer tool by clicking on the Sequential File node and clicking the View Data
button. Click OK
on the next dialog.
You'll be able to see one row of data with an ID, SHORT_NAME and a few other fields.
The output file is also written to the server. Switch to the server VM by clicking the first icon on the Environment VMs panel
and selecting iis-server
. Login as the root
user with the password inf0Xerver
.
Change your direcotory using cd
to the location where you had stored the file.
Finally, output your results using the cat
command.
CONGRATULATIONS!! You have completed this lab!
DataStage Flow Designer enables users to create, edit, load, and run DataStage jobs which can be used to perform integration of data from various sources in order to glean meaningful and valuable information.
The purpose of this lab is to design a DataStage job to satisfy the following problem statement:
As a data engineer, you have been asked by the Line of Business that you support, to produce a new data file that contains all employees whose total compensation is less than $50,000. The file must also contain the Department Name that the employee works in, and the mean average salary of all employees in that department who earn less than 50,000. In addition, the file must be sorted in descending order, based on the mean average salary amount. Finally, the application that will consume this file, expects the full name of the employee to be in one field, formatted as first, middle initial, last).
In this lab, you will learn:
How to create a job in DataStage.
How to load data from Db2 Warehouse into DataStage.
How to perform transformations such as modifying tables, joining tables, aggregating table data and sorting table data.
How to write tabular data from DataStage into a file.
How to run jobs.
How to view logs for jobs.
This lab is comprised of the following steps:
Before we start the lab, let's switch to the iis-client
VM and launch Firefox
.
Click on Classic Launchpad
in the Bookmarks tab. The first time you try this out, you might see a certificate error. To get past it, click on Advanced...
and then click Accept the Risk and Continue
.
Click on DataStage Flow Designer
.
Login with the credentials isadmin
/inf0Xerver
.
This brings up the DataStage Flow Designer
. Click OK
.
On the IBM DataStage Flow Designer, click on the Projects
tab and click + Create
. In the modal that opens up, type in a name for the project and click Create
.
The project takes a few minutes to be created and once ready, it will be visible on the Projects
tab.
Click on the tile for your newly created project. In the modal that opens up, verify that the name of your project is provided as the Project Name
and click OK
to switch the project.
The input tables - EMP
(containing employee data) and DEPT
(containing department data) - are already loaded in Db2 Warehouse. Let's add a Db2 warehouse instance as a Connection
in DataStage.
Click on the Connections
tab and then click + Create
to add a new connection.
Provide the following connection details and click OK
. Click Save
on the new modal that pops up.
A tile for the new connection will now be displayed in the Connections
tab.
Click on the Jobs
tab and then click + Create
. Click Parallel Job
.
A new tab with the name Job_1*
opens up where you can now start designing the parallel job.
The first step is to load the input tables DEPT
and EMP
into DataStage.
The DEPT
table contains the following columns:
The EMP
table contains the following columns:
First, drag a Connection connector to the canvas. In the modal that opens up, select the DB2WH
connection that was created earlier and click Next
.
On the next screen, select the BLUADMIN
schema and click Next
.
On the next screen, select the DEPT
table and click Next
.
On the next screen, click Add to Job
.
Drag another Connection connector to the canvas and repeat the steps given above but this time, select the EMP
table instead. Once you complete the steps, you should see the two Connection connectors on the canvas.
Use a Transformer stage to perform the following two modifications:
Update the output of the EMP
table by replacing any NULL MIDINIT
values with " "
. This is needed for a future step where we will combine the FIRSTNME, MIDINIT and LASTNAME columns to create the FULLNAME of the employee.
Currently, the EMP
table uses the WORKDEPT
column to identify the department number whereas the DEPT
table uses the DEPTNO
column. Modify the output of the EMP
table by changing the name of the WORKDEPT
column to DEPTNO
. This is needed for a future step where we will Join the two tables.
Drag and drop a Transformer stage next to the Connection connector for the EMP
table. Provide the output of the EMP
table Connection connector as the input to the Transformer stage. For this, click on the little blue dot on the right side of the Connection connector and drag the mouse pointer to the Transformer stage.
NOTE: For another method to connect the Connection connector to the Transformation stage, click on the Connection connector to select it, then drag and drop the Transformation stage. The Transformation stage will automatically be connected to the Connection connector.
Drag and drop a Join stage to the canvas and provide the output of the Transformer stage as the input to this Join stage.
Double click on the Transformer stage to open up the stage page. In the Properties
tab, click + Add
. A new entry is added in the table below. Use this to define a "Stage variable" (a local variable that is available only within this stage) named MIDINITIAL
which will represent the middle initial of the employee with any NULL values replaced with the empty string. Double click on {derivation}
to open up the Derivation Builder.
Begin building the derivation rule for MIDINITIAL
by finding the NullToValue
function in the table. Clicking on the entry in the table will insert it in the "Derivation" at the top. You can also use the search bar to look for the function. Replace the %input_column_input_column%
with the <Link>.MIDINIT
input variable which can also be found in the table and the %value%
with " "
. <Link>
represents the identifier of the input link for the Transformer. Click OK
to go back to the Stage page.
Update the Name of the stage variable to MIDINITIAL, the SQL type to Char, the Precision to 1 and the Scale to 0.
Now go to the Outputs
tab and in the table find the entry for the MIDINIT
column. Double click on the derivation value for this entry (it will open up the Derivation builder) and change the derivation value to use the newly created MIDINITIAL
stage variable instead. Next, look for the entry for the WORKDEPT
column. Double click on the WORKDEPT
value under the Column name
column and replace the text with DEPTNO
. Click OK
.
Both the tables now have a column called DEPTNO
which can be used to join the tables. Provide the output of the DEPT
table Connection connector as the second input to the Join stage. Double clicking the Join stage brings up the stage page where you can verify that the DEPTNO
is being used as the JOIN KEY
and the Join Type
is Inner
.
Next, add a Transformer stage to the canvas and provide the output of the Join stage as the input to the Transformer stage.
Double click the Transformer stage to open the stage page. We will add 2 stage variables to this Transformer stage. In the Properties
tab, click + Add
. A new entry is added in the table below. Use this to define a stage variable named TOTALCOMP
which will represent the total compensation (the sum of bonus, commission and salary) of the employee. Double click on {derivation}
to open up the Derivation Builder.
Build the derivation rule for TOTALCOMP by finding the BONUS
, COMM
and SALARY
input columns in the table. Clicking on the entries in the table will insert them in the "Derivation" at the top. Type in +
signs between them in order to complete out the derivation. You can also use the search bar to look for the fields. When inserted in the "Derivation", the column names will be prepended with the identifier for the input link. Click OK
to go back to the Stage page.
Update the Name of the stage variable to TOTALCOMP, the SQL type to Decimal, the Precision to 9 and the Scale to 2.
Repeat the process above to add another Stage variable FULLNAME
which will represent the complete name of the employee. Provide the Derivation as CompactWhiteSpace(<Link>.FIRSTNME:" ":<Link>.MIDINIT:" ":<Link>.LASTNAME)
, the Name of the stage variable as FULLNAME, the SQL type as Varchar and the Precision as 36 and the Scale as 0. Click OK
to save the changes and return to the canvas.
NOTE: <Link>
needs to be replaced with the identifier of the input link. CompactWhiteSpace is a function that will compact any continuous white spaces into a single white space. :
is the operator used for concatenation.
Next, add a Join stage and an Aggregator stage to the canvas. Connect the Transformer stage to both these stages such that the output of the Transformer stage is provided as the input to both these stages.
Since the output links have now been added, we can provide the 2 stage variables TOTALCOMP
and FULLNAME
as outputs of the Transformer stage and once that is done, these values will be available as inputs in the subsequent stages.
Double click on the Transformer stage to open the stage page. Go to the Outputs
tab. Click on +Add
twice to add 2 new entries in the table. Update one entry with the Derivation value as TOTALCOMP, the Column name as TOTALCOMP, the SQL type as Decimal, the Precision as 9, the Scale as 2 and set Nullable to true. Update the second entry with the Derivation value as FULLNAME, the Column name as FULLNAME, the SQL type as Varchar, the Precision as 36 and the Scale as 0.
We also need to add a Constraint here, which will ensure that only the records with TOTALCOMP
more than 50000 are sent in the output. Click on the empty space under Constraint
to open up the Derivation Builder. Specify the derivation as TOTALCOMP>50000
.
Switch to the second output link by clicking on the Link_<number>
under Output name and repeat the above steps to add the 2 stage variables to the output and to add the constraint. Click OK
to save the changes and return to the canvas.
Provide the output of the Aggregator stage as the input to the Join stage. Double click on the Aggregator stage to open the stage page. Select DEPTNAME
as the grouping key.
Scroll down to the Aggregations and select the Aggregation Type as Calculation
, Column as TOTALCOMP
, Calculation Type as Mean Value
and click Add Calculation +
.
Go to the Outputs
tab and verify that you can see 2 output columns - MEAN_TOTALCOMP
and DEPTNAME
. Click OK
to save the changes and return to the canvas.
Drag and drop a Sort stage on the canvas and provide the output of the Join stage as the input of the Sort stage.
Double click on the Join stage to open the stage page. On the Properties
tab, verify that the Join Key is DEPTNAME
and the Join Type is Inner
.
Go to the Outputs
tab and verify that you can see FULLNAME
and MEAN_TOTALCOMP
in the output column list.
Double click on the Sort stage to open up the stage page. Specify the Sorting Key as MEAN_TOTALCOMP
and the Sort Order as Descending
. Click OK
.
Drag and drop a File connector to the canvas. In the modal that opens up, check the Add connector as target
checkbox and click Add to Job
.
Provide the output of the Sort stage as the input to the File connector. Double click the File connector to open the stage page, and provide the name of the output File as output.csv
. Specify Comma-separated value (CSV)
as the File format. Enable First row is header
and provide Null value as ""
. Click OK
.
Click the Save
icon to save the job. If you wish to, you can provide a different name for the job in the modal that pops up. Click Save
. Once the job is saved, click on the Compile
icon to compile it. If compilation is successful, you should see a green check mark and the message Compiled successfully
displayed on the screen.
Click the Run
icon to run the job. In the modal that opens up, click Run
.
The output file will be saved on the server. Switch to the server VM by clicking the first icon on the Environment VMs panel
and selecting iis-server
. Login as the root
user with the password inf0Xerver
.
CD to the location where you had stored the file. If you provided a path starting at "/", then it will be stored at that location in the server. Since we had only provided output.csv
as the file path in the File connector, the file will be available in the Transformation project's folder, i.e.,
CONGRATULATIONS!! You have completed this lab!
Information Server enables users to better understand their data. It uses pre-built and custom rules that will apply meaning and quality measurements, which are available for users of the data and interested parties.
DataStage Flow Designer enables users to create, edit, load, and run DataStage jobs which can be used to perform integration of data from various sources in order to glean meaningful and valuable information.
This lab will give you hands-on experience using Information Server's datastage flow designer and rule capabilities. In this lab, you will learn the following:
How to join two data sets using DataStage Flow Designer to create a single data set
How to create a data rule that runs on columns from each data set
Apply the rules to the joined data set
View data that does not apply to the rule
This lab is comprised of 2 sections. In the first section, we will see how to associate records within 2 data sets to create a single combined data set using DataStage Flow Designer. In the second section, we will apply a data rule on the combined data set and analyze the data that does not apply to the rule.
Note: The merged data set is already available in DB2WH. If you wish to, you may skip section 1 and directly move on to section 2 where you can import this merged data set.
This lab consists of the following steps:
Before we start the lab, let's switch to the iis-client
VM and launch Firefox
.
Click on Classic Launchpad
in the Bookmarks tab. The first time you try this out, you might see a certificate error. To get past it, click on Advanced...
and then click Accept the Risk and Continue
.
Click on DataStage Flow Designer
.
Login with the credentials isadmin
/inf0Xerver
.
This brings up the DataStage Flow Designer
. Click OK
.
On the IBM DataStage Flow Designer, click on the Projects
tab and click + Create
. In the modal that opens up, type in a name for the project and click Create
.
The project takes a few minutes to be created and once ready, it will be visible on the Projects
tab.
Click on the tile for your newly created project. In the modal that opens up, verify that the name of your project is provided as the Project Name
and click OK
to switch the project.
The input tables - EMP
(containing employee data) and DEPT
(containing department data) - are already loaded in Db2 Warehouse. Let's add a Db2 warehouse instance as a Connection
in DataStage.
Click on the Connections
tab and then click + Create
to add a new connection.
Provide the following connection details and click OK
. Click Save
on the new modal that pops up.
A tile for the new connection will now be displayed in the Connections
tab.
Click on the Jobs
tab and then click + Create
. Click Parallel Job
.
A new tab with the name Job_1*
opens up where you can now start designing the parallel job.
The first step is to load the input tables DEPT
and EMP
into DataStage. The WORKDEPT
column of the EMP
table is the same as the DEPTNO
column of the DEPT
table.
First, drag a Connection connector to the canvas. In the modal that opens up, select the DB2WH
connection that was created earlier and click Next
.
On the next screen, select the BLUADMIN
schema and click Next
.
On the next screen, select the DEPT
table and click Next
.
On the next screen, click Add to Job
.
Drag another Connection connector to the canvas and repeat the steps given above but this time, select the EMP
table instead. Once you complete the steps, you should see the two Connection connectors on the canvas.
The EMP
table uses the WORKDEPT
column to identify the department number whereas the DEPT
table uses the DEPTNO
column. Use a Transformer stage to modify the output of the EMP
table by changing the name of the WORKDEPT
column to DEPTNO
. This is needed for a future step where we will Join the two tables.
Drag and drop a Transformer stage next to the Connection connector for the EMP
table. Provide the output of the EMP
table Connection connector as the input to the Transformer stage. For this, click on the little blue dot on the right side of the Connection connector and drag the mouse pointer to the Transformer stage.
NOTE: For another method to connect the Connection connector to the Transformation stage, click on the Connection connector to select it, then drag and drop the Transformation stage. The Transformation stage will automatically be connected to the Connection connector.
Drag and drop a Join stage to the canvas and provide the output of the Transformer stage as the input to this Join stage.
Double click on the Transformer stage to open up the stage page. Go to the Outputs
tab and in the table find the entry for the WORKDEPT
column. Double click on the WORKDEPT
value under the Column name
column and replace the text with DEPTNO
. Click OK
.
Both the tables now have a column called DEPTNO
which can be used to join the tables. Provide the output of the DEPT
table Connection connector as the second input to the Join stage. Double clicking the Join stage brings up the stage page where you can verify that the DEPTNO
is being used as the JOIN KEY
and the Join Type
is Inner
. Click OK
.
Drag and drop a Connection connector to the canvas. In the modal that pops up in the screen, check the box for Add selected connection as target
and click Add to Job
. Provide the output of the Join stage as the input to this connector.
Double click on the Connection connector to open up the Properties page on the right. Verify that the URL, username and password are already populated.
Scroll down and under the Usage
section, provide the Table name as <user>_DEPTEMP
where \ is your name, and update the Table action to Replace
. Click OK
.
Click the Save
icon to save the job. If you wish to, you can provide a different name for the job in the modal that pops up. Click Save
. Once the job is saved, click on the Compile
icon to compile it. If compilation is successful, you should see a green check mark and the message Compiled successfully
displayed on the screen.
Click the Run
icon to run the job. In the modal that opens up, click Run
.
After a successful run, the results will be stored within the DB2WH connection in the BLUADMIN
schema. Because we specified the Table action as Replace
in the Connection connector that represents the output, each subsequent run of the job will delete all existing records in the table and replace them with the new output.
To view the results of the job, double click on the Connection connector that represents the output. This will open up the Properties page on the right. Click on View Data
to bring up a modal that shows the contents of the <user>_DEPTEMP
table in which the output was stored.
This marks the end of Section 1 of this lab.
Switch to the iis-client
VM and launch Firefox
.
Click on the Launchpad
bookmark. When the Information Server launchpad shows up click on the Information Governance Catalog New
tile.
Log in with the username isadmin
and password inf0Xerver
.
The overview page will appear.
Click on the Connections
tab and click + Create connections
.
On the next screen, provide the following details and click Test connection
to test the connection. Once successful, click Save connection
.
Now locate the tile for the connection that was just added, click on the kebab icon (â‹®) on the tile and click Discover
.
Click on Browse
.
Expand db2
. Expand BLUADMIN
and select the <user>_DEPTEMP
table where \ is your name. This is the table that was created in section 1 above. If you skipped section 1, then you can select the SANDHYA_DEPTEMP
table. Click OK
.
Under Discovery options
, select Analyze data quality
. This will automatically check the boxes for Analyze columns
and Assign terms
.
Scroll to the bottom and select the Host
as IIS-SERVER.IBM.DEMO
and the Workspace
as UGDefaultWorkspace
. Click Discover
.
The discovery process will take some time. Once the assets are discovered, the analysis process will begin. Once that completes, you will be able to see what percentage of records were successfully analyzed.
Now let us go and have a look at the data. Go to Quality
tab and click on the tile for UGDefaultWorkspace
.
The workspace overview will load. Take a few moments to browse the graphics on the page and click on Data sets
link to view the data in this exercise.
Before we create new rules, let's look at the data set that will be used in this lab. Click on the tile for <user>_DEPTEMP
.
Click on the Columns
tab to view findings from the analyzer. It found many things when the data was imported, like maximum and minimum values, distinct values, format, and uniqueness.
We're now ready to create our first data rule!
From the UGDefaultWorkspace
workspace, click on the Data rules
tab.
Expand Published Rules
> 08 Validity and Completeness
> Valid Value Combination
. Find the rule for IfFieldaIsXThenFieldbGtQty
. Click on the ...
overflow menu on the right and select Manage in workspace
.
The rule should now be available under All
. We will now edit the rule. If you wish to rename the rule, you will first need to Copy
the rule and you will be provided with the option to rename the rule. Click on the ...
overflow menu of the rule and select Edit
.
Switch to the Rule logic
tab and update the rule to say IF DEPTNAME = 'OPERATIONS' THEN SALARY > 36000
.
Next, switch to the Rule testing
tab. Here you need to bind the variables in the rule logic to specific columns in the data source. Select the salary
variable in the left table and select the SALARY
column under Available data sources. It should be under IIS-SERVER.IBM.DEMO
> db2
> BLUADMIN
> <user>_DEPTEMP
. Click on + Bind
. The value <user>_DEPTEMP.SALARY
will now be shown under Implemented bindings for the salary
variable.
Uncheck the salary
variable, check the deptname
variable and bind it with the DEPTNAME
column under Available data sources. As in case of the SALARY
column, it should be under IIS-SERVER.IBM.DEMO
> db2
> BLUADMIN
> <user>_DEPTEMP
. Click on + Bind
. The value <user>_DEPTEMP.DEPTNAME
will now be shown under Implemented bindings for the salary
variable.
Scroll down and click on Test
to test out the rule. You will see a message at the top of the screen that says the test results can be viewed once ready. When the message disappears, go to the Rule test results
tab to view the test results.
You can see that of the 42 total rows, 39 met the rule, and 3 did not. Click on Did not meet rule conditions
to view the 3 rows. You can see that these rows have DEPTNAME
= OPERATIONS
but have SALARY
< 36000
, and therefore they did not match the rule conditions. Click Save
to save the rule.
When you are brought back to the Data rules
tab, you'll notice that the new rule has an error. We need to publish the rule. To do so navigate to the right to show the ⋯
menu. Choose the Publish
option from the menu.
In the modal that pops up, click Publish
to confirm the publish action.
Go back to the UGDefaultWorkspace
workspace and click on the Data sets
link.
Click on the <user>_DEPTEMP
data set.
We can now apply the newly created rule by switching to the Rules (0)
tab and clicking the + Add rule
button.
Choose the IfFieldaisXThenFieldbGtQty
rule under All
and click Next
.
As before, we need to bind the rule variables to specific columns in our data set. Select the salary
variable in the left table and select the SALARY
column under Available data sources. Click on the + Bind
button. Once bound, select the deptname
variable and bind it with the DEPTNAME
column under Available data sources. Once both the variables are bound with the right columns, click Next
.
This time, we don't need to test the rule. Simply click Save
.
You should now see the rule in the data set view. Click on the Analyze
button to restart the analysis with the new rule.
In the modal that pops up, click Analyze
to confirm analysis.
The analysis will take a few minutes. You may need to refresh your browser a few times. You will see the state go from Running
to Successful
when analysis is complete.
Once analysis has completed successfully, go to the Data quality
tab. You'll see that the new rule has six findings - three against DEPTNAME and three against SALARY. Click on either DEPTNAME or SALARY to view the exceptions.
Scrolling to the right you'll see that the three entries shown have DEPTNAME
= OPERATIONS
but have SALARY
less than 36000
, which goes against our rule.
CONGRATULATIONS!! You have completed this lab!
For more info check out the documentation.
Download the file to the desktop.
You may see an error message: Message bundle error Can't find resource for bundle com.ibm.e2.Bundle_E2_engine_msgs, key OperatorException.operatorTransitionFailedAtRuntime
. To resolve this, go to the iis-server
machine, go to the temp directory cd /tmp
and delete the XML files rm -rf XML*
. This should resolve the issue. Re-compile and re-run the job. For more info check out .
To keep things simple in this lab we're going to speak generally about mainframes and COBOL. Files from a mainframe are usually saved as binary files and sFTPed to some server where DataStage can access them. The binary files alone are not enough for DataStage to read the contents. A COBOL playbook is required to translate the data from binary to ascii. Both files are available here: .
The copybook can be downloaded to the client machine by going to the and saving the file to the desktop. The copybook we're using looks like this:
NOTE: You can use the Peek stage to check intermediate results in the job as demonstrated in .
NOTE: You can use the Peek stage to check intermediate results in the job as demonstrated in .
Column Name
Data Type
ProductID
varchar(9)
Zip
Integer(6)
Column Name | Data Type | Nullable |
DEPTNO | VARCHAR(3) | N |
DEPTNAME | VARCHAR(36) | N |
MGRNO | CHAR(6) | Y |
ADMRDEPT | CHAR(3) | N |
LOCATION | CHAR(16) | Y |
Column Name | Data Type | Nullable |
EMPNO | CHAR(6) | N |
FIRSTNME | VARCHAR(12) | N |
MIDINIT | CHAR(1) | Y |
LASTNAME | VARCHAR(15) | N |
WORKDEPT | VARCHAR(3) | Y |
PHONENO | CHAR(4) | Y |
HIREDATE | DATE(4) | Y |
JOB | CHAR(8) | Y |
EDLEVEL | SMALLINT | N |
SEX | CHAR(1) | Y |
BIRTHDATE | DATE(4) | Y |
SALARY | DECIMAL(9,2) | Y |
BONUS | DECIMAL(9,2) | Y |
COMM | DECIMAL(9,2) | Y |
You can integrate your Git server with IBM DataStage. This integration allows you to publish jobs and related artifacts to different Git branches and load other versions of a job from Git onto the IBM DataStage Flow Designer canvas.
Using IBM DataStage Flow Designer with Git provides the following benefits:
Helps with Continuous Integration Continuous Delivery (CICD) pipeline automation. Your assets are easily available in Git and you can move them from the development (DEV) branch to quality assurance (QA) to production.
Helps with auditing requirements. You can simply tell who changed what.
You can map a version of a job in the XMETA repository to a version in Git.
You can work on multiple versions of a job by creating temporary branches.
You can easily roll back to a prior version of a job.
The following Git repositories are supported:
Bitbucket
GitHub
GitLab
Microsoft Team Foundation Server
For more info check out the Managing source control with Git documentation.
In this lab we'll learn how to integrate DataStage with BitBucket Cloud, a free to register BitBucket platform available at https://bitbucket.org/.
This lab consists of the following steps:
Navigate to https://bitbucket.org/ and click on the "Get it free" button and create an account.
Create a new repo, we'll call it datastage-jobs
. Ensure the option to create a README.md
file is selected.
Your new repo should look like this:
Generate an "App Password" by going to your "Personal Settings". Remmeber to give the application password the read/write permission for repositories.
Before we start the lab, let's switch to the iis-client
VM and launch Firefox
.
Click on Classic Launchpad
in the Bookmarks tab. The first time you try this out, you might see a certificate error. To get past it, click on Advanced...
and then click Accept the Risk and Continue
.
Click on DataStage Flow Designer
.
Login with the credentials isadmin
/inf0Xerver
.
This brings up the DataStage Flow Designer
. Click OK
.
From the IBM DataStage Flow Designer, click on the username bubble on the top right. Choose the Setup
option. We'll be configuring the Server
and User
options.
For the Server
click on the Git
tab and input the following:
Git repo: Your new repo, i.e. https://bitbucket.org:stevemart/datastage-jobs.git
Clone repo path: A new folder name, i.e. bitbucket/stevemart/datastage-jobs
Vendor: BitBucket
For the User
configuration input the following:
Email address: Your BitBucket email
Username: Your BitBucket username
Personal access token: Your BitBucket app password from the previous step
Load up any existing job or create a new one. Once loaded, click on the Export
button that looks like an upload arrow.
The entire DataStage project will be saved and exported to your configured BitBucket. Enter a specific commit message.
There will be a success message if everything was exported correctly.
Navigate back to your BitBucket repository to see the exported files.
CONGRATULATIONS!! You have completed this lab!
Information Server enables users to better understand their data. It uses pre-built and custom rules that will apply meaning and quality measurements, which are available for users of the data and interested parties.
This lab, will give you hands-on experience using Information Server's rule capabilities. In this lab, you will learn the following:
How to create a data rule
Apply the rule to a data set
View data that does not apply to the rule
This section is comprised of the following steps:
Launch Firefox and click on the Launchpad
bookmark. When the Information Server launchpad shows up click on the Information Governance Catalog New
tile.
Log in with the username isadmin
and password inf0Xerver
.
The overview page will appear.
Click on the Connections
tab and ensure the connection JK_ORD
appears and the host is IIS-SERVER.IBM.DEMO
. This connection is pre-established for convenience and was imported to the JK Orders
workspace, let's take a look at it.
To find the JK Orders
workspace, go to Quality
tab and click on the right tile.
The workspace overview will load. Take a few moments to browse the graphics on the page and click on the Data sets
link to view the data in this exercise.
Before we create new rules let's look at the two data sets that will be used in this example. Take a look at the Sales
data set first.
Click on the Columns
tab to view findings from the analyzer. It found many things when the data was imported, like maximum and minimum values, distinct values, format, and uniqueness.
Repeat the same for the CUSTOMERS
data set.
We're now ready to create our first data rule!
From the JK Orders
workspace click on the Data rules
tab.
Expand the Published Rules
section, then the 01 Personal Identity
section, and then the Age
section to reveal the AdultInRangeNumeric
rule.
Navigate to the right to show the ⋯
menu. Choose the Manage in workspace
option from the menu.
The rule should now appear under the All
section. Navigate to the right to show the ⋯
menu. Choose the Edit
option from the menu.
Now we're going to edit the rule so that it flags ages that are under 18 or over 120. To do that, edit the text boxes to the right that show minage
and maxage
and update them with 18
and 120
respectively. The formula at the bottom should read:
Save the rule.
When you are brought back to the Data rules
tab you'll notice that the new rule has an error. We need to publish the rule. To do so navigate to the right to show the ⋯
menu. Choose the Publish
option from the menu.
Confirm the publish action.
Go back to the JK Orders
workspace and click on the Data sets
links
Click on the SALES
data set.
We can now apply the rule by clicking the + Add rule
button.
Choose the AdultInRangeNumeric
rule and click Next
.
We now have to bind the rule to a specific column in our data set. Click on the age
variable name on the left, this is the rule. And click on the AGE
column name on the right, this is our data source. Click on the Bind
button and once bound you can click Next
.
We don't need to test the rule this time, we can just click on Save
.
You should now see the rule in the data set view. Click on the Analyze
button to restart the analysis with the new rule.
Confirm to start the analysis.
You may need to refresh your browser a few times for this step. It will take a few minutes. You'll see the state go to Running
to Successful
when done.
Once complete, go to the Data quality
tab. You'll see that the new rule has one finding. Click on the findings to view the exception.
Scrolling to the right you'll see that there is one entry whos age is 17, this is less than the required 18 for our rule!
CONGRATULATIONS!! You have completed this lab!
This section has hints and tips for instructors:
Only IBMers can provision an instance, you'll need to log in with your w3ID.
You're limited to one request at a time.
You can schedule how long you want to keep the instance live.
You'll be emailed a link and a password. Distribute these to the lab attendees.
There's no cost to running these instances.
Instances will automatically suspend after 2 hours of inactivity. Just turn them back on to continue.
Access the iis-server
VM.
Log in with the credentials above. Run the following commands.
Change to the Information Server install path:
Edit the DSODBConfig.cfg
file. set the DSODBON
value to 1
. It should be the first value at the top of the file. Save your changes and exit your editor.
If you're not familiar with vi
you can use sed
Restart your console:
Access the iis-server
VM.
Log in with the credentials above. Run the following commands.
Change to the right folder:
Download the data:
NOTE: the command uses
-O
(not zero)
Verify the content
Use to provision instances of the Skytap environment as needed.