arrow-left

Only this pageAll pages
gitbookPowered by GitBook
1 of 15

datastage-standalone-workshop

Loading...

Getting Started

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Resources

Loading...

Survey

Instructors

Loading...

Additional resources

hashtag
IBM Demos

  • Collection: InfoSphere Information Serverarrow-up-right

hashtag
Redbooks

hashtag
Videos

  • Tutorial: Transforming your data with IBM DataStagearrow-up-right
    IBM InfoSphere DataStage Data Flow and Job Designarrow-up-right
    InfoSphere DataStage Parallel Framework Standard Practicesarrow-up-right
    Video: Postal codes and part numbers (DataStage)arrow-up-right
    Video: Calculate employee compensation (read from CSV) (DataStage and Gov. Catalog)arrow-up-right
    Video: Banks have merged (DataStage and Gov. Catalog)arrow-up-right
    Video: Groceries with Kafka (DataStage)arrow-up-right
    Video: Find relationships between sales, employees, and customers (Information Analyzer)arrow-up-right
    Video: Clean and analyze data (Governance Catalog)arrow-up-right

    Lab: DataStage: Read and analyze data from multiple tables

    Information Server enables users to better understand their data. It uses pre-built and custom rules that will apply meaning and quality measurements, which are available for users of the data and interested parties.

    DataStage Flow Designer enables users to create, edit, load, and run DataStage jobs which can be used to perform integration of data from various sources in order to glean meaningful and valuable information.

    This lab will give you hands-on experience using Information Server's datastage flow designer and rule capabilities. In this lab, you will learn the following:

    • How to join two data sets using DataStage Flow Designer to create a single data set

    • How to create a data rule that runs on columns from each data set

    • Apply the rules to the joined data set

    • View data that does not apply to the rule

    This lab is comprised of 2 sections. In the first section, we will see how to associate records within 2 data sets to create a single combined data set using DataStage Flow Designer. In the second section, we will apply a data rule on the combined data set and analyze the data that does not apply to the rule.

    Note: The merged data set is already available in DB2WH. If you wish to, you may skip section 1 and directly move on to section 2 where you can import this merged data set.

    This lab consists of the following steps:

    hashtag
    Section 1: Join the data sets

    hashtag
    Before you start

    Before we start the lab, let's switch to the iis-client VM and launch Firefox.

    Click on Classic Launchpad in the Bookmarks tab. The first time you try this out, you might see a certificate error. To get past it, click on Advanced... and then click Accept the Risk and Continue.

    Click on DataStage Flow Designer.

    Login with the credentials isadmin/inf0Xerver.

    This brings up the DataStage Flow Designer. Click OK.

    hashtag
    1. Create a Transformation project

    • On the IBM DataStage Flow Designer, click on the Projects tab and click + Create. In the modal that opens up, type in a name for the project and click Create.

    The project takes a few minutes to be created and once ready, it will be visible on the Projects tab.

    • Click on the tile for your newly created project. In the modal that opens up, verify that the name of your project is provided as the Project Name and click OK to switch the project.

    hashtag
    2. Add database connection

    The input tables - EMP (containing employee data) and DEPT (containing department data) - are already loaded in Db2 Warehouse. Let's add a Db2 warehouse instance as a Connection in DataStage.

    • Click on the Connections tab and then click + Create to add a new connection.

    • Provide the following connection details and click OK. Click Save on the new modal that pops up.

    A tile for the new connection will now be displayed in the Connections tab.

    hashtag
    3. Create the job

    • Click on the Jobs tab and then click + Create. Click Parallel Job.

    A new tab with the name Job_1* opens up where you can now start designing the parallel job.

    The first step is to load the input tables DEPT and EMP into DataStage. The WORKDEPT column of the EMP table is the same as the DEPTNO column of the DEPT table.

    • First, drag a Connection connector to the canvas. In the modal that opens up, select the DB2WH connection that was created earlier and click Next.

    • On the next screen, select the BLUADMIN schema and click Next.

    • On the next screen, select the DEPT table and click Next.

    • On the next screen, click Add to Job.

    • Drag another Connection connector to the canvas and repeat the steps given above but this time, select the EMP table instead. Once you complete the steps, you should see the two Connection connectors on the canvas.

    The EMP table uses the WORKDEPT column to identify the department number whereas the DEPT table uses the DEPTNO column. Use a Transformer stage to modify the output of the EMP table by changing the name of the WORKDEPT column to DEPTNO. This is needed for a future step where we will Join the two tables.

    • Drag and drop a Transformer stage next to the Connection connector for the EMP table. Provide the output of the EMP table Connection connector as the input to the Transformer stage. For this, click on the little blue dot on the right side of the Connection connector and drag the mouse pointer to the Transformer stage.

    NOTE: For another method to connect the Connection connector to the Transformation stage, click on the Connection connector to select it, then drag and drop the Transformation stage. The Transformation stage will automatically be connected to the Connection connector.

    • Drag and drop a Join stage to the canvas and provide the output of the Transformer stage as the input to this Join stage.

    • Double click on the Transformer stage to open up the stage page. Go to the Outputs tab and in the table find the entry for the WORKDEPT column. Double click on the WORKDEPT value under the Column name column and replace the text with DEPTNO. Click OK.

    • Both the tables now have a column called DEPTNO which can be used to join the tables. Provide the output of the DEPT table Connection connector as the second input to the Join stage. Double clicking the Join stage brings up the stage page where you can verify that the DEPTNO is being used as the JOIN KEY and the Join Type is Inner. Click OK.

    • Drag and drop a Connection connector to the canvas. In the modal that pops up in the screen, check the box for Add selected connection as target and click Add to Job. Provide the output of the Join stage as the input to this connector.

    • Double click on the Connection connector to open up the Properties page on the right. Verify that the URL, username and password are already populated.

    • Scroll down and under the Usage section, provide the Table name as <user>_DEPTEMP where \ is your name, and update the Table action to Replace. Click OK.

    hashtag
    4. Compile and run the job

    • Click the Save icon to save the job. If you wish to, you can provide a different name for the job in the modal that pops up. Click Save. Once the job is saved, click on the Compile icon to compile it. If compilation is successful, you should see a green check mark and the message Compiled successfully displayed on the screen.

    • Click the Run icon to run the job. In the modal that opens up, click Run.

    hashtag
    5. View output

    After a successful run, the results will be stored within the DB2WH connection in the BLUADMIN schema. Because we specified the Table action as Replace in the Connection connector that represents the output, each subsequent run of the job will delete all existing records in the table and replace them with the new output.

    • To view the results of the job, double click on the Connection connector that represents the output. This will open up the Properties page on the right. Click on View Data to bring up a modal that shows the contents of the <user>_DEPTEMP table in which the output was stored.

    This marks the end of Section 1 of this lab.

    hashtag
    Section 2: Create and run data rule

    hashtag
    1. Import and view the data

    Switch to the iis-client VM and launch Firefox.

    • Click on the Launchpad bookmark. When the Information Server launchpad shows up click on the Information Governance Catalog New tile.

    • Log in with the username isadmin and password inf0Xerver.

    • The overview page will appear.

    • Click on the Connections tab and click + Create connections.

    • On the next screen, provide the following details and click Test connection to test the connection. Once successful, click Save connection.

    • Now locate the tile for the connection that was just added, click on the kebab icon (⋮) on the tile and click Discover.

    • Click on Browse.

    • Expand db2. Expand BLUADMIN and select the <user>_DEPTEMP table where \ is your name. This is the table that was created in section 1 above. If you skipped section 1, then you can select the SANDHYA_DEPTEMP table. Click OK.

    • Under Discovery options, select Analyze data quality. This will automatically check the boxes for Analyze columns and Assign terms.

    • Scroll to the bottom and select the Host as IIS-SERVER.IBM.DEMO and the Workspace as UGDefaultWorkspace. Click Discover.

    • The discovery process will take some time. Once the assets are discovered, the analysis process will begin. Once that completes, you will be able to see what percentage of records were successfully analyzed.

    • Now let us go and have a look at the data. Go to Quality tab and click on the tile for UGDefaultWorkspace.

    • The workspace overview will load. Take a few moments to browse the graphics on the page and click on Data sets link to view the data in this exercise.

    • Before we create new rules, let's look at the data set that will be used in this lab. Click on the tile for <user>_DEPTEMP.

    • Click on the Columns tab to view findings from the analyzer. It found many things when the data was imported, like maximum and minimum values, distinct values, format, and uniqueness.

    We're now ready to create our first data rule!

    hashtag
    2. Create a data rule

    • From the UGDefaultWorkspace workspace, click on the Data rules tab.

    • Expand Published Rules > 08 Validity and Completeness > Valid Value Combination. Find the rule for IfFieldaIsXThenFieldbGtQty. Click on the ... overflow menu on the right and select Manage in workspace.

    • The rule should now be available under All. We will now edit the rule. If you wish to rename the rule, you will first need to Copy the rule and you will be provided with the option to rename the rule. Click on the ... overflow menu of the rule and select Edit.

    • Switch to the Rule logic tab and update the rule to say IF DEPTNAME = 'OPERATIONS' THEN SALARY > 36000.

    • Next, switch to the Rule testing tab. Here you need to bind the variables in the rule logic to specific columns in the data source. Select the salary variable in the left table and select the SALARY column under Available data sources. It should be under IIS-SERVER.IBM.DEMO > db2 > BLUADMIN > <user>_DEPTEMP. Click on + Bind. The value <user>_DEPTEMP.SALARY

    • Scroll down and click on Test to test out the rule. You will see a message at the top of the screen that says the test results can be viewed once ready. When the message disappears, go to the Rule test results tab to view the test results.

    • You can see that of the 42 total rows, 39 met the rule, and 3 did not. Click on Did not meet rule conditions to view the 3 rows. You can see that these rows have DEPTNAME = OPERATIONS but have SALARY < 36000, and therefore they did not match the rule conditions. Click Save to save the rule.

    • When you are brought back to the Data rules tab, you'll notice that the new rule has an error. We need to publish the rule. To do so navigate to the right to show the ⋯ menu. Choose the Publish option from the menu.

    • In the modal that pops up, click Publish to confirm the publish action.

    hashtag
    3. Re-analyze and view results

    • Go back to the UGDefaultWorkspace workspace and click on the Data sets link.

    • Click on the <user>_DEPTEMP data set.

    • We can now apply the newly created rule by switching to the Rules (0) tab and clicking the + Add rule button.

    • Choose the IfFieldaisXThenFieldbGtQty rule under All and click Next.

    • As before, we need to bind the rule variables to specific columns in our data set. Select the salary variable in the left table and select the SALARY column under Available data sources. Click on the + Bind button. Once bound, select the deptname variable and bind it with the DEPTNAME column under Available data sources. Once both the variables are bound with the right columns, click Next.

    • This time, we don't need to test the rule. Simply click Save.

    • You should now see the rule in the data set view. Click on the Analyze button to restart the analysis with the new rule.

    • In the modal that pops up, click Analyze to confirm analysis.

    • The analysis will take a few minutes. You may need to refresh your browser a few times. You will see the state go from Running to Successful when analysis is complete.

    • Once analysis has completed successfully, go to the Data quality tab. You'll see that the new rule has six findings - three against DEPTNAME and three against SALARY. Click on either DEPTNAME or SALARY to view the exceptions.

    Scrolling to the right you'll see that the three entries shown have DEPTNAME = OPERATIONS but have SALARY less than 36000, which goes against our rule.

    CONGRATULATIONS!! You have completed this lab!

    Compile and run the job
  • View output

  • will now be shown under
    Implemented bindings
    for the
    salary
    variable.
  • Uncheck the salary variable, check the deptname variable and bind it with the DEPTNAME column under Available data sources. As in case of the SALARY column, it should be under IIS-SERVER.IBM.DEMO > db2 > BLUADMIN > <user>_DEPTEMP. Click on + Bind. The value <user>_DEPTEMP.DEPTNAME will now be shown under Implemented bindings for the salary variable.

  • Section 1: Join the data sets
    Create a Transformation project
    Add database connection
    Create the job
    Section 2: Create and run data rule
    Import and view the data
    Create a data rule
    Re-analyze and view results
    Switch to iis-client
    Classic Launchpad
    Select DFD
    Log into DFD
    DFD is up
    Create project
    Project created
    Switch project
    Create connection
    Add connection
    Created connection
    Create parallel job
    Data set definitions
    Create connection - select connection
    Create connection - select schema
    Create connection - select table
    Create connection - add to job
    Create connection - completed
    Add transformer for EMP
    Add join stage
    Transformer - updates on Output tab
    Connect DEPT to Join
    Add output connection
    Connection properties - 1 - already populated
    Connection properties - 2 - table name action
    Save compile
    Run job
    view table data
    Switch to iis-client
    1-iis-launchpad-new
    2-gc-login
    3-gc-landing
    4-connections
    5-add-connection
    6-discover
    7-browse-discovery-root
    8-select-table
    9-discovery-options
    10-discover
    11-discovery-complete
    12-open-workspace
    13-workspace-overview
    14-data-sets
    15-deptemp
    16-data-rules
    17-manage-in-workspace
    18-edit-rule
    19-update-rule-logic
    20-bind-columns.png
    21-test-the-rule
    22-save-the-rule
    23-publish-the-rule
    24-confirm-publish
    13-workspace-overview
    14-data-sets
    25-add-rule
    26-select-rule
    27-bind
    28-save
    29-analyze
    30-analyze-confirm
    31-analysis-running
    32-findings
    33-findings-view

    Introduction

    hashtag
    Information Server Workshop

    Welcome to our workshop! In this workshop we'll be using IBM Information Server to design data processing jobs with DataStage and create rules to analyze data.

    • Get acquainted with Information Server

    • Run a few DataStage jobs

    • Learn the ins and outs of data rules

    • Have fun!

    hashtag
    About Information Server

    IBM InfoSphere Information Server is a market-leading data integration platform which includes a family of products that enable you to understand, cleanse, monitor, transform and deliver data, and to collaborate to bridge the gap between business and IT. InfoSphere Information Server provides massively parallel processing (MPP) capabilities to deliver a highly scalable and flexible integration platform that handles all data volumes, big and small.

    hashtag
    Credits

    Lab: Access the environment

    This section will explain how to access your environment.

    hashtag
    Credentials

    Name: DB2WH
    Connector type: JDBC
    URL: jdbc:db2://db2w-kzwbsid.us-east.db2w.cloud.ibm.com:50001/BLUDB:sslConnection=true;
    Username: bluadmin
    Password: ****************
    Name: DB2WH
    Choose connection: Db2
    JDBC URL: jdbc:db2://db2w-kzwbsid.us-east.db2w.cloud.ibm.com:50001/BLUDB:sslConnection=true;
    Username: bluadmin
    Password: ****************

    Administrator

    inf0Xerver

    Linux VM

    root

    inf0Xerver

    Information Server

    isadmin

    inf0Xerver

    hashtag
    1. Turn on your Skytap Environment

    The lab instructors will provide everyone with a set of URLs at the beginning of the lab.

    Accessing the URL will open a Skytap environment that is password protected.

    Password prompt

    Typing in the password provided by the lab instructor will give you access to 4 virtual machines. Turn the VMs on by clicking the "Play" button.

    VMs turned off

    Once the VMs are turned on, click on the tile to access the virtual machine.

    VMs turned on

    Use the toolbar at the top to make the window fullscreen. The "Ctrl-Alt-Del" button can be used to log into Windows environments.

    Windows VM

    NOTE: The Skytap environment will automatically turn off the VMs if they are not used within two hours. You can simply start them again to resume your work.

    Windows VM

    Steve Martinelliarrow-up-right
    Sandhya Nayakarrow-up-right
    DataStage Designer

    Lab: DataStage: Read JSON from a REST call

    You can use the REST protocols to incorporate IBM DataStage Flow Designer features in your custom application. You use the REST protocol to perform actions such as compiling jobs, running jobs, and getting job status.

    The following REST calls are supported:

    • XML

    • JSON

    For more info check out the documentation.

    In this lab we'll learn how to call a JSON endpoint and pipe those results to a file using DataStage.

    This lab consists of the following steps:

    hashtag
    Before you start: Launching DataStage Flow Designer

    Before we start the lab, let's switch to the iis-client VM and launch Firefox.

    Launch the desktop client by going to the start menu and searching for DataStage Designer.

    hashtag
    1. Create job layout

    Start a new Parallel Job project and create a job that looks like the image below. Remember to wire the elements together. It should have:

    • 1 x Hierarchical Data

    • 1 x Peek

    • 1 X Sequential File

    Double click on the Hierarchical Data node and click on Edit assembly.

    hashtag
    2. Customize with Assembly Editor

    The "Assembly Editor" will launch. This allows for fine tuning the properties of certain jobs.

    Click on Palette and add a REST step and a JSON Parser step.

    From the REST step, in the General tab, specify the following URL:

    From the REST step, in the Response tab, specify the content type to be test/plain:

    From the REST step, in the Mappings tab, click on the Map Automatically button:

    Before proceeding to the JSON Parser step click on the Libraries button on the top left and choose the New Library button.

    Download the file to the desktop.

    Click on Import New Resource and browse to the newly downloaded file.

    DataStage will automatically use this file as a JSON schema

    From the JSON Parser step, in the Configuration tab, choose the String set option and use the drop down menu. There is only one result.

    From the JSON Parser step, in the Documentation Root tab, specify the documentation root to be the new library from the previous step, likely called employees.

    From the Output step, in the Mappings tab, click on the Map Automatically button. Remove the image profile and any others you do not want to use.

    Click OK to exit out of the Assembly Editor.

    hashtag
    3. Compile and run the job

    Before running the job we need to modify the Sequential File by double clicking on its icon on the designer. Specify an output file, say output.csv.

    Also update the columns by adding the id, employee_name, employee_salary, and employee_age fields.

    Finally, compile and run the job.

    You may see an error message: Message bundle error Can't find resource for bundle com.ibm.e2.Bundle_E2_engine_msgs, key OperatorException.operatorTransitionFailedAtRuntime. To resolve this, go to the iis-server machine, go to the temp directory cd /tmp and delete the XML files rm -rf XML*. This should resolve the issue. Re-compile and re-run the job. For more info check out .

    hashtag
    4. View output

    The output file will be saved on the server. Switch to the server VM by clicking the first icon on the Environment VMs panel and selecting iis-server. Login as the root user with the password inf0Xerver.

    Change your direcotory using cd to the location where you had stored the file. If you provided a path starting at "/", then it will be stored at that location in the server. Since we had only provided output.csv as the file path in the File connector, the file will be available in the Transformation project's folder, i.e.,

    Finally, output your results using the cat command.

    CONGRATULATIONS!! You have completed this lab!

    View output
    REST APIarrow-up-right
    Create job layout
    Customize with Assembly Editor
    Compile and run the job
    employees.jsonarrow-up-right
    IBM supportarrow-up-right
    Switch to iis-client
    DataStage Designer
    REST job
    Edit Hierarchical Data
    Add Steps
    Specify URL
    Specify content type
    Specify mappings
    Create new library
    New JSON schema
    Specify configuration
    Specify documentation root
    Specify mappings
    Specify output file
    Specify columns
    Switch to server VM
    output file contents

    Lab: DataStage: Configure a Git server (BitBucket)

    You can integrate your Git server with IBM DataStage. This integration allows you to publish jobs and related artifacts to different Git branches and load other versions of a job from Git onto the IBM DataStage Flow Designer canvas.

    Using IBM DataStage Flow Designer with Git provides the following benefits:

    • Helps with Continuous Integration Continuous Delivery (CICD) pipeline automation. Your assets are easily available in Git and you can move them from the development (DEV) branch to quality assurance (QA) to production.

    • Helps with auditing requirements. You can simply tell who changed what.

    • You can map a version of a job in the XMETA repository to a version in Git.

    • You can work on multiple versions of a job by creating temporary branches.

    • You can easily roll back to a prior version of a job.

    The following Git repositories are supported:

    • Bitbucket

    • GitHub

    • GitLab

    For more info check out the documentation.

    In this lab we'll learn how to integrate DataStage with BitBucket Cloud, a free to register BitBucket platform available at .

    This lab consists of the following steps:

    hashtag
    Before you start: Registering for BitBucket

    Navigate to and click on the "Get it free" button and create an account.

    Create a new repo, we'll call it datastage-jobs. Ensure the option to create a README.md file is selected.

    Your new repo should look like this:

    Generate an "App Password" by going to your . Remmeber to give the application password the read/write permission for repositories.

    hashtag
    Before you start: Launching DataStage Flow Designer

    Before we start the lab, let's switch to the iis-client VM and launch Firefox.

    Click on Classic Launchpad in the Bookmarks tab. The first time you try this out, you might see a certificate error. To get past it, click on Advanced... and then click Accept the Risk and Continue.

    Click on DataStage Flow Designer.

    Login with the credentials isadmin/inf0Xerver.

    This brings up the DataStage Flow Designer. Click OK.

    hashtag
    1. Configure DataStage with BitBucket

    From the IBM DataStage Flow Designer, click on the username bubble on the top right. Choose the Setup option. We'll be configuring the Server and User options.

    For the Server click on the Git tab and input the following:

    • Git repo: Your new repo, i.e. https://bitbucket.org:stevemart/datastage-jobs.git

    • Clone repo path: A new folder name, i.e. bitbucket/stevemart/datastage-jobs

    For the User configuration input the following:

    • Email address: Your BitBucket email

    • Username: Your BitBucket username

    • Personal access token: Your BitBucket app password from the previous step

    hashtag
    2. Export DataStage assets to BitBucket

    Load up any existing job or create a new one. Once loaded, click on the Export button that looks like an upload arrow.

    The entire DataStage project will be saved and exported to your configured BitBucket. Enter a specific commit message.

    There will be a success message if everything was exported correctly.

    Navigate back to your BitBucket repository to see the exported files.

    CONGRATULATIONS!! You have completed this lab!

    http://dummy.restapiexample.com/api/v1/employees
    cd /opt/IBM/InformationServer/Server/Projects/<project-name>/
    cat output.csv
    Microsoft Team Foundation Server
    Vendor
    : BitBucket
    Managing source control with Gitarrow-up-right
    https://bitbucket.org/arrow-up-right
    Configure DataStage with BitBucket
    Export DataStage assets to BitBucket
    https://bitbucket.org/arrow-up-right
    "Personal Settings"arrow-up-right
    new repo
    generated repo
    app password
    Switch to iis-client
    Classic Launchpad
    Select DFD
    Log into DFD
    DFD is up
    Setup git credentials
    Conifugre git server
    Conifugre git user
    Export
    Commit
    Success
    Exported files

    Lab: DataStage: Read from a database (Db2 on IBM Cloud)

    DataStage Flow Designer enables users to create, edit, load, and run DataStage jobs which can be used to perform integration of data from various sources in order to glean meaningful and valuable information.

    The purpose of this lab is to design a DataStage job to satisfy the following problem statement:

    As a data engineer, you have been asked by the Line of Business that you support, to produce a new data file that contains all employees whose total compensation is less than $50,000. The file must also contain the Department Name that the employee works in, and the mean average salary of all employees in that department who earn less than 50,000. In addition, the file must be sorted in descending order, based on the mean average salary amount. Finally, the application that will consume this file, expects the full name of the employee to be in one field, formatted as first, middle initial, last).

    NOTE: You can use the Peek stage to check intermediate results in the job as demonstrated in Lab 1.

    In this lab, you will learn:

    • How to create a job in DataStage.

    • How to load data from Db2 Warehouse into DataStage.

    • How to perform transformations such as modifying tables, joining tables, aggregating table data and sorting table data.

    NOTE: You can use the Peek stage to check intermediate results in the job as demonstrated in .

    This lab is comprised of the following steps:

    hashtag
    Before you start

    Before we start the lab, let's switch to the iis-client VM and launch Firefox.

    Click on Classic Launchpad in the Bookmarks tab. The first time you try this out, you might see a certificate error. To get past it, click on Advanced... and then click Accept the Risk and Continue.

    Click on DataStage Flow Designer.

    Login with the credentials isadmin/inf0Xerver.

    This brings up the DataStage Flow Designer. Click OK.

    hashtag
    1. Create a Transformation project

    • On the IBM DataStage Flow Designer, click on the Projects tab and click + Create. In the modal that opens up, type in a name for the project and click Create.

    The project takes a few minutes to be created and once ready, it will be visible on the Projects tab.

    • Click on the tile for your newly created project. In the modal that opens up, verify that the name of your project is provided as the Project Name and click OK to switch the project.

    hashtag
    2. Add database connection

    The input tables - EMP (containing employee data) and DEPT (containing department data) - are already loaded in Db2 Warehouse. Let's add a Db2 warehouse instance as a Connection in DataStage.

    • Click on the Connections tab and then click + Create to add a new connection.

    • Provide the following connection details and click OK. Click Save on the new modal that pops up.

    A tile for the new connection will now be displayed in the Connections tab.

    hashtag
    3. Create the job

    • Click on the Jobs tab and then click + Create. Click Parallel Job.

    A new tab with the name Job_1* opens up where you can now start designing the parallel job.

    The first step is to load the input tables DEPT and EMP into DataStage.

    The DEPT table contains the following columns:

    The EMP table contains the following columns:

    • First, drag a Connection connector to the canvas. In the modal that opens up, select the DB2WH connection that was created earlier and click Next.

    • On the next screen, select the BLUADMIN schema and click Next.

    • On the next screen, select the DEPT table and click Next.

    • On the next screen, click Add to Job.

    • Drag another Connection connector to the canvas and repeat the steps given above but this time, select the EMP table instead. Once you complete the steps, you should see the two Connection connectors on the canvas.

    Use a Transformer stage to perform the following two modifications:

    1. Update the output of the EMP table by replacing any NULL MIDINIT values with " ". This is needed for a future step where we will combine the FIRSTNME, MIDINIT and LASTNAME columns to create the FULLNAME of the employee.

    2. Currently, the EMP table uses the WORKDEPT column to identify the department number whereas the DEPT

    NOTE: For another method to connect the Connection connector to the Transformation stage, click on the Connection connector to select it, then drag and drop the Transformation stage. The Transformation stage will automatically be connected to the Connection connector.

    • Drag and drop a Join stage to the canvas and provide the output of the Transformer stage as the input to this Join stage.

    • Double click on the Transformer stage to open up the stage page. In the Properties tab, click + Add. A new entry is added in the table below. Use this to define a "Stage variable" (a local variable that is available only within this stage) named MIDINITIAL which will represent the middle initial of the employee with any NULL values replaced with the empty string. Double click on {derivation} to open up the Derivation Builder.

    • Begin building the derivation rule for MIDINITIAL by finding the NullToValue function in the table. Clicking on the entry in the table will insert it in the "Derivation" at the top. You can also use the search bar to look for the function. Replace the %input_column_input_column% with the <Link>.MIDINIT input variable which can also be found in the table and the %value% with " ". <Link> represents the identifier of the input link for the Transformer. Click OK to go back to the Stage page.

    • Update the Name of the stage variable to MIDINITIAL, the SQL type to Char, the Precision to 1 and the Scale to 0.

    • Now go to the Outputs tab and in the table find the entry for the MIDINIT column. Double click on the derivation value for this entry (it will open up the Derivation builder) and change the derivation value to use the newly created MIDINITIAL stage variable instead. Next, look for the entry for the WORKDEPT column. Double click on the WORKDEPT value under the Column name column and replace the text with DEPTNO. Click OK.

    • Both the tables now have a column called DEPTNO which can be used to join the tables. Provide the output of the DEPT table Connection connector as the second input to the Join stage. Double clicking the Join stage brings up the stage page where you can verify that the DEPTNO is being used as the JOIN KEY and the Join Type is Inner.

    • Next, add a Transformer stage to the canvas and provide the output of the Join stage as the input to the Transformer stage.

    • Double click the Transformer stage to open the stage page. We will add 2 stage variables to this Transformer stage. In the Properties tab, click + Add. A new entry is added in the table below. Use this to define a stage variable named TOTALCOMP which will represent the total compensation (the sum of bonus, commission and salary) of the employee. Double click on {derivation} to open up the Derivation Builder.

    • Build the derivation rule for TOTALCOMP by finding the BONUS, COMM and SALARY input columns in the table. Clicking on the entries in the table will insert them in the "Derivation" at the top. Type in + signs between them in order to complete out the derivation. You can also use the search bar to look for the fields. When inserted in the "Derivation", the column names will be prepended with the identifier for the input link. Click OK to go back to the Stage page.

    • Update the Name of the stage variable to TOTALCOMP, the SQL type to Decimal, the Precision to 9 and the Scale to 2.

    • Repeat the process above to add another Stage variable FULLNAME which will represent the complete name of the employee. Provide the Derivation as CompactWhiteSpace(<Link>.FIRSTNME:" ":<Link>.MIDINIT:" ":<Link>.LASTNAME), the Name of the stage variable as FULLNAME, the SQL type as Varchar and the Precision as 36 and the Scale as 0. Click OK to save the changes and return to the canvas.

    NOTE: <Link> needs to be replaced with the identifier of the input link. CompactWhiteSpace is a function that will compact any continuous white spaces into a single white space. : is the operator used for concatenation.

    • Next, add a Join stage and an Aggregator stage to the canvas. Connect the Transformer stage to both these stages such that the output of the Transformer stage is provided as the input to both these stages.

    Since the output links have now been added, we can provide the 2 stage variables TOTALCOMP and FULLNAME as outputs of the Transformer stage and once that is done, these values will be available as inputs in the subsequent stages.

    • Double click on the Transformer stage to open the stage page. Go to the Outputs tab. Click on +Add twice to add 2 new entries in the table. Update one entry with the Derivation value as TOTALCOMP, the Column name as TOTALCOMP, the SQL type as Decimal, the Precision as 9, the Scale as 2 and set Nullable to true. Update the second entry with the Derivation value as FULLNAME, the Column name as FULLNAME, the SQL type as Varchar, the Precision as

    • Provide the output of the Aggregator stage as the input to the Join stage. Double click on the Aggregator stage to open the stage page. Select DEPTNAME as the grouping key.

    • Scroll down to the Aggregations and select the Aggregation Type as Calculation, Column as TOTALCOMP, Calculation Type as Mean Value and click Add Calculation +.

    • Go to the Outputs tab and verify that you can see 2 output columns - MEAN_TOTALCOMP and DEPTNAME. Click OK to save the changes and return to the canvas.

    • Drag and drop a Sort stage on the canvas and provide the output of the Join stage as the input of the Sort stage.

    • Double click on the Join stage to open the stage page. On the Properties tab, verify that the Join Key is DEPTNAME and the Join Type is Inner.

    • Go to the Outputs tab and verify that you can see FULLNAME and MEAN_TOTALCOMP in the output column list.

    • Double click on the Sort stage to open up the stage page. Specify the Sorting Key as MEAN_TOTALCOMP and the Sort Order as Descending. Click OK.

    • Drag and drop a File connector to the canvas. In the modal that opens up, check the Add connector as target checkbox and click Add to Job.

    • Provide the output of the Sort stage as the input to the File connector. Double click the File connector to open the stage page, and provide the name of the output File as output.csv. Specify Comma-separated value (CSV) as the File format. Enable First row is header and provide Null value as "". Click OK.

    hashtag
    4. Compile and run the job

    • Click the Save icon to save the job. If you wish to, you can provide a different name for the job in the modal that pops up. Click Save. Once the job is saved, click on the Compile icon to compile it. If compilation is successful, you should see a green check mark and the message Compiled successfully displayed on the screen.

    • Click the Run icon to run the job. In the modal that opens up, click Run.

    hashtag
    5. View output

    • The output file will be saved on the server. Switch to the server VM by clicking the first icon on the Environment VMs panel and selecting iis-server. Login as the root user with the password inf0Xerver.

    • CD to the location where you had stored the file. If you provided a path starting at "/", then it will be stored at that location in the server. Since we had only provided output.csv as the file path in the File connector, the file will be available in the Transformation project's folder, i.e.,

    CONGRATULATIONS!! You have completed this lab!

    How to write tabular data from DataStage into a file.
  • How to run jobs.

  • How to view logs for jobs.

  • Compile and run the job
  • View output

  • N

    LOCATION

    CHAR(16)

    Y

    N

    WORKDEPT

    VARCHAR(3)

    Y

    PHONENO

    CHAR(4)

    Y

    HIREDATE

    DATE(4)

    Y

    JOB

    CHAR(8)

    Y

    EDLEVEL

    SMALLINT

    N

    SEX

    CHAR(1)

    Y

    BIRTHDATE

    DATE(4)

    Y

    SALARY

    DECIMAL(9,2)

    Y

    BONUS

    DECIMAL(9,2)

    Y

    COMM

    DECIMAL(9,2)

    Y

    table uses the
    DEPTNO
    column. Modify the output of the
    EMP
    table by changing the name of the
    WORKDEPT
    column to
    DEPTNO
    . This is needed for a future step where we will
    Join
    the two tables.
  • Drag and drop a Transformer stage next to the Connection connector for the EMP table. Provide the output of the EMP table Connection connector as the input to the Transformer stage. For this, click on the little blue dot on the right side of the Connection connector and drag the mouse pointer to the Transformer stage.

  • 36
    and the
    Scale
    as
    0
    .
  • We also need to add a Constraint here, which will ensure that only the records with TOTALCOMP more than 50000 are sent in the output. Click on the empty space under Constraint to open up the Derivation Builder. Specify the derivation as TOTALCOMP>50000.

  • Switch to the second output link by clicking on the Link_<number> under Output name and repeat the above steps to add the 2 stage variables to the output and to add the constraint. Click OK to save the changes and return to the canvas.

  • Column Name

    Data Type

    Nullable

    DEPTNO

    VARCHAR(3)

    N

    DEPTNAME

    VARCHAR(36)

    N

    MGRNO

    CHAR(6)

    Y

    ADMRDEPT

    Column Name

    Data Type

    Nullable

    EMPNO

    CHAR(6)

    N

    FIRSTNME

    VARCHAR(12)

    N

    MIDINIT

    CHAR(1)

    Y

    LASTNAME

    Lab 1
    Create a Transformation project
    Add database connection
    Create the job
    Switch to iis-client
    Classic Launchpad
    Select DFD
    Log into DFD
    DFD is up
    Create project
    Project created
    Switch project
    Create connection
    Add connection
    Created connection
    Create parallel job
    Create connection - select connection
    Create connection - select schema
    Create connection - select table
    Create connection - add to job
    Create connection - completed
    Add transformer for EMP
    Add join stage
    Transformer - 1.1 - add MIDINITIAL
    Transformer - 1.2 - add MIDINITIAL derivation
    Transformer - 1.3 - complete MIDINITIAL
    Transformer - 1.4 - updates on Output tab
    Connect DEPT to Join
    Add transformer stage
    Transformer - 2.1 - add TOTALCOMP
    Transformer - 2 - build TOTALCOMP derivation
    Transformer - 3 - complete TOTALCOMP
    Transformer - 4 - complete FULLNAME
    Add join and aggregator
    Transformer - add output columns and constraint
    Aggregator - 1 - add grouping key
    Aggregator - 2 - add calculation
    Aggregator - 3 - confirm output columns
    Add sort stage
    Confirm join stage
    Confirm join stage outputs
    Update sort stage
    Add file connector
    Update file connector
    Save compile
    Run job
    switch to server VM
    CD to project folder

    CHAR(3)

    VARCHAR(15)

    Name: DB2WH
    Connector type: JDBC
    URL: jdbc:db2://db2w-kzwbsid.us-east.db2w.cloud.ibm.com:50001/BLUDB:sslConnection=true;
    Username: bluadmin
    Password: ****************
    /opt/IBM/InformationServer/Server/Projects/<project-name>/

    About this workshop

    This section has hints and tips for instructors:

    hashtag
    Provisioning environments

    Use https://bluedemos.com/show/2543arrow-up-right to provision instances of the Skytap environment as needed.

    • Only IBMers can provision an instance, you'll need to log in with your w3ID.

    • You're limited to one request at a time.

    • You can schedule how long you want to keep the instance live.

    • You'll be emailed a link and a password. Distribute these to the lab attendees.

    • There's no cost to running these instances.

    • Instances will automatically suspend after 2 hours of inactivity. Just turn them back on to continue.

    hashtag
    Enable the Operations Console

    Access the iis-server VM.

    Log in with the credentials above. Run the following commands.

    Change to the Information Server install path:

    Edit the DSODBConfig.cfg file. set the DSODBON value to 1. It should be the first value at the top of the file. Save your changes and exit your editor.

    If you're not familiar with vi you can use sed

    Restart your console:

    hashtag
    Copy the data over

    Access the iis-server VM.

    Log in with the credentials above. Run the following commands.

    Change to the right folder:

    Download the data:

    NOTE: the command uses -O (not zero)

    Verify the content

    Lab: IGC: Create custom rules for data analysis

    Information Server enables users to better understand their data. It uses pre-built and custom rules that will apply meaning and quality measurements, which are available for users of the data and interested parties.

    This lab, will give you hands-on experience using Information Server's rule capabilities. In this lab, you will learn the following:

    • How to create a data rule

    • Apply the rule to a data set

    • View data that does not apply to the rule

    This section is comprised of the following steps:

    hashtag
    1. Import and view the data

    Launch Firefox and click on the Launchpad bookmark. When the Information Server launchpad shows up click on the Information Governance Catalog New tile.

    Log in with the username isadmin and password inf0Xerver.

    The overview page will appear.

    Click on the Connections tab and ensure the connection JK_ORD appears and the host is IIS-SERVER.IBM.DEMO. This connection is pre-established for convenience and was imported to the JK Orders workspace, let's take a look at it.

    To find the JK Orders workspace, go to Quality tab and click on the right tile.

    The workspace overview will load. Take a few moments to browse the graphics on the page and click on the Data sets link to view the data in this exercise.

    Before we create new rules let's look at the two data sets that will be used in this example. Take a look at the Sales data set first.

    Click on the Columns tab to view findings from the analyzer. It found many things when the data was imported, like maximum and minimum values, distinct values, format, and uniqueness.

    Repeat the same for the CUSTOMERS data set.

    We're now ready to create our first data rule!

    hashtag
    2. Create a data rule

    From the JK Orders workspace click on the Data rules tab.

    Expand the Published Rules section, then the 01 Personal Identity section, and then the Age section to reveal the AdultInRangeNumeric rule.

    Navigate to the right to show the ⋯ menu. Choose the Manage in workspace option from the menu.

    The rule should now appear under the All section. Navigate to the right to show the ⋯ menu. Choose the Edit option from the menu.

    Now we're going to edit the rule so that it flags ages that are under 18 or over 120. To do that, edit the text boxes to the right that show minage and maxage and update them with 18 and 120 respectively. The formula at the bottom should read:

    Save the rule.

    When you are brought back to the Data rules tab you'll notice that the new rule has an error. We need to publish the rule. To do so navigate to the right to show the ⋯ menu. Choose the Publish option from the menu.

    Confirm the publish action.

    hashtag
    3. Re-analyze and view results

    Go back to the JK Orders workspace and click on the Data sets links

    Click on the SALES data set.

    We can now apply the rule by clicking the + Add rule button.

    Choose the AdultInRangeNumeric rule and click Next.

    We now have to bind the rule to a specific column in our data set. Click on the age variable name on the left, this is the rule. And click on the AGE column name on the right, this is our data source. Click on the Bind button and once bound you can click Next.

    We don't need to test the rule this time, we can just click on Save.

    You should now see the rule in the data set view. Click on the Analyze button to restart the analysis with the new rule.

    Confirm to start the analysis.

    You may need to refresh your browser a few times for this step. It will take a few minutes. You'll see the state go to Running to Successful when done.

    Once complete, go to the Data quality tab. You'll see that the new rule has one finding. Click on the findings to view the exception.

    Scrolling to the right you'll see that there is one entry whos age is 17, this is less than the required 18 for our rule!

    CONGRATULATIONS!! You have completed this lab!

    $ cd /opt/IBM/InformationServer/Server/DSODB
    DSODBON=1
    sed -i 's/DSODBON=0/DSODBON=1/g' DSODBConfig.cfg
    $ cd /opt/IBM/InformationServer/Server/DSODB/bin
    $ ./DSAppWatcher.sh -start
    $ cd /IBMdemos
    $ wget http://ibm.biz/datastage-standalone-data -O raviga-products.csv
    $ cat raviga-products.csv
    RAV8XLIGA,90033
    ...
    RAV9XLIGA,90033
    1. Import and view the data
    2. Create a data rule
    3. Re-analyze and view results
    1-iis-launchpad-new
    2-gc-login
    3-gc-landing
    4-connections
    5-workspaces
    6-workspace-overview
    7-data-sets
    8-sales
    9-customers
    10-data-rules
    11-manage-rule
    12-edit-rule
    13-rule-logic
    14-publish
    15-publish-confirm
    16-data-sets
    17-sales
    18-rules
    19-select-rule
    20-bind
    21-test
    22-analyze
    23-analyze-confirm
    24-running
    25-findings
    26-findings-view

    Lab: DataStage: Import COBOL copybook and read EBCIDIC data

    IBM DataStage Flow Designer allows you to read data from a mainframe. More specifically, you can specify inputs in your DataStage job to be in EBCIDIC format and to import COBOL copybooks as table definitions.

    To keep things simple in this lab we're going to speak generally about mainframes and COBOL. Files from a mainframe are usually saved as binary files and sFTPed to some server where DataStage can access them. The binary files alone are not enough for DataStage to read the contents. A COBOL playbook is required to translate the data from binary to ascii. Both files are available here: https://github.com/IBM/datastage-standalone-workshop/tree/master/data/mainframearrow-up-right.

    This lab consists of the following steps:

    1. Create job layout

    hashtag
    About the data

    The example binary data should be downloaded to the server. Switch to the server and run:

    The copybook can be downloaded to the client machine by going to the and saving the file to the desktop. The copybook we're using looks like this:

    hashtag
    Before you start: Launching DataStage Flow Designer

    Before we start the lab, let's switch to the iis-client VM and launch Firefox.

    Launch the desktop client by going to the start menu and searching for DataStage Designer.

    hashtag
    1. Create job layout

    Start a new Parallel Job project and create a job that looks like the image below. Remember to wire the elements together. It should have:

    • 1 x Complex Flat File

    • 1 x Peek

    • 1 X Sequential File

    hashtag
    2. Add COBOL copybook as a table definition

    In the toolbar click on Import > Table Definitions > COBOL File Definitions.

    Specify the downloaded copybook file and click Import.

    You have just imported your copybook definitions!

    hashtag
    3. Customize the job

    The first step is to double click on the Complex Flat File node, go to the File options tab, and specify the example binary file as the input. Critically, we must specify Fixed block as the record type.

    Go to the Record options tab and choose the Binary data format and EBCDIC as the character set.

    Go to the Records tab and click the Load button, this will give us the option to specify a copybook.

    Choose to use on the copybook that was imported in the previous step.

    Select the >> icon to use all fields from the copybook.

    The Records tab should now show the various column names from the copybook.

    Double clicking on the Peek node allows us to map output from the Complex Flat File. Click on the Output section and choose the Columns tab.

    Enter the following new columns:

    • ID

    • SHORT_NAME

    • CLIENTID

    Still in the Output section now click on the Mapping tab and choose to Auto-Match.

    Double clicking on the Sequential File node brings up a single option. To specify a filename, choose mainframe.csv for example.

    hashtag
    4. Compile, run, view output

    Compile and run the job using the usual icons from the toolbar.

    After running the job you can view the output from the Designer tool by clicking on the Sequential File node and clicking the View Data button. Click OK on the next dialog.

    You'll be able to see one row of data with an ID, SHORT_NAME and a few other fields.

    The output file is also written to the server. Switch to the server VM by clicking the first icon on the Environment VMs panel and selecting iis-server. Login as the root user with the password inf0Xerver.

    Change your direcotory using cd to the location where you had stored the file.

    Finally, output your results using the cat command.

    CONGRATULATIONS!! You have completed this lab!

    Lab: DataStage: Read and write to file (CSV)

    DataStage Flow Designer enables users to create, edit, load, and run DataStage jobs which can be used to perform integration of data from various sources in order to glean meaningful and valuable information.

    The purpose of this lab is to familiarize ourselves with the DataStage Flow Designer and the Operations Console.

    In this lab, you will learn the following:

    • How to create a job in DataStage.

    • How to read tabular data from a file into DataStage.

    • How to perform transformations (such as filtering out records and removing duplicate values) on a table.

    • How to write tabular data from DataStage into a file.

    • How to run jobs.

    • How to view logs for jobs.

    • How to schedule jobs.

    This lab is comprised of the following steps:

    hashtag
    Before you start

    Before we start the lab, let's switch to the iis-client VM and launch Firefox.

    Click on Classic Launchpad in the Bookmarks tab. The first time you try this out, you might see a certificate error. To get past it, click on Advanced... and then click Accept the Risk and Continue.

    Click on DataStage Flow Designer.

    Login with the credentials isadmin/inf0Xerver.

    This brings up the DataStage Flow Designer. Click OK.

    hashtag
    1. Create a Transformation project

    • On the IBM DataStage Flow Designer, click on the Projects tab and click + Create. In the modal that opens up, type in a name for the project and click Create.

    The project takes a few minutes to be created and once ready, it will be visible on the Projects tab.

    • Click on the tile for your newly created project. In the modal that opens up, verify that the name of your project is provided as the Project Name and click OK to switch the project.

    hashtag
    2. Create the table definition

    We will be using the raviga-products.csv file which contains tabular data having 2 columns:

    • On the DataStage Flow Designer, click on the Table Definitions tab and then click + Create. In the modal that opens up, provide the definition of the table contained in the input file. Go to the General tab and provide the Data source type as File, the Data source name as raviga and the Table/File name as ravigaproducts. If you wish to, you can also provide optional Short description and Long description.

    • Next, go to the Columns tab and click + Add. A new entry is added in the table. Edit the entry and update the Column name to ProductID. Mark the column as Key. Update the SQL type to Varchar and the Length to 9.

    hashtag
    3. Create the job

    • Click on the Jobs tab and then click + Create. Click Parallel Job.

    A new tab with the name Job_1* opens up where you can now start designing the parallel job.

    Before designing the job, take a few minutes to look at the various buttons and menus available on the screen.

    1. The Show/Hide Palette button can be used to show or hide the palette on the left of the screen which contains the connectors and stages that can be used while designing the job.

    2. The Save, Compile and Run icons are used to save, compile and run the job respectively.

    • Drag a File connector to the canvas. The Table Definition Asset Browser opens up. Select raviga\ravigaproducts and click Next.

    • Ensure that both the columns ProductID and Zip are selected and click Add to Job.

    • The File connector should now be visible on the canvas. Drag and drop a Filter stage on the canvas. Provide the output of the File connector as the input to the Filter stage. To do this, click on the little blue dot on the right side of the File connector and drag the mouse pointer to the Filter stage.

    NOTE: For another method to connect the File connector to the Filter stage, click on the File connector to select it and then drag and drop the Filter stage. The Filter stage will automatically be connected to the File connector.

    • Double click on the File connector to open the Connector's Properties page. Provide the File name as /IBMdemos/raviga-products.csv which is where the source file is located. Further down, provide the File format as Comma-separated value (CSV).

    • Scroll further down and update the Null value to "". Click OK.

    • Double click the Filter stage to open the Stage's Properties page. Provide the value for Where Clause as Zip IS NOT NULL. This will filter out all the records in the table where the value of Zip is NULL. Click OK.

    • We will use the Peek stage to look at the intermediate state of the table. Drag and drop a Peek stage to the canvas and provide the output of the Filter stage as the input to the Peek stage.

    • Next, we will add a Remove Duplicates stage and provide the output of the Peek stage as the input to the Remove Duplicates stage.

    • Double click on the Remove Duplicates stage to open the Stage's Properties page. Under KEYS THAT DEFINE DUPLICATES, select Zip. Set the value of Duplicate to retain as First. This will ensure that if there are any duplicate Zip values, the job will only retain the first record out of the records where this Zip value occurs. Click OK.

    • Drag and drop a File connector to the canvas. In the modal that opens up, check the Add connector as target checkbox and click Add to Job.

    • Provide the output of the Remove Duplicates stage as the input to the File connector. Double click the File connector to open the stage page, and provide the File name as output.csv. Specify Comma-separated value (CSV) as the File format. Click OK.

    hashtag
    4. Compile and run the job

    • Click the Save icon to save the job. If you wish to, you can provide a different name for the job in the modal that pops up. Click Save. Once the job is saved, click on the Compile icon to compile it. If compilation is successful, you should see a green check mark and the message Compiled successfully displayed on the screen. The OSH Code will also be available for viewing.

    If the compilation was not successful, you should see a red error icon and the message Compiler failed with ### errors. where ### represents the number of errors. Clicking on View errors brings up a modal that lists all the errors that were found. The stage/connector/link that has the error is also highlighted in the canvas and you can hover over it to see the error.

    • Click the Run icon to run the job. In the modal that opens up, click Run.

    • Once the run completes, you should see the status of the run on the screen. The status will also indicate if there were any warnings or errors. Click on View log to get more details about the run.

    • Everything that happens as part of the job run is available in the log. This includes warnings, errors and informational messages such as the results of the Peek stage. Scroll through the log to reach the Peek results. By default, the Peek stage only logs the first 10 records of the partition (the data was split into two partitions in this case). You can see how in both Peek logs there are no records where Zip value is NULL.

    hashtag
    5. Use Operations Console to view previous logs

    • At the bottom of the log, click on Open Operations Console. Alternatively, you can open the Classic Launchpad in Firefox and click on Operations Console.

    • The Operations console can be used to view logs for previous runs of the job. Switch to the Projects tab. In the left-hand navigation pane, expand Lab1 (the transformation project), followed by Jobs. Click on Job_1 (the job that was just saved/compiled/run). The Job Runs table at the bottom of the screen will be updated to reflect all the times that the job was run. Click on any job run in the list to select it and click View Details.

    • Go to the Log Messages tab and check the Full Messages checkmark to see the complete log for the job run. Clicking on an entry in the job log will display the log message details at the bottom.

    hashtag
    6. Scheduling the job

    DataStage Flow Designer provides the ability to run jobs as per a schedule.

    • Back in the DataStage Flow Designer where we were designing the job, click on the Schedule icon. In the modal that opens up, provide the frequency for running the job - whether you want to run it every Day, Week or Month. Next to it, provide the time of the day when the job should run. Click Schedule.

    Alternatively you can go up one level to the Jobs tab, and hover over your job's tile to see the kebab icon (⋮). Click on the icon and select Schedule to open up the scheduling modal where you can set up a schedule for your job.

    NOTE: You can only schedule your job once it has been successfully compiled.

    hashtag
    7. View output

    • The output file will be saved on the server. Switch to the server VM by clicking the first icon on the Environment VMs panel and selecting iis-server. Login as the root user with the password inf0Xerver.

    • CD to the location where you had stored the file. If you provided a path starting at "/", then it will be stored at that location in the server. Since we had only provided output.csv as the file path in the File connector, the file will be available in the Transformation project's folder, i.e.,

    CONGRATULATIONS!! You have completed this lab!

    age >= 18 and age < 120
    COMPANY_ID_NUM
  • COMPANY_ID_STR

  • REGISTRATION_NUM

  • NUMBER_OF_ACCTS

  • ACCOUNT_NUMBER

  • ACCOUNT_TYPE_X

  • Add COBOL copybook as a table definition
    Customize the job
    Compile, run, view output
    GitHub repoarrow-up-right
    Switch to iis-client
    DataStage Designer
    mainframe job
    COBOL definitions
    Import copybook
    file options
    record options
    load copybook
    find copybook
    specify fields
    add records
    add colums
    auto-match
    file name
    output file contents
    shows the data
    Switch to server VM
    in a terminal
    Compile and run the job
  • Use Operations Console to view previous logs

  • Scheduling the job

  • View output

  • Click on + Add to add another entry in the table. Edit this entry and update the Column name to Zip, SQL type to Integer, Extended to "" (blank) and Length to 6. Mark the column as Nullable. Click OK. In the modal that pops up, click Save.

    The Compare icon is used to compare this job with another job in the Project.

  • The View menu has the following options:

    1. View OSH code - which is available once the job has been successfully compiled.

    2. View log - which is available once the job has been run.

    3. Properties - based on what is selected in the job canvas before clicking on Properties, either the properties of the job or of a connector/stage/link are displayed.

  • The Schedule icon is used to set a schedule to run the job.

  • The Settings menu has a number of options such as

    1. Apply horizontal layout - which arranges all the connectors and stages in the canvas in a horizontal manner with data flowing from left to right.

    2. The ability to view/hide annotations, arrows, link names, and the type of the connector/stage.

    3. Smart palette - which applies smart logic based on usage patterns to reorder the items available in the palette. If disabled, the items in the palette are displayed in an alphabetical order.

    4. Smart stage suggestions - applies smart logic based on usage patterns to suggest the next stage that you might want to add to the job.

  • Column Name

    Data Type

    ProductID

    varchar(9)

    Zip

    Integer(6)

    Create a Transformation project
    Create the table definition
    Create the job
    Switch to iis-client
    Classic Launchpad
    Select DFD
    Log into DFD
    DFD is up
    Create project
    Project created
    Switch project
    Create table definition
    Complete table definition
    Create parallel job
    Job options
    Add file connector
    Add file connector 2
    Add filter stage
    Update file connector
    Update file connector 2
    Update filter properties
    Add peek stage
    Add remove duplicates stage
    Update remove duplicates properties
    Add file connector
    Update file connector
    Save compile
    Save compile error
    Run job
    View log
    View log 2
    Open Operations Console
    View job runs
    View job run log
    Schedule job
    Schedule job 2
    Switch to server VM
    CD to project folder
    cd /opt/IBM/InformationServer/Server/Projects/dstage1
    wget https://raw.githubusercontent.com/IBM/datastage-standalone-workshop/master/data/mainframe/example.bin
    01  RECORD.
        05  ID                        PIC S9(4)  COMP.
        05  COMPANY.
            10  SHORT-NAME            PIC X(10).
            10  COMPANY-ID-NUM        PIC 9(5) COMP-3.
            10  COMPANY-ID-STR
        05  METADATA.
            10  CLIENTID              PIC X(15).
            10  REGISTRATION-NUM      PIC X(10).
            10  NUMBER-OF-ACCTS       PIC 9(03) COMP-3.
    cd /opt/IBM/InformationServer/Server/Projects/<project-name>/
    cat mainframe.csv
    /opt/IBM/InformationServer/Server/Projects/<project-name>/