0 / 0
Data integration tutorial: Replicate data
Last updated: Nov 27, 2024
Data integration tutorial: Replicate data

Take this tutorial to set up data replication between a source and target data store with the Data integration use case of the data fabric trial. Your goal is to use Data Replication to integrate the credit score information from the provider's Db2 on Cloud data source by setting up a near real time and continuous replication feed with efficient data capture from the source database into your Golden Bank's Event Streams instance. Event Streams is a high-throughput message bus built with Apache Kafka. It is optimized for event ingestion into IBM Cloud and event stream distribution between your services and applications. For more information about Event Streams, see the Learn more section.

Quick start: If you did not already create the sample project for this tutorial, access the Data integration sample project in the Resource hub.

The story for the tutorial is that Golden Bank needs to adhere to a new regulation where it cannot lend to underqualified loan applicants. As a data engineer at Golden Bank, you need to provide access to the most up to date credit scores of loan applicants. These credit scores are sourced from a Db2 on Cloud database owned by an external provider and continuously delivered into Golden Bank's Event Streams hub. The data in Event Streams hub is used by the application to lookup credit scores for mortgage applicants to determine loan approval for qualified applicants.

The following animated image provides a quick preview of what you’ll accomplish by the end of the tutorial. Click the image to view a larger image.

Animated image

Preview the tutorial

In this tutorial, you will complete these tasks:

Watch Video Watch this video to preview the steps in this tutorial. There might be slight differences in the user interface shown in the video. The video is intended to be a companion to the written tutorial.

This video provides a visual method to learn the concepts and tasks in this documentation.





Tips for completing this tutorial
Here are some tips for successfully completing this tutorial.

Use the video picture-in-picture

Tip: Start the video, then as you scroll through the tutorial, the video moves to picture-in-picture mode. Close the video table of contents for the best experience with picture-in-picture. You can use picture-in-picture mode so you can follow the video as you complete the tasks in this tutorial. Click the timestamps for each task to follow along.

The following animated image shows how to use the video picture-in-picture and table of contents features:

How to use picture-in-picture and chapters

Get help in the community

If you need help with this tutorial, you can ask a question or find an answer in the Cloud Pak for Data Community discussion forum.

Set up your browser windows

For the optimal experience completing this tutorial, open Cloud Pak for Data in one browser window, and keep this tutorial page open in another browser window to switch easily between the two applications. Consider arranging the two browser windows side-by-side to make it easier to follow along.

Side-by-side tutorial and UI

Tip: If you encounter a guided tour while completing this tutorial in the user interface, click Maybe later.



Set up the prerequisites

Sign up for Cloud Pak for Data as a Service

You must sign up for Cloud Pak for Data as a Service and provision the necessary services for the Data integration use case.

  • If you have an existing Cloud Pak for Data as a Service account, then you can get started with this tutorial. If you have a Lite plan account, only one user per account can run this tutorial.
  • If you don't have a Cloud Pak for Data as a Service account yet, then sign up for a data fabric trial.

Watch Video Watch the following video to learn about data fabric in Cloud Pak for Data.

This video provides a visual method to learn the concepts and tasks in this documentation.

Verify the necessary provisioned services

preview tutorial video To preview this task, watch the video beginning at 01:29.

Important: Data Replication is available in the Dallas region only. If necessary, switch to the Dallas region before continuing.

Follow these steps to verify or provision the necessary services:

  1. In Cloud Pak for Data, verify that you are in the Dallas region. If not, click the region drop down, and then select Dallas.
    Change region

  2. From the Navigation menu Navigation menu, choose Services > Service instances.

  3. Use the Product drop-down list to determine whether an existing Data Replication service instance exists.

  4. If you need to create a Data Replication service instance, click Add service.

    1. Select Data Replication.

    2. Select the Lite plan.

    3. Click Create.

  5. Wait while the Data Replication service is provisioned, which might take a few minutes to complete.

  6. Repeat these steps to verify or provision the following additional services:

    • watsonx.ai Studio
    • Cloud Object Storage
    • Event Streams - You might be prompted to log in to your IBM Cloud account.

Checkpoint icon Check your progress

The following image shows the provisioned service instances. You are now ready to create the sample project.

Provisioned services

Create the sample project

preview tutorial video To preview this task, watch the video beginning at 02:19.

If you already have the sample project for this tutorial, then skip to Task 1. Otherwise, follow these steps:

  1. Access the Data integration tutorial sample project in the Resource hub.

  2. Click Create project.

  3. If prompted to associate the project to a Cloud Object Storage instance, select a Cloud Object Storage instance from the list.

  4. Click Create.

  5. Wait for the project import to complete, and then click View new project to verify that the project and assets were created successfully.

  6. Click the Assets tab to see the connections, connected data asset, and the notebook.

Note: You might see a guided tour showing the tutorials that are included with this use case. The links in the guided tour will open these tutorial instructions.

Checkpoint icon Check your progress

The following image shows the Assets tab in the sample project. You are now ready to start the tutorial.

Sample project




Task 1: Set up Event Streams

preview tutorial video To preview this task, watch the video beginning at 03:05.

As part of the Prerequisites, you provisioned a new Event Streams instance. Now, you need to set up that service instance. Follow these steps to:

  • Create a topic to store the data replicated from the source data in Db2 on Cloud. The topic is the core of Event Streams flows. Data passes through a topic from producing applications to consuming applications.

  • Copy sample code that contains the bootstrap server information necessary to set up data replication.

  • Create credentials that you will use to create a connection to the service in the project.

  1. Return to the IBM Cloud console Resources list.

  2. Expand the Integration section.

  3. Click the service instance name for your Event Streams instance to view the instance details.

  4. First, to create the topic, click the Topics page.

    1. Click Create topic.

    2. For the Topic name, type golden-bank-mortgage.

    3. Click Next.

    4. In the Partitions section, accept the default value, and click Next.

    5. In the Message retention section, accept the default value, and click Create topic.

    6. Open a text editor, and paste the topic name golden-bank-mortgage into the text file to use later.

  5. Next, back on the Topics page, click Connect to this service to retrieve the connection information.

    1. Copy the value in the Bootstrap server field. The bootstrap server is required when creating a connection to the Event Streams instance in your project.

    2. Paste the bootstrap server value into the same text file to use later.

    3. Click the Sample code tab.

    4. Copy the value in the Sample configuration properties field. You will use some properties from this snippet to connect securely to the service.

    5. Paste the sample code into the same text file to use later.

    6. Click the X to close the Connect to this service panel.

  6. Lastly, to create the credentials, click the Service credentials page.

    1. Click New credential.

    2. Accept the default name, or change it if you would prefer.

    3. For the Role, accept the default value of Manager.

    4. Expand the Advanced options section.

    5. In the Select Service ID field, select Auto Generate.

    6. Click Add.

    7. Next to the new credentials, click the Copy to clipboard icon.

    8. Paste the credentials into the same text file to use later.

Your text file should contain all of the following information:

TOPIC NAME: golden-bank-mortgage

BOOTSTRAP SERVER FIELD
broker-5-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093,broker-1-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093,broker-2-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093,broker-0-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093,broker-3-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093,broker-4-7w81scvsqh485hbz.kafka.svc04.us-south.eventstreams.cloud.ibm.com:9093

SAMPLE CODE
bootstrap.servers=broker-5-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-0-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-2-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-1-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-3-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-4-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="token" password="<APIKEY>";
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
ssl.protocol=TLSv1.2
ssl.enabled.protocols=TLSv1.2
ssl.endpoint.identification.algorithm=HTTPS
CREDENTIALS
{
  "api_key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "apikey": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "bootstrap_endpoints": "broker-2-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-0-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-4-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-5-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-3-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,broker-1-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
  "iam_apikey_description": "Auto-generated for key crn:v1:bluemix:public:messagehub:us-south:a/a53b11fc95fcca4d96484d0de5f3bc3c:6b5a2cb2-74ef-432d-817f-f053873e7ed2:resource-key:96372942-5d26-4c59-8ca4-41ab6766ba91",
  "iam_apikey_name": "Service credentials-1",
  "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Manager",
  "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/a53b11fc95fcca4d96484d0de5f3bc3c::serviceid:ServiceId-4773bed1-f423-43ea-adff-469389dca54c",
  "instance_id": "6b5a2cb2-74ef-432d-817f-f053873e7ed2",
  "kafka_admin_url": "https://pqny71x0b9vh7nwh.svc11.us-south.eventstreams.cloud.ibm.com",
  "kafka_brokers_sasl": [
    "broker-2-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
    "broker-0-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
    "broker-4-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
    "broker-5-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
    "broker-3-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093",
    "broker-1-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093"
  ],
  "kafka_http_url": "https://pqny71x0b9vh7nwh.svc11.us-south.eventstreams.cloud.ibm.com",
  "password": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "user": "token"

Checkpoint icon Check your progress

The following image shows the Topics page for your Event Streams instance in IBM Cloud. You are now ready to create connection to the Event Streams instance in your project.

Topics page




Task 2: View the credit score data

preview tutorial video To preview this task, watch the video beginning at 05:06.

The sample project includes a connection to the Db2 on Cloud instance where the source data is stored. Follow these steps to view the connection asset and the credit score data:

  1. Return to your Cloud Pak for Data as a Service browser tab. You will see the Data integration project. If you don't see the project, then follow these steps:

    1. From the Navigation menu Navigation menu, choose Projects > View all projects.

    2. Click the Data integration project to open it.

  2. On the the Assets tab, click All assets.

  3. Locate the the Data Fabric Trial - Db2 on Cloud - Source connection asset.

  4. Locate the CREDIT_SCORE connected data asset.

  5. Click the CREDIT_SCORE asset to see a preview. This data asset maps to the CREDIT_SCORE table in the BANKING schema in the provider's Db2 on Cloud instance. It includes information about the mortgage applicants such as ID, name, address, and credit score. You want to set up data replication for this data asset.

  6. Click Data integration project name in the navigation trail to return to the project.
    Navigation trail

Checkpoint icon Check your progress

The following image shows the credit score data asset in the sample project. You are now ready to create a connection to the Event Streams service in this project.

Credit score data asset




Task 3: Create a connection to your Event Streams instance

preview tutorial video To preview this task, watch the video beginning at 05:34.

To set up replication, you also need a connection to the new Event Streams instance that you provisioned as part of the Prerequisites using the information you gathered in Task 1. Follow these steps to create the connection asset:

  1. On the Assets tab, click New asset > Connect to a data source.

  2. Select the Apache Kafka connector, and then click Next.

  3. For the Name, type Event Streams.

  4. In the Connection details section, complete the following fields:

    • Kafka server host name: Paste the bootstrap server value from the text file you created in Task 1.
    • Secure connection: Select SASL_SSL.
    • User principal name: Paste the user value from the service credentials in your text file. This value is usually token.
    • Password: Paste the password value from the service credentials in your text file.
  5. Click Test connection.

  6. When the test is successful, click Create. If the test is not successful, verify the information you copied and pasted from your text file, and try again. If prompted to confirm creating the connection without setting location and sovereignty, click Create again.

  7. Click All assets to see the new connection.

Checkpoint icon Check your progress

The following image shows the Assets tab in the sample project showing the new Event Streams connection asset. You are now ready to associate the Data Replication service with this project.

EVent Streams connection asset




Task 4: Associate the Data Replication service with your project

preview tutorial video To preview this task, watch the video beginning at 06:32.

To use the Data Replication service in your project, you need to associate your service instance with the project. Follow these steps to associate the Event Streams service with the Data integration project:

  1. In the Data integration project, click the Manage tab.

  2. Click the Services and integrations page.

  3. Click Associate service.

  4. Check the box next to your Data Replication service instance.

  5. Click Associate.

  6. Click Cancel to return to the Services & Integrations page.

Checkpoint icon Check your progress

The following image shows the Services and Integrations page with the Data Replication service listed. You are now ready to set up data replication.

Associate service with project




Task 5: Set up data replication

preview tutorial video To preview this task, watch the video beginning at 06:53.

Now you can create a Data Replication asset to start continuous data replication between the Db2 on Cloud source and the Event Streams target. Follow these steps to set up data replication:

  1. Click the Assets tab in the project.

  2. Click New asset > Replicate data.

  3. For the Name, type CreditScoreReplication.

  4. Click Source options.

  5. On the Source options page, select Data Fabric Trial - Db2 on Cloud - Source from the list of connections.

  6. Click Select data.

  7. On the Select data page, select the BANKING schema > CREDIT_SCORE table.

  8. Click Target options.

  9. On the Target options page, select Event streams from the list of connections.

  10. In the Default topic field, paste the topic name created in Task 1, golden-bank-mortgage.

  11. Accept the default value for the rest of the fields, and click Review.

  12. Review the summary, and click Create.

Checkpoint icon Check your progress

The following image shows the ReplicateCreditScoreData screen with replication stopped. You are now ready to run data replication.

Data replication asset




Task 6: Run data replication

preview tutorial video To preview this task, watch the video beginning at 07:54.

After creating the Data Replication asset, you can run data replication and view information about the replication status. Follow these steps to run data replication:

  1. On the CreditScoreReplication screen, click the Run icon Run replication to start the replication process.

    If this is your first time running a Data Replication asset, you might be prompted to provide an API key. Data replication assets use your personal IBM Cloud API key to execute replication operations securely without disruption. If want to use a specific API key, then click the Settings icon Settings.

    • If you have an existing API key, click Use existing API key, paste the API key, and click Save.
    • If you don't have an existing API key, click Generate new API key, and then click Generate. Save the API key for future use, and then click Close.
  2. In the Event logs section, click the Refresh icon Refresh to see any new messages.

  3. After a few minutes, the message Completed initial synchronization for table "BANKING"."CREDIT_SCORE" displays in the Event logs section.

From this point forward, any changes to the BANKING.CREDIT_SCORE table in the Db2 on Cloud instance will be detected automatically and replicated to the target.

Checkpoint icon Check your progress

The following image shows the CreditScoreReplication screen with replication running and messages in the Event log. You are now ready to monitor replication by watching the status of the replication asset, the events and metrics, and to verify that the data is being replicated.

Data replication asset running




Task 7: Verify data replication

preview tutorial video To preview this task, watch the video beginning at 09:03.

You can use Python code to verify that the credit score data was replicated into Golden Bank's Event Streams hub. The sample project includes a Jupyter notebook containing the sample Python code. Follow these steps to edit and run the code in the notebook:

  1. Click Data integration project name in the navigation trail to return to the project.
    Navigation trail

  2. Click the Assets tab.

  3. Click All assets.

  4. Click the Overflow menu Overflow menu at the end of the row for the Monitor data replication notebook, and choose Edit.

  5. Run the first code cell to install the Kafka-python library.

  6. Edit the second cell using the information you saved to a text file from Task 1.

    • topic: Paste the topic name. This value is golden-bank-mortgage.

    • bootstrap_servers: Paste the bootstrap server value from your text file, which should look similar to this value:

      broker-5-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,
      broker-0-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,
      broker-2-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,
      broker-1-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,
      broker-3-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093,
      broker-4-pqny71x0b9vh7nwh.kafka.svc11.us-south.eventstreams.cloud.ibm.com:9093
      
    • sasl_plain_username: Paste the user value from the service credentials in the text file. This value is usually token.

    • security_protocol: Paste the security.protocol value from the text file. This value is usually SASL_SSL.

    • sasl_mechanism: Paste the sasl.mechanism value from the text file. This value is usually PLAIN.

    • sasl_plain_password: Paste the password value from the service credentials in the text file.

  7. After completing all of the values, run the code in the second cell to provide the connection information for your Event Streams instance.

  8. Run the code in the the third cell to consume records from your Event Streams topic.

  9. Run the code in the fourth cell to print the messages captured into your consumer object.

  10. Review the output showing the content of the messages delivered by replication into your Event Streams topic. Compare that to the CREDIT_SCORE data asset you viewed in Task 2.

  11. Click File > Save to save the Jupyter notebook with your stored credentials.

Checkpoint icon Check your progress

The following image shows the Monitor data replication notebook after running the code successfully.

Jupyter notebook to monitor data replication



As a data engineer at Golden Bank, you set up continuous access to the most up to date credit scores of loan applicants by configuring data replication between the CREDIT_SCORE table in the Db2 on Cloud source database and a topic in Event Streams. If there are changes to an applicant's credit score, then Golden Bank's mortgage approvers will have near real time access to those changes.

Cleanup (Optional)

If you would like to retake the tutorials in the Data integration use case, delete the following artifacts.

Artifact How to delete
Data Replication and Event Streams service instances 1. From the Navigation Menu Navigation menu, choose Services > Service instances.
2. Click the Action menu next to the service name, and choose Delete.
Data integration sample project Delete a project

Next steps

Learn more

Parent topic: Use case tutorials

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more