0 / 0
Tutorial: Get started with Data Replication
Tutorial: Get started with Data Replication

Tutorial: Get started with Data Replication

This getting started tutorial shows you how to set up a sample Data Replication to move data between an Amazon RDS for PostgreSQL database and Db2 Warehouse. It takes approximately 2 hours to complete.

Before you begin

You need an Amazon RDS for PostgreSQL database to complete this tutorial. If you do not have an Amazon RDS for PostgreSQL database, create one by completing the following steps.

Create and connect to a PostgreSQL database

Follow steps 1-4 provided in the AWS tutorial to create a database named myDatabase with master username masterUsername, download SQL Workbench for the client, and connect to the database.

To create a new AWS account, you need to provide a credit card number. Select a support plan that is appropriate for your needs. Support plan Basic support - free can be used for this tutorial. When your AWS account is ready, return to Step 1.

If you click the link in Step 1 and it directs you to the Amazon Web Services sign-in page, review the AWS Customer Agreement, Privacy Notice, and Cookie Notice. If you agree with the terms, then sign in as Root with your existing AWS account or click Create a new AWS account.

Configure the PostgreSQL database for replication

To enable PostgreSQL to act as a Data Replication source, modify the database parameters to enable replication.

  1. Create a DB parameter group. Ensure that you have an existing parameter group named default.postgresversion. Create a new parameter group, selecting parameter group family postgresversion, where the version matches the existing parameter group version.

  2. Associate a DB parameter group with your database instance.

  3. Modify parameters in the DB parameter group

    1. Set rds.logical_replication to 1. This enables logical replication.

    2. Set max_replication_slots and max_wal_sender to the number of concurrent Data Replication assets that you want to run. The default values of 10 are sufficient for this tutorial.

    3. Save the changes.
  4. Apply the changes. Open the RDS console, in the navigation pane select Databases, select your database instance, and from the Actions menu select Reboot.

Configure security to allow connections from IBM Cloud

The database that was created in the earlier steps is publicly accessible but does not allow connections from IBM Cloud. In this step, you will grant access for IBM Cloud to connect to the database.

  1. Open the RDS console. In the navigation pane select Databases, select your database instance, and click the Connectivity & security link.

  2. Under heading VPC security groups, click the link for the active security group.

  3. Scroll down and click Edit inbound rules.

  4. Click Add rule. Select Type PostgreSQL, Source Anywhere.

  5. Click Save rules. This will also modify the Outbound rules accordingly.

Install the PostgreSQL database client of your choice

Now, create a PostgreSQL schema and a table to replicate. To do this, you need a PostgreSQL database client. If you have a preferred database client already installed, you can skip this step.

Some recommendations:

  • SQL Workbench - The AWS tutorial section Download a SQL Client walks you through installing the SQL Workbench database client and PostgreSQL JDBC driver.
  • dbVisualizer - Another popular client is dbVisualizer.
  • dBeaver - An alternative client is dBeaver.

Create sample PostgreSQL tables

Next, create tables to replicate. The following steps show you how to create a schema with two tables with referential integrity. If you have existing tables that you want to use, you can skip this section.

These steps assume the use of SQL Workbench.

  1. In the Statement panel, copy and paste the following SQL statements.

     CREATE SCHEMA sales;
     CREATE TABLE sales.rewards_members (member_id integer not null,
       email_address varchar(128) not null,
       total_points integer not null,
       phone_number varchar(30),
       PRIMARY KEY (MEMBER_ID));
    
     CREATE TABLE IF NOT EXISTS sales.transactions (transaction_id SERIAL PRIMARY KEY,
         member_id integer REFERENCES sales.rewards_members (member_id),
         points_awarded integer default 0,
         points_redeemed integer default 0,
         created_at TIMESTAMP DEFAULT Now());    
    
     INSERT INTO sales.rewards_members (member_id,
       email_address,
       total_points,
       phone_number)
     VALUES (1,
       '[email protected]',
        100, 
        12345678);
    
     INSERT INTO sales.transactions(points_awarded, points_redeemed, member_id) 
     VALUES (83, 25, 1);
    
     COMMIT;
    
  2. Run the SQL statements. Using SQL workbench, click the play button.

  3. Next, type in the following query, highlight it and run it. Using SQL Workbench, click the green play button again (highlighting runs the current instruction only):

     SELECT * FROM sales.transactions;
    

    The output will be the row you inserted.

  4. Check that your PostgreSQL user ID has the proper privileges for replication. Using SQL Workbench, type in this query, highlight it, and run it by clicking the green play button.

     SELECT setting FROM pg_settings WHERE NAME='rds.logical_replication';
    
  5. If your PostgreSQL database instance has the proper privileges, the setting column will have value on. If the value is not on, return to the steps in Configure the PostgreSQL database for replication.

You are now ready to create a Data Replication service instance, associate the service, create and use connections, and create Data Replication assets.

Start using Data Replication

Log in to IBM Cloud

Log in to IBM Cloud. If you do not have an IBMid, please Create an account and then log in.

Create a Data Replication service instance

Before setting up a Data Replication asset, you need a Data Replication service.

  1. In the search bar at the top of the page, search and select Data Replication.

  2. The Lite plan is selected for you. If you are prompted to select a region, select Global.

  3. Enter a service name and click Create.

For returning users: If you have recently deleted a Lite instance and then receive a 400 - Only one free environment is allowed per resource group error message when creating a new environment in a new Lite instance, you need to finish deleting the original Lite instance. See ibmcloud resource reclamations and follow the reclamation-delete instructions.

Your service is set up.

Activate Cloud Pak for Data as a Service

If you have signed in to Cloud Pak for Data as a Service before, clicking the Launch tool button will take you to the Cloud Pak for Data as a Service landing page.

If you have not signed in to Cloud Pak for Data as a Service before, clicking the Launch tool button will prompt you to select an account. Select your individual IBM id (rather than a company account). This activates Cloud Pak for Data as a Service on your account. Cloud Pak for Data as a Service has the building block microservices that the Data Replication service will use later.

Complete the following steps:

  1. If prompted to pick your region for services and data, select Dallas.
  2. From the Cloud Pak for Data navigation menu, choose Services > Services catalog.
  3. Select Cloud Object Storage.
  4. Select the free tier. Many services, including Data Replication, will use a small amount of cloud object storage.
  5. Click Continue. Cloud Pak for Data as a Service will deploy, then take you to the Cloud Pak for Data as a Service landing page.

If you experience difficulties in this step, there are additional videos and details in Signing up for Cloud Pak for Data as a Service.

Create a new project

  1. From the Cloud Pak for Data as a Service homepage, click New project.

  2. Click Create an empty project.

  3. Enter RewardsProgram as the project name.

  4. Click Create.

  5. Click the Assets tab. In a later step, you will add connection assets and a Data Replication asset.

Now that you have a project created, configure the project to use the Data Replication service you created earlier by associating the Data Replication service with the project. The project will then know which Data Replication service to use.

The following step shows you how to associate the Data Replication service to your project.

Associate Data Replication

  1. In the RewardsProgram project, click the Manage tab.

  2. Under Services and integrations, click Associate service > New service.

  3. Scroll down or use the search bar to select Data Replication.

  4. Click Create, then close the dialog.

The project is now configured to enable Data Replication.

Create connections

After creating the project, you will see your project's dashboard. To define a Data Replication asset, you need to create connections to the database source you want to replicate from and the target database that the replicated data will go.

The following steps will show you how to create a source connection and a target connection.

Create a source Connection

  1. Click the Assets tab.

  2. Click New asset and select Connection.

  3. Select Amazon RDS for PostgreSQL connection. Enter the following information:

    • Name: Enter source-aws-postgres.

    • Database: Enter myDatabase as the database name, as was determined when you created a PostgreSQL database.

    • Hostname: To find the hostname, sign in to the AWS Console, navigate to Databases, select your database instance, and copy the Endpoint value listed on the Connectivity & security tab.

    • Port: In this tutorial, it is the database default, 5432.

      Under Credentials, complete the following:

    • Username: In this tutorial, it is masterUsername.

    • Password: In this tutorial, it is the value you configured when creating the database service.

    • Checkmark Port is SSL-enabled.

    • Uncheck Validate the SSL certificate.

  4. Click Test connection. If this is not successful, check that the credentials and database were entered correctly and test again.

  5. Click Create.

Create a target Connection

  1. Ask your Db2 Warehouse administrator to add a user for you. Request that it be of type Add user, not of type Add IBMid user, with user privilege User (Administrator is not required). Request the Db2 Warehouse hostname, port number and database name.

  2. Click New asset and select Connection.

  3. Select the Db2 Warehouse connection. Name it target-db2-warehouse.
    Database: The default database name is BLUDB. Replace BLUDB with the database name provided by your Db2 administrator.

    Hostname: [as provided by your Db2 administrator]

    Port: The default port is 50001.

    Username: [as provided by your Db2 administrator]

    Password: [as provided by your Db2 administrator]

    No API keys are needed

  4. Checkmark Port is SSL-enabled.

  5. Click Test connection. If this is not successful, check that the credentials and database were entered correctly and test again.

  6. Click Create.

Your source and target connections are now ready. Next, create a Data Replication asset and configure it.

Create a Data Replication asset

  1. Click New asset.

  2. Select the Data Replication asset.

  3. Provide a name and optionally a description. Use postgres-to-db2 for the name.

  4. The Copy business goal is already selected for you.

  5. Click Connections at the bottom to move to next step. Notice that the Source connection is set to source-aws-postgres automatically, and the Target connection is set to target-db2-warehouse.

  6. Click Select data to move to the next step. After all the schemas from the source connection are loaded, select the sales schema. You can either select the schema or select specific tables.

  7. Click Target options. If the target schema is set to Default, Data Replication will create a schema with the same name as the source schema. You have the option to specify the target schema.

  8. Click Review to review your selections.

  9. Click Create.

The replication is now defined and ready to run. The following steps will show you how to start using Data Replication.

Start replication

  1. From the RewardsProgram project, click the Assets tab.

  2. Click the Data Replication asset.

  3. Click the run icon.

  4. The first time you click run on a Data Replication asset, an API key configuration dialog will appear. This API key updates information in the project.

    Creating an API key ensures that the user has the right privileges to run replication and that unauthorized users cannot stop, start, or edit data replication assets.

    If the authorized user is removed from the account, the API key is no longer valid and the replication stops. Other users with access can open the same Data Replication asset and restart replication.

    1. Click Generate an API key.
    2. A Name and Description are provided for you. Edit these if you wish.
    3. Click Generate and run.
    4. To save your API key, click the Copy icon, and paste into the password manager software of your choice.
    5. Click Close.

The replication status will change to starting and you can see progress in the event log. The status will update to running once the replication has completed initialization and started replicating data.

The replication will continue to run until you stop it. Any changes to the selected source schema tables will be replicated in the target database tables.

Monitor replication

You can view the status of the Data Replication asset on the details page. There, you can view a summary of the source and target connections, schema mappings, job details, event logs and metrics.

Clicking View all replications takes you to the Replication dashboard. Click on the asset to return to its details page.

You can add a connected data asset to your project to view tables from the target database. This will let you verify that the data is being replicated. The following steps show you how to create a connected data asset to view the target database tables.

Verify the target is receiving data updates from the source

  1. In the project RewardsProgram, click the Assets tab.

  2. Click New asset and select Connected data.

  3. Click Select source.

  4. Select target-db2-warehouse to be the source.

  5. Filter on your schema SALES*.

  6. Select table TRANSACTIONS.

  7. Click Select.

  8. Name your connected data asset target-db2-transactions, to match the table name.

  9. Click Create.

  10. On the project Assets tab, click target-db2-transactions to open it. It will show a preview with the first 1000 rows.

  11. Observe that the CREATED_AT column shows the timestamp of when the row was created on the source PostgreSQL database.

  12. Click Refresh to see the data update on the target Db2 Warehouse database.

Insert data into Amazon RDS for PostgreSQL and watch it replicate to Db2

Now let's replicate data.

  1. Go back to the PostgreSQL database client you used in Step 5.
  2. In the Statement panel, copy and paste the following SQL statements.

     INSERT INTO sales.transactions(points_awarded, points_redeemed, member_id) 
     VALUES (10, 70, 1);
    
     COMMIT;
    
  3. If you are using SQL Workbench, run the query by clicking the play button at the upper-left.

  4. Click Refresh on target-db2-transactions to see the data update on the target Db2 Warehouse database.

Congratulations! You have completed the basic tasks for the beta version of Data Replication and your replication is now running.

Pausing or stopping your replication

You have the option to pause or stop your replication.

Pausing your replication If you would like to temporarily stop your replication, click the pause button after clicking on your project's Data Replication asset. Pausing your replication acts as a bookmark, allowing you to resume the replication at the stage it was last left.

Stopping your replication If you choose to completely stop your Data Replication, it will permanently delete all running Kubernetes clusters (microservices). Stopping then starting your replication is the same as setting up a brand new data replication asset. Click the stop button after clicking on your project's data relication asset.

Parent topic: Data Replication (beta)