Copying data from source to target

You’ll primarily use Data Refinery to read data from a source location, refine that data, and then load the analytics-ready data to a target location. But you can also use Data Refinery to simply and securely copy data from a source to a target.

To copy data:

  1. From within a project, add the data that you want to copy. This creates a data asset in the project.

  2. Create a connection for the target, if it doesn’t already exist. Be sure to use credentials that have Write permission.

    Important: Not all connection types can be targets. Review restrictions in Connection types.

  3. From the project’s Assets page, select Refine from the data asset’s menu. Alternatively, click the data asset to see a preview and then click the Refine link.

  4. Click the Run Data Refinery flow icon in the Data Refinery toolbar.

  5. In the DATA REFINERY FLOW OUTPUT panel, click the Edit Output icon and save the Data Refinery flow output (target data set) to a connection.

    1. Optionally, edit the target name.

    2. Click Change Location and then Connections.

    3. Select the connection details, then click Save Location.

  6. If you select an existing relational database table or view or you select a connected relational data asset as the target for the Data Refinery flow output, in the IMPACT TO EXISTING DATA SET drop-down, select what to do if the data set already exists in the target location:

    • Overwrite - Overwrite the rows in the existing data set with those in the Data Refinery flow output
    • Recreate - Delete the rows in the existing data set and replace them with the rows in the Data Refinery flow output
    • Insert - Append all rows of the Data Refinery flow output to the existing data set
    • Update - Update rows in the existing data set with the Data Refinery flow output; don’t insert any new rows
    • Upsert - Update rows in the existing data set and append the rest of the Data Refinery flow output to it

    For the Update and Upsert options, you’ll need to select the columns in the output data set to compare to columns in the existing data set. The output and target data sets must have the same number of columns, and the columns must have the same names and data types in both data sets.

    If you select a file in a connection as the target for your Data Refinery flow output, you can select one of the following formats for that file:

    • Avro
    • CSV
    • JSON
    • Parquet
  7. In the Edit output panel, click the checkmark to save the changes.
  8. Click Save and Run flow.

  9. Optionally, follow the progress of the run by clicking View Flow to monitor the Data Refinery flow’s status on the Data Refinery flow details page. When the Data Refinery flow completes, you can view the target data set from the Data Refinery flow details page too.

Watch this video to see how to copy data to a target.

Figure 1. Video iconCopy data to a target
This video shows you how to copy data from a source to a target.