Copying data from source to target

You’ll primarily use Data Refinery to read data from a source location, refine that data, and then load the analytics-ready data to a target location. But you can also use Data Refinery to securely copy data from a source to a target.

These instructions are for copying data from a data asset in a project to a target data source defined by a connection. You can also copy data in the other direction- from a source that is defined by a connection to a data asset in the project. Or between two connections. You can also copy data from a data asset in a project to another data asset in the same project.

To copy data:

  1. From within a project, add the data that you want to copy. This creates a data asset in the project.

  2. Create a connection for the target, if it doesn’t already exist. Be sure to use credentials that have Write permission.

    Important: Not all connection types can be targets. Review restrictions in Connection types.

  3. From the project’s Assets page, select Refine from the data asset’s menu. Alternatively, click the data asset to see a preview and then click the Refine link.

  4. In the Details side panel, click Edit.

  5. In the DATA REFINERY FLOW OUTPUT panel, click the Edit icon.

  6. Click Change Location.

  7. Click Connections, and then drill-down to the desired location.

  8. Click Save Location.

  9. If you select an existing relational database table or view or you select a connected relational data asset as the target for the Data Refinery flow output, in the IMPACT TO EXISTING DATA SET drop-down, select what to do if the data set already exists in the target location:

    • Overwrite - Overwrite the rows in the existing data set with those in the Data Refinery flow output
    • Recreate - Delete the rows in the existing data set and replace them with the rows in the Data Refinery flow output
    • Insert - Append all rows of the Data Refinery flow output to the existing data set
    • Update - Update rows in the existing data set with the Data Refinery flow output; don’t insert any new rows
    • Upsert - Update rows in the existing data set and append the rest of the Data Refinery flow output to it

    For the Update and Upsert options, you’ll need to select the columns in the output data set to compare to columns in the existing data set. The output and target data sets must have the same number of columns, and the columns must have the same names and data types in both data sets.

    If you select a file in a connection as the target for your Data Refinery flow output, you can select one of the following formats for that file:

    • Avro
    • CSV
    • JSON
    • Parquet
  10. Optional: Change the target data set name.

  11. In the Edit output panel, click the Save checkmark.

  12. Click Done.

  13. To run the Data Refinery flow, create a job for it. On the Data Refinery flow toolbar click either Save and create a job or Save and view jobs.

Watch this video to see how to copy data to a target.

Figure 1. Video iconCopy data to a target
This video shows you how to copy data from a source to a target.