Specifying the format of your data in Data Refinery

When your data is read into Data Refinery, it should look like a well-formatted spreadsheet. If it doesn’t display in tabular form or it doesn’t look as you’d expect, go to the Data tab. Scroll down to the SOURCE FILE information at the bottom of the page. Click the Specify data format icon. Modifying the default data format specification can help Data Refinery correctly read your data.

To specify the format of your data:

  1. Indicate whether the first row of your data contains column headers.

    If your data doesn’t contain column headers, Data Refinery will add them so that you can use them in cleansing and shaping operations.

  2. Select the appropriate character encoding for your data source, for example, CSV files are often UTF-8 encoded.

  3. Identify the character that separates each field or column value from the next value, for example, CSV files are often comma-delimited.

  4. Identify the character that encloses string values, for example, CSV files typically enclose strings in double quotation marks.

  5. Identify the character that’s used to escape other characters, for example, backslashes ( \ ) are commonly used as escape characters. Escaping is a string technique that identifies characters (such as double quotation marks) as being part of a string value.

  6. Click Apply to apply the format specification to your data and return to Data Refinery.

Important: You can only specify the data format before you start to refine your data. After you apply the first operation to a data set, the Specify data format icon is disabled. If you find that you need to specify the format after you’ve applied one or more operations, you’ll need to undo or delete those operations first. Only when the data set is restored to its original state will the Specify data format icon be re-enabled.