Specifying the format of your data in Data Refinery
You can specify format options for CSV or delimited files.
When your data is read into Data Refinery, it should look like a well-formatted spreadsheet. If it doesn’t display in tabular form or it doesn’t look as you’d expect, go to the Data tab. Scroll down to the SOURCE FILE information at the bottom of the page. Click the “Specify data format” icon .
Tip: Applying the format options might be an iterative process. Inspect your data after you apply an option. For example, changing the field delimiter to a semicolon might not work if the required delimiter is a pipe symbol.
Also, when you open a file in Data Refinery, the Convert column type operation is automatically applied as the first step if any of the column data types has been inferred as a non-string data type. If you see an error in the steps, click a previous step that is not in error to put Data Refinery into snapshot view, and then change the format options.
Use the following options to ensure that Data Refinery can correctly read your data:
- Indicate whether the first row of your data contains column headers
- If your data doesn’t contain column headers, Data Refinery will add them so that you can use them in cleansing and shaping operations.
- The character encoding for the data source (UTF-8 or SJIS).
- Field delimiter
- Identify the character that separates each field or column value from the next value.
- Quote character
- Identify the character that encloses the field values. For example, CSV files typically enclose strings in double quotation marks.
None means no quote character.
- Escape character
- Identify the character that’s used to escape other characters, for example, backslashes ( \ ) are commonly used as escape characters. Escaping is a string technique that identifies characters (such as double quotation marks) as being part of a string value.
None means no escape character.
Click Apply to apply the format specification to your data and return to Data Refinery.