Updates for Data Refinery July 12
Data Refinery introduces many enhancements on July 12, 2019. Mandatory migration steps for scheduled Data Refinery flows and important FAQs are listed here.
Moving scheduled Data Refinery flows to jobs
Effective July 12, 2019, there will be a new way to run a Data Refinery flow. Instead of scheduling a Data Refinery flow, you will run a Data Refinery flow as a job with a new Jobs user interface. From the Project’s new Jobs tab, you can view all the jobs in a project. Click New job to create a job. Alternatively, create a job in Data Refinery. For more information about jobs, see Jobs in a project.
Important: You must manually move your currently scheduled Data Refinery flows to the new Jobs interface before August 12, 2019.
Re-create your scheduled Data Refinery flows as jobs
- View the details of your currently scheduled Data Refinery flow. On the Project Assets page, select a Data Refinery flow, and then click View schedule (deprecated) from the ACTIONS menu.
- Under the Runs section, go to the Schedule tab. Make note of the schedule details.
- Go back to the Project Assets page, select the Data Refinery flow, and then click Create job from the ACTIONS menu.
- Enter the details for the job, including the job name, runtime, and schedule.
- Click Create.
- Delete the currently scheduled Data Refinery flow:
- On the Project Assets page, select the Data Refinery flow, and then click View schedule (deprecated) from the ACTIONS menu.
- Under the Runs section, go to the Schedule tab. Click the Delete icon to delete the schedule.
Q: How do I know which of my Data Refinery flows have scheduled runs?
A: You need to view each scheduled Data Refinery flow individually, view the schedule details, and create a job for it. (The new Jobs interface gives you a view of all jobs.)
Q: What happens if I do not move the scheduled Data Refinery flow run to a job before August 12, 2019?
A: The user interface for scheduled Data Refinery flow runs will be removed on August 12, 2019. You will not have access to the details for scheduled Data Refinery flow runs. Scheduled runs will not happen. Only the jobs in the Jobs interface will be available.
Q: What happened to the “Add Schedule” and “Edit Schedule” links in Data Refinery?
A: The “Add Schedule” and “Edit Schedule” links have been removed. You can use the Jobs selections on the Data Refinery toolbar to save and create a job or save and view jobs.
Q: Where did the Run icon in Data Refinery go?
A: The Run icon has been removed. To run a Data Refinery flow, you save and create a job, which you can run immediately or schedule for later.
Q: In Data Refinery, when I click Edit from the Details tab, a page opens with DATA REFINERY FLOW DETAILS, where I can select a runtime environment. But I can also select the runtime in the Create a job user interface. What’s the difference?
A: The Runtime environment that you select in Data Refinery will be the default runtime environment for this Data Refinery flow. You can change the runtime environment when you create the job. If you do not make any changes, the default runtime environment is
Default Data Refinery XS. For information about runtime environments, see Data Refinery environments.
Q: Where did the
None - Use Data Refinery Default runtime environment go?
None - Use Data Refinery Default runtime environment is discontinued. It is replaced by
Default Data Refinery XS, which is HIPAA ready.
Q: How do I know which runtime environment to select when I create a job?
Default Data Refinery XS runtime environment is best for Data Refinery flows that operate on small data sets. If you have a Data Refinery flow with a large data set, use a Spark runtime environment. For information about runtime environment selections, see Data Refinery environments.
Q: I used to be able to change the source data asset on the Summary page. The icon is missing. Where did it go?
A: Change source has moved to the Steps interface in Data Refinery. Click the edit icon next to Data Source to choose a different source data asset.
As before, for best results, the new data set should have a schema that is compatible to the original data set (for example, column names, number of columns, and data types). If the new data set has a different schema, operations that won’t work with the schema will show errors. You can edit or delete the operations, or change the source to one that has a more compatible schema.
Q: What happened to the Refine option under the ACTIONS menu for a Data Refinery flow on the project’s Assets page?
A: On the project’s Assets page, click the Data Refinery flow name to go directly to Data Refinery.
Q: I used to be able to run a Data Refinery flow from the project’s Assets page by selecting the Data Refinery flow, then selecting the Run option from the ACTIONS menu. Now what do I do?
A: You run a Data Refinery flow now as a job. On the project’s Assets page, select the Data Refinery flow. On the ACTIONS menu, select Create job. Enter the details for the job, and then click Create and Run. If you already have a job for that Data Refinery flow, click View job or View jobs.
Q: Why do I see an active environment runtime for Data Refinery when I open the project’s Environments tab?
A: One active environment runtime is started for each project-user combination when you open Data Refinery.
Important: Active runtimes for Data Refinery consume CUH. You can stop the runtime by clicking Stop from the ACTIONS menu. The runtime will stop automatically after one hour of inactivity.