To optimize the running of flows, you can set up a cache on any nonterminal node. When you set up a cache on a node, the cache is filled with the data that passes through the node the next time you run the data flow. From then on, the data is read from the cache (which is stored temporarily) rather than from the data source.
Nodes with caching enabled are displayed with a special circle-backslash icon. When the data is cached at the node, the icon changes to a check mark.
To enable a cache
Hover over the node in your flow, then click the overflow menu and select
.You can turn off the cache any time by disabling it.
Caching nodes in a database
For flows that run in a database, you can cache data mid-flow to a temporary table in the database rather than the file system. When combined with SQL optimization, this may result in significant gains in performance. For example, the output from a flow that merges multiple tables to create a data mining view may be cached and reused as needed. By automatically generating SQL for all downstream nodes, performance can be further improved.
To take advantage of database caching, both SQL optimization and database caching must be enabled.
With database caching enabled, you can cache data at any nonterminal node, and the cache will be created automatically directly in the database the next time the flow runs is run. If database caching or SQL optimization is not enabled, the cache will be written to the file system instead.
To flush a cache
A circle-backslash icon by node indicates that its cache is empty. When the cache is full, the icon becomes a check mark. If you want to replace the contents of the cache, you must first flush the cache and then re-run the data flow to refill it.
Hover over the node in your flow, then click the overflow menu and select
.