0 / 0

Storing and persisting metrics

Last updated: Jul 10, 2025
Storing and persisting metrics

DataStage® records various information about job runs, that can be viewed for runs in the job run details and run metrics screens. Metrics can also be stored for a longer period by setting up a metrics repository in a database.

Recorded information about job runs includes:
  • job run information, such as name and start/stop time,
  • stage and link information, such as the rows read/written for each stage and link,
  • other information such as the values of parameters for a run.

With the ds-metrics service, you can set up metrics storage at the project level for all your DataStage flows. While metrics are enabled by default for your flows, you must manually enable persisting metrics at a project level. For more information about viewing job run and metrics, see Viewing job run details and run metrics.

Prerequisite

IBM DataStage supports sending the DataStage metrics data to a separate metrics database. PostgreSQL is the designated database for storing metrics, offering a lightweight solution that is not intended as an operational data store. It enables users to run their own queries for insights into job performance. Depending on client preferences, their PostgreSQL database can be hosted within the same Cloud Pak for Data environment, operated on a virtual machine, or managed within a PostgreSQL service.

Creating a connection

To enable repository metrics you need to open your project, and click the Manage tab. On the left side panel, click DataStage. After that, click the Repository tab. Click on Enable persisting metrics option. Next you need to provide your credentials to make a connection. Specify a connection type, configure properties and security details, and test the connection to verify if it works. Click Save button at the top of the page.

To activate repository metrics at the project level, ensure that you create a flow after saving the connection. Once repository metrics are enabled for a specific job, they will not be available for any other job by default. To enable metrics for additional jobs, you must recompile and re-run each job individually. To verify whether repository metrics are active, review the job logs after execution. The parameter DSMetricsRepoEnabled must be set to True

You must use a clean database, or the one that was previously initialized by ds-metrics. To ensure that a database is clean and to clear any previous ds-metrics data, run the following commands.
drop schema if exists ds_metrics cascade;
drop table if exists public.databasechangelog;
drop table if exists public.databasechangeloglock;
To use a database with ds-metrics, the database user must have both the permission to create schemas in the database, and the permission to create tables in the public schema. To check for permissions to create tables and schemas, log in to the database and run the following query as the same user who requires the permissions.
select has_database_privilege(current_database(), 'create'), has_schema_privilege('public', 'create');
if you have create-schema and create-table permissions, the query returns true, true.

Metrics retention

The Metrics retention field controls how frequently, and if at all, ds-metrics removes old data from the database. If you enable the Max limit option, you can choose between the Days or Runs settings. For example, if you set 30 days, metrics deletes all information that is related to the job and to all its runs when a job is over 30 days old. If you set 100 runs, metrics keeps the 100 most recent runs, and deletes the older runs. The Metrics retention field applies only to metrics data related to the project in which the settings are activated. If the database is shared with another project, the retention settings do not impact the data from that project.

After you save the connection settings, it triggers ds-metrics to clear its connection cache. When you run a job in the project, ds-metrics connects to the database for the first time. After a connection is made, it populates the ds-metrics schema after it is initialized, or migrates it if the schema in the database is an older version.

Database structure

In the metrics database, most tables are in the ds-metrics schema. public.databasechangelog and public.databasechangeloglock are extra internal tables in the public schema that are used to track the database schema versions.

Table schemas

Table 1.
Table schema Description Parameters
disk_metrics Stores disk size metrics over time.
  • pod_id
  • timestamp
  • mount_point
  • space_remaining
pod_disk Stores information about disks mounted to the pods.
  • pod_id
  • mount_point
  • size
pod_metrics Stores CPU and memory metrics over time.
  • pod_id
  • timestamp
  • core_usage
  • memory_usage
job_run_link_metrics Stores metrics for links in a job run while a job is running.
  • run_id
  • link_name
  • source
  • dest
  • state
  • last_update_time
  • start_time
  • stop_time
  • rows_read
  • rows_written
  • total_rows
  • duration
job_run_log Stores non-info log lines from a job run when a job run ends.
  • run_id
  • event_index
  • log_timestamp
  • log_level
  • message_id
  • message_text
job_run_parameter Stores parameters for a job run when a job run starts.
  • run_id
  • name
  • type
  • value
job_run_stage_metrics Stores metrics for stages in a job run while a job is running.
  • run_id
  • stage_name
  • stage_type
  • state
  • last_update_time
  • start_time
  • stop_time
  • duration
  • rows_read
  • rows_written
  • stage_seconds_cpu
  • total_memory
  • num_partitions
  • partition_row_counts
instance Stores information about the px-runtime instances.
  • instance_id
  • create_time
  • last_update_time
  • description
  • runtime_version
  • jobs_queued
  • jobs_running
  • jobs_completed
  • jobs_failed
pod Stores information about the px-runtime conductor and compute pods.
  • pod_id
  • instance_id
  • head_pod_id
  • create_time
  • core_limit
  • memory_limit
jub_run Stores information about the job run while a job is running.
  • run_id
  • job_name
  • last_update_time
  • start_time
  • stop_time
  • duration
  • run_status
  • user_status
  • queue_name
  • partition
  • config_file
  • instance_id
  • controller_id
  • conductor_pid
  • create_time
  • scheduled
job Stores information about the job when a job run starts.
  • job_id
  • flow_id
  • project_id
  • job_name
  • flow_name
  • project_name
  • type
  • create_time
version Stores the version of ds-metrics that last initialized the database, unless the database was set up manually.
  • version