Storing and persisting metrics
DataStage® records various information about job runs, that can be viewed for runs in the job run details and run metrics screens. Metrics can also be stored for a longer period by setting up a metrics repository in a database.
- job run information, such as name and start/stop time,
- stage and link information, such as the rows read/written for each stage and link,
- other information such as the values of parameters for a run.
With the ds-metrics
service, you can set up metrics storage at the project level
for all your DataStage flows. While metrics are enabled
by default for your flows, you must manually enable persisting metrics at a project level. For more
information about viewing job run and metrics, see Viewing job run details and
run metrics.
Prerequisite
IBM DataStage supports sending the DataStage metrics data to a separate metrics database. PostgreSQL is the designated database for storing metrics, offering a lightweight solution that is not intended as an operational data store. It enables users to run their own queries for insights into job performance. Depending on client preferences, their PostgreSQL database can be hosted within the same Cloud Pak for Data environment, operated on a virtual machine, or managed within a PostgreSQL service.
Creating a connection
Under the Manage tab of your Cloud Pak for Data project, go to DataStage > Metrics Repository and enable persisting metrics. Specify a connection type, configure properties and security details, and test the connection to verify that it works.
ds-metrics
. To ensure that a database is clean and to clear any previous
ds-metrics
data, run the following
commands.drop schema if exists ds_metrics cascade;
drop table if exists public.databasechangelog;
drop table if exists public.databasechangeloglock;
To use a database with
ds-metrics
, the database user must have both the permission to create schemas in
the database, and the permission to create tables in the public
schema. To check
for permissions to create tables and schemas, log in to the database and run the following query as
the same user who requires the
permissions.select has_database_privilege(current_database(), 'create'), has_schema_privilege('public', 'create');
if
you have create-schema and create-table permissions, the query returns true,
true
.Metrics retention
The Metrics retention field controls how frequently, and if at all,
ds-metrics
removes old data from the database. If you enable the Max
limit option, you can choose between the Days or
Runs settings. For example, if you set 30 days, metrics deletes all
information that is related to the job and to all its runs when a job is over 30 days old. If you
set 100 runs, metrics keeps the 100 most recent runs, and deletes the older runs. The
Metrics retention field applies only to metrics data related to the project
in which the settings are activated. If the database is shared with another project, the retention
settings do not impact the data from that project.
After you save the connection settings, it triggers ds-metrics
to clear its
connection cache. When you run a job in the project, ds-metrics
connects to the
database for the first time. After a connection is made, it populates the
ds-metrics
schema after it is initialized, or migrates it if the schema in the
database is an older version.
Database structure
In the metrics database, most tables are in the ds-metrics
schema.
public.databasechangelog
and public.databasechangeloglock
are
extra internal tables in the public schema that are used to track the database schema versions.
Table schemas
For tables, ds_metrics
schema is used as the main schema.
job
- Stores information about the job when a job run starts.
job_run
- Stores information about the job run while a job is running.
job_run_parameter
- Stores parameters for a job run when a job run starts.
job_run_log
- Stores non-info log lines from a job run when a job run ends.
job_run_stage_metrics
- Stores metrics for stages in a job run while a job is running.
job_run_link_metrics
- Stores metrics for links in a job run while a job is running.
version
- Stores the version of
ds-metrics
that last initialized the database, unless the database was set up manually.
instance
- Stores information about the px-runtime instances.
pod
- Stores information about the px-runtime conductor and compute pods.
pod_disk
- Stores information about disks mounted to the pods.
pod_metrics
- Stores CPU and memory metrics over time.
disk_metrics
- Stores disk size metrics over time.