Storing and persisting metrics
DataStage® records various information about job runs, that can be viewed for runs in the job run details and run metrics screens. Metrics can also be stored for a longer period by setting up a metrics repository in a database.
- job run information, such as name and start/stop time,
- stage and link information, such as the rows read/written for each stage and link,
- other information such as the values of parameters for a run.
With the ds-metrics
service, you can set up metrics storage at the project level
for all your DataStage flows. While metrics are enabled
by default for your flows, you must manually enable persisting metrics at a project level. For more
information about viewing job run and metrics, see Viewing job run details and
run metrics.
Prerequisite
IBM DataStage supports sending the DataStage metrics data to a separate metrics database. PostgreSQL is the designated database for storing metrics, offering a lightweight solution that is not intended as an operational data store. It enables users to run their own queries for insights into job performance. Depending on client preferences, their PostgreSQL database can be hosted within the same Cloud Pak for Data environment, operated on a virtual machine, or managed within a PostgreSQL service.
Creating a connection
To enable repository metrics you need to open your project, and click the Manage tab. On the left side panel, click DataStage. After that, click the Repository tab. Click on Enable persisting metrics option. Next you need to provide your credentials to make a connection. Specify a connection type, configure properties and security details, and test the connection to verify if it works. Click Save button at the top of the page.
To activate repository metrics at the project level, ensure that you create a flow after saving
the connection. Once repository metrics are enabled for a specific job, they will not be available
for any other job by default. To enable metrics for additional jobs, you must recompile and re-run
each job individually. To verify whether repository metrics are active, review the job logs after
execution. The parameter DSMetricsRepoEnabled
must be set to
True
ds-metrics
. To ensure that a database is clean and to clear any previous
ds-metrics
data, run the following
commands.drop schema if exists ds_metrics cascade;
drop table if exists public.databasechangelog;
drop table if exists public.databasechangeloglock;
To use a database with
ds-metrics
, the database user must have both the permission to create schemas in
the database, and the permission to create tables in the public
schema. To check
for permissions to create tables and schemas, log in to the database and run the following query as
the same user who requires the
permissions.select has_database_privilege(current_database(), 'create'), has_schema_privilege('public', 'create');
if
you have create-schema and create-table permissions, the query returns true,
true
.Metrics retention
The Metrics retention field controls how frequently, and if at all,
ds-metrics
removes old data from the database. If you enable the Max
limit option, you can choose between the Days or
Runs settings. For example, if you set 30 days, metrics deletes all
information that is related to the job and to all its runs when a job is over 30 days old. If you
set 100 runs, metrics keeps the 100 most recent runs, and deletes the older runs. The
Metrics retention field applies only to metrics data related to the project
in which the settings are activated. If the database is shared with another project, the retention
settings do not impact the data from that project.
After you save the connection settings, it triggers ds-metrics
to clear its
connection cache. When you run a job in the project, ds-metrics
connects to the
database for the first time. After a connection is made, it populates the
ds-metrics
schema after it is initialized, or migrates it if the schema in the
database is an older version.
Database structure
In the metrics database, most tables are in the ds-metrics
schema.
public.databasechangelog
and public.databasechangeloglock
are
extra internal tables in the public schema that are used to track the database schema versions.
Table schemas
Table schema | Description | Parameters |
---|---|---|
disk_metrics | Stores disk size metrics over time. |
|
pod_disk | Stores information about disks mounted to the pods. |
|
pod_metrics | Stores CPU and memory metrics over time. |
|
job_run_link_metrics | Stores metrics for links in a job run while a job is running. |
|
job_run_log | Stores non-info log lines from a job run when a job run ends. |
|
job_run_parameter | Stores parameters for a job run when a job run starts. |
|
job_run_stage_metrics | Stores metrics for stages in a job run while a job is running. |
|
instance | Stores information about the px-runtime instances. |
|
pod | Stores information about the px-runtime conductor and compute pods. |
|
jub_run | Stores information about the job run while a job is running. |
|
job | Stores information about the job when a job run starts. |
|
version | Stores the version of ds-metrics that last initialized the database, unless
the database was set up manually. |
|