Managing a remote engine for DataStage Anywhere

Last updated: Dec 09, 2024

DataStage® Anywhere supports maintenance, updates, and other data considerations with remote runtime engines.

Maintenance

With DataStage-aaS Anywhere, the control plane remains hosted on and managed by IBM Cloud.

You are responsible for managing your data plane through the remote engine. Automated scripts are available to update a remote engine, which you can do by downloading the container image to your internal registries and deploying with those internal registries. See https://github.com/IBM/DataStage/blob/main/RemoteEngine/docker/README.md for simple controls including creating, running, cleaning, and upgrading a remote engine.

Scaling

You can add or remove remote engines to scale deployments throughout the month. There is no deployment limit, but you are charged for the maximum amount of VPCs deployed each month whether or not they are used.

Disaster recovery

Deploy additional remote engines to support disaster recovery.

Data observability

You can put an observability solution in place within your container management platform. Databand is integrated with DataStage Anywhere and can monitor DataStage pipelines.

Storage

The DataStage operator mounts default storage to the remote engine's Kubernetes pods. To add additional storage with persistent volumes, see https://www.ibm.com/docs/en/cloud-paks/cp-data/5.0.x?topic=administering-setting-up-nfs-mount.

Enabling alternative Cloud Object Storage location for remote engine logs

By default job run logs for the remote engine are pushed to the default bucket in IBM Cloud Object Storage (COS). You can enable an alternate COS location for storing the job run logs.

To disable pushing job run logs to IBM COS default bucket for the kubernetes deployment, use the following command:

kubectl -n <namespace> set env deployment/<instance-name>-ibm-datastage-px-runtime DISABLE_REMOTE_LOG_PUSH =true

To enable pushing logs to the alternative COS location, for the kubernetes deployment, use the following command that creates the secret containing new COS location:

kubectl -n <namespace> create secret generic datastage-log-cos-location \
--from-literal=CUSTOM_S3_BUCKET_NAME=<bucket-name> \
--from-literal=CUSTOM_S3_REGION=<region> \
--from-literal=CUSTOM_S3_ENDPOINT=<endpoint> \
--from-literal=CUSTOM_S3_ACCESS_KEY=<access-key> \
--from-literal=CUSTOM_S3_SECRET_KEY=<secret-key>

This command triggers pod restart, if you use the disabling command first, you must restart the pod manually.

Setting proxy information

Container deployment supports proxy information. To set the proxy information to a remote engine for DataStage Anywhere, set the following environment variable in the container:

REMOTE_HTTPS_PROXY=http://username:password@host:port

Currently the proxy support is not available for Kubernetes deployment.

Was the topic helpful?

0/1000