0 / 0
Compute resource options for the notebook editor in projects
Compute resource options for the notebook editor in projects

Compute resource options for the notebook editor in projects

When you run a notebook in the notebook editor in a project, you choose an environment template, which defines the compute resources for the runtime environment. The environment template specifies the type, size, and power of the hardware configuration, plus the software configuration. For notebooks, environment templates include a supported language of Python, R, or Scala.

Types of environments

You can use these types of environments for running notebook:

Most environment types for notebooks have default environment templates so you can get started quickly. Otherwise, you can create custom environment templates.

Environment types for notebooks
Environment type Default templates Custom templates
Anaconda CPU
Spark clusters
GPU

Runtime releases

Starting with the addition of Python 3.9 to Cloud Pak for Data as a Service, the default notebook environment templates will be added as an affiliate of a runtime release and prefixed with Runtime followed by the release year and release version.

A runtime release specifies a list of key data science libraries and a language version. All environments of a runtime release are built based on the library versions defined in the release, thus ensuring the consistent use of data science libraries across all data science applications.

The first runtime release in 2022 is available for Python 3.9 only. The runtime release prefix is Runtime 22.1.

While Runtime 22.1 is supported, IBM will update the library versions to address security requirements. Note that these updates will not change the <Major>.<Minor> versions of the libraries, but only the <Patch> versions. This ensures that your notebook assets will continue to run.

For example: Runtime 22.1 supports TensorFlow 2.7 in its initial release. During regular updates to Cloud Pak for Data as a Service, TensorFlow might be updated to version 2.7.1 or 2.7.2, but not to version 2.8.

Libraries in Runtime 22.1

Runtime 22.1 includes the following popular data science library packages:

Table 2. Packages and their versions in Runtime 22.1
Library Version
Dali 1.9
Horovod 0.23
Keras 2.7
Lale 0.6
LightGBM 3.3
NumPy 1.20
ONNX 1.10
ONNX Runtime 1.10
OpenCV 4.5
pandas 1.3
PyArrow 5.0
PyTorch 1.10
scikit-learn 1.0
SciPy 1.7
SnapML 1.8
TensorBoard 2.7
TensorFlow 2.7
XGBoost 1.5

Runtime 22.1 includes a large set of other useful libraries in addition to the libraries listed in the table. To see the full list, select the Runtime 22.1 on Python 3.9 environment template on the Environments page of a project, and view the software configuration details.

Default CPU environment templates

You can select any of the following default CPU environment templates for notebooks. These default environment templates are listed on the project’s Environments page.

DO Indicates that the environment templates includes the CPLEX and the DOcplex libraries to model and solve decision optimization problems that exceed the complexity that is supported by the Community Edition of the libraries in the other default Python environments. See Decision Optimization notebooks.

NLP Indicates that the environment templates includes the Watson Natural Language Processing library with pre-trained models for language processing tasks that you can run on unstructured data. See Using the Watson Natural Language Processing library. This default environment should be large enough to run the pre-trained models.

~ Indicates that the environment templates requires the Watson Studio Standard or Enterprise plan. See Offering plans.

Default CPU environment templates for notebooks
Name Hardware configuration CUH rate per hour
Runtime 22.1 on Python 3.9 XXS 1 vCPU and 2 GB RAM 0.5
Runtime 22.1 on Python 3.9 XS 2 vCPU and 8 GB RAM 2
Runtime 22.1 on Python 3.9 S 4 vCPU and 16 GB RAM 4
DO + NLP Runtime 22.1 on Python 3.9 2 vCPU and 8 GB RAM 6
Default R 3.6 S 4 vCPU and 16 GB RAM 4
Default R 3.6 M ~ 16 vCPU and 64 GB RAM 8

You should stop all active CPU runtimes when you no longer need them to prevent consuming extra capacity unit hours (CUHs). See CPU idle timeout.

Notebooks and CPU environments

When you open a notebook in edit mode in a CPU runtime environment, exactly one interactive session connects to a Jupyter kernel for the notebook language and the environment runtime that you select. The runtime is started per single user and not per notebook. This means that if you open a second notebook with the same environment template in the same project, a second kernel is started in the same runtime. Runtime resources are shared by the Jupyter kernels that you start in the runtime. Runtime resources are also shared if the CPU has GPU.

If you want to avoid sharing runtimes but want to use the same environment template for multiple notebooks in a project, you should create custom environment templates with the same specifications and associate each notebook with its own definition.

If necessary, you can restart or reconnect to the kernel. When you restart a kernel, the kernel is stopped and then started in the same session again, but all execution results are lost. When you reconnect to a kernel after losing a connection, the notebook is connected to the same kernel session, and all previous execution results which were saved are available.

Default Spark environment templates

You can select any of the following default Spark environment templates for notebooks. These default environment templates are listed on the project’s Environments page.

* Indicates that the environment includes libraries from Runtime 22.1.

~ Indicates that the environment is deprecated. Consider switching to a newer version as soon as you can.

Default Spark environment templates for notebooks
Name Hardware configuration CUH rate per hour
Default Spark 3.3 & Python 3.9 *
Default Spark 3.3 & R 3.6
2 Executors each: 1 vCPU and 4 GB RAM;
Driver: 1 vCPU and 4 GB RAM
1
Default Spark 3.2 & Python 3.9 * ~
Default Spark 3.2 & R 3.6 ~
Default Spark 3.2 & Scala 2.12 ~
2 Executors each: 1 vCPU and 4 GB RAM;
Driver: 1 vCPU and 4 GB RAM
1

You should stop all active Spark runtimes when you no longer need them to prevent consuming extra capacity unit hours (CUHs). See Spark idle timeout.

Large Spark environments

Standard and enterprise plan users can create custom environment templates for larger Spark environments.

Standard and Enterprise plan users can have up to 35 executors and can choose from the following options for both driver and executor:

Hardware configuration
1 vCPU and 4 GB RAM
1 vCPU and 8 GB RAM
1 vCPU and 12 GB RAM

The CUH rate per hour increases by 0.5 for every vCPU that is added. For example, 1x Driver: 3vCPU with 12GB of RAM and 4x Executors: 2vCPU with 8GB of RAM amounts to (3 + (4 * 2)) = 11 vCPUs and 5.5 CUH.

Notebooks and Spark environments

You can select the same Spark environment template for more than one notebook. Every notebook associated with that environment has its own dedicated Spark cluster and no resources are shared.

When you start a Spark environment, extra resources are needed for the Jupyter Enterprise Gateway, Spark Master, and the Spark worker daemons. These extra resources amount to 1 vCPU and 2 GB of RAM for the driver and 1 GB RAM for each executor. You need to take these extra resources into account when selecting the hardware size of a Spark environment. For example: if you create a notebook and select Default Spark 3.3 & Python 3.9, the Spark cluster consumes 3 vCPU and 12 GB RAM but, as 1 vCPU and 4 GB RAM are required for the extra resources, the resources remaining for the notebook are 2 vCPU and 8 GB RAM.

File system on a Spark cluster

If you want to share files across executors and the driver or kernel of a Spark cluster, you can use the shared file system at /home/spark/shared.

If you want to use your own custom libraries, you can store them under /home/spark/shared/user-libs/. There are four subdirectories under /home/spark/shared/user-libs/ that are pre-configured to be made available to Python, R and Scala or Java runtimes.

The following tables lists the pre-configured subdirectories where you can add your custom libaries.

Table 1. Pre-configured subdirectories for custom libraries
Directory Type of library
/home/spark/shared/user-libs/python3/ Python 3 libraries
/home/spark/shared/user-libs/R/ R packages
/home/spark/shared/user-libs/spark2/ Java or Scala JAR files

To share libraries across a Spark driver and executors:

  1. Download your custom libraries or JAR files to the appropriate pre-configured directory.
  2. Restart the kernel from the notebook menu by clicking Kernel > Restart Kernel. This loads your custom libraries or JAR files in Spark.

Note that these libraries are not persisted. When you stop the environment runtime and restart it again later, you need to load the libraries again.

Default GPU environment templates

You can select the following default GPU environment template for notebooks. This default environment template is listed on the project’s Environments page.

~ Indicates that the environment template requires the Watson Studio Standard or Enterprise plan. See Offering plans.

* Indicates that the environment templates includes the Watson Natural Language Processing library with pre-trained models for language processing tasks that you can run on unstructured data. See Using the Watson Natural Language Processing librarylibraries from Runtime 22.1.

Default GPU environment templates for notebooks
Name Hardware configuration CUH rate per hour
GPU Runtime 22.1 on Python 3.9 ~ * 4 vCPU + 24 GB + 0.5 NVIDIA TESLA K80 (1 GPU) 6

You should stop all active GPU runtimes when you no longer need them to prevent consuming extra capacity unit hours (CUHs). See GPU idle timeout.

Notebooks and GPU environments

GPU environments for notebooks are available only in the Dallas IBM Cloud service region.

You can select the same Python and GPU environment template for more than one notebook in a project. In this case, every notebook kernel runs in the same runtime instance and the resources are shared. To avoid sharing runtime resources, create multiple custom environment templates with the same specifications and associate each notebook with its own definition.

Default hardware specifications for scoring models with Watson Machine Learning

When you invoke the Watson Machine Learning API within a notebook, you consume compute resources from the Watson Machine Learning service as well as the compute resources for the notebook kernel.

You can select any of the following hardware specifications when you connect to Watson Machine Learning and create a deployment.

Hardware specifications available when invoking the Watson Machine Learning service in a notebook
Capacity size Hardware configuration CUH rate per hour
Extra small 1x4 = 1 vCPU and 4 GB RAM 0.5
Small 2x8 = 2 vCPU and 8 GB RAM 1
Medium 4x16 = 4 vCPU and 16 GB RAM 2
Large 8x32 = 8 vCPU and 32 GB RAM 4

Data files in notebook environments

If you are working with large data sets, you should store the data sets in smaller chunks in the IBM Cloud Object Storage associated with your project and process the data in chunks in the notebook. Alternatively, you should run the notebook in a Spark environment.

Be aware that the file system of each runtime is non-persistent and cannot be shared across environments. To persist files in Watson Studio, you should use IBM Cloud Object Storage. The easiest way to use IBM Cloud Object Storage in notebooks in projects is to leverage the project-lib package for Python or the project-lib package for R.

Compute usage by service

The notebook runtimes consumes compute resources as CUH from one of these service in projects:

  • Watson Studio, while running default or custom environments. You can monitor the Watson Studio CUH consumption in the project on the Environments page.
  • IBM Analytics Engine: while running a notebook in a Spark environment provided by the service.

Notebooks can also consume CUH from the Watson Machine Learning service when the notebook invokes the Watson Machine Learning to score a model. You can monitor the total monthly amount of CUH consumption for the Watson Machine Learning service on the Environments page.

Track CUH consumption for Watson Machine Learning in a notebook

To calculate capacity unit hours consumed by a notebook, run this code in the notebook:

CP =  client.service_instance.get_details()
CUH = CUH["entity"]["usage"]["capacity_units"]["current"]/(3600*1000)
print(CUH)

For example:

'capacity_units': {'current': 19773430}

19773430/(3600*1000)

returns 5.49 CUH

For details, see the Service Instances section of the IBM Watson Machine Learning API documentation.

Runtime scope

Environment runtimes are always scoped to an environment template and a user within a project. If different users in a project work with the same environment, each user will get a separate runtime.

If you select to run a version of a notebook as a scheduled job, each scheduled job will always start in a dedicated runtime. The runtime is stopped when the job finishes.

Changing the environment of a notebook

You can switch environments for different reasons, for example, you can:

  • Select an environment with more processing power or more RAM
  • Change from using an environment without Spark to a Spark environment

You can only change the environment of a notebook if the notebook is unlocked. You can change the environment:

  • From the notebook opened in edit mode:

    1. Save your notebook changes.
    2. Click the Notebook Info icon (Notebook Info icon) from the notebook toolbar and then click Environment. A short description of the environment is displayed.
    3. Select another runtime with the compute power and memory capacity from the list under Environments.
      The active runtime is stopped and the runtime you selected is instantiated.
    4. Select Change environment.
      This stops the active runtime and starts the newly selected environment.
  • From the Assets page of your project:

    1. Select the notebook in the Notebooks section, click Actions > Change Environment and select another environment. The kernel must be stopped before you can change the environment. This new runtime environment will be instantiated the next time the notebook is opened for editing.
  • In the notebook job by editing the job template. See Editing job settings.

Next steps

Learn more

Parent topic: Choosing compute resources for tools