0 / 0
Compute resource options for the notebook editor in projects
Compute resource options for the notebook editor in projects

Compute resource options for the notebook editor in projects

When you run a notebook in the notebook editor in a project, you choose an environment template, which defines the compute resources for the runtime environment. The environment template specifies the type, size, and power of the hardware configuration, plus the software configuration. For notebooks, environment templates include a supported language of Python, R, or Scala.

Types of environments

You can use these types of environments for running notebook:

Most environment types for notebooks have default environment templates so you can get started quickly. Otherwise, you can create custom environment templates.

Environment types for notebooks
Environment type Default templates Custom templates
Anaconda CPU
Spark clusters
GPU

Runtime releases

The default environments for notebooks are added as an affiliate of a runtime release and prefixed with Runtime followed by the release year and release version.

A runtime release specifies a list of key data science libraries and a language version, for example Python 3.10. All environments of a runtime release are built based on the library versions defined in the release, thus ensuring the consistent use of data science libraries across all data science applications.

The first runtime release in 2022 is available for Python 3.9 only and is prefix by Runtime 22.1. The second release in 2022 is available for Python 3.10 and R 4.2 and is prefixed by Runtime 22.2.

While a runtime release is supported, IBM will update the library versions to address security requirements. Note that these updates will not change the <Major>.<Minor> versions of the libraries, but only the <Patch> versions. This ensures that your notebook assets will continue to run.

For example: A runtime release supports TensorFlow 2.9. In Cloud Pak for Data 4.6, the runtime release will contain TensorFlow 2.9.0. Although TensorFlow might be updated to version 2.9.1 or 2.9.2 in later Cloud Pak for Data 4.6.x releases, it will not be updated to version 2.10.

Libraries in the 22.x Runtime releases

The 22.x Runtime releases include the following popular data science library packages for Python and R.

Runtime releases for Python 3.10 and 3.9 listing libraries and their versions:

Table 1. Packages and their versions in the 22.x Runtime releases for Python
Library Runtime 22.2 on Python 3.10 Runtime 22.1 on Python 3.9
Dali 1.15 1.9
Horovod 0.25 0.23
Keras 2.9 2.7
Lale 0.6 0.6
LightGBM 3.3 3.3
NumPy 1.23 1.20
ONNX 1.12 1.10
ONNX Runtime 1.12 1.10
OpenCV 4.6 4.5
pandas 1.4 1.3
PyArrow 8.0 5.0
PyTorch 1.12 1.10
scikit-learn 1.1 1.0
SciPy 1.8 1.7
SnapML 1.8 1.8
TensorBoard 2.9 2.7
TensorFlow 2.9 2.7
XGBoost 1.6 1.5

Runtime releases 22.2 for R 4.2 listing libraries and their versions:

Table 2. Packages and their versions in the 22.2 Runtime releases for R
Library Runtime 22.2 on R 4.2
arrow 8.0
car 3.0
caret 6.0
catools 1.18
forecast 8.16
ggplot2 3.3
glmnet 4.1
hmisc 4.7
keras 2.9
lme4 1.1
mvtnorm 1.1
pandoc 2.12
psych 2.2
python 3.10
randomforest 4.7
reticulate 1.25
sandwich 3.0
scikit-learn 1.1
spatial 7.3
tensorflow 2.9
tidyr 1.2
xgboost 1.6

The 22.x Runtime releases for Python and R include a large set of other useful libraries in addition to the libraries listed in the table. To see the full list, select the Runtime 22.2 on Python 3.10 or the Runtime 22.2 on R 4.2 environment template under Templates on the Environments page on the Manage tab of your project.

CPU environment templates

You can select any of the following default CPU environment templates for notebooks. The default environment templates are listed under Templates on the Environments page on the Manage tab of your project.

DO Indicates that the environment templates includes the CPLEX and the DOcplex libraries to model and solve decision optimization problems that exceed the complexity that is supported by the Community Edition of the libraries in the other default Python environments. See Decision Optimization notebooks.

NLP Indicates that the environment templates includes the Watson Natural Language Processing library with pre-trained models for language processing tasks that you can run on unstructured data. See Using the Watson Natural Language Processing library. This default environment should be large enough to run the pre-trained models.

~ Indicates that the environment templates requires the Watson Studio Enterprise plan. See Offering plans.

* Indicates that the environment template is deprecated.

Default CPU environment templates for notebooks
Name Hardware configuration CUH rate per hour
Runtime 22.2 on Python 3.10 XXS 1 vCPU and 2 GB RAM 0.5
Runtime 22.2 on Python 3.10 XS 2 vCPU and 8 GB RAM 2
Runtime 22.2 on Python 3.10 S 4 vCPU and 16 GB RAM 4
Runtime 22.1 on Python 3.9 XXS 1 vCPU and 2 GB RAM 0.5
Runtime 22.1 on Python 3.9 XS 2 vCPU and 8 GB RAM 2
Runtime 22.1 on Python 3.9 S 4 vCPU and 16 GB RAM 4
DO + NLP Runtime 22.2 on Python 3.10 2 vCPU and 8 GB RAM 6
DO + NLP Runtime 22.1 on Python 3.9 2 vCPU and 8 GB RAM 6
Runtime 22.2 on R 4.2 S 4 vCPU and 16 GB RAM 4
Default R 3.6 S * 4 vCPU and 16 GB RAM 4
Default R 3.6 M ~ * 16 vCPU and 64 GB RAM 8

You should stop all active CPU runtimes when you no longer need them to prevent consuming extra capacity unit hours (CUHs). See CPU idle timeout.

Notebooks and CPU environments

When you open a notebook in edit mode in a CPU runtime environment, exactly one interactive session connects to a Jupyter kernel for the notebook language and the environment runtime that you select. The runtime is started per single user and not per notebook. This means that if you open a second notebook with the same environment template in the same project, a second kernel is started in the same runtime. Runtime resources are shared by the Jupyter kernels that you start in the runtime. Runtime resources are also shared if the CPU has GPU.

If you want to avoid sharing runtimes but want to use the same environment template for multiple notebooks in a project, you should create custom environment templates with the same specifications and associate each notebook with its own template.

If necessary, you can restart or reconnect to the kernel. When you restart a kernel, the kernel is stopped and then started in the same session again, but all execution results are lost. When you reconnect to a kernel after losing a connection, the notebook is connected to the same kernel session, and all previous execution results which were saved are available.

Spark environment templates

You can select any of the following default Spark environment templates for notebooks. The default environment templates are listed under Templates on the Environments page on the Manage tab of your project.

* Indicates that the environment includes libraries from Runtime 22.1.

~ Indicates that the environment template is deprecated. Consider switching to a newer version as soon as you can.

Default Spark environment templates for notebooks
Name Hardware configuration CUH rate per hour
Default Spark 3.3 & Python 3.9 *
Default Spark 3.3 & R 3.6
2 Executors each: 1 vCPU and 4 GB RAM;
Driver: 1 vCPU and 4 GB RAM
1
Default Spark 3.2 & Python 3.9 * ~
Default Spark 3.2 & R 3.6 ~
Default Spark 3.2 & Scala 2.12 ~
2 Executors each: 1 vCPU and 4 GB RAM;
Driver: 1 vCPU and 4 GB RAM
1

You should stop all active Spark runtimes when you no longer need them to prevent consuming extra capacity unit hours (CUHs). See Spark idle timeout.

Large Spark environments

Standard and enterprise plan users can create custom environment templates for larger Spark environments.

Standard and Enterprise plan users can have up to 35 executors and can choose from the following options for both driver and executor:

Hardware configuration
1 vCPU and 4 GB RAM
1 vCPU and 8 GB RAM
1 vCPU and 12 GB RAM

The CUH rate per hour increases by 0.5 for every vCPU that is added. For example, 1x Driver: 3vCPU with 12GB of RAM and 4x Executors: 2vCPU with 8GB of RAM amounts to (3 + (4 * 2)) = 11 vCPUs and 5.5 CUH.

Notebooks and Spark environments

You can select the same Spark environment template for more than one notebook. Every notebook associated with that environment has its own dedicated Spark cluster and no resources are shared.

When you start a Spark environment, extra resources are needed for the Jupyter Enterprise Gateway, Spark Master, and the Spark worker daemons. These extra resources amount to 1 vCPU and 2 GB of RAM for the driver and 1 GB RAM for each executor. You need to take these extra resources into account when selecting the hardware size of a Spark environment. For example: if you create a notebook and select Default Spark 3.3 & Python 3.9, the Spark cluster consumes 3 vCPU and 12 GB RAM but, as 1 vCPU and 4 GB RAM are required for the extra resources, the resources remaining for the notebook are 2 vCPU and 8 GB RAM.

File system on a Spark cluster

If you want to share files across executors and the driver or kernel of a Spark cluster, you can use the shared file system at /home/spark/shared.

If you want to use your own custom libraries, you can store them under /home/spark/shared/user-libs/. There are four subdirectories under /home/spark/shared/user-libs/ that are pre-configured to be made available to Python, R and Scala or Java runtimes.

The following tables lists the pre-configured subdirectories where you can add your custom libaries.

Table 1. Pre-configured subdirectories for custom libraries
Directory Type of library
/home/spark/shared/user-libs/python3/ Python 3 libraries
/home/spark/shared/user-libs/R/ R packages
/home/spark/shared/user-libs/spark2/ Java or Scala JAR files

To share libraries across a Spark driver and executors:

  1. Download your custom libraries or JAR files to the appropriate pre-configured directory.
  2. Restart the kernel from the notebook menu by clicking Kernel > Restart Kernel. This loads your custom libraries or JAR files in Spark.

Note that these libraries are not persisted. When you stop the environment runtime and restart it again later, you need to load the libraries again.

GPU environment templates

You can select the following GPU environment template for notebooks. The environment templates are listed under Templates on the Environments page on the Manage tab of your project.

The GPU environment template names indicate the accelerator power. The GPU environment templates include the Watson Natural Language Processing library with pre-trained models for language processing tasks that you can run on unstructured data. See Using the Watson Natural Language Processing library.

~ Indicates that the environment template requires the Watson Studio Professional plan. See Offering plans.

* Indicates that the environment template is constricted and can only be used in notebooks that are already using this template. Start using a GPU V100 template.

Default GPU environment templates for notebooks
Name Hardware configuration CUH rate per hour
GPU V100 Runtime 22.2 on Python 3.10 ~ 40 vCPU + 172 GB + 1 NVIDIA TESLA V100 (1 GPU) 68
GPU V100 Runtime 22.1 on Python 3.9 ~ 40 vCPU + 172 GB + 1 NVIDIA TESLA V100 (1 GPU) 68
GPU K80 Runtime 22.1 on Python 3.9 ~ * 4 vCPU + 24 GB + 0.5 NVIDIA TESLA K80 (1 GPU) 6

You should stop all active GPU runtimes when you no longer need them to prevent consuming extra capacity unit hours (CUHs). See GPU idle timeout.

Notebooks and GPU environments

GPU environments for notebooks are available only in the Dallas IBM Cloud service region.

You can select the same Python and GPU environment template for more than one notebook in a project. In this case, every notebook kernel runs in the same runtime instance and the resources are shared. To avoid sharing runtime resources, create multiple custom environment templates with the same specifications and associate each notebook with its own template.

Default hardware specifications for scoring models with Watson Machine Learning

When you invoke the Watson Machine Learning API within a notebook, you consume compute resources from the Watson Machine Learning service as well as the compute resources for the notebook kernel.

You can select any of the following hardware specifications when you connect to Watson Machine Learning and create a deployment.

Hardware specifications available when invoking the Watson Machine Learning service in a notebook
Capacity size Hardware configuration CUH rate per hour
Extra small 1x4 = 1 vCPU and 4 GB RAM 0.5
Small 2x8 = 2 vCPU and 8 GB RAM 1
Medium 4x16 = 4 vCPU and 16 GB RAM 2
Large 8x32 = 8 vCPU and 32 GB RAM 4

Data files in notebook environments

If you are working with large data sets, you should store the data sets in smaller chunks in the IBM Cloud Object Storage associated with your project and process the data in chunks in the notebook. Alternatively, you should run the notebook in a Spark environment.

Be aware that the file system of each runtime is non-persistent and cannot be shared across environments. To persist files in Watson Studio, you should use IBM Cloud Object Storage. The easiest way to use IBM Cloud Object Storage in notebooks in projects is to leverage the project-lib package for Python or the project-lib package for R.

Compute usage by service

The notebook runtimes consumes compute resources as CUH from Watson Studio, while running default or custom environments. You can monitor the Watson Studio CUH consumption in the project on the Resource usage page on the Manage tab of the project.

Notebooks can also consume CUH from the Watson Machine Learning service when the notebook invokes the Watson Machine Learning to score a model. You can monitor the total monthly amount of CUH consumption for the Watson Machine Learning service on the Resource usage page on the Manage tab of the project.

Track CUH consumption for Watson Machine Learning in a notebook

To calculate capacity unit hours consumed by a notebook, run this code in the notebook:

CP =  client.service_instance.get_details()
CUH = CUH["entity"]["usage"]["capacity_units"]["current"]/(3600*1000)
print(CUH)

For example:

'capacity_units': {'current': 19773430}

19773430/(3600*1000)

returns 5.49 CUH

For details, see the Service Instances section of the IBM Watson Machine Learning API documentation.

Runtime scope

Environment runtimes are always scoped to an environment template and a user within a project. If different users in a project work with the same environment, each user will get a separate runtime.

If you select to run a version of a notebook as a scheduled job, each scheduled job will always start in a dedicated runtime. The runtime is stopped when the job finishes.

Changing the environment of a notebook

You can switch environments for different reasons, for example, you can:

  • Select an environment with more processing power or more RAM
  • Change from using an environment without Spark to a Spark environment

You can only change the environment of a notebook if the notebook is unlocked. You can change the environment:

  • From the notebook opened in edit mode:

    1. Save your notebook changes.
    2. Click the Notebook Info icon (Notebook Info icon) from the notebook toolbar and then click Environment.
    3. Select another template with the compute power and memory capacity from the list.
    4. Select Change environment.
      This stops the active runtime and starts the newly selected environment.
  • From the Assets page of your project:

    1. Select the notebook in the Notebooks section, click Actions > Change Environment and select another environment. The kernel must be stopped before you can change the environment. This new runtime environment will be instantiated the next time the notebook is opened for editing.
  • In the notebook job by editing the job template. See Editing job settings.

Next steps

Learn more

Parent topic: Choosing compute resources for tools