When you run a notebook in the notebook editor in a project, you choose an environment template, which defines the compute resources for the runtime environment. The environment template specifies the type, size, and power of the hardware configuration, plus the software configuration. For notebooks, environment templates include a supported language of Python and R.
- Types of environments
- Runtime releases
- CPU environment templates
- Spark environment templates
- GPU environment templates
- Default hardware specifications for scoring models with Watson Machine Learning
- Data files in notebook environments
- Compute usage by service
- Runtime scope
- Changing environments
Types of environments
You can use these types of environments for running notebook:
- Anaconda CPU environments for standard workloads.
- Spark environments for parallel processing that is provided by the platform or by other services.
- GPU environments for compute-intensive machine learning models.
Most environment types for notebooks have default environment templates so you can get started quickly. Otherwise, you can create custom environment templates.
Environment type | Default templates | Custom templates |
---|---|---|
Anaconda CPU | ✓ | ✓ |
Spark clusters | ✓ | ✓ |
GPU | ✓ | ✓ |
Runtime releases
The default environments for notebooks are added as an affiliate of a runtime release and prefixed with Runtime
followed by the release year and release version.
A runtime release specifies a list of key data science libraries and a language version, for example Python 3.10. All environments of a runtime release are built based on the library versions defined in the release, thus ensuring the consistent use of data science libraries across all data science applications.
The Runtime 22.2
release is available for Python 3.10 and R 4.2.
While a runtime release is supported, IBM will update the library versions to address security requirements. Note that these updates will not change the <Major>.<Minor>
versions of the libraries, but only the <Patch>
versions. This ensures that your notebook assets will continue to run.
Libraries in the 22.x Runtime releases
The 22.x Runtime releases include the following popular data science library packages for Python and R.
Runtime releases for Python 3.10 and 3.9 listing libraries and their versions:
Library | Runtime 22.2 on Python 3.10 |
---|---|
Dali | 1.15 |
Horovod | 0.25 |
Keras | 2.9 |
Lale | 0.6 |
LightGBM | 3.3 |
NumPy | 1.23 |
ONNX | 1.12 |
ONNX Runtime | 1.12 |
OpenCV | 4.6 |
pandas | 1.4 |
PyArrow | 8.0 |
PyTorch | 1.12 |
scikit-learn | 1.1 |
SciPy | 1.8 |
SnapML | 1.8 |
TensorBoard | 2.9 |
TensorFlow | 2.9 |
XGBoost | 1.6 |
Runtime releases 22.2 for R 4.2 listing libraries and their versions:
Library | Runtime 22.2 on R 4.2 |
---|---|
arrow | 8.0 |
car | 3.0 |
caret | 6.0 |
catools | 1.18 |
forecast | 8.16 |
ggplot2 | 3.3 |
glmnet | 4.1 |
hmisc | 4.7 |
keras | 2.9 |
lme4 | 1.1 |
mvtnorm | 1.1 |
pandoc | 2.12 |
psych | 2.2 |
python | 3.10 |
randomforest | 4.7 |
reticulate | 1.25 |
sandwich | 3.0 |
scikit-learn | 1.1 |
spatial | 7.3 |
tensorflow | 2.9 |
tidyr | 1.2 |
xgboost | 1.6 |
The 22.x Runtime releases for Python and R include a large set of other useful libraries in addition to the libraries listed in the table. To see the full list, select the Runtime 22.2 on Python 3.10
or the Runtime 22.2 on R 4.2
environment template under Templates on the Environments page on the Manage tab of your project.
CPU environment templates
You can select any of the following default CPU environment templates for notebooks. The default environment templates are listed under Templates on the Environments page on the Manage tab of your project.
DO
Indicates that the environment templates includes the CPLEX and the DOcplex libraries to model and solve decision optimization problems that exceed the complexity that is supported by the Community Edition of the libraries in
the other default Python environments. See Decision Optimization notebooks.
NLP
Indicates that the environment templates includes the Watson Natural Language Processing library with pre-trained models for language processing tasks that you can run on unstructured data. See Using the Watson Natural Language Processing library.
This default environment should be large enough to run the pre-trained models.
~ Indicates that the environment templates requires the Watson Studio Professional plan. See Offering plans.
# Indicates that the environment template is in constricted mode and can't be selected to create a notebook.
Name | Hardware configuration | CUH rate per hour |
---|---|---|
Runtime 22.2 on Python 3.10 XXS | 1 vCPU and 4 GB RAM | 0.5 |
Runtime 22.2 on Python 3.10 XS | 2 vCPU and 8 GB RAM | 1 |
Runtime 22.2 on Python 3.10 S | 4 vCPU and 16 GB RAM | 2 |
DO + NLP Runtime 22.2 on Python 3.10 | 2 vCPU and 8 GB RAM | 6 |
Runtime 22.2 on R 4.2 S | 4 vCPU and 16 GB RAM | 2 |
Default R 3.6 S # | 4 vCPU and 16 GB RAM | 2 |
Default R 3.6 M ~ # | 16 vCPU and 64 GB RAM | 8 |
You should stop all active CPU runtimes when you no longer need them to prevent consuming extra capacity unit hours (CUHs). See CPU idle timeout.
Notebooks and CPU environments
When you open a notebook in edit mode in a CPU runtime environment, exactly one interactive session connects to a Jupyter kernel for the notebook language and the environment runtime that you select. The runtime is started per single user and not per notebook. This means that if you open a second notebook with the same environment template in the same project, a second kernel is started in the same runtime. Runtime resources are shared by the Jupyter kernels that you start in the runtime. Runtime resources are also shared if the CPU has GPU.
If you want to avoid sharing runtimes but want to use the same environment template for multiple notebooks in a project, you should create custom environment templates with the same specifications and associate each notebook with its own template.
If necessary, you can restart or reconnect to the kernel. When you restart a kernel, the kernel is stopped and then started in the same session again, but all execution results are lost. When you reconnect to a kernel after losing a connection, the notebook is connected to the same kernel session, and all previous execution results which were saved are available.
Spark environment templates
You can select any of the following default Spark environment templates for notebooks. The default environment templates are listed under Templates on the Environments page on the Manage tab of your project.
# Indicates that the environment template is in constricted mode and can't be selected to create a notebook.
Name | Hardware configuration | CUH rate per hour |
---|---|---|
Default Spark 3.3 & R 3.6 # | 2 Executors each: 1 vCPU and 4 GB RAM; Driver: 1 vCPU and 4 GB RAM |
1 |
Default Spark 3.3 & R 4.2 | 2 Executors each: 1 vCPU and 4 GB RAM; Driver: 1 vCPU and 4 GB RAM |
1 |
You should stop all active Spark runtimes when you no longer need them to prevent consuming extra capacity unit hours (CUHs). See Spark idle timeout.
Large Spark environments
If you have the Watson Studio Professional plan, you can create custom environment templates for larger Spark environments.
Professional plan users can have up to 35 executors and can choose from the following options for both driver and executor:
Hardware configuration |
---|
1 vCPU and 4 GB RAM |
1 vCPU and 8 GB RAM |
1 vCPU and 12 GB RAM |
The CUH rate per hour increases by 0.5 for every vCPU that is added. For example, 1x Driver: 3vCPU with 12GB of RAM
and 4x Executors: 2vCPU with 8GB of RAM
amounts to (3 + (4 * 2)) = 11 vCPUs
and 5.5 CUH
.
Notebooks and Spark environments
You can select the same Spark environment template for more than one notebook. Every notebook associated with that environment has its own dedicated Spark cluster and no resources are shared.
When you start a Spark environment, extra resources are needed for the Jupyter Enterprise Gateway, Spark Master, and the Spark worker daemons. These extra resources amount to 1 vCPU and 2 GB of RAM for the driver and 1 GB RAM for each executor.
You need to take these extra resources into account when selecting the hardware size of a Spark environment. For example: if you create a notebook and select Default Spark 3.3 & Python 3.9
, the Spark cluster consumes 3 vCPU
and 12 GB RAM but, as 1 vCPU and 4 GB RAM are required for the extra resources, the resources remaining for the notebook are 2 vCPU and 8 GB RAM.
File system on a Spark cluster
If you want to share files across executors and the driver or kernel of a Spark cluster, you can use the shared file system at /home/spark/shared
.
If you want to use your own custom libraries, you can store them under /home/spark/shared/user-libs/
. There are four subdirectories under /home/spark/shared/user-libs/
that are pre-configured to be made available
to Python and R or Java runtimes.
The following tables lists the pre-configured subdirectories where you can add your custom libaries.
Directory | Type of library |
---|---|
/home/spark/shared/user-libs/python3/ |
Python 3 libraries |
/home/spark/shared/user-libs/R/ |
R packages |
/home/spark/shared/user-libs/spark2/ |
Java JAR files |
To share libraries across a Spark driver and executors:
- Download your custom libraries or JAR files to the appropriate pre-configured directory.
- Restart the kernel from the notebook menu by clicking Kernel > Restart Kernel. This loads your custom libraries or JAR files in Spark.
Note that these libraries are not persisted. When you stop the environment runtime and restart it again later, you need to load the libraries again.
GPU environment templates
You can select the following GPU environment template for notebooks. The environment templates are listed under Templates on the Environments page on the Manage tab of your project.
The GPU environment template names indicate the accelerator power. The GPU environment templates include the Watson Natural Language Processing library with pre-trained models for language processing tasks that you can run on unstructured data. See Using the Watson Natural Language Processing library.
~ Indicates that the environment template requires the Watson Studio Professional plan. See Offering plans.
Name | Hardware configuration | CUH rate per hour |
---|---|---|
GPU V100 Runtime 22.2 on Python 3.10 ~ | 40 vCPU + 172 GB RAM + 1 NVIDIA TESLA V100 (1 GPU) | 68 |
GPU 2xV100 Runtime 22.2 on Python 3.10 ~ | 80 vCPU and 344 GB RAM + 2 NVIDIA TESLA V100 (2 GPU) | 136 |
You should stop all active GPU runtimes when you no longer need them to prevent consuming extra capacity unit hours (CUHs). See GPU idle timeout.
Notebooks and GPU environments
GPU environments for notebooks are available only in the Dallas IBM Cloud service region.
You can select the same Python and GPU environment template for more than one notebook in a project. In this case, every notebook kernel runs in the same runtime instance and the resources are shared. To avoid sharing runtime resources, create multiple custom environment templates with the same specifications and associate each notebook with its own template.
Default hardware specifications for scoring models with Watson Machine Learning
When you invoke the Watson Machine Learning API within a notebook, you consume compute resources from the Watson Machine Learning service as well as the compute resources for the notebook kernel.
You can select any of the following hardware specifications when you connect to Watson Machine Learning and create a deployment.
Capacity size | Hardware configuration | CUH rate per hour |
---|---|---|
Extra small | 1x4 = 1 vCPU and 4 GB RAM | 0.5 |
Small | 2x8 = 2 vCPU and 8 GB RAM | 1 |
Medium | 4x16 = 4 vCPU and 16 GB RAM | 2 |
Large | 8x32 = 8 vCPU and 32 GB RAM | 4 |
Data files in notebook environments
If you are working with large data sets, you should store the data sets in smaller chunks in the IBM Cloud Object Storage associated with your project and process the data in chunks in the notebook. Alternatively, you should run the notebook in a Spark environment.
Be aware that the file system of each runtime is non-persistent and cannot be shared across environments. To persist files in Watson Studio, you should use IBM Cloud Object Storage. The easiest way to use IBM Cloud Object Storage in notebooks
in projects is to leverage the project-lib
package for Python or the project-lib
package for R.
Compute usage by service
The notebook runtimes consumes compute resources as CUH from Watson Studio, while running default or custom environments. You can monitor the Watson Studio CUH consumption in the project on the Resource usage page on the Manage tab of the project.
Notebooks can also consume CUH from the Watson Machine Learning service when the notebook invokes the Watson Machine Learning to score a model. You can monitor the total monthly amount of CUH consumption for the Watson Machine Learning service on the Resource usage page on the Manage tab of the project.
Track CUH consumption for Watson Machine Learning in a notebook
To calculate capacity unit hours consumed by a notebook, run this code in the notebook:
CP = client.service_instance.get_details()
CUH = CUH["entity"]["usage"]["capacity_units"]["current"]/(3600*1000)
print(CUH)
For example:
'capacity_units': {'current': 19773430}
19773430/(3600*1000)
returns 5.49 CUH
For details, see the Service Instances section of the IBM Watson Machine Learning API documentation.
Runtime scope
Environment runtimes are always scoped to an environment template and a user within a project. If different users in a project work with the same environment, each user will get a separate runtime.
If you select to run a version of a notebook as a scheduled job, each scheduled job will always start in a dedicated runtime. The runtime is stopped when the job finishes.
Changing the environment of a notebook
You can switch environments for different reasons, for example, you can:
- Select an environment with more processing power or more RAM
- Change from using an environment without Spark to a Spark environment
You can only change the environment of a notebook if the notebook is unlocked. You can change the environment:
-
From the notebook opened in edit mode:
- Save your notebook changes.
- Click the Notebook Info icon () from the notebook toolbar and then click Environment.
- Select another template with the compute power and memory capacity from the list.
- Select Change environment. This stops the active runtime and starts the newly selected environment.
-
From the Assets page of your project:
- Select the notebook in the Notebooks section, click Actions > Change Environment and select another environment. The kernel must be stopped before you can change the environment. This new runtime environment will be instantiated the next time the notebook is opened for editing.
-
In the notebook job by editing the job template. See Editing job settings.
Next steps
Learn more
Parent topic: Compute resources for tools