Notebook environments

An environment definition defines the hardware and software configuration that you can use to run tools like notebooks in IBM Watson Studio. With environments, you can tailor the hardware and software configuration for the needs of your data science tool.

Watson Studio offers a selection of default Anaconda-based environment definitions with different hardware and software configurations which you can select for running your Jupyter notebooks for example. You can choose to use the defaults, or you can create your own environments based on the provided defaults which you can customize to fit your needs.

An environment definition references a runtime service. When you start a notebook associated with an environment for example, the runtime service creates a runtime instance based on the environment configuration you specified.

The memory size for each environment is the size given to the runtime instance, which means that some of the space is also used by system resources like the Jupyter server. Therefore the specified memory space is not entirely available for your use.

The deciding factor when selecting an Anaconda-based environment is the volume of data you want to analyze and whether you use Spark APIs. If you don't need Spark, you can use the default Anaconda-based environments. If you are working with big data sets and need to run distributed workloads across a cluster, you must associate the notebook with a Spark service.

Free environments

You can use the following environment for free to create environments for the languages R and Python. The provided default environment for this size uses Python 3.5. If you want to work with Python 2.7 or R, create an environment based on the free default and select Python 2 or R as the software configuration to use.

  • Default Python 3.5 Free

    Software configuration: Anaconda 5.0; Hardware configuration: 1 Core / 4 GB RAM

Important:

  • You can create any number of these small runtime environments and customize them but only one free environment can be active at any one time.
  • You can't schedule notebooks that run in a free environment.You must use a charged environment to schedule a notebook.

Environments that consume capacity units

When you run a notebook in any other environment other than the free default, it consumes capacity unit hours (CUHs), which is the period of time a runtime is active, multiplied by the size of it's hardware configuration.

For example, if your environment is size S (with CU=2) and you run it for 3 hours, you are billed for 6 CUHs. If your environment is size XS (CU=1), it only consumes 0.5 CUH per hour.

You are charged based on your Watson Studio service plan. For up-to-date information, see the Watson Studio pricing plans.

The default environments include the languages Python 3.5 and R. If you want to work with Python 2.7, create an environment based on one of the defaults and select Python 2 as the software configuration to use.

Watson Studio offers the following default environments that consume capacity units:

  • Default Python 3.5 XS

    Software configuration: Anaconda 5.0; Hardware configuration: 2 Cores / 8 GB RAM

  • Default Python 3.5 S

    Software configuration: Anaconda 5.0; Hardware configuration: 4 Cores / 16 GB RAM

  • Default R 3.4 XS

    Software configuration: R-3.4 with r-essentials; Hardware configuration: 2 Cores / 8 GB RAM

  • Default R 3.4 S

    Software configuration: R-3.4 with r-essentials; Hardware configuration: 4 Cores / 16 GB RAM

Spark environments

If your notebook includes Spark APIs, you must associate the notebook with a Spark service. See The Jupyter and Spark notebook environment.

In Watson Studio, you can use:

  • Spark services offered through IBM Cloud.

    With IBM Analytics Engine, you are offered Hortonworks Data Platform on IBM Cloud. You get one VM per cluster node and your own local HDFS. You get Spark and the entire Hadoop ecosystem. You are given shell access and can also create notebooks. See Add associated services.

  • Spark environments offered under Watson Studio.

    A Spark environment offers Spark kernels as a service (SparkR, PySpark and Scala) and is based on Armada/Kubernetes. The underlying Armada is shared across multiple users. However each kernel gets a dedicated Spark cluster and Spark executors. You can change the Spark configurations, and can specify the size of the executors and the number of executors per kernel. A Spark environment is more serverless in nature. See Spark environments.

File system in environments

The file system of each runtime has approximately 2 GB of free space for installing packages from conda or pip, or for temporary files. The file system is non-persistent and cannot be shared across environments. To persist files in Watson Studio, you should use IBM Cloud Object Storage which is integrated into projects.

The easiest way to use IBM Cloud Object Storage in notebooks in projects is to leverage the project-lib package.

Runtime scope

Environment runtimes are always scoped to an environment definition and a user.

This means that if you associate each of your notebooks with its own environment, each notebook will get its own runtime. However, if you open a notebook with an environment, which you also selected for another notebook and that notebook has an active runtime, both notebooks will be active in the same runtime. In this case, both notebooks will use the compute and data resources available in the runtime that they share.

If you want to avoid sharing runtimes but want to use the same environment definition for multiple notebooks, you should create multiple custom environment definitions with the same specifications and associate each notebook with its out definition.

If different users in a project work with the same environment, each user will get a separate runtime.

Stop active runtimes

You should stop all active runtimes after your notebooks have stopped running to prevent consuming extra capacity unit hours (CUHs).

Project users with Admin role can stop all runtimes in the project. Users added to the project with Editor role can stop the runtimes they started, but can't stop other project users’ runtimes. Users added to the project with the viewer role can't see the runtimes in the project.

You can stop runtimes that are billed against your user account from the Watson Admin Console. The Admin Console lists all active runtime across all projects for your account. You can also stop runtimes for a specific project from the Environments page of that project.

All environment runtimes are shutdown automatically if they have been idle for longer than one hour.

If you have a Lite or Standard v1 service plan, your environment runtimes are shutdown automatically after 12 hours of continuous use.

Runtimes for scheduled notebooks are automatically shut down after the scheduled job has completed if the runtime isn't also shared by another notebook. For example, if you schedule to run a notebook once a day for 2 months, the runtime instance will be activated every day for the duration of the scheduled job and deactivated again after the job has finished.

Next steps