To make a plan for using Jupyter notebooks and scripts, first understand the choices that you have, the implications of those choices, and how those choices affect the order of implementation tasks.
You can perform most tasks that are related to notebooks and scripts with Editor or Admin role in an analytics project.
Before you start working with notebooks and scripts, consider the following questions as most tasks need to be completed in a particular order:
Which programming language do you want to work in?
What will your notebooks be doing?
What libraries do you want to work with?
How can you use the notebook or script in IBM watsonx?
To create a plan for using Jupyter notebooks or scripts, determine which of the following tasks you must complete.
You need to create a project before you can start working in notebooks.
Projects You can create an empty project, one from file, or from URL. In this project:
You can use the Jupyter Notebook and RStudio.
Notebooks are assets in the project.
Notebook collaboration is based on locking by user at the project level.
R scripts and Shiny apps are not assets in the project.
There is no collaboration on R scripts or Shiny apps.
Picking a programming language
Copy link to section
You can choose to work in the following languages:
Notebooks
Python and R
Scripts
R scripts and R Shiny apps
Selecting a tool
Copy link to section
In IBM watsonx, you can work with notebook and scripts in the following tool:
Jupyter Notebook editor
In the Jupyter Notebook editor, you can create Python or R notebooks. Notebooks are assets in a project. Collaboration is only at the project level. The notebook is locked by a user when opened and can only be unlocked by the same user or
a project admin.
RStudio
In RStudio, you can create R scripts and Shiny apps. R scripts are not assets in a project, which means that there is no collaboration at the project level.
Checking the library packages
Copy link to section
When you open a notebook in a runtime environment, you have access to a large selection of preinstalled data science library packages. Many environments also include libraries provided by IBM at no extra charge, such as:
The Watson Natural Language Processing library in Python environments
Libraries to help you access project assets
Libraries for time series or geo-spatial analysis in Spark environments
For a list of the library packages and the versions included in an environment template, select the template on the Templates page from the Manage tab on the project's Environments page.
If libraries are missing in a template, you can add them:
Through the notebook or script
You can use familiar package install commands for your environment. For example, in Python notebooks, you can use mamba, conda or pip.
By creating a custom environment template
When you create a custom template, you can create a software customization and add the libraries that you want to include. For details, see Customizing environment templates.
Choosing a runtime environment
Copy link to section
Choosing the compute environment for your notebook depends on the amount of data you want to process and the complexity of the data analysis processes.
watsonx.ai Studio offers many default environment templates with different hardware sizes and software configurations to help you quickly get started, without having to create your own templates. These included templates are listed on the Templates page from the Manage tab on the project's Environments page. For more information about the included environments, see Environments.
If the available templates don't suit your needs, you can create custom templates and determine the hardware size and software configuration. For details, see Customizing environment templates.
Important: Make sure that the environment has enough memory to store the data that you load to the notebook. Oftentimes this means that the environment must have significantly more memory than the total size
of the data loaded to the notebook because some data frameworks, like pandas, can hold multiple copies of the data in memory.
Working with data
Copy link to section
To work with data in a notebook:
Add the data to your project, which turns the data into a project asset. See Adding data to a project for the different methods for adding data to a project.
Use generated code that loads data from the asset to a data structure in your notebook. For a list of the supported data types, see Data load support.
Write your own code to load data if the data source isn't added as a project asset or support for adding generated code isn't available for the project asset.
Managing the notebooks and scripts lifecycle
Copy link to section
After you create and test a notebook in your tool, you can:
Share a read-only copy outside of watsonx.ai Studio so that people who aren't collaborators in your projects can see and use it. See Sharing notebooks with a URL.
To ensure that a notebook can be run as a job or in a pipeline:
Ensure that no cells require interactive input by a user.
Ensure that the notebook logs enough detailed information to enable understanding the progress and any failures by looking at the log.
Use environment variables in the code to access configurations if a notebook or script requires them, for example the input data file or the number of training runs.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.