A project is how you organize your resources to work with data. You can create projects with the Watson Studio and Watson Knowledge Catalog services.
Your project can include these types of resources:
- Collaborators are the people who you work with in your project.
- Data assets are what you work with. Data assets often consist of raw data that you work with to refine.
- Operational assets are what you create with tools to run code on data.
- Environments are how you configure compute resources for running operational assets.
- Jobs are how you manage and schedule the running of operational assets.
- Project documentation and notifications are how you stay informed about what’s happening in the project.
- Project storage is where project information and files are stored.
- Integrations are how you incorporate external tools.
- Services are how you add tools or processing power to your project.
- Catalogs are how you share assets between projects.
You can customize projects to suit your goals. You can change the contents of your project and almost all of its properties at any time. However, you must make these choices when you create the project because you can’t change them later:
- Whether to restrict eligible collaborators to your company’s employees, or members of your IBM Cloud account.
- Whether to enable catalog access by restricting collaborator eligibility.
- The instance of IBM Cloud Object Storage to use for project storage.
Collaboration in projects
As a project creator, you can add other collaborators and assign them roles that control which actions they can take. You automatically have the Admin role in the project, and if you give other collaborators the Admin role, they can add collaborators too. See Adding collaborators and Project collaborator roles.
When you create a project, you can control who is eligible to be added as collaborators:
- Restrict who is eligible to be added as a collaborator to people who are internal to your organization by selecting the Restrict who can be a collaborator checkbox. When you select this option, you can add only members of your IBM Cloud account, or, if your company has SAML federation set up in IBM Cloud, employees of your company. This option also allows access to catalog assets from the project. If you have Watson Knowledge Catalog, this option is selected by default.
- Allow anyone to be added as a collaborator. If necessary, clear the checkbox.
This setting is permanent. You can’t change it after you create the project.
Collaboration on assets
Assets are locked during editing to prevent conflicts between changes made by different collaborators. All collaborators work with the same copy of each asset. Only one collaborator can edit an asset at a time. While a collaborator is editing an asset in a tool, that asset is locked. Other collaborators can view a locked asset, but not edit it. See Managing assets.
You can add these types of data assets to projects:
- Data assets from local files, catalogs, or the Gallery
- Connections to cloud, on-premises, and streaming data sources
- Connected data assets from an existing connection asset that provide read-only access to a table or file in an external data source
- Imported data assets from an existing connection asset that provide read-only access to a table or a file in an external data source
- Folder assets to view the files within a folder in a file system
See Adding data. For some formats of relational or tabular data, you can preview and profile the data when you open the asset.
Operational assets are how you work with data with tools that prepare data, analyze data, or build models. Most types of operational assets have a specific tool with which you create and edit that type of operational asset. Notebooks have a choice of editors.
With Watson Studio, you can create these types of operational assets without additional services:
- Data Refinery flows to refine data with the Data Refinery tool.
- Jupyter notebooks to analyze data or build models. You use the Jupyter notebook editor.
- SPSS Modeler flows to automate the flow of data through a model with SPSS algorithms in the SPSS Modeler.
- Decision Optimization models to solve scenarios in the Decision Optimization model builder.
- R Shiny apps to develop interactive web applications.
These operational assets require more services. You can provision each service when you create the first asset that needs it:
- Dashboards to visualize data without code in the Dashboard editor. Requires the Cognos Dashboards service.
- AutoAI experiments to build a model without coding in the AutoAI tool. Requires the Watson Machine Learning service.
- Deep learning experiments to train deep learning models in the Experiment builder. Requires the Watson Machine Learning service.
- DataStage flows to create data transformation jobs with the DataStage tool. Requires the Watson Knowledge Catalog service.
- Metadata imports to import asset metadata into a project or a catalog. Requires the Watson Knowledge Catalog service.
- Master data configuration to match record data and create master data entities. Requires the Master Data Management service.
- Natural Language Classification models to classify text in the Natural Language Classifier modeler. Requires the Natural Language Classification service.
- Visual Recognition models to categorize images or identify objects in image in the Visual Recognition modeler. Requires the Visual Recognition service.
If you have the Watson Knowledge Catalog service without Watson Studio, you can create Data Refinery flows, DataStage flows, and Metadata import assets.
Environments control your compute resources. An environment definition specifies hardware and software resources to instantiate the environment runtimes that run your operational assets in tools.
Some types of operational assets have an automatically selected environment definition. However, for some types of operational assets, you can choose between multiple environments when you create an asset and when you run it. Watson Studio includes a set of default environment definitions that vary by coding language, tool, and compute engine type. You can also create custom environment definitions or add services that provide environment definitions. For example, you can associate the IBM Analytics Engine service to your project to provide extra compute power.
The compute resources that you consume in a project are tracked. Depending on your offering plan, you might have a limit to your monthly compute resources or pay extra after you reach a threshold.
A job is a single run of an operational asset with a specified environment runtime. You can schedule one or repeating jobs, monitor, edit, stop, or cancel jobs. See Jobs.
Each project has a dedicated, secure storage bucket that contains:
- Data assets that you upload to the project as files.
- Data assets from files that you copy from a catalog.
- Files that you save to the project with a tool.
- Files for operational assets, such as notebooks.
- The project readme file and internal project files.
When you create a project, you must select an instance of IBM Cloud Object Storage or create a new instance. You cannot change the IBM Cloud Object Storage instance after you create the project. See Object storage.
When you delete a project, its storage bucket is also deleted.
You can associate more services with a project to add tools, compute environments, or other functionality. See Adding associated services.
Integrations with external tools
Integrations provide a method to interact with tools that are external to the project.
Project documentation and notifications
While you create a project, you can add a short description to document the purpose or goal of the project. You can edit the description later, on the project’s Settings page.
The Overview page of a project contains a readme file where you can document the status or results of the project. The readme file uses standard Markdown formatting. Collaborators with the Admin or Editor role can edit the readme file.
All collaborators in a project are notified when a collaborator changes an asset or adds a comment to a notebook.
A catalog is a central repository for assets where you can easily find and share data and other assets. Before you can access a catalog, a catalog administrator must add you as a catalog collaborator. A catalog has the same type of roles as a project. With any catalog role, you can copy assets from the catalog into a project to use them. With the Editor or Admin role in the catalog, you can create assets in a project and then publish them into the catalog.
If you want to access catalogs in a project, you must select the Restrict who can be a collaborator option when you create the project. This setting keeps the company data in your catalog secure. You can’t enable catalog integration in a project after creation. See Collaborator eligibility.