Accounts and setup
- How do I get the IAM Editor role so I can create service instances?
- Which regions can I provision IBM Watson apps and other services in?
- Why can’t I see all my projects and catalogs across regions?
- Which web browsers can I use with IBM Watson apps?
- How do I upgrade?
- How do I find my account administrator?
- What is the difference between using Spark in a Spark environment and Spark in IBM Analytics Engine?
- Is environment runtime sharing and billing the same for Spark environments as for Anaconda environments?
- How do I load very large files to my project?
- Why is the machine learning job I submitted still in pending state?
IBM Cloud Object Storage
- What is saved in IBM Cloud Object Storage for projects and catalogs?
- Do I need to upgrade IBM Cloud Object Storage when I upgrade Watson apps?
IBM Watson Knowledge Catalog
- What is Watson Knowledge Catalog?
- What is the difference between a catalog and a project?
- What data sources and asset types are supported?
- How do I add very large files to my catalog?
- Do I need to move my data into Watson Knowledge Catalog?
- What is the maximum number of assets I can have in Watson Knowledge Catalog?
- Does Watson Knowledge Catalog provide data policy services?
- Does Watson Knowledge Catalog provide classification services?
- Does Watson Knowledge Catalog have data wrangling capabilities?
- Does Watson Knowledge Catalog use Apache Atlas for its metadata repository?
- When adding assets to a project from a catalog, why don’t I see all my projects listed in the dropdown for target project?
- When I create policies, which catalogs will they apply to?
- Do data policies affect data in external data sources?
- Why can’t I add policies?
- Can I install libraries or packages to use in my notebooks?
- Can I call functions defined in one notebook from another notebook?
- Can I add arbitrary notebook extensions?
- How do I access the data from a CSV file in a notebook?
- How do I access the data from a compressed file in a notebook?
- What do I do when my notebook won’t run because I’ve reached my kernel limit?
Security and reliability
- How secure are IBM Watson apps?
- Are my data and notebook protected from sharing outside of my collaborators?
- Do I need to back up my notebooks?
Sharing and collaboration
- What are the implications of sharing a notebook?
- How can I share my work outside of RStudio in Watson Studio?
- How do I share my SPSS Modeler flow with another project?
IBM Watson Machine Learning
Accounts and setup
How can I get the IAM Editor role so I can provision service instances?
If you try to provision an instance of a service, for example, the Visual Recognition service, and you might get this error message:
You do not have the required permission to create an instance. You must be assigned the IAM Editor role or Operator role or higher. Contact the account owner to update your access.
To get the IAM Editor role:
- Find your IBM Cloud account owner or administrator.
- Ask to be assigned the IAM Editor role for the resource group.
Which regions can I provision IBM Watson apps in?
Currently, you can provision IBM Watson apps in these IBM Cloud regions:
- Dallas (US South): dataplatform.cloud.ibm.com
- London (United Kingdom): eu-gb.dataplatform.cloud.ibm.com
- Frankfurt (Germany): eu-de.dataplatform.cloud.ibm.com
- Tokyo (AP-North): jp-tok.dataplatform.cloud.ibm.com
You can provision other services to use with Watson Studio in any region. See Add and manage services.
Why can’t I see all my projects and catalogs across regions?
For some offering plans, you can provision Watson Studio and Watson Knowledge Catalog apps in multiple IBM Cloud service regions. However, your projects, catalogs, and data are specific to the region in which they were saved and can be accessed only from your apps in that region. You must switch your region to see the projects, catalogs, and data from that region.
Which web browsers can I use with the Watson Studio and Watson Knowledge Catalog apps?
You can use the latest versions of these web browers:
Tip for Firefox on Mac users: Horizontal scrolling within an app can be interpreted by your Mac as an attempt to swipe between pages. If this behavior is undesired or if you experience browser crashes after the app prompts you to stay on the page, consider disabling the Swipe between pages gesture in Launchpad > System Preferences > Trackpad > More Gestures.
- Firefox ESR
- IE 11 (latest service pack)
- Microsoft Edge
- Android Chrome Mobile
- iOS Safari Mobile
How do I upgrade?
When you’re ready to upgrade your Watson Studio or Watson Knowledge Catalog app, or any of the services that you can use with Watson Studio, you can upgrade in place without losing any of your work or data. You must be the owner or administrator of the IBM Cloud account for a service to upgrade it.
- Upgrade Watson Studio or Watson Knowledge Catalog
- Upgrade an older Watson Studio plan
- Upgrade an older Watson Knowledge Catalog plan
- Upgrade a service that you use in Watson Studio
How do I find my IBM Cloud account owner?
If you have an enterprise account or work in an IBM Cloud that you don’t own, you might need to ask an account owner to give you the Watson Knowledge Catalog app Admin role or the IBM Cloud account administrator role.
To find your IBM Cloud account owner:
- From Watson Studio or IBM Cloud, choose Manage > Account > Users.
- From the avatar menu, make sure you’re in the right account, or switch accounts, if necessary.
- On the Users page, find the user name with the word
ownernext to it.
What is the difference between using Spark in a Spark environment and Spark in IBM Analytics Engine?
Spark environments are provided under Watson Studio. A Spark environment offers Spark kernels as a service (SparkR, PySpark and Scala) and is based on Armada/Kubernetes. The underlying Armada is shared across multiple users. However each kernel gets a dedicated Spark cluster and Spark executors. You can change the Spark configurations, and can specify the size of the executors and the number of executors per kernel. A Spark environment is more serverless in nature.
In contrast, IBM Analytics Engine offers Hortonworks Data Platform on IBM Cloud. You get one VM per cluster node and your own local HDFS. You get Spark and the entire Hadoop ecosystem. You are given shell access and can also create notebooks.
Is environment runtime sharing and billing the same for Spark environments as for Anaconda environments?
No. Compute, data resources, and billing can’t be shared in a Spark environment.
Whereas you can open a notebook with an Anaconda environment, stop the kernel of the notebook, then start a second notebook with the same environment and share the runtime without stopping it, you cannot do this with a Spark environment.
Spark environments runtimes can’t be shared. Every notebook kernel has its own dedicated Spark cluster. If you create two notebooks using the same environment definition, two runtimes, each with their own kernel are started, which means that two clusters, each with a set of spark executors are created.
How do I load very large files to my project?
You can’t load data files larger than 5 GB to your project from the Watson Studio app. If your files are larger, you must use the Cloud Object Storage API and load the data in multiple parts. See the curl commands for working with Cloud Object Storage directly on IBM Cloud.
Why is the machine learning job I submitted still in pending state?
The machine learning job you submitted is still in the “Pending” state because it is waiting for enough resources to start running. This can happen if resources are currently in high demand or if you submitted a large number of concurrent requests and your newer requests are waiting for the ones submitted earlier to complete execution.
IBM Cloud Object Storage
What is saved in IBM Cloud Object Storage for projects and catalogs?
When you create a project or catalog, you specify a IBM Cloud Object Storage and create a bucket that is dedicated to that project or catalog. These types of objects are stored in the IBM Cloud Object Storage bucket for the project or catalog:
- Files for data assets that you uploaded into the project or catalog.
- Files associated with analytic assets, such as, notebooks, dashboards, and models.
- Metadata about assets, such as, the asset description, tags, and comments or reviews.
Do I need to upgrade IBM Cloud Object Storage when I upgrade Watson apps?
You must upgrade your IBM Cloud Object Storage instance only when you run out of storage space. Watson apps can use either IBM Cloud Object Storage plan and you can upgrade any Watson app or your IBM Cloud Object Storage service independently.
Watson Knowledge Catalog
What is Watson Knowledge Catalog?
Watson Knowledge Catalog is a cloud-based enterprise metadata repository that lets you catalog your knowledge and analytics assets, including structured and unstructured data wherever they reside, so that they can be easily accessed and used to fuel data science and AI. For selected source types, Watson Knowledge Catalog can automatically discover and register data assets at the provided connection. As assets are added to the catalog, they are automatically indexed and classified, making it easy for users such as data engineers, data scientists, data stewards, and business analysts to find, understand, share, and use the assets. AI-powered search and recommendations guide users to the most relevant assets in the catalog based on understanding of relationships between assets, how those assets are used, and social connections between users.
Watson Knowledge Catalog also provides an intelligent and robust governance framework that lets you define and enforce data and access policies to ensure that the right data go to the right people.
Through Watson Knowledge Catalog’s business glossary, users can create a common business vocabulary and associate them to your assets, policies and rules, providing the bridge between the business domain and your technical assets.
What is the difference between a catalog and a project?
A catalog is where you share assets across the enterprise. A project is where you work with assets within smaller teams. An enterprise catalog can have thousands of assets shared with hundreds of users. Projects are designed for a team of collaborators to work with a small number of assets for a specific goal, such as developing an artificial intelligence model or data preparation, using the Watson Studio app.
What data sources and asset types are supported?
Watson Knowledge Catalog supports over 30 connectors to cloud or on premises data source types. See Connection types.
Watson Knowledge Catalog also supports other asset types, such as structured data, unstructured data, models, and notebooks. See About assets.
How do I load very large files to my catalog?
You can’t load data files larger than 5 GB to your catalog from the Watson Knowledge Catalog app. To add a file that is larger than 5 GB to a catalog, upload the file to IBM Cloud Object Storage and then add it as a connected data asset.
Do I need to move my data into Watson Knowledge Catalog?
No, you can keep all your data in their existing repositories or you can upload local files to the IBM Cloud Object Storage associated with the catalog. The choice is yours.
Watson Knowledge Catalog stores and manages only the metadata of your assets.
What is the maximum number of assets I can have in Watson Knowledge Catalog?
The number of assets you can have in Watson Knowledge Catalog depends on your plan:
- Lite plan: 50 assets
- Standard plan: 500 assets
- Professional plan: unlimited assets
See Offering plans.
Does Watson Knowledge Catalog provide data policy services?
Watson Knowledge Catalog includes an automated policy enforcement engine that will determine outcomes based upon the policies and the action taken place. Watson Knowledge Catalog provides the ability to set up your data policies within the system and allow you to restrict access to data based upon the defined policies.
Does Watson Knowledge Catalog provide classification services?
For governed catalogs that are created with data policies are enforced, Watson Knowledge Catalog automatically classifies the columns in your relational data assets when they are added to the catalog. Over 160 attribute classifiers for columns are provided, including names, emails, postal addresses, credit card numbers, driver’s licenses, government identification numbers, date of birth, demographic information, DUNS number, and more. For ungoverned catalogs that do not enforce data policies, a user can choose to classify, or profile, a relational data asset, but assets are not automatically classified. Catalogs also profile unstructured data assets and extract metadata from content, such as, categories, concepts, sentiment, and emotion. See Profile assets.
Does Watson Knowledge Catalog have data wrangling capabilities?
Yes, data preparation capabilities are available in Data Refinery, which is part of Watson Knowledge Catalog. Data Refinery provides a rich set of capabilities that not only allow you to discover, cleanse, and transform your data with built-in operations, but it also comes with powerful profiling and visualization tools such as charts and graphs to help you interact with and understand your data. Data access and transform policies defined in Watson Knowledge Catalog are also enforced in Data Refinery to ensure that sensitive data that originated from governed catalogs remain protected.
Can I set up access groups for people in different lines of business and roles?
You can set up access groups through your IBM Cloud account in the Identity and Asset Management (IAM) area. After you set up the access groups, on the Access Control page of a catalog, you can add the access group so that all members of the access group can access the catalog with the same permissions. See Add access groups.
Does Watson Knowledge Catalog use Apache Atlas for its metadata repository?
Watson Knowledge Catalog uses its own local store for metadata. Watson Knowledge Catalog runs on a cloud native persistence store that can meet the platform needs for performance, up-time, and scalability.
When adding assets to a project from a catalog, why don’t I see all my projects listed in the dropdown for target project?
When adding assets from catalog or project, or publishing assets from project to catalog, both project and catalog must satisfy criteria:
- You must be a member of the same IBM Watson account in IBM Cloud as the catalog owner, or, if your company set up SAML federation on IBM Cloud, you must be in the same company as the catalog owner.
- If you want to add catalog assets to the project, you must choose to restrict who can be a collaborator in the project. If you just want to publish assets to a catalog, you don’t need to restrict the project.
- You must choose IBM Cloud Object Storage when you create a project. You must the owner of the IBM Cloud Object Storage instance or the IBM Cloud Object Storage instance must be configured to allow project creation.
In the catalog screen, the dropdown for target project when adding assets to project lists only the projects that satisfy all thses criteria.
When I create policies, which catalogs will they apply to?
Policies are scoped to the IBM Cloud account and will be enforced on assets in all catalogs with data policies enforced that belong to the same IBM Cloud account as the policies.
Do data policies affect data in external data sources?
No, Watson Knowledge Catalog is a data catalog for searching for data. Data policies affect only how data appears within the catalog. Data policies do not affect users who access external data sources directly.
Why can’t I add policies?
Only users with Watson Knowledge Catalog app Admin role can add policies, business terms, catalogs and view the data dashboard. If you do not have access to the UI control to add policies, then you are assigned the default Watson Knowledge Catalog app Viewer role, which limits you to only viewing exisitng policies and business terms. Ask your IBM Watson administrator to give you the Admin role for the Watson Knowledge Catalog app.
Can I install libraries or packages to use in my notebooks?
You can install Python and Scala libraries and R packages through a notebook, and those libraries and packages will be available to all your notebooks that use the same Apache Spark service. For instructions, see Import custom or third-party libraries. If you get an error about missing operating system dependencies when you install a library or package, notify IBM by clicking the chat icon. To see the preinstalled libraries and packages and the libraries and packages that you installed, from within a notebook, run the appropriate command:
- Python: !pip list
- R: installed.packages()
- Scala: Click the Notebook info icon and then click Environment.
If you used custom or third-party libraries with the preinstalled Python libraries when creating notebooks in your Analytics for Apache Spark instance, you might experience incompatibilities between the currently available preinstalled notebook libraries and your custom or third-party libraries when you open and execute existing notebooks. To resolve library inconsistencies and continue using the Python notebooks that you created using older preinstalled library versions and your custom or third-party libraries, you must remove your custom or third-party libraries. Then rerun the custom and third-party library installation commands.
Can I call functions defined in one notebook from another notebook?
No, there is no way to call one notebook from another notebook in Watson Studio. However, you can put your common code into a library outside of Watson Studio and then install it.
Can I add arbitrary notebook extensions?
No, you can’t extend your notebook capabilities by adding arbitrary extensions as a customization because all notebook extensions must be preinstalled. The only notebook extension which is preinstalled is the
Esri ArcGIS extension, which you can select when you create a runtime environment definition and select the Python 3.5 software configuration. This selection enables
widgetsnbextension for ipywidgets.
How do I access the data from a CSV file in a notebook?
After you load a CSV file into object storage, choose one of the options to create a DataFrame or other data structure from the Insert to code menu under the file name. For instructions, see Load and access data.
How do I access the data from a compressed file in a notebook?
After you load the compressed file to object storage, get the file credentials by using the Insert to code menu under the file name. Then use this function to save the file from object storage in GPFS. The
credentials argument is the dictionary that was inserted to code in your notebook.
What do I do when my notebook won’t run because I’ve reached my kernel limit?
When you open a notebook in edit mode, a kernel is started automatically and counts towards the kernel limit. You can have only 10 kernels running on your Apache Spark instance at a time. When you reach the kernel limit, you have to stop a running kernel before you can open another notebook in edit mode.
To stop a kernel, click Stop kernel on the Actions list for the notebook in your project. See Create notebooks: overview.
Security and reliability
How secure are IBM Watson apps?
IBM Watson apps are very secure and resilient. See Security of IBM Watson apps.
Are my data and notebook protected from sharing outside of my collaborators?
The data that is loaded into your Spark service and notebooks is secure. Only the collaborators in your project can access your data or notebooks. Each Watson Studio account acts as a separate tenant of the Spark and Object Storage services. Tenants cannot access other tenant’s data.
If you want to share your notebook with the public, then hide your data service credentials in your notebook. For the Python, R, and Scala languages, enter the following syntax:
Be sure to save your notebook immediately after you enter the syntax to hide cells with sensitive data. Only then should you share your work.
Do I need to back up my notebooks?
No. Your notebooks are stored in IBM Cloud Object Storage, which provides resiligency in case of an outage.
Sharing and collaboration
What are the implications of sharing a notebook?
When you share a notebook, the permalink never changes. Any person with the link can view your notebook. You can unshare the notebook by clearing the check box to share it. Updates are not automatically shared. When you update your notebook, you can sync the shared notebook by reselecting the check box to share it.
How can I share my work outside of RStudio in Watson Studio?
One way of sharing your work outside of RStudio in Watson Studio is connecting it to a shared GitHub repository that you and your collaborators can work from. Read this blog post for more information.
However, the best method to share your work with the members of a project in Watson Studio is to use notebooks in the project using the R kernel.
RStudio is a great environment to work in for prototyping and working individually on R projects, but it is not yet integrated with Watson Studio projects.
How do I share my SPSS Modeler flow with another project?
By design, modeler flows can only be used in the project where the flow is created or imported. If you need to use a modeler flow in a different project, you must download the flow from current project (source project) to your local environment and then import the flow to another project (target project).
IBM Watson Machine Learning
What is an API Key?
API keys allow you to easily authenticate when using the CLI or APIs that can be used across multiple services. API Keys are considered confidential since they are used to grant access. Treat all API keys as you would a password since anyone with your API key can impersonate your service.
Can I provide feedback?
Yes, we encourage feedback as we continue to develop this exciting array of services. Click the chat icon, type a comment, and press Return.