Loading and accessing data in a notebook

You can integrate data into notebooks by loading the data into a data structure or container, for example, a pandas.DataFrame, numpy.array, Spark RDD, or Spark DataFrame. If you created a notebook from one of the sample notebooks, the instructions in that notebook will guide you through loading data. To load data into your own notebooks, you can choose one of these options:

Load data from local files

To access data from a local file, you can load the file from within a notebook, or first load the file into your project. From your notebook, you add automatically generated code to access the data by using the Insert to code function.

The Insert to code function supports CSV and JSON files only. For all other file types, generic code is created. For Python this is a StreamingBody object and for R a textConnection object. If your data is in a spreadsheet for example, you can convert the file into a CSV file before loading it to your notebook and using the Insert to code function. Alternatively, you can load the file as is into the notebook in which case the Insert to code function will load the data as bytes into an object which you would need to process yourself in the notebook.

To add a file from your local system to your notebook:

  1. Click the Find and Add Data icon (Shows the find data icon), and then browse a data file or drag it into your notebook sidebar.
  2. Click in an empty code cell in your notebook and then click the Insert to code link below the file and choose how to load the data.

    Code is generated and added to your notebook for you. The generated code imports any required packages, accesses the data file with the object storage credentials, and loads the data into a DataFrame or RDD.

To manually add file credentials and write code for the file access method and the DataFrame yourself:

  1. Add the file to your object storage by clicking the Find and Add Data icon (Shows the find data icon), and then browsing the data file or dragging it into your notebook sidebar.
  2. Click in an empty code cell in your notebook and then click the Insert to code > Insert Credentials function from the Files notebook sidebar.
  3. Insert your credentials to the appropriate method for your notebook language to access the data in your notebook. For example, see this code in a blog for Python and in a notebook for R.
  4. Reference the data access method in the appropriate read method for your language to load the data into a DataFrame or other data structure.

Load data sets from the Community

The data sets on the Community contain open data.

Watch this short video to see how to work with public data sets in the community.

Figure 1. Video iconLoad and analyze public data
This video shows how to load and analyze public data.

To add a data set from the Community in your notebook, you copy the data set into a project:

  1. Find the card for the data set that you want to add. A view of data sets
  2. Click the Add to Project icon from the action bar, select the project, and click Add. Clicking View project takes you to the project Overview page. The data asset is added to the list of data assets on the project’s Assets page.
  3. Open your notebook in edit mode, and then click the Data icon, to see your data set.
  4. To start working with the data in your data set, click Insert to code under the file name and choose how to load the data to your notebook.

Load data from data source connections

You must create a connection to an IBM data service or an external data source before you can add data from that data source to your notebook. See Adding connections to projects.

To load data from an existing data source connection into a data structure in your notebook:

  1. Open the notebook in edit mode.
  2. Click in an empty code cell, click Find and Add Data, and then click the Connections tab to see your connections.
  3. Click Insert to code under the connection name.
  4. If necessary, enter your personal credentials for locked data connections that are marked with a key icon (the key symbol for connections with personal credentials). This is a one-time step that permanently unlocks the connection for you. After you have unlocked the connection, the key icon is no longer displayed. See Adding connections to projects.

ForIBM Db2 Warehouse on Cloud (previously named IBM dashDB) and Compose for PostgreSQL:

  1. Choose how to load the data to your notebook.
  2. Select the schema and choose a table. Code is generated and added to your notebook for you.
  3. Run the cell.

For all other connections:

  1. Run the cell to load your credentials.
  2. Open the database connection that references your credentials.
  3. Load the data into a DataFrame or other data structure.

Use an API function or operating system command to access the data

You can use API functions or operating system commands in your notebook to access data, for example, the Wget command to access data by using the HTTP, HTTPS or FTP protocols.When you use these types of API functions and commands, you need to include code that sets the project access token. See Manually add the project access token.