Load and access data in a notebook
You can integrate data into notebooks by loading the data into a data structure or container, for example, a pandas.DataFrame, numpy.array, Spark RDD, or Spark DataFrame. If you created a notebook from one of the sample notebooks, the instructions in that notebook will guide you through loading data. To load data into your own notebooks, you can choose one of these options:
- Add a file from your local system to your object storage
- Use a free data set from the community
- Load data from a data source connection
- Use the
project-liblibrary to interact with project assets:
- Write your own code with Python functions to work with data and IBM Cloud Object Storage in notebooks
- Use an API function or operating system command to access the data
Load data from local files
To access data from a local file, you can load the file from within a notebook, or first load the file into your project. From your notebook, you add automatically generated code to access the data by using the
Insert to code function.
Insert to code function supports CSV and JSON files only. For all other file types, generic code is created. For Python this is a
StreamingBody object and for R a
textConnection object. If your data is
in a spreadsheet for example, you can convert the file into a CSV file before loading it to your notebook and using the
Insert to code function. Alternatively, you can load the file as is into the notebook in which case the
Insert to code function will load the data as bytes into an object which you would need to process yourself in the notebook.
To add a file from your local system to your notebook:
- Click the Find and Add Data icon (), and then browse a data file or drag it into your notebook sidebar.
Click in an empty code cell in your notebook and then click the Insert to code link below the file and choose how to load the data.
Code is generated and added to your notebook for you. The generated code imports any required packages, accesses the data file with object storage credentials, and loads the data into a DataFrame or RDD.
To manually add file credentials and write code for the file access method and the DataFrame yourself:
- Add the file to your object storage by clicking the Find and Add Data icon (), and then browsing the data file or dragging it into your notebook sidebar.
- Click in an empty code cell in your notebook and then click the Insert to code > Insert Credentials function from the Files notebook sidebar.
- Insert your credentials to the appropriate method for your notebook language to access the data in your notebook. For example:
- Reference the data access method in the appropriate read method for your language to load the data into a DataFrame or other data structure.
Load data sets from the community
The data sets on the Community contain open data.
Watch this short video to see how to work with public data sets in the community.
To load data from an existing data source connection into a data structure in your notebook:
- Open the notebook in edit mode.
- Click in an empty code cell, click Find and Add Data, and then click the Connections tab to see your connections.
Click Insert to code under the connection name.
For IBM Db2 Warehouse on Cloud (previously named IBM dashDB) :
- Choose how to load the data to your notebook.
- Then select the schema and choose a table.
- Code is generated and added to your notebook for you.
- Run the cell.
For all other connections:
- Run the cell to load your credentials.
- Open a database connection that references your credentials.
- Create a DataFrame or other data structure. You can use a connector code snippet to open a connection and create a DataFrame: Python connectors
Use an API function or operating system command to access the data
You can use API functions or operating system commands in your notebook to access data, for example, the
Wget command to access data by using the HTTP, HTTPS or FTP protocols. When you use these types of API functions and commands,
you need to include code that sets the project access token. See Manually add the project access token.