Data load support

The Insert to code function is available for project data assets in Jupyter notebooks when you click the Find and Add Data icon (Shows the find data icon) and select an asset in the notebook sidebar. The asset can be data from a file or a data source connection.

By clicking in an empty code cell in your notebook and then clicking the Insert to code link below an asset name, you can select to:

  • Insert the data source access credentials. This capability is available for all data assets that are added to a project. With the credentials, you can write your own code to access the asset and load the data into data structures of your choice in your notebook.
  • Generate code that is added to the notebook cell. The inserted code serves as a quick start to allow you to easily begin working with a data set or connection. For production systems, you should carefully review the inserted code to determine if you should write your own code that better meets your needs.

    When you run the code cell, the data is accessed and loaded into the data structure you selected.

    Note: If the file type or database connection that you are using doesn’t appear in the following lists, you can select to create generic code. For Python this is a StreamingBody object and for R a textConnection object. Generic code cannot be created for Scala.

The following tables show you which data source connections (file types and database connections) support generating code which loads data into given data structures in a notebook. The Insert to code function options for generating code vary depending on the data source, the notebook coding language, and the notebook runtime compute.

Supported files types

Data source Notebook coding language Compute engine type Available support to load data
CSV files      
  Python 3.6 Anaconda Python distribution  Load data into pandasDataFrame
    With Spark Load data into pandasDataFrame and sparkSessionDataFrame 
    With Hadoop Load data into pandasDataFrame and sparkSessionDataFrame
  R 3.6 Anaconda R distribution Load data into R data frame
    With Spark Load data into R data frame and sparkSessionDataFrame 
    With Hadoop Load data into R data frame and sparkSessionDataFrame
  Scala 2.11 With Spark Load data into sparkSessionDataFrame
    With Hadoop Load data into sparkSessionDataFrame
Python Script      
  Python 3.6 Anaconda Python distribution  Load data into pandasStreamingBody
    With Spark Load data into pandasStreamingBody 
    With Hadoop Load data into pandasStreamingBody
  R 3.6 Anaconda R distribution Load data into rRawObject
    With Spark Load data into rRawObject 
    With Hadoop Load data into rRawObject
  Scala 2.11 With Spark No data load support
    With Hadoop No data load support
JSON files      
  Python 3.6 Anaconda Python distribution  Load data into pandasDataFrame
    With Spark Load data into pandasDataFrame and sparkSessionDataFrame 
    With Hadoop Load data into pandasDataFrame and sparkSessionDataFrame
  R 3.6 Anaconda R distribution Load data into R data frame
    With Spark Load data into R data frame, rRawObject and sparkSessionDataFrame 
    With Hadoop Load data into R data frame, rRawObject and sparkSessionDataFrame
  Scala 2.11 With Spark No data load support
    With Hadoop Load data into sparkSessionDataFrame
.xlsx and .xls files      
  Python 3.6 Anaconda Python distribution  Load data into pandasDataFrame
    With Spark Load data into pandasDataFrame 
    With Hadoop Load data into pandasDataFrame 
  R 3.6 Anaconda R distribution Load data into rRawObject
    With Spark No data load support
    With Hadoop No data load support
  Scala 2.11 With Spark No data load support
    With Hadoop No data load support
Octet-stream file types      
  Python 3.6 Anaconda Python distribution  Load data into pandasStreamingBody
    With Spark Load data into pandasStreamingBody 
  R 3.6 Anaconda R distribution Load data in rRawObject
    With Spark Load data in rDataObject
  Scala 2.11 With Spark No data load support
PDF file type      
  Python 3.6 Anaconda Python distribution  Load data into pandasStreamingBody
    With Spark Load data into pandasStreamingBody 
    With Hadoop Load data into pandasStreamingBody
  R 3.6 Anaconda R distribution Load data in rRawObject
    With Spark Load data in rDataObject
    With Hadoop Load data into rRawData
  Scala 2.11 With Spark No data load support
    With Hadoop No data load support
ZIP file type      
  Python 3.6 Anaconda Python distribution  Load data into pandasStreamingBody
    With Spark Load data into pandasStreamingBody 
  R 3.6 Anaconda R distribution Load data in rRawObject
    With Spark Load data in rDataObject
  Scala 2.11 With Spark No data load support
JPEG, PNG image files      
  Python 3.6 Anaconda Python distribution  Load data into pandasStreamingBody
    With Spark Load data into pandasStreamingBody 
    With Hadoop Load data into pandasStreamingBody 
  R 3.6 Anaconda R distribution Load data in rRawObject
    With Spark Load data in rDataObject
    With Hadoop Load data in rDataObject
  Scala 2.11 With Spark No data load support
    With Hadoop No data load support
Binary files      
  Python 3.6 Anaconda Python distribution  Load data into pandasStreamingBody
    With Spark Load data into pandasStreamingBody 
    Hadoop No data load support
  R 3.6 Anaconda R distribution Load data in rRawObject
    With Spark Load data into rRawObject
    Hadoop Load data in rDataObject
  Scala 2.11 With Spark No data load support
    With Hadoop Load data into sparkSessionDataFrame

 

Supported database connections

Data source Notebook coding language Compute engine type Available support to load data
- Db2 Warehouse on Cloud
- IBM Db2 on Cloud
- IBM Db2 Database
     
  Python 3.6 Anaconda Python distribution  Load data into ibmdbpyIda and ibmdbpyPandas
    With Spark Load data into ibmdbpyIda, ibmdbpyPandas and sparkSessionDataFrame 
    With Hadoop Load data into ibmdbpyIda, ibmdbpyPandas and sparkSessionDataFrame
  R 3.6 Anaconda R distribution Load data into ibmdbrIda and ibmdbrDataframe
    With Spark Load data into ibmdbrIda, ibmdbrDataFrame and sparkSessionDataFrame 
    With Hadoop Load data into ibmdbrIda, ibmdbrDataFrame and sparkSessionDataFrame
  Scala 2.11 With Spark Load data into sparkSessionDataFrame
    With Hadoop Load data into sparkSessionDataFrame
- Amazon Simple Storage Services (S3)
- Amazon Simple Storage Services (S3) with an IAM access policy
     
  Python 3.6 Anaconda Python distribution Load data into pandasStreamingBody
    With Hadoop Load data into pandasStreamingBody and sparkSessionSetup
  R 3.6 Anaconda R distributuion Load data into rRawObject
  R 3.6 With Hadoop Load data into rRawObject and sparkSessionSetup
  Scala 2.11 With Spark No data load support
    With Hadoop No data load support
- IBM Databases for PostgreSQL
- Microsoft SQL Server
     
  Python 3.6 Anaconda Python distribution Load data into pandasDataFrame
    With Spark Load data into pandasDataFrame 
  R 3.6 Anaconda R distribution Load data into R data frame
    With Spark Load data into R data frame and sparkSessionDataFrame 
  Scala 2.11 With Spark Load data into sparkSessionDataFrame
- Cognos Analytics      
  Python 3.6 Anaconda Python distribution Load data into pandasDataFrame
In the generated code:
- Edit the path parameter in the last line of code
- Remove the comment tagging
    With Spark No data load support 
  R 3.6 Anaconda R distribution Load data into R data frame
In the generated code:
- Edit the path parameter in the last line
- Remove the comment tagging
    With Spark No data load support
  Scala 2.11 With Spark No data load support