Data load support
The Insert to code
function is available for project data assets in Jupyter notebooks when you click the Find and Add Data icon () and select an asset in the notebook sidebar. The asset can be data from a file or a data source connection.
By clicking in an empty code cell in your notebook and then clicking the Insert to code link below an asset name, you can select to:
- Insert the data source access credentials. This capability is available for all data assets that are added to a project. With the credentials, you can write your own code to access the asset and load the data into data structures of your choice in your notebook.
-
Generate code that is added to the notebook cell. The inserted code serves as a quick start to allow you to easily begin working with a data set or connection. For production systems, you should carefully review the inserted code to determine if you should write your own code that better meets your needs.
When you run the code cell, the data is accessed and loaded into the data structure you selected.
Note: If the file type or database connection that you are using doesn’t appear in the following lists, you can select to create generic code. For Python this is a StreamingBody object and for R a textConnection object. Generic code cannot be created for Scala.
The following tables show you which data source connections (file types and database connections) support generating code which loads data into given data structures in a notebook. The Insert to code
function options for generating code vary depending on the data source, the notebook coding language, and the notebook runtime compute.
Supported files types
Data source | Notebook coding language | Compute engine type | Available support to load data |
---|---|---|---|
CSV files | Â | Â | Â |
 | Python | Anaconda Python distribution | Load data into pandasDataFrame |
 |  | With Spark | Load data into pandasDataFrame and sparkSessionDataFrame |
 |  | With Hadoop | Load data into pandasDataFrame and sparkSessionDataFrame |
 | R | Anaconda R distribution | Load data into R data frame |
 |  | With Spark | Load data into R data frame and sparkSessionDataFrame |
 |  | With Hadoop | Load data into R data frame and sparkSessionDataFrame |
 | Scala | With Spark | Load data into sparkSessionDataFrame |
 |  | With Hadoop | Load data into sparkSessionDataFrame |
Python Script | Â | Â | Â |
 | Python | Anaconda Python distribution | Load data into pandasStreamingBody |
 |  | With Spark | Load data into pandasStreamingBody |
 |  | With Hadoop | Load data into pandasStreamingBody |
 | R | Anaconda R distribution | Load data into rRawObject |
 |  | With Spark | Load data into rRawObject |
 |  | With Hadoop | Load data into rRawObject |
 | Scala | With Spark | No data load support |
 |  | With Hadoop | No data load support |
JSON files | Â | Â | Â |
 | Python | Anaconda Python distribution | Load data into pandasDataFrame |
 |  | With Spark | Load data into pandasDataFrame and sparkSessionDataFrame |
 |  | With Hadoop | Load data into pandasDataFrame and sparkSessionDataFrame |
 | R | Anaconda R distribution | Load data into R data frame |
 |  | With Spark | Load data into R data frame, rRawObject and sparkSessionDataFrame |
 |  | With Hadoop | Load data into R data frame, rRawObject and sparkSessionDataFrame |
 | Scala | With Spark | No data load support |
 |  | With Hadoop | Load data into sparkSessionDataFrame |
.xlsx and .xls files | Â | Â | Â |
 | Python | Anaconda Python distribution | Load data into pandasDataFrame |
 |  | With Spark | Load data into pandasDataFrame |
 |  | With Hadoop | Load data into pandasDataFrame |
 | R | Anaconda R distribution | Load data into rRawObject |
 |  | With Spark | No data load support |
 |  | With Hadoop | No data load support |
 | Scala | With Spark | No data load support |
 |  | With Hadoop | No data load support |
Octet-stream file types | Â | Â | Â |
 | Python | Anaconda Python distribution | Load data into pandasStreamingBody |
 |  | With Spark | Load data into pandasStreamingBody |
 | R | Anaconda R distribution | Load data in rRawObject |
 |  | With Spark | Load data in rDataObject |
 | Scala | With Spark | No data load support |
PDF file type | Â | Â | Â |
 | Python | Anaconda Python distribution | Load data into pandasStreamingBody |
 |  | With Spark | Load data into pandasStreamingBody |
 |  | With Hadoop | Load data into pandasStreamingBody |
 | R | Anaconda R distribution | Load data in rRawObject |
 |  | With Spark | Load data in rDataObject |
 |  | With Hadoop | Load data into rRawData |
 | Scala | With Spark | No data load support |
 |  | With Hadoop | No data load support |
ZIP file type | Â | Â | Â |
 | Python | Anaconda Python distribution | Load data into pandasStreamingBody |
 |  | With Spark | Load data into pandasStreamingBody |
 | R | Anaconda R distribution | Load data in rRawObject |
 |  | With Spark | Load data in rDataObject |
 | Scala | With Spark | No data load support |
JPEG, PNG image files | Â | Â | Â |
 | Python | Anaconda Python distribution | Load data into pandasStreamingBody |
 |  | With Spark | Load data into pandasStreamingBody |
 |  | With Hadoop | Load data into pandasStreamingBody |
 | R | Anaconda R distribution | Load data in rRawObject |
 |  | With Spark | Load data in rDataObject |
 |  | With Hadoop | Load data in rDataObject |
 | Scala | With Spark | No data load support |
 |  | With Hadoop | No data load support |
Binary files | Â | Â | Â |
 | Python | Anaconda Python distribution | Load data into pandasStreamingBody |
 |  | With Spark | Load data into pandasStreamingBody |
 |  | Hadoop | No data load support |
 | R | Anaconda R distribution | Load data in rRawObject |
 |  | With Spark | Load data into rRawObject |
 |  | Hadoop | Load data in rDataObject |
 | Scala | With Spark | No data load support |
 |  | With Hadoop | Load data into sparkSessionDataFrame |
Supported database connections
Data source | Notebook coding language | Compute engine type | Available support to load data |
---|---|---|---|
- Db2 Warehouse on Cloud - IBM Db2 on Cloud - IBM Db2 Database |
 |  |  |
 | Python | Anaconda Python distribution | Load data into ibmdbpyIda and ibmdbpyPandas |
 |  | With Spark | Load data into ibmdbpyIda, ibmdbpyPandas and sparkSessionDataFrame |
 |  | With Hadoop | Load data into ibmdbpyIda, ibmdbpyPandas and sparkSessionDataFrame |
 | R | Anaconda R distribution | Load data into ibmdbrIda and ibmdbrDataframe |
 |  | With Spark | Load data into ibmdbrIda, ibmdbrDataFrame and sparkSessionDataFrame |
 |  | With Hadoop | Load data into ibmdbrIda, ibmdbrDataFrame and sparkSessionDataFrame |
 | Scala | With Spark | Load data into sparkSessionDataFrame |
 |  | With Hadoop | Load data into sparkSessionDataFrame |
- Amazon Simple Storage Services (S3) - Amazon Simple Storage Services (S3) with an IAM access policy |
 |  |  |
 | Python | Anaconda Python distribution | Load data into pandasStreamingBody |
 |  | With Hadoop | Load data into pandasStreamingBody and sparkSessionSetup |
 | R | Anaconda R distributuion | Load data into rRawObject |
 | R | With Hadoop | Load data into rRawObject and sparkSessionSetup |
 | Scala | With Spark | No data load support |
 |  | With Hadoop | No data load support |
- IBM Databases for PostgreSQL - Microsoft SQL Server |
 |  |  |
 | Python | Anaconda Python distribution | Load data into pandasDataFrame |
 |  | With Spark | Load data into pandasDataFrame |
 | R | Anaconda R distribution | Load data into R data frame |
 |  | With Spark | Load data into R data frame and sparkSessionDataFrame |
 | Scala | With Spark | Load data into sparkSessionDataFrame |
- Cognos Analytics | Â | Â | Â |
 | Python | Anaconda Python distribution | Load data into pandasDataFrame In the generated code: - Edit the path parameter in the last line of code - Remove the comment tagging To read data, see Reading data from a data source To search data, see Searching for data objects To write data, see Writing data to a data source |
 |  | With Spark | No data load support |
 | R | Anaconda R distribution | Load data into R data frame In the generated code: - Edit the path parameter in the last line of code - Remove the comment tagging To read data, see Reading data from a data source To search data, see Searching for data objects To write data, see Writing data to a data source |
 |  | With Spark | No data load support |
 | Scala | With Spark | No data load support |
- Microsoft Azure Cosmos DB | Â | Â | Â |
 | Python | Anaconda Python distribution | Load data into pandasDataFrame |
 |  | With Spark | Load data into pandasDataFrame |
 | R | Anaconda R distribution | No data load support |
 |  | With Spark | No data load support |
 | Scala | With Spark | No data load support |
- Amazon RDS for MySQL |
 |  |  |
 | Python | Anaconda Python distribution | Load data into pandasDataFrame |
 |  | With Spark | Load data into pandasDataFrame |
 | R | Anaconda R distribution | Load data into R data frame and sparkSessionDataFrame |
 |  | With Spark | No data load support |
 | Scala | With Spark | Load data into sparkSessionDataFrame |
- HTTP - Apache Cassandra - Amazon RDS for PostgreSQL |
 |  |  |
 | Python | Anaconda Python distribution | Load data into pandasDataFrame |
 |  | With Spark | Load data into pandasDataFrame |
 | R | Anaconda R distribution | Load data into R data frame |
 |  | With Spark | Load data into R data frame |
 | Scala | With Spark | Load data into sparkSessionDataFrame |