Python connectors for loading and saving data

Most of these connector code snippets are valid for Python 2.7 on Apache Spark 2.1. You cannot use the code samples in notebooks which run in Anaconda-based environment runtimes (environment runtimes that do not use Spark).

You can use connector code to load data from and save data to the following data sources:

Before you can use this code in a notebook, you must have already created a connection to the data source with the Services > Connections menu.

Amazon RedShift

To load data into a DataFrame named RedshiftDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

RedshiftloadOptions = {
                     Connectors.Redshift.HOST              : '***********',
                     Connectors.Redshift.PORT              : '***********',
                     Connectors.Redshift.DATABASE          : '***********',
                     Connectors.Redshift.USERNAME          : '***********',
                     Connectors.Redshift.PASSWORD          : '***********',
                      Connectors.Redshift.SOURCE_TABLE_NAME         : '***********'}

RedshiftDF = sqlContext.read.format("com.ibm.spark.discover").options(**RedshiftloadOptions).load()
RedshiftDF.printSchema()
RedshiftDF.show()

To save a DataFrame named NewRedshiftDF in your notebook back to Amazon Redshift, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

Redshiftsaveoption = {
                     Connectors.Redshift.HOST              : '***********',
                     Connectors.Redshift.PORT              : '***********',
                     Connectors.Redshift.DATABASE          : '***********',
                     Connectors.Redshift.USERNAME          : '***********',
                     Connectors.Redshift.PASSWORD          : '***********',
                     Connectors.Redshift.TARGET_TABLE_NAME : '**********.*****',
                     Connectors.Redshift.TARGET_TABLE_ACTION : 'merge'}

NewRedshiftDF = RedshiftDF.write.format("com.ibm.spark.discover").options(**Redshiftsaveoption).save()

Amazon S3

To load data into a DataFrame named S3DF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

S3loadoptions = {
                  Connectors.AmazonS3.ACCESS_KEY          : '***********',
                  Connectors.AmazonS3.SECRET_KEY          : '***********',
                  Connectors.AmazonS3.SOURCE_BUCKET       : '***********',
                  Connectors.AmazonS3.SOURCE_FILE_NAME    : '***********.csv',
                  Connectors.AmazonS3.SOURCE_INFER_SCHEMA : '1',
                  Connectors.AmazonS3.SOURCE_FILE_FORMAT  : 'csv'}

S3DF = sqlContext.read.format('com.ibm.spark.discover').options(**S3loadoptions).load()
S3DF.printSchema()
S3DF.show(5)

To save a DataFrame named NewS3DF in your notebook back to Amazon S3, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

S3saveoptions = { Connectors.AmazonS3.ACCESS_KEY        : '***********',
                  Connectors.AmazonS3.SECRET_KEY        : '***********',
                  Connectors.AmazonS3.TARGET_BUCKET     : '***********',
                  Connectors.AmazonS3.TARGET_FILE_NAME  : '***********.csv',
                  Connectors.AmazonS3.TARGET_WRITE_MODE : 'write'}

NewS3DF = S3DF.write.format("com.ibm.spark.discover").options(**S3saveoptions).save()

Apache Hive

To load data into a DataFrame named HiveDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

HiveloadOptions = { Connectors.Hive.HOST                        : '***********',
                      Connectors.Hive.PORT                      : '****',
                      Connectors.Hive.DATABASE                  : '***********',
                      Connectors.Hive.USERNAME                  : '***********',
                      Connectors.Hive.PASSWORD                  : '***********',
                      Connectors.Hive.SOURCE_TABLE_NAME         : '***********'}

HiveDF = sqlContext.read.format("com.ibm.spark.discover").options(**HiveloadOptions).load()
HiveDF.printSchema()
HiveDF.show()

Compose for PostgreSQL

To load data into a DataFrame named PostgreSQLComposeDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

PostgreSQLComposeloadOptions = {
                     Connectors.PostgreSQLCompose.HOST              : '***********',
                     Connectors.PostgreSQLCompose.PORT              : '***********',
                     Connectors.PostgreSQLCompose.DATABASE          : '***********',
                     Connectors.PostgreSQLCompose.USERNAME          : '***********',
                     Connectors.PostgreSQLCompose.PASSWORD          : '***********',
                      Connectors.PostgreSQLCompose.SOURCE_TABLE_NAME         : '***********'}

PostgreSQLComposeDF = sqlContext.read.format("com.ibm.spark.discover").options(**PostgreSQLComposeloadOptions).load()
PostgreSQLComposeDF.printSchema()
PostgreSQLComposeDF.show()

To save a DataFrame named NewPostgreSQLComposeDF in your notebook back to Compose for PostgreSQL, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

PostgreSQLComposesaveoption = {
                     Connectors.PostgreSQLCompose.HOST              : '***********',
                     Connectors.PostgreSQLCompose.PORT              : '***********',
                     Connectors.PostgreSQLCompose.DATABASE          : '***********',
                     Connectors.PostgreSQLCompose.USERNAME          : '***********',
                     Connectors.PostgreSQLCompose.PASSWORD          : '***********',
                     Connectors.PostgreSQL.TARGET_TABLE_NAME : 'TABLE2',
                     Connectors.PostgreSQL.TARGET_WRITE_MODE : 'insert',
                     Connectors.PostgreSQL.TARGET_TABLE_ACTION : 'append'}

NewPostgreSQLComposeDF = PostgreSQLComposeDF.write.format("com.ibm.spark.discover").options(**PostgreSQLComposesaveoption).save()

Greenplum Database

To load data into a DataFrame named GreenplumDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

GreenplumloadOptions = {
                     Connectors.Greenplum.HOST              : '***********',
                     Connectors.Greenplum.PORT              : '***********',
                     Connectors.Greenplum.DATABASE          : '***********',
                     Connectors.Greenplum.USERNAME          : '***********',
                     Connectors.Greenplum.PASSWORD          : '***********',
                      Connectors.Greenplum.SOURCE_TABLE_NAME         : '***********'}

GreenplumDF = sqlContext.read.format("com.ibm.spark.discover").options(**GreenplumloadOptions).load()
GreenplumDF.printSchema()
GreenplumDF.show()

To save a DataFrame named NewGreenplumDF in your notebook back to Greenplum, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

Greenplumsaveoption = {
                     Connectors.Greenplum.HOST              : '***********',
                     Connectors.Greenplum.PORT              : '***********',
                     Connectors.Greenplum.DATABASE          : '***********',
                     Connectors.Greenplum.USERNAME          : '***********',
                     Connectors.Greenplum.PASSWORD          : '***********',
                     Connectors.Greenplum.TARGET_TABLE_NAME : '***********',
                     Connectors.Greenplum.TARGET_TABLE_ACTION : 'merge'}

NewGreenplumDF = GreenplumDF.write.format("com.ibm.spark.discover").options(**Greenplumsaveoption).save()

Hortonworks for Apache Hadoop

To load data into a DataFrame named HortonDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
HortonloadOptions = {
    Connectors.HdfsHortonWorks.URL : '***************',
    Connectors.HdfsHortonWorks.USERNAME : '***********',
    Connectors.HdfsHortonWorks.PASSWORD : '***********',
    Connectors.HdfsHortonWorks.SOURCE_FILE_NAME : '***********.csv',
    Connectors.HdfsHortonWorks.SOURCE_FILE_FORMAT : 'csv',
    Connectors.HdfsHortonWorks.SOURCE_INFER_SCHEMA : '1'
  }

HortonDF = sqlContext.read.format("com.ibm.spark.discover").options(**HortonloadOptions).load()
HortonDF.printSchema()
HortonDF.show()

To save a DataFrame named NewHortonDF in your notebook back to Hortonworks, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

HortonsaveOptions = {
    Connectors.HdfsHortonWorks.URL : '*************',
    Connectors.HdfsHortonWorks.USERNAME : '***********',
    Connectors.HdfsHortonWorks.PASSWORD : '***********',
    Connectors.HdfsHortonWorks.TARGET_FILE_NAME : 'token2.csv',
    Connectors.HdfsHortonWorks.TARGET_WRITE_MODE : 'write'
}

NewHortonDF = HortonDF.write.format("com.ibm.spark.discover").options(**HortonsaveOptions).save()

IBM BigInsights for Apache Hadoop

To load data into a DataFrame named hdfsDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

HDFSloadOptions = {
    Connectors.HdfsBigInsights.URL : '****************************************************',
    Connectors.HdfsBigInsights.USERNAME : '***********',
    Connectors.HdfsBigInsights.PASSWORD : '***********',
    Connectors.HdfsBigInsights.SOURCE_FILE_NAME : '*****.csv',
    Connectors.HdfsBigInsights.SOURCE_FILE_FORMAT : 'csv',
    Connectors.HdfsBigInsights.SOURCE_INFER_SCHEMA : '1',
    Connectors.HdfsBigInsights.SSL_CERTIFICATE : '***************************************'
  }

hdfsDF = sqlContext.read.format("com.ibm.spark.discover").options(**HDFSloadOptions).load()
hdfsDF.printSchema()
hdfsDF.show()

If you have the BigInsights basic or free service plan, remove the line for the SSL_CERTIFICATE and the preceding comma. If you have the BigInsights Enterprise service plan, you must include a value for the SSL certificate.

To obtain the SSL certificate information:

  1. If you are using a Firefox browser, delete the cache and history.
  2. From your BigInsights cluster details page in IBM Cloud, click LAUNCH to start the Ambari console.
  3. Click the lock icon in your browser's address bar.
  4. Click More Information and then View Certificate on the Security page.
  5. In the Certificate Viewer, select the Details tab and then click Export.
  6. Open the certificate in Notepad or another word processing utility.
  7. Add \n at the end of each line:
    -----BEGIN CERTIFICATE-----\n
    <certificate text>\n
    ...
    <certificate text>\n
    -----END CERTIFICATE-----
    
  8. Copy the certificate text, including the first and last lines, and paste it as a single string for the value of the SSL_CERTIFICATE field: Connectors.HdfsBigInsights.SSL_CERTIFICATE : '-----BEGIN CERTIFICATE----- \n certificate text\n certificate text\n -----END CERTIFICATE-----'

To save a DataFrame named NewhdfsDF in your notebook back to BigInsights, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

HDFSsaveOptions = {
    Connectors.HdfsBigInsights.URL : '*****************************************************',
    Connectors.HdfsBigInsights.USERNAME : '***********',
    Connectors.HdfsBigInsights.PASSWORD : '***********',
    Connectors.HdfsBigInsights.TARGET_FILE_NAME : '******.csv',
    Connectors.HdfsBigInsights.SSL_CERTIFICATE : '***************************************',
    Connectors.HdfsBigInsights.TARGET_WRITE_MODE : 'write'
}

NewHdfsDF = hdfsDF.write.format("com.ibm.spark.discover").options(**HDFSsaveOptions).save()

If you have the BigInsights basic or free service plan, remove SSL_CERTIFICATE line. If you have the BigInsights Enterprise service plan, include a value for the SSL certificate.

IBM Cloud Object Storage

To load data into a DataFrame named S3DF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

S3loadoptions = {
      Connectors.BluemixCloudObjectStorage.URL                          : "************************",
      Connectors.BluemixCloudObjectStorage.IAM_URL                      : "*************************",
      Connectors.BluemixCloudObjectStorage.RESOURCE_INSTANCE_ID         : "*******************************",
      Connectors.BluemixCloudObjectStorage.API_KEY                      : "***********",
      Connectors.BluemixCloudObjectStorage.REGION                       : "*********",
      Connectors.BluemixCloudObjectStorage.SOURCE_BUCKET                : "********************",
      Connectors.BluemixCloudObjectStorage.SOURCE_FILE_NAME             : "*******************.csv",
      Connectors.BluemixCloudObjectStorage.SOURCE_INFER_SCHEMA          : "1",
      Connectors.BluemixCloudObjectStorage.SOURCE_FILE_FORMAT           : "csv",
      Connectors.BluemixCloudObjectStorage.SOURCE_FIRST_LINE_HEADER     : "true",
      Connectors.BluemixCloudObjectStorage.SOURCE_INVALID_DATA_HANDLING : "column"}

S3DF = sqlContext.read.format('com.ibm.spark.discover').options(**S3loadoptions).load()
S3DF.printSchema()
S3DF.count()

To save a DataFrame named NewS3DF in your notebook back to IBM Cloud Object Storage, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

S3saveoptions = {
      Connectors.BluemixCloudObjectStorage.URL                      : "********************",
      Connectors.BluemixCloudObjectStorage.IAM_URL                  : "*********************",
      Connectors.BluemixCloudObjectStorage.RESOURCE_INSTANCE_ID     : "******************************************",
      Connectors.BluemixCloudObjectStorage.API_KEY                  : "*******************************",
      Connectors.BluemixCloudObjectStorage.TARGET_BUCKET            : "******************",
      Connectors.BluemixCloudObjectStorage.TARGET_FILE_NAME         : "**************.csv",
      Connectors.BluemixCloudObjectStorage.TARGET_WRITE_MODE        : "write",
      Connectors.BluemixCloudObjectStorage.TARGET_FILE_FORMAT       : "csv",
      Connectors.BluemixCloudObjectStorage.TARGET_FIRST_LINE_HEADER : "true"}

NewS3DF = S3DF.write.format('com.ibm.spark.discover').options(**S3saveoptions).save()

IBM Cloudant

You can use this connector with Spark 2.1 or later.

Watch a video or download a notebook on how to use this connector.

To load data into a DataFrame named cloudantdata, copy this code into a code cell in your notebook and replace the asterisks with your information:

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

cloudantdata = spark.read.format("org.apache.bahir.cloudant")\
       .option("cloudant.host","***********")\
       .option("cloudant.username", "*************")\
       .option("cloudant.password","************")\
       .load("cloudantdata")
cloudantdata.printSchema()
cloudantdata.show()

To select the properties column on the cloudantdata DataFrame and save it to a Cloudant database named Newcloudantdata, copy this code into a code cell in your notebook and replace the asterisks with your information:

cloudantdata.select("properties").write.format("org.apache.bahir.cloudant")\
         .option("cloudant.host","***********")\
         .option("cloudant.username", "**************")\
         .option("cloudant.password","****************")\
         .option("createDBOnSave", "true")\
         .save("Newcloudantdata")

IBM Db2 Warehouse on Cloud (previously named IBM dashDB)

To load data into a DataFrame named dashdbDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

dashDBloadOptions = { Connectors.DASHDB.HOST              : '***********',
                      Connectors.DASHDB.DATABASE          : '***********',
                      Connectors.DASHDB.USERNAME          : '***********',
                      Connectors.DASHDB.PASSWORD          : '***********',
                      Connectors.DASHDB.SOURCE_TABLE_NAME : '***********.****'}

dashdbDF = sqlContext.read.format("com.ibm.spark.discover").options(**dashDBloadOptions).load()
dashdbDF.printSchema()
dashdbDF.show()

To save a DataFrame named NewdashdbDF in your notebook back to the database, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

dashdbsaveoption = {
                     Connectors.DASHDB.HOST              : '***********',
                     Connectors.DASHDB.DATABASE          : '***********',
                     Connectors.DASHDB.USERNAME          : '***********',
                     Connectors.DASHDB.PASSWORD          : '***********',
                     Connectors.DASHDB.TARGET_TABLE_NAME : '***********.****',
                     Connectors.DASHDB.TARGET_WRITE_MODE : 'merge' }

NewdashDBDF = dashdbDF.write.format("com.ibm.spark.discover").options(**dashdbsaveoption).save()

IBM Db2

To load data into a DataFrame named DB2DF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

DB2loadOptions = {
                     Connectors.DB2.HOST              : '***********',
                     Connectors.DB2.PORT              : '***********',
                     Connectors.DB2.DATABASE          : '*********',
                     Connectors.DB2.USERNAME          : '***********',
                     Connectors.DB2.PASSWORD          : '***********',
                      Connectors.DB2.SOURCE_TABLE_NAME         : '***********'}

DB2DF = sqlContext.read.format("com.ibm.spark.discover").options(**DB2loadOptions).load()
DB2DF.printSchema()
DB2DF.show()

To save a DataFrame named NewDB2DF in your notebook back to the database, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

DB2saveoption = {
                     Connectors.DB2.HOST              : '***********',
                     Connectors.DB2.PORT              : '***********',
                     Connectors.DB2.DATABASE          : '***********',
                     Connectors.DB2.USERNAME          : '***********',
                     Connectors.DB2.PASSWORD          : '***********',
                     Connectors.DB2.TARGET_TABLE_NAME : '***********.********',
                     Connectors.DB2.TARGET_TABLE_ACTION : 'merge',
                     Connectors.DB2.TARGET_WRITE_MODE : 'insert'}

NewDB2DF = DB2DF.write.format("com.ibm.spark.discover").options(**DB2saveoption).save()

IBM Db2 Hosted (previously named IBM DB2 on Cloud)

To load data into a DataFrame named DB2CLOUDDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
DB2CLOUDloadOptions = {
                     Connectors.DB2CLOUD.HOST              : '***********',
                     Connectors.DB2CLOUD.PORT              : '***********',
                     Connectors.DB2CLOUD.DATABASE          : '***********',
                     Connectors.DB2CLOUD.USERNAME          : '***********',
                     Connectors.DB2CLOUD.PASSWORD          : '***********',
                     Connectors.DB2CLOUD.SOURCE_TABLE_NAME         : '***********'}

DB2CLOUDDF = sqlContext.read.format("com.ibm.spark.discover").options(**DB2CLOUDloadOptions).load()
DB2CLOUDDF.printSchema()
DB2CLOUDDF.show()

To save a DataFrame named NewDB2CLOUDDF in your notebook back to the database, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

DB2CLOUDsaveoption = {
                     Connectors.DB2CLOUD.HOST              : '***********',
                     Connectors.DB2CLOUD.PORT              : '***********',
                     Connectors.DB2CLOUD.DATABASE          : '***********',
                     Connectors.DB2CLOUD.USERNAME          : '***********',
                     Connectors.DB2CLOUD.PASSWORD          : '***********',
                     Connectors.DB2CLOUD.TARGET_TABLE_NAME : '***********.********',
                     Connectors.DB2CLOUD.TARGET_TABLE_ACTION : 'merge',
                     Connectors.DB2CLOUD.TARGET_WRITE_MODE : 'insert'}

NewDB2CLOUDDF = DB2CLOUDDF.write.format("com.ibm.spark.discover").options(**DB2CLOUDsaveoption).save()

IBM DB2 z/OS

To load data into a DataFrame named DB2ZOSDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
DB2ZOSloadOptions = {
                     Connectors.DB2ZOS.HOST              : '***********',
                     Connectors.DB2ZOS.PORT              : '***********',
                     Connectors.DB2ZOS.DATABASE          : '***********',
                     Connectors.DB2ZOS.USERNAME          : '***********',
                     Connectors.DB2ZOS.PASSWORD          : '***********',
                     Connectors.DB2ZOS.SOURCE_TABLE_NAME         : '***********'}

DB2ZOSDF = sqlContext.read.format("com.ibm.spark.discover").options(**DB2ZOSloadOptions).load()
DB2ZOSDF.printSchema()
DB2ZOSDF.show()

To save a DataFrame named NewDB2ZOSDF in your notebook back to DB2 z/OS, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

DB2ZOSsaveoption = {
                     Connectors.DB2ZOS.HOST              : '***********',
                     Connectors.DB2ZOS.PORT              : '***********',
                     Connectors.DB2ZOS.DATABASE          : '***********',
                     Connectors.DB2ZOS.USERNAME          : '***********',
                     Connectors.DB2ZOS.PASSWORD          : '***********',
                     Connectors.DB2ZOS.TARGET_TABLE_NAME : '***********.********',
                     Connectors.DB2ZOS.TARGET_TABLE_ACTION : 'merge',
                     Connectors.DB2ZOS.TARGET_WRITE_MODE : 'insert'}

NewDB2ZOSDF = DB2ZOSDF.write.format("com.ibm.spark.discover").options(**DB2ZOSsaveoption).save()

IBM Informix

To load data into a DataFrame named InformixDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

InformixloadOptions = {
                     Connectors.Informix.HOST              : '***********',
                     Connectors.Informix.PORT              : '***********',
                     Connectors.Informix.SERVER            : '***********',
                     Connectors.Informix.DATABASE          : '***********',
                     Connectors.Informix.USERNAME          : '***********',
                     Connectors.Informix.PASSWORD          : '***********',
                      Connectors.Informix.SOURCE_TABLE_NAME         : '***********'}

InformixDF = sqlContext.read.format("com.ibm.spark.discover").options(**InformixloadOptions).load()
InformixDF.printSchema()
InformixDF.show()

To save a DataFrame named NewInformixDF in your notebook back to Informix, copy this code into a code cell in your notebook and replace the asterisks with your connection information:

from ingest.Connectors import Connectors

Informixsaveoption = {
                     Connectors.Informix.HOST              : '***********',
                     Connectors.Informix.PORT              : '***********',
                     Connectors.Informix.SERVER            : '***********',
                     Connectors.Informix.DATABASE          : '***********',
                     Connectors.Informix.USERNAME          : '***********',
                     Connectors.Informix.PASSWORD          : '***********',
                     Connectors.Informix.TARGET_TABLE_NAME : '***********',
                     Connectors.Informix.TARGET_TABLE_ACTION : 'merge'}

NewInformixDF = InformixDF.write.format("com.ibm.spark.discover").options(**Informixsaveoption).save()

IBM Netezza Data Warehouse

To load data into a DataFrame named NetezzaDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

NetezzaloadOptions = {
                     Connectors.Netezza.HOST              : '***********',
                     Connectors.Netezza.PORT              : '***********',
                     Connectors.Netezza.DATABASE          : '***********',
                     Connectors.Netezza.USERNAME          : '***********',
                     Connectors.Netezza.PASSWORD          : '***********',
                      Connectors.Netezza.SOURCE_TABLE_NAME         : '***********'}

NetezzaDF = sqlContext.read.format("com.ibm.spark.discover").options(**NetezzaloadOptions).load()
NetezzaDF.printSchema()
NetezzaDF.show()

To save a DataFrame named NewNetezzaDF in your notebook back to Netezza, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

Netezzasaveoption = {
                     Connectors.Netezza.HOST              : '***********',
                     Connectors.Netezza.PORT              : '***********',
                     Connectors.Netezza.DATABASE          : '***********',
                     Connectors.Netezza.USERNAME          : '***********',
                     Connectors.Netezza.PASSWORD          : '***********',
                     Connectors.Netezza.TARGET_TABLE_NAME : '***********',
                     Connectors.Netezza.TARGET_TABLE_ACTION : 'merge',
                     Connectors.Netezza.TARGET_WRITE_MODE : 'insert'}

NewNetezzaDF = NetezzaDF.write.format("com.ibm.spark.discover").options(**Netezzasaveoption).save()

IBM Object Storage (Swift)

To load data into a DataFrame named objectstoreDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

objectstoreloadOptions = {
       Connectors.BluemixObjectStorage.AUTH_URL             : '***********',
        Connectors.BluemixObjectStorage.USERID              : '***********',
        Connectors.BluemixObjectStorage.PASSWORD            : '***********',
        Connectors.BluemixObjectStorage.PROJECTID           : '***********',
        Connectors.BluemixObjectStorage.REGION              : '***********',
        Connectors.BluemixObjectStorage.SOURCE_CONTAINER    : '***********',
        Connectors.BluemixObjectStorage.SOURCE_FILE_NAME    : '***********.csv',
        Connectors.BluemixObjectStorage.SOURCE_INFER_SCHEMA : '1'}

objectstoreDF = sqlContext.read.format("com.ibm.spark.discover").options(**objectstoreloadOptions).load()
objectstoreDF.printSchema()
objectstoreDF.show(5)

To save a DataFrame named NewobjectstoreDF in your notebook back to Object Storage (Swift), copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

objectstoresaveOptions = {
        Connectors.BluemixObjectStorage.AUTH_URL          : '***********',
        Connectors.BluemixObjectStorage.USERID            : '***********',
        Connectors.BluemixObjectStorage.PASSWORD          : '***********',
        Connectors.BluemixObjectStorage.PROJECTID         : '***********',
        Connectors.BluemixObjectStorage.REGION            : '***********',
        Connectors.BluemixObjectStorage.TARGET_CONTAINER  : '***********',
        Connectors.BluemixObjectStorage.TARGET_FILE_NAME  : '***********.csv',
        Connectors.BluemixObjectStorage.TARGET_WRITE_MODE : 'write'}

NewobjectstoreDF = objectstoreDF.write.format("com.ibm.spark.discover").options(**objectstoresaveOptions).save()

IBM SQLDB

To load data into a DataFrame named SQLDBDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

SQLDBloadOptions = {
                     Connectors.SQLDB.HOST              : '***********',
                     Connectors.SQLDB.PORT              : '***********',
                     Connectors.SQLDB.DATABASE          : '***********',
                     Connectors.SQLDB.USERNAME          : '***********',
                     Connectors.SQLDB.PASSWORD          : '***********',
                      Connectors.SQLDB.SOURCE_TABLE_NAME         : '***********'}

SQLDBDF = sqlContext.read.format("com.ibm.spark.discover").options(**SQLDBloadOptions).load()
SQLDBDF.printSchema()
SQLDBDF.show()

To save a DataFrame named NewSQLDBDF in your notebook back to SQLDB, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

SQLDBsaveoption = {
                     Connectors.SQLDB.HOST              : '***********',
                     Connectors.SQLDB.PORT              : '***********',
                     Connectors.SQLDB.DATABASE          : '***********',
                     Connectors.SQLDB.USERNAME          : '***********',
                     Connectors.SQLDB.PASSWORD          : '***********',
                     Connectors.SQLDB.TARGET_TABLE_NAME : '***********',
                     Connectors.SQLDB.TARGET_TABLE_ACTION : 'append'}

NewSQLDBDF = SQLDBDF.write.format("com.ibm.spark.discover").options(**SQLDBsaveoption).save()

IBM Watson Analytics

To load data into a DataFrame named WatsonAnalyticsDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

WatsonAnalyticsloadOptions = {
                     Connectors.WatsonAnalytics.CLIENT_ID              : '***********',
                     Connectors.WatsonAnalytics.SECRET_ID              : '***********',
                     Connectors.WatsonAnalytics.CUSTOM_URL            : '***********',
                     Connectors.WatsonAnalytics.USERNAME          : '***********',
                     Connectors.WatsonAnalytics.PASSWORD          : '***********',
                     Connectors.WatsonAnalytics.SOURCE_FILE_NAME         : '***********'}

WatsonAnalyticsDF = sqlContext.read.format("com.ibm.spark.discover").options(**WatsonAnalyticsloadOptions).load()
WatsonAnalyticsDF.printSchema()
WatsonAnalyticsDF.show()

To save a DataFrame named NewWatsonAnalyticsDF in your notebook back to Watson Analytics, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

WatsonAnalyticssaveoption = {
                     Connectors.WatsonAnalytics.CLIENT_ID              : '***********',
                     Connectors.WatsonAnalytics.SECRET_ID              : '***********',
                     Connectors.WatsonAnalytics.CUSTOM_URL            : '***********',
                     Connectors.WatsonAnalytics.USERNAME          : '***********',
                     Connectors.WatsonAnalytics.PASSWORD          : '***********',
                     Connectors.WatsonAnalytics.TARGET_FILE_NAME : '********',
                     Connectors.WatsonAnalytics.TARGET_WA_META_DATA : '***********',
                     Connectors.WatsonAnalytics.TARGET_WRITE_MODE : '*****'}

NewWatsonAnalyticsDF = WatsonAnalyticsDF.write.format("com.ibm.spark.discover").options(**WatsonAnalyticssaveoption).save()

Microsoft SQL Server

To load data into a DataFrame named SqlServerDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

SqlServerloadOptions = {
                     Connectors.SqlServer.HOST              : '***********',
                     Connectors.SqlServer.PORT              : '***********',
                     Connectors.SqlServer.DATABASE          : '***********',
                     Connectors.SqlServer.USERNAME          : '***********',
                     Connectors.SqlServer.PASSWORD          : '***********',
                      Connectors.SqlServer.SOURCE_TABLE_NAME         : '***********'}

SqlServerDF = sqlContext.read.format("com.ibm.spark.discover").options(**SqlServerloadOptions).load()
SqlServerDF.printSchema()
SqlServerDF.show()

To save a DataFrame named NewSqlServerDF in your notebook back to SQL Server, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

SqlServersaveoption = {
                     Connectors.SqlServer.HOST              : '***********',
                     Connectors.SqlServer.PORT              : '***********',
                     Connectors.SqlServer.DATABASE          : '***********',
                     Connectors.SqlServer.USERNAME          : '***********',
                     Connectors.SqlServer.PASSWORD          : '***********',
                     Connectors.SqlServer.TARGET_TABLE_NAME : '***********',
                     Connectors.SqlServer.TARGET_TABLE_ACTION : 'merge'}

NewSqlServerDF = SqlServerDF.write.format("com.ibm.spark.discover").options(**SqlServersaveoption).save()

MySQL

To load data into a DataFrame named MySQLDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

MySQLloadOptions = {
                     Connectors.MySQL.HOST              : '***********',
                     Connectors.MySQL.PORT              : '***********',
                     Connectors.MySQL.DATABASE          : '**********',
                     Connectors.MySQL.USERNAME          : '***********',
                     Connectors.MySQL.PASSWORD          : '***********',
                      Connectors.MySQL.SOURCE_TABLE_NAME         : '***********'}

MySQLDF = sqlContext.read.format("com.ibm.spark.discover").options(**MySQLloadOptions).load()
MySQLDF.printSchema()
MySQLDF.show()

To save a DataFrame named NewMySQLDF in your notebook back to MySQL, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

MySQLsaveoption = {
                     Connectors.MySQL.HOST              : '***********',
                     Connectors.MySQL.PORT              : '***********',
                     Connectors.MySQL.DATABASE          : '***********',
                     Connectors.MySQL.USERNAME          : '***********',
                     Connectors.MySQL.PASSWORD          : '***********',
                     Connectors.MySQL.TARGET_TABLE_NAME : '***********',
                     Connectors.MySQL.TARGET_TABLE_ACTION : 'merge'}

NewMySQLDF = MySQLDF.write.format("com.ibm.spark.discover").options(**MySQLsaveoption).save()

Oracle

To load data into a DataFrame named OracleDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

OracleloadOptions = {
                     Connectors.Oracle.HOST              : '***********',
                     Connectors.Oracle.PORT              : '***********',
                     Connectors.Oracle.SID               : '***********',
                     Connectors.Oracle.SERVICE_NAME      : '***********',
                     Connectors.Oracle.USERNAME          : '***********',
                     Connectors.Oracle.PASSWORD          : '***********',
                      Connectors.Oracle.SOURCE_TABLE_NAME         : '***********'}

OracleDF = sqlContext.read.format("com.ibm.spark.discover").options(**OracleloadOptions).load()
OracleDF.printSchema()
OracleDF.show()

To save a DataFrame named NewOracleDF in your notebook back to Oracle, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

Oraclesaveoption = {
                     Connectors.Oracle.HOST              : '***********',
                     Connectors.Oracle.PORT              : '***********',
                     Connectors.Oracle.SID               : '***********',
                     Connectors.Oracle.SERVICE_NAME      : '***********',
                     Connectors.Oracle.USERNAME          : '***********',
                     Connectors.Oracle.PASSWORD          : '***********',
                     Connectors.Oracle.TARGET_TABLE_NAME : '***********',
                     Connectors.Oracle.TARGET_TABLE_ACTION : 'merge'}

NewOracleDF = OracleDF.write.format("com.ibm.spark.discover").options(**Oraclesaveoption).save()

PostgreSQL

To load data into a DataFrame named PostgreSQLDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

PostgreSQLloadOptions = {
                     Connectors.PostgreSQL.HOST              : '***********',
                     Connectors.PostgreSQL.PORT              : '***********',
                     Connectors.PostgreSQL.DATABASE          : '***********',
                     Connectors.PostgreSQL.USERNAME          : '***********',
                     Connectors.PostgreSQL.PASSWORD          : '***********',
                      Connectors.PostgreSQL.SOURCE_TABLE_NAME         : '***********'}

PostgreSQLDF = sqlContext.read.format("com.ibm.spark.discover").options(**PostgreSQLloadOptions).load()
PostgreSQLDF.printSchema()
PostgreSQLDF.show()

To save a DataFrame named NewPostgreSQLDF in your notebook back to PostgreSQL, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

PostgreSQLsaveoption = {
                     Connectors.PostgreSQL.HOST              : '***********',
                     Connectors.PostgreSQL.PORT              : '***********',
                     Connectors.PostgreSQL.DATABASE          : '***********',
                     Connectors.PostgreSQL.USERNAME          : '***********',
                     Connectors.PostgreSQL.PASSWORD          : '***********',
                     Connectors.PostgreSQL.TARGET_TABLE_NAME : '***********',
                     Connectors.PostgreSQL.TARGET_TABLE_ACTION : 'merge'}

NewPostgreSQLDF = PostgreSQLDF.write.format("com.ibm.spark.discover").options(**PostgreSQLsaveoption).save()

Salesforce

To load data into a DataFrame named SalesforceDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

SalesforceloadOptions = {
                     Connectors.Salesforce.USERNAME          : '***********',
                     Connectors.Salesforce.PASSWORD          : '***********',
                      Connectors.Salesforce.SOURCE_TABLE_NAME         : '***********'}

SalesforceDF = sqlContext.read.format("com.ibm.spark.discover").options(**SalesforceloadOptions).load()
SalesforceDF.printSchema()
SalesforceDF.show()

To save a DataFrame named NewSalesforceDF in your notebook back to Salesforce, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

Salesforcesaveoption = {
                     Connectors.Salesforce.USERNAME          : '***********',
                     Connectors.Salesforce.PASSWORD          : '***********',
                     Connectors.Salesforce.TARGET_TABLE_NAME : '***********',
                     Connectors.Salesforce.TARGET_TABLE_ACTION : 'append'}

NewSalesforceDF = SalesforceDF.write.format("com.ibm.spark.discover").options(**Salesforcesaveoption).save()

SAP IQ (formerly Sybase IQ)

To load data into a DataFrame named SybaseIQDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

SybaseIQloadOptions = {
                     Connectors.SybaseIQ.HOST              : '***********',
                     Connectors.SybaseIQ.PORT              : '***********',
                     Connectors.SybaseIQ.DATABASE          : '***********',
                     Connectors.SybaseIQ.USERNAME          : '***********',
                     Connectors.SybaseIQ.PASSWORD          : '***********',
                      Connectors.SybaseIQ.SOURCE_TABLE_NAME         : '***********'}

SybaseIQDF = sqlContext.read.format("com.ibm.spark.discover").options(**SybaseIQloadOptions).load()
SybaseIQDF.printSchema()
SybaseIQDF.show()

To save a DataFrame named NewSybaseIQDF in your notebook back to SAP IQ, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

SybaseIQsaveoption = {
                     Connectors.SybaseIQ.HOST              : '***********',
                     Connectors.SybaseIQ.PORT              : '***********',
                     Connectors.SybaseIQ.DATABASE          : '***********',
                     Connectors.SybaseIQ.USERNAME          : '***********',
                     Connectors.SybaseIQ.PASSWORD          : '***********',
                     Connectors.SybaseIQ.TARGET_TABLE_NAME : '***********',
                     Connectors.SybaseIQ.TARGET_TABLE_ACTION : 'append'}

NewSybaseIQDF = SybaseIQDF.write.format("com.ibm.spark.discover").options(**SybaseIQsaveoption).save()

SAP Sybase

To load data into a DataFrame named SybaseDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

SybaseloadOptions = {
                     Connectors.Sybase.HOST              : '***********',
                     Connectors.Sybase.PORT              : '***********',
                     Connectors.Sybase.DATABASE          : '***********',
                     Connectors.Sybase.USERNAME          : '***********',
                     Connectors.Sybase.PASSWORD          : '***********',
                      Connectors.Sybase.SOURCE_TABLE_NAME         : '***********'}

SybaseDF = sqlContext.read.format("com.ibm.spark.discover").options(**SybaseloadOptions).load()
SybaseDF.printSchema()
SybaseDF.show()

To save a DataFrame named NewSybaseDF in your notebook back to Sybase, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

Sybasesaveoption = {
                     Connectors.Sybase.HOST              : '***********',
                     Connectors.Sybase.PORT              : '***********',
                     Connectors.Sybase.DATABASE          : '***********',
                     Connectors.Sybase.USERNAME          : '***********',
                     Connectors.Sybase.PASSWORD          : '***********',
                     Connectors.Sybase.TARGET_TABLE_NAME : '***********',
                     Connectors.Sybase.TARGET_TABLE_ACTION : 'append'}

NewSybaseDF = SybaseDF.write.format("com.ibm.spark.discover").options(**Sybasesaveoption).save()

Softlayer Object Storage

To load data into a DataFrame named softlyobjDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

softlayerobjloadoptions = {
    Connectors.SoftLayerObjectStorage.ACCESS_KEY          : '***********',
    Connectors.SoftLayerObjectStorage.SECRET_KEY          : '***********',
    Connectors.SoftLayerObjectStorage.URL                 : '***********',
    Connectors.SoftLayerObjectStorage.SOURCE_CONTAINER    : '***********',
    Connectors.SoftLayerObjectStorage.SOURCE_FILE_NAME    : '****.avro',
    Connectors.SoftLayerObjectStorage.SOURCE_FILE_FORMAT  : 'avro',
    Connectors.SoftLayerObjectStorage.SOURCE_INFER_SCHEMA : '1'  }

softlyobjDF =  sqlContext.read.format("com.ibm.spark.discover").options(**softlayerobjloadoptions).load()
softlyobjDF.printSchema()
softlyobjDF.show()

To save a DataFrame named NewsoftlyobjDF in your notebook back to Softlayer Object Storage, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

softlayerobjsaveoptions = {
    Connectors.SoftLayerObjectStorage.ACCESS_KEY         : '***********',
    Connectors.SoftLayerObjectStorage.SECRET_KEY         : '***********',
    Connectors.SoftLayerObjectStorage.URL                : '***********',
    Connectors.SoftLayerObjectStorage.TARGET_CONTAINER   : '***********',
    Connectors.SoftLayerObjectStorage.TARGET_FILE_NAME   : '*******.avro',
    Connectors.SoftLayerObjectStorage.TARGET_FILE_FORMAT : 'avro',
    Connectors.SoftLayerObjectStorage.TARGET_WRITE_MODE  : 'write'}

NewsoftlyobjDF = softlyobjDF.write.format("com.ibm.spark.discover").options(**softlayerobjsaveoptions).save()

Teradata

To load data into a DataFrame named TeradataDF, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

TeradataloadOptions = {
                     Connectors.Teradata.HOST              : '***********',
                     Connectors.Teradata.PORT              : '***********',
                     Connectors.Teradata.DATABASE          : '***********',
                     Connectors.Teradata.USERNAME          : '***********',
                     Connectors.Teradata.PASSWORD          : '***********',
                      Connectors.Teradata.SOURCE_TABLE_NAME         : '***********'}

TeradataDF = sqlContext.read.format("com.ibm.spark.discover").options(**TeradataloadOptions).load()
TeradataDF.printSchema()
TeradataDF.show()

To save a DataFrame named NewTeradataDF in your notebook back to Teradata, copy this code into a code cell in your notebook and replace the asterisks with your information:

from ingest.Connectors import Connectors

Teradatasaveoption = {
                     Connectors.Teradata.HOST              : '***********',
                     Connectors.Teradata.PORT              : '***********',
                     Connectors.Teradata.DATABASE          : '***********',
                     Connectors.Teradata.USERNAME          : '***********',
                     Connectors.Teradata.PASSWORD          : '***********',
                     Connectors.Teradata.TARGET_TABLE_NAME : '***********',
                     Connectors.Teradata.TARGET_TABLE_ACTION : 'append'}

NewTeradataDF = TeradataDF.write.format("com.ibm.spark.discover").options(**Teradatasaveoption).save()