In Cloud Pak for Data as a Service you can access data stored in AWS S3 buckets through access points from a notebook.
Run the notebook in an environment in Cloud Pak for Data as a Service. Create an internet-enabled access point to connect to the S3 bucket.
Connecting to AWS S3 data through an internet-enabled access point
You can access data in an AWS S3 bucket through an internet-enabled access point in any AWS region.
To access S3 data through an internet-enabled access point:
-
Create an access point for your S3 bucket. See Creating access points.
Set the network origin to
.Internet
-
After the access point is created, make a note of the Amazon resource name (ARN) for the access point. Example:
. You will need to enter the ARN in your notebook.ARN: arn:aws:s3:us-east-1:675068711478:accesspoint/cust-data-bucket-internet-ap
Accessing AWS S3 data from your notebook
The following sample code snippet shows you how to access AWS data from your notebook by using an access point:
import boto3
import pandas as pd
# use an access key and a secret that has access to the bucket
access_key="..."
secret="..."
s3_client = boto3.client('s3', aws_access_key_id=access_key, aws_secret_access_key=secret)
#the Amazon resource name (ARN) of the access point
arn = "..."
# the file you want to retrieve
fileName="customers.csv"
response = s3_client.get_object(Bucket=arn, Key=fileName)
s3FileStream = response["Body"]
#for other file types, change the line below to use the appropriate read_() method from pandas
customerDF = pd.read_csv(s3FileStream)
Parent topic: Loading and accessing data in a notebook