Accessing data in AWS through access points from a notebook
In Cloud Pak for Data as a Service you can access data stored in AWS S3 buckets through access points from a notebook.
Run the notebook in an environment in Cloud Pak for Data as a Service. Create an internet-enabled access point to connect to the S3 bucket.
Connecting to AWS S3 data through an internet-enabled access point
You can access data in an AWS S3 bucket through an internet-enabled access point in any AWS region.
To access S3 data through an internet-enabled access point:
-
Create an access point for your S3 bucket. See Creating access points.
Set the network origin to
Internet
. -
After the access point is created, make a note of the Amazon resource name (ARN) for the access point. Example:
ARN: arn:aws:s3:us-east-1:675068711478:accesspoint/cust-data-bucket-internet-ap
. You will need to enter the ARN in your notebook.
Accessing AWS S3 data from your notebook
The following sample code snippet shows you how to access AWS data from your notebook by using an access point:
import boto3
import pandas as pd
# use an access key and a secret that has access to the bucket
access_key="..."
secret="..."
s3_client = boto3.client('s3', aws_access_key_id=access_key, aws_secret_access_key=secret)
#the Amazon resource name (ARN) of the access point
arn = "..."
# the file you want to retrieve
fileName="customers.csv"
response = s3_client.get_object(Bucket=arn, Key=fileName)
s3FileStream = response["Body"]
#for other file types, change the line below to use the appropriate read_() method from pandas
customerDF = pd.read_csv(s3FileStream)
Parent topic: Loading and accessing data in a notebook