Google BigQuery connection

Last updated: Apr 05, 2025

To access your data in Google BigQuery, create a connection asset for it.

Google BigQuery is a fully managed, serverless data warehouse that enables scalable analysis over petabytes of data.

Create a connection to Google BigQuery

To create the connection asset, choose an authentication method. Choices include an authentication with or without workload identity federation.

Without workload identity federation

Account key (full JSON snippet): The contents of the Google service account key JSON file
Client ID, Client secret, Access token, and Refresh token

With workload identity federation
You use an external identity provider (IdP) for authentication. An external identity provider uses Identity and Access Management (IAM) instead of service account keys. IAM provides increased security and centralized management. You can use workload identity federation authentication with an access token or with a token URL.

You can configure a Google BigQuery connection for workload identity federation with any identity provider that complies with the OpenID Connect (OIDC) specification and that satisfies the Google Cloud requirements that are described in Prepare your external IdP. The requirements include:

The identity provider must support OpenID Connect 1.0.
The identity provider's OIDC metadata and JWKS endpoints must be publicly accessible over the internet. Google Cloud uses these endpoints to download your identity provider's key set and uses that key set to validate tokens.
The identity provider is configured so that your workload can obtain ID tokens that meet these criteria:
- Tokens are signed with the RS256 or ES256 algorithm.
- Tokens contain an aud claim.

For examples of the workload identity federation configuration steps for Amazon Web Services (AWS) and Microsoft Azure, see .

Workload Identity Federation with access token connection details

Access token: An access token from the identity provider to connect to BigQuery.

Security Token Service audience: The security token service audience that contains the project ID, pool ID, and provider ID. Use this format:

//iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/POOL_ID/providers/PROVIDER_ID

For more information, see Authenticate a workload by using the REST API.

Service account email: The email address of the Google service account to be impersonated. For more information, see Create a service account for the external workload.
Service account token lifetime (optional): The lifetime in seconds of the service account access token. The default lifetime of a service account access token is one hour. For more information, see URL-sourced credentials.
Token format: Text or JSON with the Token field name for the name of the field in the JSON response that contains the token.
Token field name: The name of the field in the JSON response that contains the token. This field appears only when the Token format is JSON.
Token type: AWS Signature Version 4 request, Google OAuth 2.0 access token, ID token, JSON Web Token (JWT), or SAML 2.0.

Workload Identity Federation with token URL connection details

Security Token Service audience: The security token service audience that contains the project ID, pool ID, and provider ID. Use this format:

//iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/POOL_ID/providers/PROVIDER_ID

For more information, see Authenticate a workload using the REST API.

Service account email: The email address of the Google service account to be impersonated. For more information, see Create a service account for the external workload.
Service account token lifetime (optional): The lifetime in seconds of the service account access token. The default lifetime of a service account access token is one hour. For more information, see URL-sourced credentials.
Token URL: The URL to retrieve a token.
HTTP method: HTTP method to use for the token URL request: GET, POST, or PUT.
Request body (for POST or PUT methods): The body of the HTTP request to retrieve a token.
HTTP headers: HTTP headers for the token URL request in JSON or as a JSON body. Use format: "Key1"="Value1","Key2"="Value2".
Token format: Text or JSON with the Token field name for the name of the field in the JSON response that contains the token.
Token field name: The name of the field in the JSON response that contains the token. This field appears only when the Token format is JSON.
Token type: AWS Signature Version 4 request, Google OAuth 2.0 access token, ID token, JSON Web Token (JWT), or SAML 2.0.

Server proxy (optional)

Select Server proxy to access the Google BigQuery data source through an HTTPS proxy server. Depending on its setup, a proxy server can provide load balancing, increased security, and privacy. The proxy server settings are independent of the authentication credentials and the personal or shared credentials selection.

Proxy host: The hostname or IP addess of the HTTPS proxy server. For example, proxy.example.com or 192.0.2.0.
Proxy port: The port number to connect to the HTTPS proxy server. For example, 8080 or 8443.
Proxy username and Proxy password.

Other properties

Project ID (optional) The ID of the Google project.

Output JSON string format: JSON string format for output values that are complex data types (for example, nested or repeated).

Pretty: Values are formatted before sending them to output. Use this option to visually read a few rows.
Raw: (Default) No formatting. Use this option for the best performance.

Metadata discovery: The setting determines whether comments on columns (remarks) and aliases for schema objects such as tables or views (synonyms) are retrieved when assets are added by using this connection.

Permissions

The connection to Google BigQuery requires the following BigQuery permissions:

bigquery.job.create
bigquery.tables.get
bigquery.tables.getData

Use one of three ways to gain these permissions:

Use the predefined BigQuery Cloud IAM role bigquery.admin, which includes these permissions;
Use a combination of two roles, one from each column in the following table; or
Create a custom role. See Create and manage custom roles.

First role	Second role
`bigquery.dataEditor`	`bigquery.jobUser`
`bigquery.dataOwner`	`bigquery.user`
`bigquery.dataViewer`

For more information about permissions and roles in Google BigQuery, see Predefined roles and permissions.

Choose the method for creating a connection based on where you are in the platform

In a project: Click Assets > New asset > Connect to a data source. See Adding a connection to a project.
In a catalog: Click Add to catalog > Connection. See Adding a connection asset to a catalog.
In a deployment space: Click Import assets > Data access > Connection. See Adding data assets to a deployment space.
In the Platform assets catalog: Click New connection. See Adding platform connections.

Next step: Add data assets from the connection

Google BigQuery setup

Quickstart by using the Cloud Console

Configuring lineage metadata import for Google BigQuery

When you create a metadata import for the Google BigQuery connection, you can set options specific to this data source, and define the scope of data for which lineage is generated. For details about metadata import, see Designing metadata imports.

To import lineage metadata for Google BigQuery, complete these steps:

Create a data source definition. Select Google BigQuery as the data source type.
Create a connection to the data source in a project.
Create a metadata import. Learn more about options that are specific to Google BigQuery data source:
- When you define a scope, you can analyze the entire data source or use the include and exclude options to define the exact datasets and projects that you want to be analyzed. See Include and exclude lists.
- Optionally, you can provide external input in the form of a .zip file. You add this file in the Add inputs from file field. The file must have a supported structure. See External inputs.

Include and exclude lists

You can include or exclude assets up to the dataset level. Provide projects and datasets in the format project/dataset. Each part is evaluated as a regular expression. Assets which are added later in the data source will also be included or excluded if they match the conditions specified in the lists. Example values:

myProject/: all datasets in myProject,
myProject2/.*: all datasets in myProject2,
myProject3/myDataset1: myDataset1 from myProject3,
myProject4/myDataset[1-5]: any Dataset in my myProject4 with a name that starts with myDataset and ends with a digit between 1 and 5

External inputs

If you use external Google BigQuery SQL or job scripts, you can add them in a .zip file as an external input. You can organize the structure of a .zip file as subfolders that represent projects and datasets. After the scripts are scanned, they are added under respective projects and datasets in the selected catalog or project. The .zip file can have the following structure:

<project_id>
   <dataset_name>
      <script_name.sql>
<project_id>
        <script_name.sql> 
jobs
       <job_name.json>      
<script_name.sql>
replace.csv
connectionsConfiguration.prm

The replace.csv file contains placeholder replacements for the scripts that are added in the .zip file. For more information about the format, see Placeholder replacements.

The connectionsConfiguration.prm file contains database connection resource definitions used in federated queries. The file can have the followng structure:

[{Shortcut_Name}] Type={connection_type}
Connection_String={connection_string}
Server_Name={server_name}
Database_Name={database_name}
Schema_Name={schema_name}
User_Name={user_name}