Google BigQuery
  • 04 Jul 2024
  • 2 Minutes to read
  • PDF

Google BigQuery

  • PDF

Article summary

Google BigQuery (GBQ) is one of many delivery destinations that Bobsled supports. When delivering data from cloud object storage, Bobsled will turn the selected folders into tables in BigQuery. To facilitate data sharing, Bobsled leverages Google Analytics Hub data exchanges. Each Google principal granted access to the Bobsled share are authorized access to the shared datasets in the data exchange. Data consumers can access and subscribe to the data exchange's listing, enabling them to perform data queries within their own projects.


Bobsled-managed Google BigQuery

To learn how to configure a Google BigQuery destination in Bobsled, please visit Bobsled-managed GBQ setup guide.

Authorization

Bobsled requires a consumers Google principal(s) in order to grant access to the Analytics Hub data exchange. To learn more about the Google BigQuery sharing identifier used within Bobsled please visit Account Access Identifiers in Google Cloud.more, visit the advanced destination table settings section.


Bobsled supports various advanced settings to further control how tables are delivered in BigQuery.

Clustering

Bobsled supports the setting of clustering ↗ in BigQuery, resulting in optimized tables for expected query patterns.

  • To set up clustering, access the advanced settings icon on the right side of the table configuration screen.

  • Each table can have one cluster key configuration.

  • The order of the selected keys during setup is important, and Bobsled will respect that order.

Clustering can also be set using the tableSettings property on a share using the API ↗

TIP:
If you are interested in using clustering to deliver optimized tables to your consumers but need assistance with the setup, please reach out to your account team.

Datatype override

Bobsled offers the capability to override a column's data type in your source schema with a different data type in the destination table. This functionality is primarily used for certain geospatial data types, which are availabe in BigQuery but not specifiable in Parquet. If you wish to leverage data type overriding, please reach out to your account team.

TIP:
If you wish to leverage data type overriding, please reach out to your account team.

Schema migration support

When new columns are added to tables or files, Bobsled efficiently handles schema migrations by adding new columns to existing tables without disrupting deliveries.

  • When new columns are introduced, they're seamlessly integrated, and any missing data in these columns is defaulted to null values.

    • This approach ensures that data loading continues smoothly, even with schema changes, preventing load failures and maintaining data integrity.

  • Our schema migration strategy is designed for flexibility and reliability during data structure evolution.

  • When columns aren't present in new files, the values for missing columns is set to null.


Consuming a data transfer

Once you’ve configured your destination in a share, granted access to a consumer, and transferred data, learn how to consume a data transfer in Google BigQuery


Was this article helpful?