Create and manage a data transfer
  • 11 Jul 2024
  • 13 Minutes to read
  • PDF

Create and manage a data transfer

  • PDF

Article summary

Bobsled offers multiple ways to ensure your data is correctly and efficiently replicated across different types of destinations. When sharing your data with Bobsled, whether it’s from a file storage or a cloud data warehouse source, it’s useful to keep in mind which replications patterns are available to ensure your data reaches your consumers.

This article will describe the steps to create a data transfer within a share, and how to monitor it.


Prerequisites

  • A Share must be created.

  • In order to successfully create a data transfer in a share, you must have at least one Data source preconfigured in Bobsled.

  • If you’re sharing data to a Bobsled-managed destination, all you need do is to pick where you want your data to be shared. Sharing to an externally managed destination may need more details before you are ready to start creating a transfer. Check our supported Destinations for more information.

NOTE:
What you can see and do will differ based on your role and permissions.


Data transfer setup instructions

Creating a data transfer in Bobsled comprises of three main steps:

  1. Choose data to share: Choose paths from a File storage sources or choose objects from a Cloud Data Warehouse sources

  2. Configure loading/replication patterns: Configure how Bobsled should load/replicate your data—not applicable for Files Storage source to File Storage destinations

  3. Review data transfer: Set sync intervals and schedule

These will vary depending on your source–destination combination and will render a different morphology to guarantee maximum control and flexibility.


File Storage sources to Cloud Data Warehouse destinations

When selecting data from your source, Bobsled will take each unique selection and map it to an individual table.

  • By default, if you select 3 paths, 3 tables will be loaded to the destination.

  • Bobsled generates the default name of the table based on your source selection. E.g., S3://source_path/data_package_1/data_2020/ will translate into one table, named data_2020.

Step 1: Choose data

Select the source paths you want to share with a consumer.  

  1. Click the Create transfer button

  2. Choose your source paths that you want to share

  3. Once you have at least one path selected, click Continue

NOTE:

You can drill down on your paths for more granular selection, depending on your folder hierarchy. Bobsled seeks to replicate the source tables as a well-structured folder with files.

TIP:

If you experience a delay between your source data and what Bobsled has indexed, click the refresh icon in this view.

Step 2: Configure tables for destination

Bobsled will map each source selection to a table in a given destination.

  1. Bobsled will infer the file format and schema of your File Storage source selection

    • If the schema inference fails, you will be presented with an option to choose another file format or click the try again button to re-infer again

TIP:

If the issue persists, you may have conflicting formats, or the source folder may be empty. Please reach out to your account team so they can assist you.

  1. Choose loading pattern. By default, Bobsled sets the Append only pattern. Select the dropdown and choose the pattern best suited to your needs. Learn more on each loading patterns by following our Transferring files to cloud data warehouse destinations guide.

TIP:

If the schema inference process hasn’t finished, you will not be able to complete setting up alternative loading patterns.

  1. Optionally change the name of the destination table.

NOTE:

If you give a Destination table the same name for more than one path selection within a Share, these will be merged in the destination. In order to ensure it succeeds, these must share the same loading patterns and schema. The Bobsled Application will alert you if these conditions are not met.

TIP:

Bobsled extracts the name based on your source selection, and will assist you with the correct naming format, providing feedback if some characters or formats aren’t accepted by the destination.

  1. Optionally click on View schema, and click on Advance settings to set Clustering keys for the destination table

    • You can observe the schema Bobsled has inferred, translated into Bobsled data types, and optionally override it

    • You can set clustering keys for the destination table. Please note this will be applied differently by Cloud Data Warehouse destinations

  2. Once done and no errors are observed, click on continue

Step 3: Review

Review the data transfer configuration.

  1. Review your selection, schema and data loading preferences

  2. Optionally change the transfer interval and set when it starts syncing, learn more in the Sync preferences and transfer scheduling guide

  3. Click Save transfer


File Storage sources to File Storage destinations

When selecting data from your source, Bobsled will, by default, mirror the contents of your File Storage source and load it into the File storage destination.

NOTE:

Bobsled doesn’t support File storage sources to File to destination loading configurations. Learn more in Transferring files to file storage destinations.

Step 1: Choose data

Select the source paths you want to share with a consumer.

  1. Click the Create transfer button

  2. Choose your source paths that you want to share.

  3. Once you have at least one path selected, click Continue.

NOTE:

You can drill down on your paths for more granular selection, depending on your folder hierarchy. Bobsled seeks to replicate the source tables as a well-structured folder with files.

TIP:

If you experience a delay between your source data and what Bobsled has indexed, click the refresh icon in this view.

Step 2: Review

Review the data transfer configuration.

  1. Review your selection, schema and data loading preferences

  2. Optionally change the transfer interval and set when it starts syncing, learn more in the Sync preferences and transfer scheduling guide

  3. Click Save transfer


Cloud Data Warehouse sources to Cloud Data Warehouse destinations

When selecting data from your source, Bobsled will take each unique object selection and map it into an individual table.

  • Bobsled generates the default name of the table based on your source selection. E.g., SNOWFLAKE_DATABASE.SNOWFAKE_SCHEMA.TABLE_2020 will translate into one table, named table_2020

Step 1: Choose data

Select the source objects you want to share with a consumer.

  1. Click the Create transfer button

  2. Choose your source objects that you want to share.

  3. Once you have at least one object selected, click Continue.

Step 2: Configure tables for destination

Bobsled maps 1–1 your source objects to tables in the destination.

  1. Choose loading pattern. By default, Bobsled sets the Full-table replication pattern. Select the dropdown and choose the pattern best suited to your needs. Learn more on each loading patterns by following our Transferring tables to cloud data warehouse destinations guide.

  2. Optionally change the name of the destination table.

NOTE:

Bobsled extracts the name based on your source selection, and will assist you with the correct naming format, providing feedback if some characters or formats aren’t accepted by the destination.

  1. Optionally click on View schema, set a back-fill, and click on Advance settings to set Clustering keys for the destination table

    • You can observe the schema Bobsled has inferred, translated into Bobsled data types, and optionally override it

    • The first load is effectively a back-fill, this interaction will be disabled in the first sync of your transfer. Learn more

    • You can set clustering keys for the destination table. Please note this will be applied differently by Cloud Data Warehouse destinations

  2. Once done and no errors are observed, click on continue

Step 3: Review

Review the data transfer configuration.

  1. Review your selection, schema and data loading preferences

  2. Optionally change the transfer interval and set when it starts syncing, learn more in the Sync preferences and transfer scheduling guide

  3. Click Save transfer


Cloud Data Warehouse sources to File Storage destinations

When selecting data from your source, Bobsled will take each unique object selection and map it into an individual folder.

  • Bobsled generates the default name of the folder based on your source selection. E.g., SNOWFLAKE_DATABASE.SNOWFAKE_SCHEMA.TABLE_2020 will translate into one folder, named .../table_2020/

Step 1: Choose data

Select the source objects you want to share with a consumer.

  1. Click the Create transfer button

  2. Choose your source objects that you want to share.

  3. Once you have at least one object selected, click Continue.

Step 2: Configure tables for destination

Bobsled maps 1–1 your source objects to folders in the destination.

  1. Choose loading pattern. By default, Bobsled sets the Full-table replication pattern. Select the dropdown and choose the pattern best suited to your needs. Learn more on each loading patterns by following our Transferring tables to file storage destinations guide.

  2. Choose format. By default, Bobsled sets your the option Parquet (Snappy). Learn more about which formats are supported.

  3. Optionally change the name of the destination folder.

NOTE:

Bobsled extracts the name based on your source selection, and will assist you with the correct naming format, providing feedback if some characters or formats aren’t accepted by the destination.

TIP:

You can change your default file format by following the sidebar and clicking on Environment, scroll down to Data delivery preferences and click on the edit (pencil) icon next to File format default.  

  1. Optionally set File format and Data Delivery preferences; View schema, and set a back-fill.

    • You can configure the file format in which Bobsled writes to the destination.

    • You can configure how Bobsled writes to the destination bucket. Learn more about the folder structure.

    • You can observe the schema Bobsled has inferred.

    • The first load is effectively a back-fill, this interaction will be disabled in the first sync of your transfer. Learn more.

  2. Once done and no errors are observed, click on continue

Step 3: Review

Review the data transfer configuration.

  1. Review your selection, schema and data loading preferences

  2. Optionally change the transfer interval and set when it starts syncing, learn more in the Sync preferences and transfer scheduling guide

  3. Click Save transfer


Manage a data transfer

After the first data transfer has been set up, you will be provided with more information available at a glance:

  • Transfer ID: Unique identifier for the data transfer.

  • Details about:

    • Total number of entities being transferred

    • Last edit date and who was the author of the edit

  • Transfer status: In-detail information about the status of the Data transfer.

  • Access data button: Instructions on how to access the data. Check the Destinations guides on how to consume a data transfer.

NOTE:

The Access data button is only available after the first sync.

  • More (ellipsis) button:

    • Edit: Make changes to the data transfer. The data transfer must be paused in order to edit.

    • View transfer configuration: Renders a modal with information pertaining the data transfer configuration, as outlined in the review steps above.

Furthermore, Bobsled offers details and controls about the transfer:

  • Transfer interval: Displays what is the transfer interval set.

  • Pause transfer: Bobsled will not run any new syncs.

    • If the transfer is paused, the Resume transfer button is shown instead.

  • Sync now: Bobsled will trigger an ad hoc data transfer, with no interference on any transfer intervals.

NOTE:

You cannot force a Sync now if there is data currently being transferred.


Pause a data transfer

  1. Click on the Pause transfer. It will suspend the transfer sync interval, and no new transfers will happen automatically.

  2. To resume the transfer sync interval, click on Resume transfer.

NOTE:

Clicking Sync now on a paused data transfer will trigger a sync. This will not resume the transfer interval schedule.


Troubleshooting a data transfer

When a data transfer fails, Bobsled will relay details about the error or warning. Hover the error or warning message for more details and how to recover.

NOTE:

Bobsled will render the Failed status, regardless of the data transfer being paused on actively scheduled.


Edit a data transfer

In order to edit, you must first pause the data transfer.

  1. In the Share detail page, click Pause transfer

  2. Click on the more (ellipsis) button, and then click edit.

  1. Perform the desired changes, follow the wizard and click Resume transfer.


Restrictions

  • The maximum transfer size is 400,000 files.

  • The maximum number of tables transferred in one automation is 800.


Was this article helpful?