- 17 Jul 2024
- 1 Minute to read
- PDF
What is data sharing?
- Updated on 17 Jul 2024
- 1 Minute to read
- PDF
Data sharing is an essential component of the way organizations consume, distribute and monetize data. Data sharing is the process by which teams enable others inside and outside of their organization to access valuable analytical data.
Traditional forms of data sharing involve intensive replication and integration of data. The sources of data (providers) typically replicated a dataset and shared the duplicate file through a secure intermediary like an SFTP server. In the past decade, providers have also built bulk APIs that allow consumers to request specific data programmatically, but many of the same challenges remain. The consumer is responsible for extracting the data from the location, loading it into their workspace, and then transforming the data into an analytics-ready format.
Modern data sharing builds on new sharing technologies developed by major platforms such as AWS, Snowflake, Databricks, and others. With one version, in-place sharing, a provider uses a sharing protocol (often developed by a platform) to grant a consumer access to a data product via a public identifier who then uses their computing to transform the data as they see fit. The data appears in a ready-to-query format within the consumer’s main analytics platform (e.g. data lake like AWS S3 or data warehouse like Snowflake).
Topics: | Traditional Sharing | Modern Sharing |
Core Technologies |
|
|
Supporting Technologies |
|
|
Key Difference | Data is replicated and delivered to a consumer outside of their native analytical environment. | Access to the data is granted to the consumer in their native analytical environment. |
Typical Advantages | Decades of development and product maturity | Limited integration burden on consumers |
Typical Disadvantages | Substantial “integration” burden on consumer | Limited by platform and often region |