- 25 Apr 2023
- 1 Minute to read
- Print
- DarkLight
- PDF
What is Data Sharing?
- Updated on 25 Apr 2023
- 1 Minute to read
- Print
- DarkLight
- PDF
Data sharing is an essential component of the way organizations consume, distribute and monetize data. Data sharing is the process by which teams enable others inside and outside of their organization to access valuable analytical data.
Traditional forms of data sharing involved intensive replication and integration of data. The sources of data (providers) typically replicated a dataset and shared the duplicate file through a secure intermediary like a SFTP server. In the past decade, providers have also built bulk APIs that allow consumers to request specific data programmatically, but many of the same challenges remain. The consumer is responsible for extracting the data from the location, loading it into their workspace and then transforming the data into an analytics ready format.
Modern data sharing builds on new sharing technologies developed by the major platforms such as AWS, Snowflake, Databricks and others. With one version, in-place sharing, a provider uses a sharing protocol (often developed by a platform) to grant a consumer access to a data product via public identifier who then uses their own compute to transform the data as they see fit. The data appears in a ready-to-query format within the consumers main analytics platform (e.g. data lake like AWS S3 or data warehouse like Snowflake).
Element | Traditional Sharing | Modern Sharing |
Core Technologies |
|
|
Supporting Technologies |
|
|
Key Difference | Data is replicated and delivered to a consumer outside of their native analytical environment. | Access to the data is granted to the consumer in their native analytical environment. |
Typical Advantages |
|
|
Typical Disadvantages |
|
|