by Ramana Kumar Gunti
Data Sharing in Snowflake
In this article, we will talk about Snowflake data sharing which enables account-to-account sharing of data through Snowflake database tables, secure views, and secure UDFs. So let's get started.
Snowflake data sharing is a powerful yet simple feature to share the data from one account and to use the shared data from another account. The data producer can provide access to his live data within minutes without copying or moving the data to any number of data consumers. The data consumer can query the shared data from data producer without any performance bottle necks thanks to snowflakes multi-cluster shared data architecture.
Data sharing is only supported between Snowflake accounts. In few business scenarios where the data can be shared to non-Snowflake customer by using reader accounts feature. Reader accounts provide a quick, easy, and cost-effective way to share data without requiring the consumer to become a Snowflake customer.
A reader account can only consume data from the provider account that created it.
Traditional Data Share WorkflowIn Traditional Data Sharing, provider has to do following.
- Extract the data from Source System
- Encrypt the Data
- Share the data to respective data consumers using FTP/API/Cloud Storage
In contrast, the consumers have to decrypt the data, load it into their system before start using data. This will make the systems out of sync as the process of extracting and loading the data between systems takes time. Moreover, it will even make the process delayed when the process encounters failures while extracting and loading the data.
Legacy Data Share Challenges
- Handling increased data size
- Decrypting sensitive data
- Changing file formats and schema
- Sharing data in real time
- Cleaning data
Snowflake Data Share
Snowflake data sharing works due to its multi-cluster shared data architecture, any snowflake customer (data consumer) who wants to share data with someone else can give other organizations live access to their data over a secure connection. Rather than physically transferring the data to the consumer, the data remains in the provider's account and is visible and accessible to the consumer through SQL.
The data producer will pay only for snowflake storage and compute resource they use. The data consumer will pay for the compute resources used to query the shared data. When the object is shared with the data consumer the objects remains inside the producers account so no storage cost applies to the data consumers for the shared data unless they copy the data into a table.
Snowflake Data Share Advantages:
How data sharing works?
Snowflake allows granular control of access to the objects within the shared database through grants.
- Data Provider creates a share container with the objects (databases, schemas, tables, or views) to be shared
- Data Provider then grants a Data Consumer account access to the share
- Data Consumer creates new database that maps to the shared object(s)
- Data Consumer then grants access privileges to a role in their account
- Data Consumer starts querying, using the privileged role and their virtual warehouse
How to create share and grant access?
General data share limitations
- Data Sharing is only supported between accounts in the same Snowflake region
Snowflake allows to share the data between different regions when the data replicated in consumers region
- Secure data share is not allowed with different regions or cloud platforms when one or more external table exists as part of data share
- VPS (Virtual Private Snowflake) does not support secure data sharing across regions
- Shared databases are read-only. Consumers can view/query data, but cannot insert or update data, or create any objects in the database
- The following actions are not supported:
- Creating a clone of a shared database or any schemas/tables in the database.
- Time Travel for a shared database or any schemas/tables in the database.
- Editing the comments for a shared database
- Shared databases and all the objects in the database cannot be re-shared with other accounts
Ramana Kumar Gunti is a Manager at KPI Partners and has 15 Years of experience in Big Data echo systems, Modelling and Cloud Analytics. He has extensively worked on Big Data, Cloud data warehouse and cloud analytics projects using Hadoop, Snowflake, python and Airflow. He is very enthusiastic to learn and explore the new tools and technologies and be updated with all the features of the existing products.