Ultimate 2023 Guide to Snowflake Interview Mastery: 100+ Essential Questions & Answers for Both Novices and Experienced Professionals(Part:-1)
Snowflake has emerged as a leading cloud-based platform for storing, processing and analyzing large volumes of structured and semi-structured data. As a result, many companies are seeking to hire experienced Snowflake professionals to help manage and optimize their data operations.
If you are preparing for a Snowflake interview, it is essential to be familiar with the key concepts and features of the platform. In this post, we will cover the top 100 Snowflake interview questions and answers to help you prepare for your next interview.
1. What’s unique about snowflake architecture? How snowflake architecture is different from traditional databases (RDBMS like SQL Server, Oracle, My SQL)?
Snowflake is a cloud-based data warehousing platform that enables organizations to store, process, and analyze large volumes of structured and semi-structured data in real time. It differs from traditional data warehousing solutions in that it is built on a scalable, elastic, and multi-cluster architecture that allows it to handle massive data sets with ease using Cloud Computing technologies.
Snowflake has a three-layer architecture
Storage layer => that stores data in hybrid columnar format and saved it in the cloud
Query Processing => Virtual Warehouse to run queries/for compute resources (muscle)
Cloud Services => brain of snowflake architecture coordinates between storage and query processing layers
2. Why go with Snowflake instead of other Cloud Data warehouses like AWS Redshift, Azure Synapse, and Google Big Query?
- Snowflake is available on all cloud platforms like AWS, Azure and GCP (Multi-cloud). Other cloud Data warehouses were restricted to that one cloud example: Redshift will be available in only AWS, and Big Query will be available in GCP.
- Unique features like Data Market place, Data Exchange, Zero Copy Clone, Time travel, Data sharing, Streams for CDC
- Separate Storage and compute costs, Zero Administration/Maintenance (partitioning, data backups, index maintenance, and performance tuning will be taken care of by Snowflake itself).
- No software or hardware is required to be installed or upgraded as Snowflake is SaaS (Software as a service) whereas Synapse and BigQuery are PaaS (Platform as a service)
3. What is Virtual Warehouse?
To perform any operations, we require resources. In Snowflake, these resources are called virtual warehouses.
A virtual warehouse is a cluster of compute nodes/machines
A cluster is a group of nodes/machines connected together
Compute Resources is a combination of CPU + Memory+ Temporary storage
4. Different sizes of warehouses?
XS(1) will have one compute node/machines
S(2) will have 2 compute nodes, M(4), L(8), XL(16), 2XL(32), 3XL(64), 4XL(128) — generally available
5XL(256) and 6 XL(512) are in preview state
5. When we create a warehouse from UI, What is the default warehouse size?
XL(16) is the default warehouse size when created from UI
6. When we create a warehouse in the worksheet, what is the default warehouse size?
XS(1) is the default warehouse size when created in a worksheet
7. What is the default Auto suspend time in a Warehouse?
10 minutes is the default auto-suspend time in a warehouse
8. What is Auto Resume in Virtual Warehouse?
If the warehouse is in a suspended state and if we set Auto Resume as true then as soon as a query is run using this warehouse it will automatically get resumed
9. Can we increase or decrease the warehouse size at any time?
Yes, on the fly (instantly) it will get resized
10. What will happen to already running queries if we resize the warehouse?
Already running queries will still use the old warehouse size but subsequent queries in the queue will run with the resized warehouse
11. Can we start or stop the warehouse at any time?
Yes, suspend will stop it and resume will start it again
12. What is the difference between scale-up and scale-out in the warehouse?
Increasing the warehouse size is scale up and it’s to increase the performance of complex queries and large volumes of data (when remote disk spillage happens)
Adding clusters to a warehouse is scale out and it is to handle concurrency (parallel queries) issues
13. How to set a warehouse to auto-scale mode?
While creating a warehouse If you set max_cluster_count (10)> min_cluster_count(1) then the Virtual warehouse will be in auto-scale mode and it will start with the minimum number of clusters and scale out to maximum clusters based on workloads on demand
14. What is maximize mode?
When the max_cluster_count and min_cluster_count are given as the same number, then that warehouse is said to be running a maximized mode.
Example:-
Maximized Mode=> max_cluster_count(5)= min_cluster_count (5)
Snowflake starts all the clusters so that maximum resources are available while the warehouse is running.
15. What are horizontal scaling and vertical scaling?
Scale-up is named Horizontal scaling i.e, increasing the computing capacity
Scale-out is named as Vertical scaling i.e, increasing the number of clusters in a warehouse
16. Can we automate scale-up?
No, we have to manually resize(increase/decrease) the warehouse
Scaling out is an automated process taken care by Snowflake based on your workload
17. What is a scaling policy?
A scaling policy is used to determine when to start or shut down a warehouse.
Standard (default policy) — performance (starts immediately when a query is queued)
Economy — conserves credits (cluster starts only if it estimates there is enough query load to keep the cluster busy for at least 6 minutes)
18. What is time travel in Snowflake?
Time travel enables accessing historical data (i.e. data that has been changed or deleted) at any point within a defined period based on the retention period. If we drop a table, with time travel we can restore it if it is within the retention period
19. What’s the maximum retention period for permanent tables?
For the Standard edition, the maximum retention period is 1 day
For the Enterprise edition and above, the maximum retention period is 90 days
20. What’s the maximum retention period for temporary tables and transient tables?
The maximum Retention period for temporary tables and transient tables is 1 day
21. What’s the default retention period in Snowflake?
One day
22. What is a failsafe?
Fail-safe is a data recovery service intended only for use when all other recovery options have been attempted and offers protection of data in case of disaster.
23. Can we configure a failsafe period?
Its a Non-configurable 7 days period and can be defined only for permanent
24. Can we disable Fail-safe in Snowflake?
Fail-safe cannot be disabled on a table
25. Can we interact/read/access data from in failsafe?
No user interaction and recoverable only by Snowflake, We need to approach the Snowflake team to get data from failsafe
26. What are the differences between Time Travel and Fail-safe?
Time travel: Helps the user to access the historical data at any point within the retention period and restore the previous state of the table.
Fail-safe: internally used by Snowflake to restore the data during times of hardware failure/disaster, user can’t access/read data from Fail safe
After the Time travel period is over, the next 7 days of data will be stored in Snowflake
27. Will time travel contribute to additional storage costs?
Yes, since it has to maintain snapshots/historical data
28. Will failsafe contribute to additional storage costs?
Yes, since it has to maintain historical data for 7 days
29. Which scenario did you use time travel?
Restore data that got accidentally deleted using UNDROP.
Backing up of data from key points in the past i.e. data a week back, month back. Query data in the past that has since been updated or deleted.
30. What are the different ways of getting/accessing/querying data from Time travel?
It can be done in 3 ways
- By timestamp: check data at a specific point in time
SELECT * FROM my_table AT(timestamp => ‘Mon, 01 May 2021 08:00:00 -0700’::timestamp_tz);
SELECT * FROM my_table BEFORE(timestamp => ‘Mon, 01 May 2021 08:00:00 -0700’::timestamp_tz);
- By offset →In this example, we select the historical data from a table as of 15 minutes ago:
SELECT * FROM my_table AT(offset => -60*15);
SELECT * FROM my_table BEFORE(offset => -60*15);
- By query statement ID→ We can see the Query Statement ID in the history tab
SELECT * FROM my_table BEFORE(STATEMENT => ‘8e5d0ca9–005e-44e6-b858-a8f5b37c5726’);
SELECT * FROM my_table AT(STATEMENT => ‘8e5d0ca9–005e-44e6-b858-a8f5b37c5726’);
Accessing data with AT => Data at that point
Accessing data with BEFORE => Data before that point
Thank you for reading our guide on Snowflake interview questions. If you found this helpful, Don’t miss Part-2 with more advanced Snowflake interview questions. This article was first published on sqlskool.com. Keep an eye out for more updates, and best of luck!