Architecture: Snowflake

A data engineer kicks off a heavy quarterly report at 9 a.m. It scans two years of order history and takes four minutes to run. At the same moment, three hundred analysts open their morning dashboards, a nightly load job is still flushing yesterday's events into the warehouse, and a data scientist is training a churn model against the same tables. On a traditional warehouse, all of these workloads fight over the same fixed pool of CPU and disk. Somebody's query gets slow. Usually it's everybody's.

On Snowflake, none of them touch each other. The report runs on its own compute cluster, the dashboards on another, the load job on a third, the model training on a fourth. They all read the exact same data, and not one of them slows down the others. That property of independent, non-competing workloads over a single copy of data is the thing Snowflake was built to deliver, and the architecture is shaped entirely around it.

The big picture: three layers that scale on their own

Traditional databases come in two shapes. Shared-disk systems put all data in one central store that every compute node can reach, which keeps a single source of truth but turns the storage tier into a bottleneck. Shared-nothing systems give each node its own slice of the data, which scales compute well but couples storage to compute permanently: you cannot add query power without also moving data around.

Snowflake takes a hybrid position. There is one central data repository like shared-disk, and there are many independent compute clusters that process queries in parallel like shared-nothing, but storage and compute are fully decoupled. The platform is organized into three layers, and the central design decision is that each layer scales independently of the others.

Everything below is a detailed tour of these three layers, the data structures underneath them, how a real query travels through the stack, and the newer layer Snowflake has built on top: Snowpark, Iceberg tables, and Cortex AI.

The storage layer

When you load data into Snowflake, you never see a file. You write SQL against tables, and underneath, Snowflake takes ownership of how that data is physically stored. It reorganizes incoming rows into an internal columnar format, compresses them, and writes the result into the object storage of whichever cloud you are running on: S3 on AWS, Blob Storage on Azure, or Cloud Storage on GCP.

You do not manage this storage. You do not pick file sizes, choose compression codecs, or run a vacuum job. Snowflake handles organization, compression, and the metadata index behind the scenes, and bills you separately for the storage you consume. Because storage lives in cloud object stores rather than on the compute nodes, it is effectively unlimited and it persists whether or not any compute is running.

Micro-partitions: the unit that makes it fast

The single most important structure in Snowflake is the micro-partition. As data lands in a table, Snowflake automatically divides it into contiguous units of storage, each holding roughly 50 to 500 MB of uncompressed data, stored compressed in a proprietary columnar format. A large table is not a handful of partitions you defined; it is potentially millions of these tiny units, created and managed without any input from you.

Two properties of micro-partitions matter enormously. First, they are columnar: data is stored column by column inside each unit, so an analytical query that selects three columns out of fifty reads only those three. Second, they are immutable. A micro-partition is never edited in place. An update writes a new micro-partition and marks the old one as superseded. That immutability is exactly what makes time travel and zero-copy cloning possible later on.

For every micro-partition, Snowflake records metadata: the range of values for each column, the number of distinct values, counts, and null counts. This metadata lives in the cloud services layer, not in the data files, and it is the engine behind partition pruning. When a query filters on order_date, the optimizer reads the min/max range for that column on each micro-partition and skips every partition that cannot possibly contain matching rows. On a table with hundreds of millions of micro-partitions, a well-filtered query might read a few dozen. The data that gets skipped is never fetched from storage at all.

Pruning is also why some queries return almost instantly. A COUNT(*) or a MAX(column) can often be answered directly from micro-partition metadata without scanning any data, because the answer is already recorded in the counts and ranges the cloud services layer holds.

Clustering

Pruning only helps when the values you filter on are physically grouped together. If orders for a single day are scattered across thousands of micro-partitions, the min/max ranges overlap and almost nothing can be skipped. Data loaded in date order naturally clusters by date, so time-range queries prune beautifully. For very large tables where the natural load order does not match the query pattern, you can define a clustering key, and Snowflake will reorganize micro-partitions in the background to keep related values together. Clustering is a maintenance cost paid in compute, so it is worth it only on large tables with predictable filter columns.

The compute layer: virtual warehouses

A virtual warehouse is a cluster of compute nodes that executes queries. It is the only part of Snowflake that does actual query work: scanning micro-partitions, joining, aggregating, sorting. Warehouses come in t-shirt sizes from X-Small upward, where each size step roughly doubles the node count and roughly doubles both the speed and the cost per second.

The defining property is that warehouses are independent and ephemeral. You can start one, resize it, suspend it, or clone it, and none of these actions touch the stored data or interfere with any other warehouse. Billing is measured in Snowflake credits consumed per second of run time, so a warehouse that is suspended costs nothing for compute. This is what lets you give each workload its own warehouse: the morning reports get a Large, the dashboards get a Small, and they never compete because they are literally different clusters reading the same storage.

Scaling up versus scaling out

There are two different ways to add power, and they solve different problems. Scaling up means resizing a warehouse to a larger size so a single complex query runs faster, because it has more nodes to parallelize across. Scaling out means a multi-cluster warehouse that automatically adds identical clusters when concurrency rises, so a thousand simultaneous users do not queue behind each other. Scaling up fixes slow queries; scaling out fixes crowded ones. A multi-cluster warehouse can spin clusters up and back down on demand, and you pay only for clusters that are actually running.

Caching

Warehouses keep a local disk cache of the micro-partitions they have recently read. A query that reuses data its warehouse already pulled can skip the trip back to object storage and run noticeably faster, which is why a warehouse warms up after its first few queries. Separately, the cloud services layer holds a result cache: if an identical query runs again and the underlying data has not changed, Snowflake returns the stored result in milliseconds without using a warehouse at all. There is also the metadata cache that answers count and min/max queries straight from micro-partition statistics. Three caches, three layers, each cutting a different kind of repeated work.

The cloud services layer

The cloud services layer is the brain that coordinates everything else. It is a collection of always-on services, running on compute that Snowflake manages for you, that handle every part of a request that is not raw query execution. When you sign in, submit SQL, or share data, you are talking to this layer first.

Its responsibilities are wide. Authentication and access control decide who you are and what you can see. Query optimization parses your SQL, builds a plan, and decides which micro-partitions to prune and which warehouse work to dispatch. Metadata management holds the catalog of tables, the micro-partition statistics, and the clustering information that pruning depends on. Transaction management guarantees that concurrent reads and writes stay consistent, so an analyst never sees a half-finished load. Infrastructure management provisions and tears down warehouse nodes behind the scenes.

Because this layer owns the metadata rather than the data, several signature Snowflake features are really just clever metadata operations. Zero-copy cloning creates a full copy of a table, schema, or database instantly, because it copies only metadata pointers to existing immutable micro-partitions; new storage is consumed only when the clone diverges. Time travel lets you query a table as it existed minutes or days ago, because the old micro-partitions still exist and the metadata remembers which ones were current then. Secure data sharing grants another account read access to your micro-partitions without copying anything, because sharing is a grant in the services layer, not a data transfer.

The request lifecycle: following one query end to end

Putting the layers together, here is what happens when an analyst runs a filtered aggregate over a billion-row table.

The order matters. Optimization and pruning happen in the services layer before any compute spins up, so the warehouse is handed a plan that already knows it only needs a fraction of the table. The expensive part, reading data, is the last thing to happen and the most aggressively minimized. A query over a billion rows that filters down to one week of data may physically scan only the handful of micro-partitions covering that week.

Failure modes and fault tolerance

The decoupled design also shapes how Snowflake survives failure. Because storage lives in cloud object storage, durability is inherited from services like S3, which replicate every object across multiple availability zones. Data does not live on warehouse nodes, so losing a node loses no data. If a node in a running warehouse fails mid-query, the work it was doing is rescheduled onto healthy nodes, and because the data sits in shared storage, any node can pick up any micro-partition.

The cloud services layer runs as a redundant, multi-instance service across availability zones, so the loss of one instance does not take down authentication or query planning. For larger disasters, Snowgrid is Snowflake's cross-region and cross-cloud layer that replicates databases, pipelines, and account objects to another region, so an entire region outage can fail over to a replica elsewhere. The trade-off worth understanding is that the cloud services layer is a shared dependency: it is highly available by design, but it is the one part of the stack you do not run yourself, so its availability is Snowflake's responsibility rather than something you tune.

The layer on top: Snowpark, Iceberg, and Cortex

For its first several years Snowflake was a SQL data warehouse. The three-layer core has not changed, but a programmability and AI layer now sits on top of it, and the architecture absorbed these additions by extending the compute and storage layers rather than redesigning them.

Snowpark

Snowpark is a set of libraries that let you write data transformations in Python, Java, or Scala instead of SQL, and have them execute inside Snowflake's compute layer. The code runs on the same virtual warehouses, next to the data, so large datasets never leave the platform to be processed elsewhere. For workloads that need full container control, Snowpark Container Services runs your own Docker images on Snowflake-managed compute, which is how custom models and non-SQL services get deployed inside the same security perimeter.

Iceberg tables

Snowflake's native micro-partition format is proprietary and closed. Apache Iceberg is an open table format, and Snowflake's Iceberg tables let you keep data in Iceberg's open Parquet-plus-metadata layout in your own cloud storage while still querying it with Snowflake's engine. The motivation is openness: organizations that need data to be readable by other engines, such as Spark, or that face regulatory requirements to avoid vendor lock-in, can store the data once in an open format and let multiple tools read it. You give up some of the optimizations of the native format in exchange for portability, and the same compute and services layers operate over both kinds of table.

Cortex AI

Cortex AI is Snowflake's AI layer, and its architectural significance is where the inference runs: directly inside Snowflake, next to the data, invoked from SQL. A function like AI_COMPLETE runs large language model inference as part of a query, with no external API call and no data leaving the platform. Around it sits a family of capabilities: Cortex Search for hybrid retrieval over your own documents, Cortex Agents for multi-step autonomous tasks over governed data, and Cortex Guard for filtering sensitive content out of model outputs. The recurring theme is compute-to-data: rather than shipping governed data out to an AI service, Snowflake brings the model to the data so the same security and governance perimeter still applies.

Tying these together is the **Dynamic Table**, a declarative pipeline primitive where you define the target query and Snowflake incrementally keeps it up to date, which is how feature engineering and continuous transformations are expressed without hand-written orchestration.

The full picture

Reading the diagram top to bottom traces a single idea. Data flows in through ingestion, the services layer governs and optimizes every request, independent warehouses do the compute, one shared copy of storage holds everything, and an AI layer runs inference next to the data. Nothing in the stack forces one workload to wait on another.

Summary

Snowflake's architecture is the answer to one question: how do you let unlimited, unrelated workloads run over a single copy of data without any of them slowing each other down? The answer is to split the system into three layers that scale independently. Storage holds one columnar copy of the data as immutable, metadata-rich micro-partitions. Compute is many ephemeral virtual warehouses, each its own isolated cluster, billed by the second. Cloud services is the always-on brain that authenticates, optimizes, and crucially holds the metadata that turns a query over a billion rows into a read of a few micro-partitions.

Almost every feature people associate with Snowflake follows from those three decisions. Per-workload warehouses come from decoupling compute from storage. Instant cloning and time travel come from immutable micro-partitions plus metadata in the services layer. Fast queries come from pruning before any data is read. And the newer Snowpark, Iceberg, and Cortex layers extend the same core rather than replacing it, running code and AI inference on the existing compute next to the existing data. Understand the three layers and the micro-partition, and the rest of the platform stops looking like a list of features and starts looking like consequences of one good idea.

Related on this blog: Architecture series