Architecture: DB2 for z/OS

A payment transaction hits the mainframe. It needs to check an account balance, verify a credit limit, write an audit record, and update three tables atomically. All of this happens under a single unit of work, coordinated by a database that has been running at that bank since the 1980s. If any part fails, every change rolls back. If two transactions try to update the same row simultaneously, one waits. If the system crashes mid-transaction, the database recovers to a consistent state on restart.

That is DB2 for z/OS. It is the relational database engine of the mainframe world: the system that handles structured SQL workloads the way CICS handles transactional program execution. The two are deeply integrated. Most production CICS applications access their relational data through DB2, and DB2's recovery model is designed to participate in CICS's XA two-phase commit protocol.

This post assumes you have read Architecture: Mainframe and Architecture: VSAM. It covers how DB2 is structured internally, how it stores data, how it executes SQL, how it manages locking, and how it achieves high availability through data sharing on Parallel Sysplex.

The Big Picture

DB2 for z/OS (formally IBM Db2 for z/OS, currently at version 13) is not a single process. It runs as a set of cooperating z/OS address spaces, each responsible for a distinct function. Together they form a DB2 subsystem, identified by a four-character subsystem ID (SSID) such as `DSN1` or `DB1P`. A z/OS LPAR can run multiple DB2 subsystems, though most production environments run one per LPAR.

DB2 stores its data in VSAM LDS (Linear Data Set) datasets, organized into tablespaces. From the file system's perspective, a DB2 table is a set of VSAM datasets. From DB2's perspective, those datasets are a byte-addressable storage medium onto which DB2 imposes its own page structure, row layout, and index organization. Nothing between DB2 and the disk knows about tables or rows: that abstraction exists entirely within DB2.

The Address Spaces

DB2 runs across four primary address spaces, each with a distinct role.

SSAS: System Services Address Space (MSTR)

The SSAS, known by its started task name `MSTR`, is the control center of the DB2 subsystem. It manages the DB2 log, handles checkpoint processing, and coordinates subsystem startup and shutdown. It accepts connections from non-CICS callers such as TSO users and batch jobs. It also manages the execution of DB2 utilities: REORG, COPY, RUNSTATS, RECOVER, and LOAD all execute under MSTR's supervision. In a data sharing environment, MSTR is responsible for the SCA (Shared Communications Area) structure in the Coupling Facility, which carries group-wide status information.

DBAS: Database Services Address Space (DBM1)

The DBAS, known as `DBM1`, is the core SQL engine. It processes all SQL statements, manages the buffer pools, and performs the physical I/O to VSAM datasets. When a COBOL program issues a DB2 SQL call, control flows through a thin attachment layer into DBM1, which parses the request, locates the data via the buffer pool or disk I/O, applies locks through the IRLM, and returns the result. DBM1 is the address space that consumes the most CPU in a typical DB2 workload. In a data sharing environment, DBM1 is responsible for the Group Buffer Pools in the Coupling Facility.

IRLM: Internal Resource Lock Manager

The IRLM runs as its own address space and manages all DB2 locking. It maintains the in-memory lock table and communicates with the IRLM instances on other members in a data sharing group through the Coupling Facility's LOCK1 structure. The IRLM enforces both L-locks (logical locks: row, page, table, tablespace level) and P-locks (physical locks: cross-member ownership of buffer pool pages). Deadlock detection runs in the IRLM; when a deadlock is detected, it rolls back one of the participants.

DDF: Distributed Data Facility

The DDF handles all remote connections to DB2. It implements the DRDA (Distributed Relational Database Architecture) protocol over TCP/IP, accepting connections from JDBC drivers, ODBC drivers, and DB2 clients on other platforms. A Java application running on Linux can connect to DB2 for z/OS through DDF exactly as it would connect to any JDBC data source. DDF translates the DRDA protocol into internal DB2 calls and routes them into DBM1. For REST-based access, z/OS Connect sits in front of DDF and wraps SQL calls in a REST API layer.

Storage: Databases, Tablespaces, and Pages

DB2 organizes data in a strict hierarchy: database, tablespace, table, and row.

A database in DB2 is a namespace and administrative container. It groups related tablespaces and indexes. Unlike in other RDBMS systems, a DB2 database is not the unit of connection: applications connect to the DB2 subsystem and then reference objects by their fully qualified names.

A tablespace is the physical storage container. It maps to one or more VSAM LDS datasets on DASD. A tablespace contains one or more tables, stored as pages of a fixed size. Page sizes are 4 KB, 8 KB, 16 KB, or 32 KB; the choice is made at tablespace creation time and cannot be changed without recreating the tablespace. Each page size has its own buffer pool.

The dominant tablespace type in modern DB2 is the Universal Table Space (UTS), which comes in two forms:

Partition-by-range (PBR): the table is partitioned by a defined range of key values. Each partition maps to a separate VSAM dataset. A table with 50 partitions has 50 physical datasets. Partitioning enables parallel utilities, partition-level maintenance, and efficient range queries.
Partition-by-growth (PBG): DB2 automatically creates new partitions as the table grows. Useful when the data volume is hard to predict at design time.

Within a tablespace, each page holds a fixed header, one or more rows, and a slot directory at the end of the page that records the offset of each row. Rows are never split across pages. If a row is too large for the page size, the insert fails. This is a common design mistake when migrating from other databases to DB2: oversized row definitions cause page size problems.

Indexes

DB2 indexes are B-tree structures stored in their own datasets, separate from the tablespace data. Each index has its own buffer pool assignment. An index leaf page contains key values and RIDs (Record IDs), which are pointers to the exact page and slot in the tablespace where the corresponding row lives. A multi-column index stores a concatenated key. DB2 supports unique and non-unique indexes, clustering indexes (which influence the physical row order in the tablespace), and partitioned indexes.

The Catalog and Directory

DB2 maintains two internal metadata databases that it uses continuously during operation.

The Catalog (`DSNDB06`) is the user-visible system catalog. It contains entries for every object in the DB2 subsystem: tables, columns, indexes, tablespaces, views, stored procedures, triggers, and authorizations. Applications query the catalog using SQL against tables like `SYSIBM.SYSTABLES` and `SYSIBM.SYSCOLUMNS`. DBAs use catalog queries to understand the database schema, find missing indexes, or audit security grants. The catalog is itself a set of DB2 tables, stored in their own tablespaces.

The Directory (`DSNDB01`) is DB2's internal metadata store. It is not accessible via SQL. It contains the compiled forms of all SQL statements (packages and plans), the database descriptor (DBD) for every database, and skeleton cursor tables that DB2 uses to execute SQL quickly. When DB2 restarts after a failure, it reads the Directory to reconstruct its internal state before opening user databases.

The BSDS (Bootstrap Data Set) is a VSAM KSDS that records the location and names of all DB2 active and archive log datasets. DB2 reads the BSDS on every startup. Without a valid BSDS, DB2 cannot start. BSDS is maintained as a dual-copy dataset for this reason: DB2 writes every BSDS update to both copies simultaneously.

SQL Execution: Bind, Plan, and Package

DB2 does not compile SQL at runtime in the same way other database systems do. Static SQL in COBOL or PL/I programs goes through a two-step process: compilation and bind.

During compilation, the DB2 precompiler extracts all `EXEC SQL` statements from the source program and replaces them with host language calls. The extracted SQL is stored in a DBRM (Database Request Module). During the bind step, DB2 processes the DBRM: it parses the SQL, validates it against the catalog (checking that tables and columns exist and the user has authority), runs the query optimizer to determine the best access path, and stores the compiled access plan in the Directory as a package.

At runtime, when the program issues the SQL call, DB2 retrieves the pre-compiled package from the Directory and executes it directly. There is no parsing or optimization at runtime. This is a significant performance advantage for CICS transactions that execute the same SQL thousands of times per second: the optimization cost is paid once at bind time, not on every execution.

Dynamic SQL (used in ad-hoc queries, stored procedures, and modern Java applications) is parsed and optimized at runtime. DB2 caches dynamic SQL in an in-memory structure called the EDM pool (Environment Descriptor Manager) to avoid re-optimizing the same statement repeatedly. The EDM pool also caches package skeletons loaded from the Directory, reducing Directory I/O during steady-state operation.

Buffer Pools

The buffer pool is DB2's in-memory page cache. Every page read from or written to a VSAM LDS dataset passes through a buffer pool first. DB2 maintains separate pools for each page size: `BP0` for 4 KB pages, `BP8K0` for 8 KB, `BP16K0` for 16 KB, and `BP32K` for 32 KB. Each tablespace and index is assigned to a specific buffer pool at creation time.

Buffer pool sizing is the single most impactful DB2 performance tuning lever. A page that is in the buffer pool costs a memory access; a page that is not costs a physical disk read plus a memory access. For a high-throughput CICS environment, the most frequently accessed tablespace pages should be permanently resident in the buffer pool. Modern z/OS LPARs with hundreds of gigabytes of memory can hold entire tablespaces in memory.

DB2 uses a least recently used (LRU) page replacement algorithm by default. Pages that have not been referenced recently are eligible for replacement. For sequential scans (batch jobs reading an entire table), DB2 uses a most recently used (MRU) algorithm instead, to prevent sequential reads from evicting frequently accessed OLTP pages from the pool.

Updated pages in the buffer pool are dirty pages: they have been modified in memory but not yet written to disk. DB2 writes dirty pages to disk asynchronously in the background via a process called deferred write. At checkpoints, DB2 forces dirty pages to disk to bound the amount of log that would need to be replayed on recovery. The checkpoint interval is a key tuning parameter: shorter intervals reduce recovery time but increase checkpoint I/O overhead.

Locking

DB2's locking model is hierarchical. At the top of the hierarchy is the tablespace lock. Below it are table locks. Below those are page locks or row locks. DB2 always acquires an intent lock at the tablespace level before acquiring a lower-level lock. This allows the IRLM to detect conflicts between one transaction holding a row lock and another requesting a tablespace lock, without scanning every row lock in the system.

The lock granularity is set on the tablespace: `LOCKSIZE PAGE` (the default for most workloads), `LOCKSIZE ROW` (higher concurrency, more overhead), or `LOCKSIZE TABLESPACE` (maximum throughput, minimum concurrency). For most CICS OLTP workloads, page-level locking is the right balance. For heavily contended tables, row-level locking reduces lock wait time at the cost of more IRLM overhead.

DB2 supports four isolation levels, set at bind time:

RR (Repeatable Read): holds all locks until commit. A query run twice within the same unit of work returns the same result. Maximum isolation, lowest concurrency.
RS (Read Stability): holds locks on rows returned, but not on rows scanned and rejected. Prevents updates to rows you have read, but allows new rows to appear.
CS (Cursor Stability): holds a lock only on the current row being processed. The default for most CICS programs. Provides reasonable concurrency with protection against reading uncommitted data.
UR (Uncommitted Read): no read locks at all. Can read uncommitted data from other transactions. Used for reporting queries where absolute consistency is not required.

Lock escalation occurs when a transaction acquires too many page or row locks on a single tablespace. DB2 automatically promotes the individual locks to a single tablespace lock, releasing the lower-level locks. This reduces IRLM memory pressure but increases contention with other transactions. Frequent lock escalation is a signal that the `LOCKMAX` threshold should be reviewed or that the query is accessing far more data than intended.

Logging and Recovery

DB2 writes a continuous log of all database changes. The log is a set of sequential VSAM datasets: active log datasets that are written in a round-robin fashion during normal operation, and archive log datasets that hold older log records offloaded from the active logs. The BSDS records which active and archive log datasets exist and their sequence numbers.

The log is the foundation of DB2 recovery. Every update to a page is logged before it is written to disk (write-ahead logging). If DB2 fails mid-transaction, the log contains the before-image of every changed page, and DB2 uses it to backout uncommitted changes during restart. If a tablespace is damaged, it can be recovered by restoring the most recent image copy (a DB2-level backup taken by the COPY utility) and replaying the log forward from the copy point.

DB2 participates in CICS's XA two-phase commit protocol through the z/OS RRS (Resource Recovery Services) component. When a CICS transaction does DB2 work, CICS registers itself as a coordinator with RRS. On commit, CICS instructs RRS to drive DB2's prepare and commit phases. If DB2 prepares but the system fails before commit, DB2 holds the transaction in an in-doubt state. On recovery, RRS resolves the in-doubt transaction to committed or backed out based on the CICS log. This ensures that CICS and DB2 data are always consistent with each other.

Data Sharing

A single DB2 subsystem runs on one z/OS LPAR and is limited to that LPAR's CPU and memory. To scale beyond a single LPAR, DB2 uses data sharing: a group of DB2 members, each running on a separate LPAR in a Parallel Sysplex, that all access the same shared tablespace data on DASD simultaneously.

A Data Sharing Group (DSG) has one group name and up to 32 members. All members share the same DB2 catalog and directory. Each member has its own local buffer pools and processes SQL independently. The Coupling Facility provides three structures that make data sharing coherent:

LOCK1 (Lock Structure): a global lock table maintained in the CF by the IRLM. When a member acquires a lock on a row, the LOCK1 structure ensures that no other member can acquire a conflicting lock on the same row. LOCK1 requests are serviced in under 10 microseconds on current hardware.
GBP (Group Buffer Pool): a shared page cache in the CF. When a member updates a page, it registers the update in the GBP so other members know the page has been changed. Before a member reads a page from its local buffer pool, it checks whether another member has a more recent version in the GBP. This cross-member cache coherency protocol is what allows all members to see a consistent view of the data.
SCA (Shared Communications Area): a small list structure in the CF used for group-wide status information. Members use the SCA to communicate events like member joins, leaves, and restarts.

Each member in the DSG has its own log stream. In a data sharing environment, DB2 maintains read access to all members' logs. This is critical for recovery: if Member A fails, Member B can read Member A's log to resolve in-doubt transactions and recover uncommitted changes. The shared catalog and directory mean that DDL changes (creating a table, adding a column) are immediately visible to all members.

Request Lifecycle: A SQL Read from CICS

A CICS COBOL program issues an SQL SELECT to find a customer record. Here is the complete path through DB2.

The SQL call crosses from the CICS address space into DBM1 via the RRS attachment. DBM1 retrieves the pre-compiled package from the EDM pool (or loads it from the Directory if it is not cached). The package specifies the access path: read a specific index to find the row RID, then fetch the page from the tablespace.

DBM1 requests an S (share) lock on the target page from the IRLM. The IRLM checks the local lock table and the LOCK1 CF structure (in a data sharing environment) to confirm no conflicting lock is held. The lock is granted.

DBM1 checks the buffer pool for the required page. If the page is present and the GBP directory indicates no other member has a more recent version, DBM1 returns the row from the local buffer pool. No disk I/O occurs. If the page is not in the buffer pool, or the GBP indicates a newer version exists on another member, DBM1 issues a physical read from the VSAM LDS dataset (or fetches the updated page from the GBP), loads it into the buffer pool, and returns the row.

The row data is returned to the COBOL program in its host variable. At COMMIT, the IRLM releases the page lock. RRS notifies CICS that the DB2 unit of work is complete. Total elapsed time for a buffered read: typically 0.5 to 2 milliseconds.

Failure Modes

The most operationally common DB2 failure is lock timeout. A transaction waiting for a lock held by another transaction will eventually time out and receive a -911 or -913 SQL return code. In CICS programs, this typically surfaces as a task abend. The root cause is almost always a long-running transaction holding a lock longer than expected, often a batch job that was not designed with OLTP concurrency in mind.

Deadlock occurs when two transactions each hold a lock the other needs. The IRLM detects the cycle and rolls back the transaction that has done less work. DB2 returns a -911 with reason code 00C90088. Deadlocks indicate a design problem: either the lock granularity is too coarse, or two programs are accessing the same set of rows in different orders.

A tablespace failure (corrupt page, I/O error on the VSAM dataset) requires point-in-time recovery. The DBA restores the most recent image copy using the RECOVER utility and DB2 replays the log forward. Recovery time depends on the age of the last image copy and the volume of log to replay. Regular COPY utility runs and MODIFY RECOVERY housekeeping are production operational requirements.

In a data sharing environment, a member failure triggers a group restart. The surviving members read the failed member's log to resolve its in-doubt transactions and release its retained locks. The speed of this recovery depends on how active the failed member was and how far back in its log DB2 must read.

Summary

DB2 for z/OS is a multi-address-space relational database built for the demands of high-volume OLTP on a Parallel Sysplex. Its storage model, tablespaces built on VSAM LDS with separate buffer pools per page size, gives DBAs precise control over memory utilization and I/O. Its bind-and-package model separates SQL optimization from runtime execution, delivering consistent performance at CICS transaction volumes. Its hierarchical locking model balances concurrency and data integrity across thousands of concurrent transactions.

Data sharing extends the architecture horizontally across the sysplex. The Coupling Facility's LOCK1, GBP, and SCA structures provide the global lock management, page coherency, and status sharing that allow multiple DB2 members to treat a shared disk as a single coherent database. This is what allows a bank to scale its transaction processing across multiple mainframe LPARs while maintaining the ACID guarantees that financial workloads require.

The next post in this series covers JES and batch processing: how large-scale overnight jobs are submitted, scheduled, and executed on z/OS.

Part of the Mainframe Decoded series — IBM Z and z/OS, clearly explained for engineers.