Architecture: VSAM

Every time a CICS transaction reads an account record, there is a file somewhere holding that record. It is not a relational table. It is not a flat file. It is a VSAM dataset: a structured, indexed file that CICS reads in under 10 milliseconds, keyed precisely on the account number, without scanning from the beginning.

VSAM is the file system that mainframe application programs live on. Every CICS file definition points to a VSAM dataset. Every batch program that reads customer records, processes transactions, or produces reports is reading VSAM. DB2 uses VSAM internally for its tablespace storage. The z/OS catalog itself is a VSAM file. Understanding VSAM is not optional if you want to understand how mainframe applications store and retrieve data.

This post assumes you have read Architecture: Mainframe. It covers the four VSAM dataset types, the internal storage structure that makes them fast, the catalog system that locates them, and how CICS and batch programs use them.

The Big Picture

VSAM (Virtual Storage Access Method) is both a file format and an access method. As a file format, it defines how records are physically laid out on disk storage (DASD, Direct Access Storage Device). As an access method, it is the software layer that manages reading, writing, indexing, and buffering those records. A program never reads a VSAM file directly from disk: it calls VSAM, VSAM manages the buffers and I/O, and the program receives the logical record it asked for.

VSAM was introduced in the early 1970s and has evolved continuously. The current implementation is DFSMS VSAM (Data Facility Storage Management Subsystem), integrated into z/OS. The core concepts, the four dataset types, and the CI/CA storage model have remained stable for 50 years.

The Four Dataset Types

VSAM has four distinct dataset organizations. Each one is suited to a different access pattern. Choosing the wrong type for a use case is a design error that cannot easily be corrected after the dataset is populated.

KSDS: Key Sequenced Data Set

The KSDS is the most commonly used VSAM type and the one that underpins virtually every CICS application. Records are stored in physical key sequence: the data on disk is sorted by a defined key field. Every record has a unique primary key. A KSDS has two components: a data component holding the actual records in key order, and an index component holding the B-tree index structure that maps key values to their physical location in the data component.

A KSDS supports three access modes:

Keyed direct: provide a key value, VSAM traverses the index and returns the matching record. This is the most common mode in CICS transactions.
Keyed sequential: start at a given key and read records in key order. Used in batch programs that process a range of records.
Skip sequential: read sequentially but skip ahead to a new key position, avoiding records between. Used in batch programs that process a subset of records.

Records in a KSDS can be inserted at any key position, updated in place, or deleted. When a record is inserted between existing records, VSAM handles the physical reordering internally using its CI split mechanism, discussed below.

ESDS: Entry Sequenced Data Set

An ESDS stores records in the order they were written. There is no key field and no index. Records receive a RBA (Relative Byte Address): a byte offset from the beginning of the dataset that uniquely identifies their position. Once written, a record's RBA is permanent.

An ESDS supports sequential access (read from beginning to end) and direct access by RBA. It does not support deletion. A record can be rewritten in place if the replacement is the same length, but new records can only be appended. This makes ESDS ideal for log files and audit trails: CICS system logs, SMF data streams, and similar append-only workloads.

ESDS datasets can have alternate indexes built on them, giving them key-based access even though the primary organization is sequential. DB2 uses ESDS internally for certain structures. The z/OS Integrated Catalog Facility VVDS (VSAM Volume Data Set) is also an ESDS.

RRDS: Relative Record Data Set

An RRDS stores records in fixed-size numbered slots. Each slot has a RRN (Relative Record Number) starting at 1. Records are accessed directly by their RRN. Slots can be empty (no record present) or occupied. Records can be inserted into any empty slot and deleted from any occupied slot.

All records in an RRDS must be the same length. This constraint makes it unsuitable for most general-purpose business data, but it is efficient when the application can derive the record number from the data itself: a day-of-year table, a fixed-size lookup table, or a slot-based queue.

LDS: Linear Data Set

An LDS contains no record structure at all. It is a byte-addressable stream of pages, 4 KB each, accessed via z/OS data-in-virtual services. There are no RDFs or CIDFs in an LDS. DB2 uses LDS for its tablespace storage: DB2 imposes its own page and row structure on top of the raw byte stream. Application programs do not typically use LDS directly.

Internal Storage: Control Intervals and Control Areas

VSAM does not read individual records from disk. It works in units called Control Intervals (CIs). A CI is a fixed-size contiguous block of disk storage, typically 4 KB to 32 KB, that VSAM uses as the unit of I/O. When a program requests a record, VSAM reads the entire CI containing that record into a memory buffer. If a second request hits a record in the same CI that is already buffered, no disk I/O occurs. This buffering model is central to VSAM's performance.

Inside a CI, records are packed from the left. At the right end of the CI sit two types of control fields: RDFs (Record Definition Fields, 3 bytes each, one per record) that describe the length of each record, and a single CIDF (Control Interval Definition Field, 4 bytes) that describes the amount and location of free space in the CI. The free space sits between the records on the left and the control fields on the right.

Multiple CIs grouped together form a Control Area (CA). A CA is typically one cylinder of disk space, the natural unit of disk allocation on DASD. A VSAM dataset is composed of one or more CAs. Space is always allocated and extended in CA-sized chunks.

The KSDS Index Structure

The KSDS index is a B-tree-like structure with two levels: the sequence set and the index set.

The sequence set sits at the bottom of the index. There is one sequence set entry per CI in the data component. Each entry records the highest key present in that CI and the CI's physical location on disk. To find a record by key, VSAM scans the sequence set to find the CI whose highest key is greater than or equal to the requested key, then reads that CI.

The index set sits above the sequence set. For large datasets with many sequence set entries, searching the sequence set sequentially would be slow. The index set provides a higher-level index over the sequence set entries, pointing to the sequence set entry for each CA. For very large datasets, the index set may have multiple levels, making key lookup an O(log n) operation regardless of dataset size.

When VSAM reads a record directly by key, it traverses from the top of the index set down to the sequence set, identifies the correct CI, reads that CI into buffer, and locates the record within it. For a well-tuned VSAM file with adequate buffer space, the index levels are often cached in memory, reducing the lookup to a single physical disk read for the data CI itself.

Free Space and CI/CA Splits

When a KSDS is defined with IDCAMS, the administrator specifies a FREESPACE percentage: for example, FREESPACE(20 10) reserves 20% of each CI and 10% of each CA as empty space. This reserved space is the cushion that allows new records to be inserted in key order without immediately disrupting the physical layout.

When a new record is inserted into a CI that still has free space, VSAM shifts the existing records to make room and writes the new record in its correct key position. No I/O beyond the single CI is required.

When a CI is full and a new record must be inserted into it, a CI split occurs. VSAM takes approximately half the records from the full CI and moves them to a free CI within the same CA. The new record is written into the appropriate half. A CI split requires several I/O operations and temporarily degrades performance. It also fragments the dataset over time, causing sequential reads to become less efficient.

When a CA has no free CIs available and a CI split is needed, a CA split occurs. VSAM allocates a new CA (secondary allocation), moves half the CIs from the full CA into it, and redistributes the records. A CA split is more expensive than a CI split and involves extending the dataset on disk. Frequent CA splits indicate that the dataset was undersized at definition time or that the FREESPACE percentage is too low for the workload.

A heavily split dataset performs poorly on sequential reads because records that should be physically adjacent are scattered across multiple CAs. The remedy is a reorganization: export the data with `IDCAMS REPRO`, delete and redefine the cluster with appropriate sizes, and reload the records. This restores physical ordering and resets free space. VSAM reorganization is a routine operational task on production mainframes.

The ICF Catalog

A VSAM dataset cannot be accessed without being registered in a catalog. VSAM does not support access by volume serial and dataset name alone: there is no equivalent of opening a file by path. Every VSAM dataset must have a catalog entry.

The catalog system is called the ICF (Integrated Catalog Facility). Each z/OS system has one Master Catalog and any number of User Catalogs. The Master Catalog is chosen at IPL (system startup) and is itself a KSDS. User Catalogs delegate responsibility for subsets of the dataset namespace. A production system might have one User Catalog per application or per business domain, simplifying backup and recovery.

On every DASD volume that contains VSAM datasets, there is a VVDS (VSAM Volume Data Set), which is itself an ESDS. The VVDS records the VSAM datasets present on that volume. The ICF catalog points to the VVDS, and the VVDS records the physical location of each dataset's components. When a program opens a VSAM dataset, z/OS locates the catalog entry, follows it to the VVDS, and from there to the actual data and index components on disk.

VSAM datasets are defined and managed using the IDCAMS utility (Access Method Services). The key IDCAMS commands are:

DEFINE CLUSTER: creates a new VSAM dataset, specifying type (KSDS, ESDS, RRDS, LDS), record size, key field, free space, and allocation.
REPRO: copies records from one dataset to another. Used for backup, reorganization, and migration.
LISTCAT: displays catalog information about a dataset, including statistics on CI and CA splits.
DELETE: removes a dataset and its catalog entry.
ALTER: modifies certain dataset attributes, such as share options or buffer sizes.

//DEFKSDS  JOB ...
//STEP1    EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN    DD *
  DEFINE CLUSTER                     -
    (NAME(PROD.ACCOUNTS.KSDS)        -
     INDEXED                         -
     RECORDSIZE(200 200)             -
     KEYS(10 0)                      -
     FREESPACE(20 10)                -
     CYLINDERS(10 5)                 -
     SHAREOPTIONS(2 3))              -
  DATA                               -
    (NAME(PROD.ACCOUNTS.KSDS.DATA))  -
  INDEX                              -
    (NAME(PROD.ACCOUNTS.KSDS.INDEX))
/*

This defines a KSDS with 200-byte fixed-length records, a 10-byte key starting at position 0, 20% CI and 10% CA free space, and an initial allocation of 10 cylinders with 5 cylinder secondaries. The data and index components are named explicitly, which is good practice for operational clarity.

VSAM and CICS

CICS accesses VSAM datasets through its own File Control component. A CICS file definition in the CSD maps a logical file name (up to 8 characters, e.g. `ACCOUNTS`) to a physical VSAM cluster name (`PROD.ACCOUNTS.KSDS`). The program uses only the logical name; CICS resolves it to the physical dataset at runtime.

CICS opens and holds VSAM datasets for the lifetime of the region. It does not open and close them per transaction. This persistent open allows CICS to maintain a pool of VSAM buffers across all transactions, dramatically reducing the number of physical disk reads. A frequently accessed account file may have its most popular CIs permanently resident in the buffer pool.

CICS also integrates VSAM updates into its transaction logging. When a CICS transaction updates a VSAM record, CICS writes a before-image to its system log before the update. If the transaction abends or the region fails, CICS uses the log to backout the change and restore the record to its pre-transaction state. This is why CICS VSAM recovery works at the record level, not the CI level: CICS understands the record content, not just the raw bytes.

VSAM Record Level Sharing (RLS) is a more recent feature that allows multiple CICS regions (or other programs) to access the same VSAM dataset concurrently with record-level locking, coordinated through the Coupling Facility. RLS eliminates the need for a File-Owning Region in many modern CICSPlex deployments, simplifying the architecture while maintaining data integrity across regions.

Alternate Indexes

A KSDS has one primary key. But real applications often need to access the same records by different fields: find an account by account number (primary key) or by customer surname (alternate key). VSAM supports this through alternate indexes (AIX).

An alternate index is itself a KSDS cluster. Its records contain the alternate key value and a pointer to the primary key of the corresponding base cluster record. A path is defined that connects the AIX to the base cluster; programs access the base cluster via the path using the alternate key as if it were the primary key.

Alternate keys do not have to be unique. A non-unique AIX entry holds a list of all primary keys that share that alternate key value. Searching for all accounts with surname "Smith" returns a list of primary keys, each of which VSAM then retrieves from the base cluster. Building an AIX on a large dataset is an expensive batch operation and must be kept in sync with the base cluster as records are added or updated.

Failure Modes

The most common VSAM operational problem is dataset full: all primary and secondary allocations are exhausted and no further inserts are possible. CICS will start returning `NOSPACE` errors to applications. The fix is to increase the dataset size with `IDCAMS ALTER` or, if the dataset is fragmented, to reorganize it.

A catalog corruption is a more serious failure. If the ICF catalog that points to a production dataset is damaged, the dataset becomes inaccessible even though the data itself is intact on disk. Catalog backup and recovery procedures are a critical part of mainframe operations.

Index corruption in a KSDS causes key lookups to fail or return incorrect records. The index can be rebuilt using `IDCAMS DEFINE` and `REPRO` without losing the data, but this requires taking the dataset offline. Prevention is better: VSAM tracks index integrity internally and the LISTCAT output includes diagnostic indicators that operators monitor.

Excessive CI and CA splits degrade performance gradually rather than causing hard failures. The LISTCAT command reports split statistics; a high split count relative to the total record count is a signal that reorganization is due.

Summary

VSAM is the storage foundation of the mainframe application world. Its four dataset types cover the full range of business storage patterns: keyed random access for CICS transaction files (KSDS), append-only logging (ESDS), slot-based lookup tables (RRDS), and raw byte storage for DB2 (LDS).

The CI and CA storage model gives VSAM its performance characteristics. Reading the buffered CI is fast; splitting CIs is expensive. FREESPACE configuration and periodic reorganization are the operational levers that keep a VSAM dataset performing well as it grows and changes.

The ICF catalog makes datasets discoverable and manageable across the system. IDCAMS is the tool that defines, populates, monitors, and maintains them. CICS wraps VSAM access in file definitions, buffer pools, and transaction-aware recovery, making individual CICS programs unaware of the physical storage details beneath them.

The next post in this series covers DB2 on z/OS, the relational database that complements VSAM for structured query workloads.

Part of the Mainframe Decoded series — IBM Z and z/OS, clearly explained for engineers.