Architecture: CICS

A bank teller presses Enter. Within 30 milliseconds, the customer's account balance is on the screen. Somewhere in a data center, a COBOL program ran, read a VSAM file, formatted a response, and terminated. The same thing happened simultaneously for several thousand other tellers at other terminals in other branches. None of them waited for each other. None of them interfered with each other's data.

That is CICS. It has been doing exactly this since 1969.

CICS (Customer Information Control System) is IBM's transaction processing middleware for z/OS. It sits between the operating system and the application programs, providing the infrastructure that makes high-volume, concurrent, short-lived transactions possible. If z/OS is the engine room, CICS is the production floor: the place where actual work gets done, thousands of times per second, in strict order, with full data integrity.

This post assumes you have read Architecture: Mainframe. It covers how CICS is structured, how it manages work internally, how production deployments are organized across multiple regions, and how a transaction moves through the system from the first keystroke to the final response.

The Big Picture

CICS runs as a z/OS started task. From z/OS's perspective, a CICS region is an address space: a large, isolated region of virtual memory with its own execution environment. Inside that address space, CICS runs its own internal scheduler, its own memory manager, its own I/O subsystem, and its own security layer. It is, in effect, an operating system within the operating system.

The key design principle is multitasking within a single address space. Unlike z/OS, which isolates every unit of work into its own address space, CICS multiplexes thousands of concurrent transactions inside one. This is deliberate. The overhead of creating and destroying a full address space for every bank transaction would be prohibitive. CICS amortizes that overhead across an entire workload, using its own lightweight task model to achieve the density that OLTP requires.

Transactions and Tasks

In CICS, a transaction is a named unit of work identified by a four-character code. When a user presses Enter at a terminal, the first four characters of the input are the transaction code. CICS looks up that code in its resource definitions, finds the associated program, and dispatches it. The transaction code `ACCT` might map to a COBOL program called `ACCTINQ`. `PYMT` might map to `PAYPROC`.

A task is a specific running instance of a transaction. If 200 tellers all trigger `ACCT` at the same moment, CICS creates 200 tasks. Each task runs the same program (`ACCTINQ`) but with its own private storage, its own state, and its own data. The program itself is shared and read-only; only the working storage is per-task.

This distinction matters. A program in CICS must be reentrant: it cannot modify itself or rely on static state. All mutable state must live in the task's working storage or in a CICS-managed storage area. A non-reentrant program in a high-volume CICS region is a serious bug waiting to corrupt another task's data.

The Dispatcher

The Dispatcher is CICS's internal scheduler. It manages a run queue of all tasks and determines which task gets CPU time. Tasks cycle between four states: attached (just created), dispatchable (ready to run), suspended (waiting for a resource), and terminating. The dispatcher gives CPU time to dispatchable tasks in priority order and suspends tasks that are waiting for I/O, a lock, or a storage allocation.

This internal scheduling sits on top of z/OS's own dispatcher. CICS maps its tasks onto a small pool of z/OS TCBs (Task Control Blocks). All CICS tasks share a handful of TCBs rather than having one each. This is the source of CICS's concurrency efficiency: hundreds of tasks appear to run simultaneously while using a fraction of the OS-level threads a conventional application would require.

Resource Definitions and the CSD

Before CICS can do anything with a program, a file, a terminal, or a database connection, that resource must be defined to the CICS region. Every resource has a definition stored in a VSAM file called the CSD (CICS System Definition file). The CSD is the catalog of everything the region knows about.

Definitions are created and managed using the CEDA transaction (CICS Explorer Definition Administration) or the offline utility `DFHCSDUP`. A program definition tells CICS where to find the load module and how to handle it. A transaction definition maps a four-character code to a program and sets attributes like priority, security key, and whether the transaction can be dynamically routed to another region.

At startup, CICS reads the CSD and installs the defined resources into its in-memory tables. Resources can also be installed and updated while the region is running, without a restart. This is one of the operationally important features of CICS: a new version of a program can be deployed into a running production region without stopping it.

Storage Management: DSAs

CICS manages its own memory internally, dividing the address space into named regions called DSAs (Dynamic Storage Areas). Each DSA serves a different purpose and has different access protection rules.

The key DSAs are:

CDSA (CICS Dynamic Storage Area): Storage owned and managed by CICS itself, below the 16 MB line. Used for CICS internal control blocks. Applications cannot access this directly.
UDSA (User Dynamic Storage Area): Storage for user application programs below the 16 MB line. When a COBOL program issues `EXEC CICS GETMAIN`, it typically gets storage from the UDSA.
ECDSA / EUDSA: The 31-bit equivalents of CDSA and UDSA. Most modern CICS workloads use these, as 31-bit addressing covers 2 GB rather than 16 MB.
GCDSA / GUDSA: 64-bit storage areas, used for large data objects and modern Java workloads running in CICS.

Storage protection is enforced between CICS-key and user-key storage. A bug in a user application program cannot corrupt CICS internals because the hardware storage key prevents user-key code from writing to CICS-key storage. This is the same mechanism z/OS uses to protect the kernel from application bugs, applied again within the CICS address space itself.

DSA sizing is a critical tuning parameter. If the UDSA runs out of storage, tasks start waiting and eventually the region stalls. Monitoring DSA utilization is part of day-to-day CICS operations. The CICS storage management post in this series covers this in full.

Region Types: TOR, AOR, FOR

A single CICS region can theoretically do everything: own terminals, run programs, and own files. In small or development environments, that is fine. In production, it is not. High-volume shops split the workload across multiple specialized regions, each running in its own address space, connected to each other via MRO (Multi-Region Operation).

The standard region types are:

TOR: Terminal-Owning Region

The TOR is the front door. It owns all the terminal definitions and is the first region a user connects to. When a transaction code arrives, the TOR looks up whether that transaction runs locally or needs to be routed to another region. If routing is required, the TOR passes the request to an AOR and acts as a relay for the conversation. The TOR itself does not run application programs.

Separating terminal ownership from application execution means the TOR can route load across many AORs without any one of them becoming a bottleneck. If one AOR is busy, the TOR routes to another.

AOR: Application-Owning Region

The AOR is where programs run. It contains the program definitions, executes the business logic, and manages task lifecycles. In a large deployment there may be many AORs, each running the same set of programs. The TOR distributes transactions across them for load balancing. Each AOR runs independently; a failure in one AOR does not affect the others or the TOR.

FOR: File-Owning Region

The FOR owns VSAM file definitions and acts as a single point of access for shared files. When an AOR needs to read or update a VSAM file owned by a FOR, it issues a function-shipping request: CICS transparently sends the file operation to the FOR, executes it via a mirror transaction, and returns the result. From the application program's perspective, it issued a normal `EXEC CICS READ` and got data back. The fact that a cross-region call happened is invisible to the program.

The FOR model predates VSAM Record Level Sharing (RLS), which allows multiple regions to access the same VSAM file directly without a FOR. Many modern CICS deployments have retired their FORs in favour of VSAM/RLS, but the FOR pattern remains common in older or more conservative installations.

CICSPlex and Region Management

A CICSPlex is a collection of CICS regions managed as a single administrative unit. A large bank might have a CICSPlex containing one TOR per LPAR, a dozen AORs spread across a Parallel Sysplex, and one or two FORs. The CICSPlex SM (System Manager) provides a single point of control for monitoring, resource management, and workload routing across all regions in the plex.

CICSPlex SM enables dynamic routing: when a transaction arrives at the TOR, CICSPlex SM can evaluate which AOR in the plex has the most capacity and route there. This is CICS's equivalent of a load balancer, but operating at the transaction level with full awareness of each region's current state.

In a Parallel Sysplex, a CICSPlex can span multiple physical machines. A transaction entering the TOR on System 1 can be dynamically routed to an AOR on System 2 if that system has more capacity. The Coupling Facility shared cache allows shared resources (temporary storage queues, transient data) to be visible across all regions in the plex.

Transaction Lifecycle: From Keystroke to Response

Here is what happens when a teller triggers an account inquiry transaction.

The teller types `ACCT 12345678` and presses Enter. The terminal sends the input to the TOR over a VTAM or TCP/IP session. The TOR's front-end processing extracts the transaction code `ACCT` and looks it up in its resource definitions. The definition says `ACCT` is a routable transaction; the TOR invokes the dynamic routing program to select an AOR.

The routing program picks AOR 2, which currently has the lightest load. The TOR ships the transaction request to AOR 2 via an MRO link. AOR 2 receives the request, creates a new task, and attaches it to the dispatcher's run queue.

When the dispatcher gives the task CPU time, it loads the program `ACCTINQ` (if it is not already in memory) and begins execution. The COBOL program issues `EXEC CICS READ FILE('ACCOUNTS') INTO(WS-ACCOUNT) RIDFLD(WS-ACCT-NUM)`. CICS intercepts this at the exec interface boundary, validates the request, and issues the physical I/O to the VSAM file. While the I/O is in flight, the task is suspended and the dispatcher runs other tasks.

When the I/O completes, the task is resumed. The program formats the account data into a response screen using BMS (Basic Mapping Support), issues `EXEC CICS SEND MAP`, and terminates with `EXEC CICS RETURN`. CICS frees the task's storage, writes a performance record to SMF, and the response travels back through the MRO link to the TOR and out to the terminal.

Total elapsed time: typically 10 to 40 milliseconds. The task existed for that window and was then completely gone.

Recovery and Transactional Integrity

CICS enforces ACID properties for transactions. A unit of work in CICS either commits all its changes or rolls them all back. There is no partial success.

CICS maintains a system log: a continuous record of all data changes made by in-flight transactions, written to a VSAM log data set (or a coupling facility log stream in a sysplex). Before any change is committed, the before-image of the data is written to the log. If the transaction abends, CICS uses the log to backout the changes and restore the data to its state before the transaction began.

At region startup after an abnormal shutdown, CICS performs an emergency restart. It reads the system log, identifies all transactions that were in-flight at the time of failure, backs out their uncommitted changes, and restores data integrity. The time this takes depends on how many in-flight transactions were active at the moment of failure and how much log data must be processed.

For transactions that span multiple resource managers (CICS + DB2, or CICS + MQ), CICS participates in an XA two-phase commit protocol. In phase one, all participating resource managers confirm they are ready to commit. In phase two, CICS coordinates the final commit across all of them. If any participant cannot commit, the entire transaction is backed out across all resources. This is what makes cross-resource transactions safe.

Failure Modes

The most common CICS failure at the task level is an abend (abnormal end). A task abends when it encounters a condition it cannot handle: a program check, a storage violation, an unhandled CICS condition code, or a resource that is unavailable. Each abend has a four-character code (`ASRA` for a program check, `AICA` for a runaway task that exceeded its CPU limit, `AEY9` for a DB2 connection error). These codes are the primary diagnostic tool when investigating transaction failures.

A task abend does not bring down the region. CICS catches the abend, backs out any uncommitted changes for that task, frees its storage, and continues running. The region keeps processing other transactions throughout.

A region-level failure is more serious. If the CICS address space itself abends or is cancelled by the operator, z/OS's Automatic Restart Manager can restart it automatically. On restart, CICS performs its emergency restart procedure. In a CICSPlex, the TOR detects that an AOR is no longer responding and stops routing to it. Transactions that were routed there and did not complete will need to be resubmitted by the caller.

Storage exhaustion in a DSA is a slow-building failure. As tasks accumulate waiting for storage to become available, response times grow and eventually the region stops accepting new work. The fix is either to increase the DSA limit or to investigate which transactions are consuming abnormal amounts of storage and not freeing it.

The Full Architecture

Summary

CICS is a transaction processing engine with over 50 years of production refinement. Its architecture is built around a single insight: short-lived, concurrent, high-volume transactions are best served by a system that amortizes infrastructure overhead across the entire workload rather than paying it per transaction. Running thousands of tasks inside a single address space, with a lightweight internal dispatcher and a shared memory model protected by storage keys, is how CICS achieves the throughput numbers that still make it the processing backbone of global banking and insurance.

The region model, TOR-AOR-FOR, separates concerns in a way that allows independent scaling and failure isolation. CICSPlex and dynamic routing bring that model to the sysplex level. ACID recovery, XA two-phase commit, and emergency restart ensure that data integrity survives both task-level and region-level failures.

The next post in this series is Explained: COBOL, which covers the language that almost every CICS application program is written in.

Part of the Mainframe Decoded series — IBM Z and z/OS, clearly explained for engineers.