Explained: Taints and Tolerations in Kubernetes

"Not every workload belongs on every node. Taints and tolerations are how Kubernetes enforces that."

The Problem They Solve

Imagine your cluster has a mix of nodes: some with expensive GPUs, some reserved for critical production workloads, and some designated for a specific team. By default, Kubernetes' scheduler is happy to place any pod on any node — it just looks for available resources.

That's a problem. You don't want a batch analytics job accidentally landing on your GPU node, or a dev experiment crowding out a production service.

Taints and tolerations give you a way to say:

Node: "I repel most pods. Only special ones are welcome." → Taint
Pod: "I can handle that repulsion. Schedule me anyway." → Toleration

Think of it like a bouncer (taint) at a club door, and a VIP pass (toleration) that gets you in.

Fig 1 — A tainted node blocks regular pods but admits pods that carry a matching toleration.

What is a Taint?

A taint is applied to a node. It marks the node so that the scheduler will avoid placing pods there — unless a pod explicitly tolerates it.

Anatomy of a taint

key=value:effect

Part	Description
`key`	A label-like identifier (e.g. `gpu`, `team`, `environment`)
`value`	Optional value (e.g. `true`, `nvidia`)
`effect`	What happens to pods that don't tolerate this taint

The three effects

Fig 2 — The three taint effects, from soft to aggressive.

Applying a taint to a node

kubectl taint nodes <node-name> gpu=true:NoSchedule

To remove a taint, append a - at the end:

kubectl taint nodes <node-name> gpu=true:NoSchedule-

What is a Toleration?

A toleration is defined in a pod's spec. It tells the scheduler: "I acknowledge this taint and I'm okay being placed on that node."

Anatomy of a toleration

tolerations:
  - key: "gpu"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"

Field	Description
`key`	Must match the taint's key
`operator`	`Equal` (matches key + value) or `Exists` (matches any value for the key)
`value`	Must match the taint's value (only with `Equal`)
`effect`	Must match the taint's effect (omit to match all effects)
`tolerationSeconds`	For `NoExecute` — how long a pod can stay before eviction

Using `Exists` as an operator

tolerations:
  - key: "gpu"
    operator: "Exists"
    effect: "NoSchedule"

This tolerates any taint with the key gpu, regardless of value. Useful for broad matching.

Tolerating everything (wildcard)

tolerations:
  - operator: "Exists"

This pod tolerates all taints on all nodes. Use sparingly — typically for system-level DaemonSets that must run everywhere.

A Full Example

Scenario: Dedicate a node to GPU workloads

Step 1 — Taint the GPU node:

kubectl taint nodes gpu-node-1 hardware=gpu:NoSchedule

Step 2 — Add a toleration to your GPU-hungry pod:

apiVersion: v1
kind: Pod
metadata:
  name: ml-training-job
spec:
  tolerations:
    - key: "hardware"
      operator: "Equal"
      value: "gpu"
      effect: "NoSchedule"
  containers:
    - name: trainer
      image: my-ml-image:latest

Fig 3 — All three fields (key, value, effect) must match for a toleration to neutralise a taint.

⚠️ Important: A toleration is permission, not a guarantee. The pod can go to the tainted node, but won't be forced there. For that, combine it with Node Affinity or a nodeSelector.

Taints & Tolerations vs. Node Affinity

These are often confused. Here's how they differ:

Fig 4 — Taints repel from the node side; Node Affinity attracts from the pod side. Combined, they give precise placement control.

	Taints & Tolerations	Node Affinity
Applied to	Nodes (taint) + Pods (toleration)	Pods only
Purpose	Repel pods from nodes	Attract pods to specific nodes
Direction	Node pushes pods away	Pod pulls toward a node
Use together?	✅ Yes — often combined for precise placement

OpenShift-Specific Considerations

OpenShift (built on Kubernetes) fully supports taints and tolerations, with a few additions:

Fig 5 — In OpenShift, MachineSets automatically apply taints to every node in the pool — no manual kubectl taint per node.

Machine Config Pools (MCPs): Define taints at the MachineSet level so every new node in a pool is automatically tainted.
Infrastructure nodes: OpenShift commonly uses taints on infra nodes (routers, registry, monitoring) to isolate platform workloads:
```
node-role.kubernetes.io/infra:NoSchedule
```
Operators & DaemonSets: Many OpenShift operators deploy DaemonSets with broad tolerations so they can run on every node, including tainted ones.
Cluster Autoscaler: Taints on MachineSets affect which node pool gets selected during scale-out, so taint design matters for autoscaling behaviour.

Common Real-World Use Cases

Use case	Taint key	Effect
Dedicate nodes to a team	`team=payments`	`NoSchedule`
Reserve GPU nodes	`hardware=gpu`	`NoSchedule`
Mark a node for maintenance	`maintenance=true`	`NoExecute`
Soft-isolate spot/preemptible nodes	`cloud.google.com/gke-spot=true`	`PreferNoSchedule`
Isolate infra workloads (OpenShift)	`node-role.kubernetes.io/infra`	`NoSchedule`

Key Takeaways

Taints go on nodes — they repel pods that can't tolerate them.
Tolerations go on pods — they grant permission to be scheduled on tainted nodes.
Three effects: NoSchedule, PreferNoSchedule, NoExecute — each progressively stronger.
Tolerations are permissive, not prescriptive — combine with Node Affinity to fully control placement.
In OpenShift, use MachineSets to automatically apply taints to entire pools of nodes.
NoExecute is the only effect that evicts already-running pods — use it carefully.

Next steps: explore Node Affinity, Pod Affinity/Anti-Affinity, and Priority Classes for a complete workload scheduling strategy.