Exam code AIF-C01. Foundational level. 65 questions total: 50 scored, 15 unscored (you cannot tell which is which). Passing score is 700 / 1000. No penalty for guessing. Duration: 90 minutes. Cost: $100.
Question types include multiple choice, multiple response (select all correct answers), ordering, and matching. The newer ordering and matching types are unique to this exam at foundational level — read them carefully before answering.
Domain 1: Fundamentals of AI and ML (20%)
Core concepts
Artificial Intelligence (AI) is the broad discipline of making machines behave intelligently. Machine Learning (ML) is a subset where machines learn from data rather than being explicitly programmed. Deep Learning is a subset of ML that uses multi-layered neural networks and excels at unstructured data (images, audio, text).
Know the hierarchy: AI > ML > Deep Learning > Generative AI. The exam tests whether you can place a technique in the right category.
ML problem types
Supervised learning trains on labelled data (input + correct output). Use it for classification (spam / not spam) and regression (predicting a number). Unsupervised learning finds patterns in unlabelled data: clustering (group similar customers), dimensionality reduction. Reinforcement learning trains an agent through trial and error with rewards; used in robotics, games, and recommendation tuning.
The ML pipeline
Data collection → Data preparation (cleaning, labelling) → Feature engineering → Model training → Evaluation → Deployment → Monitoring. The exam tests you on which AWS service belongs at which stage, not on how to implement each stage.
Key metrics — know what each measures
| Metric | What it measures | When to care |
|---|---|---|
| Accuracy | % of total predictions correct | Balanced classes |
| Precision | Of predicted positives, how many are real? | False positives are costly (spam filter) |
| Recall | Of real positives, how many did we catch? | False negatives are costly (cancer detection) |
| F1 | Harmonic mean of precision & recall | Imbalanced classes |
| RMSE | Root mean square error for regression | Regression models |
| BLEU | Quality of machine translation output | NLP translation tasks |
| ROUGE | Overlap between generated and reference text | Summarisation |
Overfitting vs. underfitting
Overfitting: the model memorises training data and performs poorly on new data (high variance). Fix by adding more data, reducing model complexity, or using regularisation. Underfitting: the model is too simple to capture the pattern (high bias). Fix by increasing model complexity or training longer.
Training concepts
An epoch is one full pass through the training dataset. More epochs can improve accuracy but risk overfitting. Hyperparameters are settings you choose before training (learning rate, number of layers, batch size); they are not learned from data. Hyperparameter tuning searches for the best combination — AWS SageMaker Automatic Model Tuning handles this. Tuning strategies: grid search, random search, Bayesian optimisation.
Transfer learning vs. fine-tuning
Transfer learning takes a model trained on a large general task and adapts it to a new, related task, reusing its learned representations. Fine-tuning continues training on new data to adjust the model's weights. Domain adaptation is fine-tuning with limited domain-specific data; the key distinction from general transfer learning is that the source and target domains are similar but the target dataset is small.
AWS services for this domain
- Amazon SageMaker AI: end-to-end ML platform — build, train, deploy. The primary service for any ML workflow question.
- SageMaker Ground Truth: data labelling with human annotators.
- SageMaker Ground Truth Plus: managed labelling with expert workforce; use for high-accuracy annotation requirements (the exam uses this in scenarios involving protective eyewear, medical images, etc.).
- SageMaker Automatic Model Tuning: hyperparameter optimisation.
- SageMaker Clarify: detects bias in data and models; provides explainability reports.
- Amazon Rekognition: image and video analysis (object detection, face recognition, content moderation).
- Amazon Comprehend: NLP — sentiment, entity recognition, key phrase extraction.
- Amazon Transcribe: speech to text.
- Amazon Polly: text to speech.
- Amazon Translate: machine translation.
- Amazon Forecast: time-series forecasting (use this for demand forecasting questions, not a general ML service).
- Amazon Personalize: real-time personalisation and recommendations.
- Amazon Textract: extract text and structured data from documents (PDFs, forms, tables).
- Amazon Kendra: intelligent enterprise search powered by ML.
Domain 2: Fundamentals of Generative AI (24%)
What generative AI is
Generative AI produces new content (text, images, code, audio) by learning patterns from training data. It differs from discriminative AI, which classifies or predicts from existing data. The underlying technology for most modern generative AI is the transformer architecture, introduced in 2017.
Large Language Models (LLMs)
An LLM is a deep learning model trained on vast text corpora to predict the next token in a sequence. It learns grammar, facts, reasoning patterns, and world knowledge from that prediction task. Key characteristics: large parameter count (billions), emergent capabilities that appear only at scale, and sensitivity to how you phrase inputs.
Tokens are the units an LLM processes — roughly 4 characters or 0.75 words in English. The context window is the maximum number of tokens the model can consider at once; it limits how much prior conversation or document content the model can "see".
Foundation models
A foundation model (FM) is a large model trained on broad data that can be adapted to many downstream tasks. The term captures the idea that it serves as a foundation others build on. Foundation models are trained once (expensively) and then customised cheaply. This is the architectural pattern behind Bedrock.
Prompt engineering
Zero-shot prompting: ask the model to do something with no examples. Few-shot prompting: provide 2–5 examples of input/output in the prompt before asking your question. Chain-of-thought prompting: instruct the model to reason step by step before giving a final answer; improves accuracy on complex reasoning tasks.
System prompts set the model's persona and constraints before any user message. Prompt injection is an attack where adversarial content in user input or retrieved documents tries to override the system prompt's instructions.
Hallucination
Hallucination is when a model produces confident-sounding output that is factually incorrect or not grounded in the provided data. It is not a bug that can be "fixed" entirely; it is an inherent risk of probabilistic generation. Mitigation strategies include Retrieval-Augmented Generation (RAG), grounding responses in retrieved facts, and human review.
Generative model types
- GANs (Generative Adversarial Networks): two networks (generator and discriminator) compete; generator makes fake samples, discriminator tries to spot them. Good for images.
- VAEs (Variational Autoencoders): encode data into a compressed latent space, then decode to generate new samples.
- Diffusion models: learn to reverse a noise process; produce high-quality images (Stable Diffusion, DALL-E).
- Transformers: the architecture behind LLMs; attention mechanism lets the model relate tokens across long sequences.
BERT vs. GPT-style models
BERT (Bidirectional Encoder Representations from Transformers) reads text in both directions; excellent at understanding tasks (classification, NER, Q&A). GPT-style models are autoregressive (predict left to right); excellent at generation tasks. The exam may test which style suits a task.
AWS services for this domain
- Amazon Bedrock: managed service to access foundation models from multiple providers (Anthropic Claude, Meta Llama, Mistral, Amazon Titan, Cohere, Stability AI) via API — no infrastructure to manage.
- Amazon Q: generative AI assistant. Amazon Q Business is for enterprise (connects to company data); Amazon Q Developer is for coding assistance.
- AWS Trainium: custom chip optimised for training large models cheaply.
- AWS Inferentia: custom chip optimised for low-cost, high-throughput inference.
Domain 3: Applications of Foundation Models (28%)
This is the highest-weighted domain. Focus here.
RAG — Retrieval-Augmented Generation
RAG grounds an LLM's responses in retrieved external documents rather than relying solely on its training data. The pattern: a user query is converted to an embedding, used to search a vector store for relevant chunks, those chunks are injected into the prompt as context, and the model generates a response. RAG reduces hallucination, keeps responses current without retraining, and allows the model to cite sources.
Fine-tuning vs. RAG: which to use
| Fine-tuning | RAG | |
|---|---|---|
| Use when | Changing model style/behaviour/tone; adapting to a domain's vocabulary | Grounding responses in up-to-date or proprietary documents |
| Data needed | Labelled examples of desired input/output | A document corpus; no retraining |
| Cost | Higher (retraining or continued training) | Lower (inference + retrieval only) |
| Keeps data fresh | No — snapshot of training data | Yes — update the vector store |
Prompt engineering techniques (deeper)
Instruction-based fine-tuning: training a model with explicit task instructions rather than raw examples; produces a model that follows natural language commands reliably. RLHF (Reinforcement Learning from Human Feedback): humans rank model outputs; a reward model learns human preferences and guides further training. This is how models are aligned to be helpful and safe.
Agents
An AI agent extends an LLM by giving it tools: web search, code execution, API calls, database queries. The model reasons about which tool to call, calls it, incorporates the result, and continues reasoning. Amazon Bedrock Agents is the AWS managed service for building these workflows. Agents can break a multi-step task into sub-tasks automatically.
Model evaluation
Beyond BLEU/ROUGE, the exam covers human evaluation and model cards. A model card documents a model's intended uses, performance across subgroups, known limitations, and training data. It supports transparency and is the answer when a question asks about communicating model behaviour to stakeholders.
Partial dependence plots (PDPs): show the marginal effect of one or two features on a model's prediction, holding all others constant. They are the right answer for explainability questions in an ML context.
Benchmarks: standardised test sets used to compare models. Common ones include MMLU (general knowledge), HumanEval (code), and TruthfulQA (factual accuracy).
Latent space
The latent space is the compressed internal representation a model learns. Points close together in latent space represent semantically similar inputs. Vector embeddings live in latent space; similarity search operates there.
AWS services for this domain
- Amazon Bedrock Knowledge Bases: managed RAG pipeline — handles document ingestion, chunking, embedding, and vector storage. Connects to OpenSearch Serverless or other vector stores.
- Amazon Bedrock Agents: orchestrates multi-step agentic workflows with tool use.
- Amazon Bedrock Guardrails: filters harmful content, enforces topic restrictions, redacts PII from model inputs and outputs.
- Amazon Bedrock Model Evaluation: automated and human evaluation of FM outputs.
- Amazon OpenSearch Service: used as a vector store for similarity search in RAG architectures.
- SageMaker JumpStart: pre-built ML solutions and foundation models you can deploy with one click — good for getting started fast without building from scratch.
- Amazon Q in QuickSight: natural language querying of BI dashboards.
Domain 4: Guidelines for Responsible AI (14%)
Core dimensions
Fairness: the model should not discriminate based on protected attributes (race, gender, age). Bias can enter at the data collection stage (under-representation), labelling stage, or be amplified by the model. Amazon SageMaker Clarify detects pre-training data bias and post-training model bias.
Explainability means being able to describe how the model arrived at a specific decision. Transparency means disclosing the model's design, training data, and limitations at a system level. They are related but distinct: a system can be transparent about its architecture without being able to explain any single prediction.
Robustness: the model behaves reliably under distribution shift (new data that differs from training data) and adversarial inputs. Privacy: training data and inference requests may contain personal data that must be protected.
Human oversight: keeping humans in the loop for high-stakes decisions. Human-in-the-loop (HITL) means a human reviews and approves model outputs before they take effect. Human-on-the-loop means a human monitors and can intervene but does not approve every output.
Types of bias
- Data bias: training data under-represents or misrepresents a group.
- Algorithmic bias: the model amplifies patterns in biased data.
- Confirmation bias: labellers label data in ways that confirm existing assumptions.
- Reporting bias: training corpus over-represents content that people write about and under-represents common but unremarkable facts.
AWS services for this domain
- Amazon SageMaker Clarify: bias detection in data and models; SHAP-based feature attribution for explainability.
- Amazon SageMaker Model Cards: document model purpose, performance, and limitations.
- Amazon SageMaker Model Monitor: detects data drift and model quality degradation in production.
- Amazon Augmented AI (A2I): human review workflows for low-confidence ML predictions.
- Amazon Bedrock Guardrails: content filtering and topic restrictions for generative AI — relevant in responsible AI and security domains.
Domain 5: Security, Compliance, and Governance for AI Solutions (14%)
Shared responsibility in AI
The standard AWS shared responsibility model applies, but AI adds layers. AWS is responsible for the security of the underlying infrastructure and the foundation model itself (when using Bedrock). You are responsible for your data, your prompts, your fine-tuning data, and how you configure access to AI services.
Data security for AI
Training data and inference inputs may contain PII or sensitive business data. Key controls: encrypt data at rest (S3 SSE, KMS) and in transit (TLS). Use VPC endpoints to keep Bedrock API calls off the public internet. Restrict access with IAM roles — use least privilege; grant only the specific Bedrock model ARNs a role needs to invoke.
Amazon Bedrock Guardrails can automatically detect and redact PII from prompts and responses, reducing the risk of sensitive data leaking into model outputs or logs.
Governance
AWS Config: tracks configuration changes to AWS resources; use it to ensure AI infrastructure remains compliant with policies. AWS CloudTrail: logs all API calls, including Bedrock InvokeModel calls. This is the audit trail for who called which model with what inputs.
AWS Artifact: the portal for AWS compliance documentation — SOC reports, ISO certifications, PCI DSS reports. Use this when the question asks about obtaining compliance evidence or attestations. It is not an active monitoring service; it is a document repository.
Model governance involves tracking which model version was used for a decision, what training data it was built on, and what evaluation results it achieved. Model cards (SageMaker) are the mechanism. ML lineage tracking in SageMaker records the full chain from data to model to endpoint.
Threat landscape specific to AI
- Prompt injection: malicious instructions embedded in user input that override system prompt instructions.
- Data poisoning: adversarial manipulation of training data to cause the model to behave incorrectly in targeted ways.
- Model inversion: an attacker queries the model repeatedly to reconstruct training data or extract sensitive information.
- Model theft: querying a model extensively to train a clone.
- Hallucination as a risk: in regulated industries, a model confidently producing false output is a compliance and liability risk, not just a UX problem.
AWS services for this domain
- AWS IAM: control who can invoke which models and which Bedrock features.
- AWS KMS: manage encryption keys for training data, model artefacts, and logs.
- Amazon Macie: automatically discovers and protects sensitive data (PII) in S3 — relevant when training data is in S3.
- AWS CloudTrail: audit log for all AI service API calls.
- AWS Config: compliance monitoring for resource configurations.
- AWS Artifact: compliance reports and agreements (SOC, ISO, PCI DSS).
- Amazon Bedrock Guardrails: content filtering, PII redaction, topic blocking at the model-invocation layer.
- SageMaker Role Manager: simplifies creating least-privilege IAM roles for ML workflows.
Key AWS AI Services — Quick Reference
Commonly Confused Pairs
These distinctions appear regularly in exam questions. Know them precisely.
| Pair | Distinction |
|---|---|
| Transparency vs. Explainability | Transparency = disclosing how the system was built (system-level). Explainability = describing why a specific prediction was made (decision-level). |
| Fine-tuning vs. RAG | Fine-tuning changes the model's weights; RAG changes the model's context at inference time. RAG is cheaper and keeps data fresh. |
| Transfer learning vs. domain adaptation | Transfer learning broadly reuses a pretrained model for a new task. Domain adaptation is a specific variant using limited target-domain data to shift the model's knowledge. |
| Precision vs. Recall | Precision: when you say positive, are you right? Recall: of all real positives, how many did you find? Trade-off — improving one usually hurts the other. |
| SageMaker Clarify vs. Model Monitor | Clarify: bias and explainability at training/evaluation time. Model Monitor: data drift and quality degradation in production over time. |
| Amazon Kendra vs. Amazon Q Business | Kendra: ML-powered enterprise search returning relevant documents. Q Business: generative AI assistant that synthesises an answer from connected data sources. |
| AWS Trainium vs. AWS Inferentia | Trainium: custom chip for training. Inferentia: custom chip for inference. If the question involves deploying at scale cheaply, Inferentia. |
| CloudTrail vs. AWS Config | CloudTrail: who did what, when (API activity log). Config: what state are resources in, and has that changed (configuration compliance). |
| BLEU vs. ROUGE | BLEU: translation quality (precision-based). ROUGE: summarisation quality (recall-based). |
| Hallucination vs. bias | Hallucination: model invents content not grounded in fact. Bias: model systematically favours or disfavours certain groups or outcomes. |
Exam-Day Notes
The exam is breadth-first, not depth-first. It tests whether you know which service to use and why, not how to implement it. When you see a scenario, identify the use case, then match it to the most specific AWS service. Prefer specific over general (Forecast > SageMaker for time-series; Q Business > Kendra for generative Q&A).
For multiple-response questions, the number of correct answers is stated in the question. Eliminate obvious wrong answers first. All correct answers must be selected for credit.
Do not second-guess answers you are confident about. Flag and skip questions where you are unsure; there is no penalty for guessing, so answer every question before you finish.
The unscored 15 questions can be on any topic — if you see something genuinely unfamiliar, it may simply be an unscored research question. Do not panic; move on.
Author: Phee Jay. Last reviewed April 2026. Based on the AIF-C01 exam guide.