Curriculum: Generative AI for Enterprise Deployment

ENTERPRISE GENERATIVE AI
DEPLOYMENT CURRICULUM
A Rigorous & Practical Program
24-Week Comprehensive Training
Dual-Track: Technical Engineers & Technical Managers

Contents
1 Program Overview 3
1.1 Program Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Dual-Track Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Phase I: Foundations & Calibration 4
2.1 Week 0: Diagnostic & Cohort Calibration . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Diagnostic Assessment Components . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Remediation Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Week 1: The LLM Landscape & Mental Models . . . . . . . . . . . . . . . . . . . 4
2.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Week 2: Transformer Architecture Fundamentals . . . . . . . . . . . . . . . . . . 5
2.3.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Phase II: LLM Systems Engineering 6
3.1 Week 3: Tokenization & Context Management . . . . . . . . . . . . . . . . . . . . 6
3.1.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Week 4: Inference Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Week 5: Structured Outputs & Reliability . . . . . . . . . . . . . . . . . . . . . . 7
3.3.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4 Week 6: Function Calling & Tool Use . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 Phase III: Prompt Engineering & Model Selection 9
4.1 Week 7: Prompt Engineering Fundamentals . . . . . . . . . . . . . . . . . . . . . 9
4.1.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Week 8: Advanced Prompt Techniques . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3 Week 9: Model Selection & Vendor Evaluation . . . . . . . . . . . . . . . . . . . 10
4.3.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.4 Week 10: Evaluation & Testing Frameworks . . . . . . . . . . . . . . . . . . . . . 11
2

Enterprise Generative AI Deployment Curriculum
4.4.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.4.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.4.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5 Phase IV: Retrieval, Customization & Alignment 12
5.1 Week 11: Failure Modes & Defensive Design . . . . . . . . . . . . . . . . . . . . . 12
5.1.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.1.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.1.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2 Week 12: RAG Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3 Week 13: Advanced RAG & Hybrid Search . . . . . . . . . . . . . . . . . . . . . 13
5.3.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.4 Week 14: Fine-Tuning Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.4.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.4.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.4.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.5 Week 15: Safety, Alignment & Guardrails . . . . . . . . . . . . . . . . . . . . . . 14
5.5.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.5.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.5.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6 Phase V: Enterprise Deployment & Architecture 16
6.1 Week 16: Production Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.1.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.1.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.1.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.2 Week 17: Monitoring, Observability & Debugging . . . . . . . . . . . . . . . . . . 16
6.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.3 Week 18: Data Governance, Security & Compliance . . . . . . . . . . . . . . . . . 17
6.3.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.3.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.3.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.4 Week 19: Agentic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.4.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.4.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.4.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.5 Week 20: Enterprise Integrations . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.5.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.5.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.5.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7 Phase VI: Governance, Risk & Capstone 20
7.1 Week 21: AI Governance Frameworks . . . . . . . . . . . . . . . . . . . . . . . . 20
7.1.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.1.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Page 3

7.1.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.2 Week 22: AI Products & Organizations . . . . . . . . . . . . . . . . . . . . . . . 20
7.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.3 Weeks 2324: Capstone Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.3.1 Capstone Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.3.2 Capstone Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
8 Appendix A: What Makes This Curriculum Rigorous 23
9 Appendix B: Suggested Reading Resources 23
9.1 Foundational Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
9.2 Technical Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
9.3 Enterprise Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
10 Appendix C: Lab Environment Requirements 24
10.1 Compute Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
10.2 Software Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
10.3 Accounts Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Page 4

1 Program Overview
This 24-week program provides comprehensive training in enterprise LLM deployment, struc-
tured to build genuine competence rather than surface-level exposure. The expanded timeline
addresses the realistic complexity of modern AI systems while maintaining practical, hands-on
focus throughout.
1.1 Program Structure
The curriculum is organized into six phases, each building on the previous:
ˆ Phase I (Weeks 02): Foundations Calibration
ˆ Phase II (Weeks 36): LLM Systems Engineering
ˆ Phase III (Weeks 710): Prompt Engineering Model Selection
ˆ Phase IV (Weeks 1115): Retrieval, Customization Alignment
ˆ Phase V (Weeks 1620): Enterprise Deployment Architecture
ˆ Phase VI (Weeks 2124): Governance, Risk Capstone
1.2 Dual-Track Design
Each week includes parallel tracks for dierent roles:
ˆ Technical Lab: Hands-on implementation exercises for engineers
ˆ Manager Module: Architecture decisions, cost modeling, compliance, and team structure
for technical leaders
1.3 Prerequisites
Participants should have:
ˆ Prociency in Python (comfortable with classes, decorators, async patterns)
ˆ Basic understanding of machine learning concepts (gradient descent, loss functions, overt-
ting)
ˆ Familiarity with REST APIs and microservice architectures
ˆ Experience with cloud platforms (AWS, GCP, or Azure)
ˆ Basic SQL and data manipulation skills
Page 5

2 Phase I: Foundations Calibration
Weeks 02
2.1 Week 0: Diagnostic Cohort Calibration
Before the program begins, participants complete a diagnostic assessment to calibrate cohort
composition and identify areas requiring additional support.
2.1.1 Diagnostic Assessment Components
ˆ Python prociency test: Async patterns, decorators, context managers, type hints
ˆ ML fundamentals quiz: Gradient descent mechanics, regularization, cross-validation,
bias-variance tradeo
ˆ Systems design exercise: Design a simple API with caching, rate limiting, and error
handling
ˆ LLM familiarity survey: Prior experience with ChatGPT, Claude, open-source models,
API usage
ˆ Role and goals questionnaire: Current responsibilities, deployment objectives, organi-
zational constraints
2.1.2 Remediation Resources
Based on diagnostic results, participants receive targeted preparation materials:
ˆ Python async/await tutorial and exercises
ˆ ML fundamentals refresher (Andrew Ng's Coursera modules or fast.ai Practical ML)
ˆ PyTorch basics: tensors, autograd, simple neural network training
ˆ Docker and Kubernetes fundamentals for those lacking container experience
2.2 Week 1: The LLM Landscape Mental Models
Establish accurate mental models for what LLMs are and are not, how they dier from traditional
software, and where they fail.
2.2.1 Topics Covered
ˆ What LLMs actually do: Next-token prediction, pattern completion vs. reasoning, the
distinction between generating plausible text and computing correct answers
ˆ Capability taxonomy: What LLMs reliably do well (uent generation, translation, sum-
marization, code completion), what they do unreliably (math, logic, factual recall), what
they cannot do (real-time data, guaranteed correctness)
ˆ Failure modes overview: Hallucination, confabulation, instruction-following failures,
context window limitations, sensitivity to prompt phrasing
ˆ The API landscape: OpenAI, Anthropic, Google, Cohere, open-weight models (Llama,
Mistral, Qwen), when to use each
ˆ Cost structures: Token-based pricing, input vs. output tokens, cost estimation for realistic
workloads
Page 6

2.2.2 Technical Lab
Compare responses from 3+ models (GPT-4, Claude, Llama-3) on identical prompts across: fac-
tual questions with veriable answers, reasoning tasks, creative generation, and code generation.
Document failure patterns and model-specic behaviors.
2.2.3 Manager Module
Setting realistic expectations with stakeholders. How to communicate what LLMs can and
cannot do. Red ags in vendor claims. Building an initial cost model for a proposed use case.
2.3 Week 2: Transformer Architecture Fundamentals
Build intuition for how transformers work without requiring a full deep learning course. Focus
on the concepts that matter for deployment decisions.
ˆ Attention mechanism intuition: What attention means, how tokens relate to each
other, why context windows exist
ˆ The embedding layer: Tokens as vectors, semantic similarity, why embeddings matter
for retrieval
ˆ Positional encoding: How models understand token order, rotary embeddings (RoPE),
implications for context length
ˆ Feed-forward networks and layers: What happens between attention layers, why depth
matters
ˆ Autoregressive generation: Why generation is sequential, why output length aects
latency, the temperature and top-p parameters
ˆ Model sizes and parameter counts: What 7B or 70B means, rough relationship
between size, capability, and cost
2.3.2 Technical Lab
Visualize attention patterns using BertViz or similar tools. Observe how attention changes with
dierent prompts. Experiment with temperature and sampling parameters to understand their
eects on output diversity and quality.
How to evaluate model architecture claims from vendors. Understanding the tradeos between
model size, speed, and capability. Making informed decisions about context window requirements
for your use cases.
Page 7

3 Phase II: LLM Systems Engineering
Weeks 36
3.1 Week 3: Tokenization Context Management
Understand how text becomes tokens, why tokenization choices aect cost and performance,
and how to manage context windows eectively.
ˆ Tokenization algorithms: BPE (Byte-Pair Encoding), SentencePiece, WordPiece, Uni-
gram LM. How each works, why dierent models use dierent tokenizers
ˆ Token eciency: Why some text tokenizes more eciently than others, implications for
non-English languages, code, and structured data
ˆ Context window anatomy: System prompt, conversation history, user input, model
output. How to allocate tokens across these components
ˆ Context management strategies: Sliding windows, summarization, hierarchical context,
when to truncate vs. summarize
ˆ Cost implications: How tokenization aects billing, optimizing prompts for cost without
sacricing quality
3.1.2 Technical Lab
Build a tokenizer visualization tool that shows token boundaries, counts, and costs for dierent
models. Compare tokenization eciency across: English prose, code, JSON, non-Latin scripts,
and domain-specic terminology. Implement a context manager that tracks token budgets and
handles overow gracefully.
Budgeting token costs for production workloads. Understanding how context window size aects
use case feasibility. Evaluating whether a use case requires long-context models (and the cost
implications).
3.2 Week 4: Inference Infrastructure
Understand how LLM inference works at the infrastructure level and the factors that determine
latency, throughput, and cost.
ˆ GPU memory and computation: Why LLMs require GPUs, VRAM requirements for
dierent model sizes, the memory bandwidth bottleneck
ˆ KV-cache mechanics: What the key-value cache stores, why it grows with context length,
memory implications for long conversations
ˆ Batching strategies: Static vs. dynamic batching, continuous batching, how batching
aects latency vs. throughput tradeos
Page 8

ˆ Quantization: INT8, INT4, FP8, NF4, AWQ, GPTQ. What precision means, quality vs.
speed tradeos, when quantization is appropriate
ˆ Latency components: Time to rst token (TTFT), inter-token latency (ITL), total gen-
eration time. What aects each component
ˆ Serving frameworks: vLLM, TGI (Text Generation Inference), TensorRT-LLM, llama.cpp.
When to use each
3.2.2 Technical Lab
Deploy a model using vLLM. Measure TTFT and ITL under dierent load conditions. Exper-
iment with batch sizes and observe throughput changes. Compare quantized vs. full-precision
model performance. Prole GPU memory usage as context length increases.
Capacity planning for LLM workloads. Understanding SLAs: what latency guarantees are
achievable. Cost-performance tradeos for dierent hardware congurations. When to use
managed APIs vs. self-hosted infrastructure.
3.3 Week 5: Structured Outputs Reliability
LLMs generate unstructured text by default. Production systems require structured, parseable
outputs. This week covers techniques for reliable structured generation.
ˆ JSON mode and schema enforcement: Native JSON modes (OpenAI, Anthropic),
constrained decoding, grammar-based generation
ˆ Parsing strategies: Regex extraction, partial JSON parsing, streaming parsers, graceful
degradation when parsing fails
ˆ Retry and fallback patterns: Exponential backo, reformatting prompts on failure,
fallback to simpler output formats
ˆ Validation layers: Pydantic models, JSON Schema validation, custom validators for
domain-specic constraints
ˆ Streaming structured outputs: Incremental parsing, handling partial results, UI con-
siderations for streaming
ˆ Error budgets: Acceptable failure rates, monitoring parse failures, alerting thresholds
3.3.2 Technical Lab
Build a robust JSON extraction pipeline with: schema validation, streaming support, retry logic,
and fallback parsing. Test against adversarial inputs that typically break naive implementations.
Implement monitoring for parse failure rates.
Dening reliability requirements for LLM-powered features. Setting appropriate SLAs for struc-
tured output success rates. Understanding when best eort parsing is acceptable vs. when
guaranteed structure is required.
Page 9

3.4 Week 6: Function Calling Tool Use
Enable LLMs to take actions in the world by calling functions and APIs. This is foundational
for agentic systems covered later.
ˆ Function calling APIs: OpenAI function calling, Anthropic tool use, open-model ap-
proaches. Syntax and capabilities of each
ˆ Tool denition best practices: Clear descriptions, well-dened parameters, examples in
documentation, handling optional parameters
ˆ Execution safety: Sandboxing tool execution, permission models, preventing unintended
actions, audit logging
ˆ Multi-tool orchestration: When the model needs multiple tools, parallel vs. sequential
execution, handling tool dependencies
ˆ Error handling: What happens when tools fail, propagating errors to the model, retry
strategies
ˆ Human-in-the-loop patterns: Conrmation prompts for high-stakes actions, approval
workows, escalation paths
3.4.2 Technical Lab
Build a multi-tool agent that can: query a database, call external APIs, perform calculations,
and generate reports. Implement permission controls and audit logging. Test with adversarial
prompts that attempt to misuse tools.
Designing safe tool-calling systems for enterprise environments. Dening which actions require
human approval. Compliance implications of LLM-initiated actions. Audit and accountability
requirements.
Page 10

4 Phase III: Prompt Engineering Model Selection
Weeks 710
4.1 Week 7: Prompt Engineering Fundamentals
Prompt engineering is the primary lever for controlling LLM behavior. This week covers sys-
tematic approaches to prompt design.
ˆ Prompt anatomy: System prompts, user prompts, assistant prells. How each component
inuences model behavior
ˆ Instruction design: Clarity, specicity, avoiding ambiguity. Positive vs. negative instruc-
tions. Ordering eects
ˆ Few-shot prompting: Example selection, example ordering, format consistency, when
few-shot helps vs. hurts
ˆ Chain-of-thought (CoT): Eliciting reasoning steps, when CoT improves accuracy, when
it doesn't help, zero-shot CoT
ˆ Output formatting: Specifying format in prompts, XML tags, markdown structure, con-
trolling verbosity
ˆ Prompt sensitivity: How small changes aect outputs, strategies for robust prompts,
testing for sensitivity
4.1.2 Technical Lab
Develop prompts for a complex extraction task (e.g., extracting structured data from legal
documents). Iterate through multiple prompt versions, measuring accuracy on a held-out test
set. Document which techniques improved performance and which didn't.
Prompt engineering as an ongoing operational concern, not a one-time setup. Version control
and testing for prompts. Building prompt libraries and sharing across teams.
4.2 Week 8: Advanced Prompt Techniques
Beyond the basics: techniques for complex tasks, multi-step reasoning, and pushing model
capabilities.
ˆ Self-consistency and majority voting: Sampling multiple responses, aggregating an-
swers, when this improves reliability
ˆ Decomposition strategies: Breaking complex tasks into subtasks, prompt chaining, man-
aging state across prompts
ˆ Reection and self-critique: Asking models to verify their own outputs, iterative rene-
ment, limitations of self-correction
Page 11

ˆ Role prompting: Persona assignment, expertise simulation, when role prompting helps
vs. introduces bias
ˆ Adversarial prompting: Jailbreaks, prompt injection, understanding attack vectors to
build defenses
ˆ Prompt compression: Reducing token usage while preserving eectiveness, distillation
of long prompts
4.2.2 Technical Lab
Implement a multi-step reasoning pipeline for a complex analytical task (e.g., nancial report
analysis with numerical reasoning). Compare single-prompt vs. chained-prompt approaches.
Implement self-consistency voting and measure accuracy improvements.
Cost implications of advanced prompting techniques (self-consistency requires N× API calls).
Latency tradeos for multi-step pipelines. When the complexity is worth it vs. when simpler
approaches suce.
4.3 Week 9: Model Selection Vendor Evaluation
Choosing the right model for a use case is critical. This week provides frameworks for systematic
model evaluation and vendor comparison.
ˆ The model landscape: GPT-4/4o, Claude 3.5/Opus, Gemini Pro/Ultra, Llama 3, Mis-
tral, Qwen, Command R+. Capabilities, pricing, and positioning of each
ˆ Benchmark interpretation: MMLU, HumanEval, GSM8K, HellaSwag. What bench-
marks measure, their limitations, why benchmark performance doesn't always predict pro-
duction performance
ˆ Task-specic evaluation: Building evaluation sets for your use case, avoiding contami-
nation, statistical signicance of comparisons
ˆ Cost-capability tradeos: When smaller/cheaper models suce, tiered model strategies,
routing between models
ˆ Vendor risk assessment: API stability, pricing changes, deprecation policies, data han-
dling practices, geographic availability
ˆ Open vs. closed models: Control, customization, and compliance tradeos. When open-
weight models are appropriate
4.3.2 Technical Lab
Design and execute a model comparison for a specic use case. Build a test set with ground
truth labels. Evaluate 3+ models on accuracy, latency, and cost. Present ndings with statistical
condence intervals.
Building a model selection framework for your organization. Vendor negotiation and contract
considerations. Multi-vendor strategies to reduce lock-in. Procurement processes for AI services.
Page 12

4.4 Week 10: Evaluation Testing Frameworks
Rigorous evaluation is essential for production LLM systems. This week covers building com-
prehensive testing infrastructure.
ˆ Evaluation metrics: Exact match, BLEU, ROUGE, BERTScore, semantic similarity.
When each metric is appropriate and what each actually measures
ˆ LLM-as-judge: Using LLMs to evaluate LLM outputs, designing rubrics, calibration,
known biases (verbosity, position)
ˆ Human evaluation: When human eval is necessary, designing annotation tasks, inter-
rater reliability, cost-eective sampling strategies
ˆ Regression testing: Detecting quality degradation over time, building regression suites,
automated alerts
ˆ A/B testing for LLMs: Statistical considerations, handling high variance in LLM out-
puts, minimum sample sizes
ˆ Red teaming: Systematic adversarial testing, coverage strategies, tracking and triaging
ndings
4.4.2 Technical Lab
Build a comprehensive evaluation harness including: automated metrics, LLM-as-judge scoring,
regression tracking, and alerting. Integrate with CI/CD to run evaluations on prompt or model
changes.
Operationalizing continuous evaluation. Building evaluation review boards. Dening quality
gates for deployment. Communicating evaluation results to non-technical stakeholders.
Page 13

5 Phase IV: Retrieval, Customization Alignment
Weeks 1115
5.1 Week 11: Failure Modes Defensive Design
Before building retrieval and customization systems, understand how they fail. This week's
placement ensures defensive thinking is applied to all subsequent work.
ˆ Hallucination taxonomy: Factual errors, entity confusion, temporal confusion, fabricated
citations, condent incorrectness. Causes of each type
ˆ Retrieval failures: Missing relevant documents, retrieving irrelevant documents, retrieval
contamination, embedding space limitations
ˆ Context poisoning: When retrieved context leads to worse outputs, contradictory sources,
outdated information
ˆ Prompt injection and jailbreaks: Attack vectors, real-world examples, why these are
hard to fully prevent
ˆ Semantic drift: How model behavior changes over time, API updates, silent regressions
ˆ Defensive design patterns: Graceful degradation, uncertainty communication, human
escalation triggers, output validation
5.1.2 Technical Lab
Build a failure detection system: classify model outputs for hallucination indicators, implement
uncertainty scoring, create alerting for anomalous outputs. Red-team your own system and
document attack vectors.
Incident response planning for LLM failures. Dening severity levels. Communication templates
for stakeholders. Post-mortem processes. Building organizational awareness of failure modes.
5.2 Week 12: RAG Foundations
Retrieval-Augmented Generation (RAG) is the most common pattern for grounding LLMs in
organizational knowledge. This week covers the fundamentals.
ˆ Embedding models: OpenAI embeddings, Cohere Embed, open-source alternatives (E5,
BGE, GTE). Dimensionality, performance, and cost tradeos
ˆ Similarity metrics: Cosine similarity, dot product, Euclidean distance. When each is
appropriate, normalization considerations
ˆ Chunking strategies: Fixed-size chunks, semantic chunking, sentence-based, paragraph-
based, document structure-aware chunking
ˆ Vector databases: Pinecone, Weaviate, Qdrant, Milvus, pgvector. Architecture, indexing
algorithms (HNSW, IVF), managed vs. self-hosted
Page 14

ˆ Basic RAG pipeline: Query embedding → similarity search → context injection →
generation. End-to-end implementation
ˆ RAG evaluation: Retrieval quality (recall@k, MRR), end-to-end accuracy, attribution
correctness
5.2.2 Technical Lab
Build a complete RAG pipeline: document ingestion, chunking, embedding, vector storage,
retrieval, and generation. Benchmark retrieval quality with dierent chunking strategies and
embedding models on a labeled dataset.
RAG infrastructure decisions: vector database selection, managed vs. self-hosted, cost modeling.
Data ingestion pipelines and update strategies. When RAG is the right solution vs. alternatives.
5.3 Week 13: Advanced RAG Hybrid Search
Production RAG systems require more than basic semantic search. This week covers techniques
that signicantly improve retrieval quality.
ˆ Hybrid search: Combining vector search with keyword search (BM25), fusion strategies,
when hybrid outperforms pure semantic search
ˆ Query transformation: Query expansion, hypothetical document embeddings (HyDE),
query decomposition for complex questions
ˆ Reranking: Cross-encoder rerankers (Cohere Rerank, BGE Reranker), ColBERT, when
reranking provides lift
ˆ Hierarchical retrieval: Document summaries → full documents, multi-stage retrieval,
parent-child chunk relationships
ˆ Structured data retrieval: Text-to-SQL, knowledge graphs, combining structured and
unstructured retrieval
ˆ Metadata ltering: Using document metadata to scope searches, access control integra-
tion, temporal ltering
5.3.2 Technical Lab
Extend the Week 12 RAG pipeline with: hybrid search, reranking, and query expansion. Measure
precision/recall improvements at each stage. Implement a text-to-SQL component for structured
data queries.
Justifying infrastructure complexity: when advanced RAG techniques are worth the added cost
and maintenance. Performance vs. cost tradeos for reranking. Build vs. buy decisions for
search infrastructure.
Page 15

5.4 Week 14: Fine-Tuning Fundamentals
Fine-tuning is often overused. This week covers when ne-tuning is appropriate and how to do
it correctly when it is.
ˆ When to ne-tune: Style/format adaptation, specialized domains, performance-critical
latency reduction. When RAG or prompting suces instead
ˆ When NOT to ne-tune: Adding factual knowledge (use RAG), quick experiments,
rapidly changing information, limited data
ˆ Fine-tuning methods: Full ne-tuning, LoRA, QLoRA, DoRA. Parameter eciency,
memory requirements, quality tradeos
ˆ Data preparation: Instruction format, quality over quantity, data cleaning, deduplication,
balance across task types
ˆ Training dynamics: Learning rates, batch sizes, epochs, early stopping, catastrophic
forgetting, overtting detection
ˆ Evaluation during training: Held-out test sets, monitoring for capability regression,
comparing to base model
5.4.2 Technical Lab
Fine-tune a small model (7B parameters) using QLoRA for a domain-specic task. Prepare a
high-quality training dataset. Compare ne-tuned model performance against base model +
few-shot prompting + RAG approaches.
ROI analysis for ne-tuning projects. GPU compute budgeting. Maintenance burden of ne-
tuned models. When to use vendor ne-tuning APIs vs. self-managed training.
5.5 Week 15: Safety, Alignment Guardrails
Deploying LLMs in production requires systematic approaches to safety and alignment with
organizational values.
ˆ Alignment techniques: RLHF (Reinforcement Learning from Human Feedback), RLAIF,
Constitutional AI, DPO (Direct Preference Optimization)
ˆ Content moderation: Input ltering, output ltering, content classiers, toxicity detec-
tion, PII detection
ˆ Guardrail frameworks: NeMo Guardrails, Guardrails AI, custom rule engines. Archi-
tecture patterns for guardrail integration
ˆ Jailbreak mitigation: Known attack patterns, defense strategies, why perfect defense is
impossible, monitoring for novel attacks
ˆ Topic and scope control: Keeping models on-topic, handling out-of-scope requests grace-
fully, dening boundaries
Page 16

ˆ Bias and fairness: Testing for demographic biases, mitigation strategies, documentation
requirements
5.5.2 Technical Lab
Build a safety pipeline including: input classier, topic guardrails, output moderation, and PII
scrubbing. Red-team the system with jailbreak attempts. Implement monitoring and alerting
for safety violations.
Drafting an AI Safety Review Process for enterprise deployment. Dening acceptable use poli-
cies. Establishing review boards. Incident response for safety violations. Communicating safety
posture to stakeholders.
Page 17

6 Phase V: Enterprise Deployment Architecture
Weeks 1620
6.1 Week 16: Production Infrastructure
Moving from prototype to production requires robust infrastructure. This week covers the
systems engineering required for reliable LLM deployments.
ˆ Orchestration platforms: Kubernetes for LLM workloads, Ray Serve, Modal, Anyscale.
Scheduling and resource management for GPU workloads
ˆ Inference optimization: vLLM, TensorRT-LLM, speculative decoding, continuous batch-
ing. Maximizing throughput while meeting latency SLAs
ˆ Scaling patterns: Horizontal scaling, tensor parallelism, pipeline parallelism. Auto-
scaling based on load
ˆ Load balancing: Request routing for LLM services, session anity for stateful conversa-
tions, handling long-running requests
ˆ Caching strategies: Semantic caching, embedding caches, response caches. Cache invali-
dation for dynamic content
ˆ Cost optimization: Spot instances for batch workloads, reserved capacity planning, model
routing to optimize cost/quality
6.1.2 Technical Lab
Deploy vLLM on Kubernetes with auto-scaling. Implement load testing with realistic trac
patterns. Measure and optimize for latency percentiles (p50, p95, p99). Implement semantic
caching and measure cache hit rates.
TCO modeling: on-premises vs. cloud vs. hybrid. Reserved vs. on-demand pricing strategies.
Capacity planning for growth. SLA denition and monitoring.
6.2 Week 17: Monitoring, Observability Debugging
LLM systems require specialized monitoring beyond traditional application observability. This
week covers the unique challenges of LLM operations.
ˆ LLM-specic metrics: Token throughput, latency distributions, cost per request, cache
hit rates, error rates by type
ˆ Quality monitoring: Tracking output quality over time, drift detection, automated qual-
ity scoring, regression alerts
ˆ Tracing and debugging: End-to-end request tracing, prompt version tracking, debugging
multi-step chains, root cause analysis
Page 18

ˆ Logging best practices: What to log (prompts, responses, latencies), PII handling in
logs, log retention policies
ˆ Observability platforms: LangSmith, Weights Biases, Arize, custom solutions. What
each provides, integration patterns
ˆ Alerting strategies: Dening alert thresholds for LLM systems, avoiding alert fatigue,
on-call considerations
6.2.2 Technical Lab
Build a complete observability stack for an LLM application: metrics collection, tracing, log-
ging, quality monitoring, and alerting. Create dashboards for operational visibility. Implement
automated quality regression detection.
Dening SLOs for LLM systems. Incident management processes. On-call structures for AI
systems. Building operational runbooks. Communicating system health to stakeholders.
6.3 Week 18: Data Governance, Security Compliance
Enterprise LLM deployment requires careful attention to data handling, security, and regulatory
compliance.
ˆ Data residency: Where data is processed and stored, geographic requirements, cross-
border data transfer implications
ˆ PII handling: Detection, masking, anonymization, pseudonymization. Handling PII in
prompts, responses, and logs
ˆ Access control: Authentication, authorization, row-level security for RAG, audit trails
ˆ Regulatory frameworks: GDPR, CCPA, HIPAA, SOC2, PCI-DSS. What each requires
for AI systems
ˆ AI-specic regulations: EU AI Act, emerging US regulations, industry-specic require-
ments (nance, healthcare)
ˆ Vendor security assessment: Evaluating API provider security, data handling agree-
ments, right to audit
6.3.2 Technical Lab
Build a compliance-aware gateway: PII detection and masking, audit logging, access control
integration, data residency enforcement. Implement prompt and response ltering for sensitive
content.
Designing compliance checkpoints for GenAI workows. Working with legal and compliance
teams. Documentation requirements. Preparing for audits. Vendor risk management for AI
providers.
Page 19

6.4 Week 19: Agentic Systems
Agentic systemsLLMs that plan, use tools, and take autonomous actionsare increasingly
important but require careful design. This week provides a deep dive rather than surface cover-
age.
ˆ Agent architectures: ReAct (Reasoning + Acting), Plan-and-Execute, Tree of Thoughts.
When each pattern is appropriate
ˆ Agent frameworks: LangGraph, AutoGen, CrewAI, custom implementations. Frame-
work selection criteria, avoiding over-abstraction
ˆ State management: Tracking agent state across steps, persistence, recovery from failures,
conversation memory
ˆ Multi-agent systems: Agent coordination, task delegation, conict resolution, communi-
cation protocols
ˆ Guarded autonomy: Setting boundaries on agent actions, approval workows, rollback
mechanisms, kill switches
ˆ Agent failure modes: Innite loops, goal drift, unintended actions, cascading failures.
Detection and mitigation
6.4.2 Technical Lab
Build a multi-agent system for a realistic enterprise workow (e.g., research assistant with plan-
ner, researcher, and writer agents). Implement state persistence, approval gates, and failure
recovery. Test with scenarios designed to trigger failure modes.
What tasks should never be delegated to autonomous AI. Liability considerations for agent
actions. Approval workows and human oversight requirements. Building organizational trust
in agentic systems incrementally.
6.5 Week 20: Enterprise Integrations
Connecting LLM systems to enterprise infrastructure: CRMs, ERPs, data warehouses, and
collaboration tools.
ˆ Integration patterns: REST APIs, GraphQL, message queues (Kafka, Pub/Sub), web-
hooks. Choosing the right pattern for each integration
ˆ Common enterprise systems: Salesforce, ServiceNow, SAP, Workday, Jira, Conuence.
Integration approaches for each
ˆ Data synchronization: Keeping RAG indices in sync with source systems, change data
capture, incremental updates
ˆ Authentication and authorization: OAuth ows, service accounts, API key manage-
ment, propagating user identity
Page 20

ˆ Error handling and resilience: Handling downstream system failures, circuit breakers,
fallback strategies
ˆ Event-driven architectures: Triggering LLM workows from business events, async pro-
cessing, result delivery
6.5.2 Technical Lab
Build a full-stack LLM microservice that: ingests documents from a simulated enterprise system,
maintains a RAG index, responds to queries via REST API, and pushes results to a message
queue. Implement proper error handling and monitoring.
Building AI Centers of Excellence. Integration roadmap planning. Change management for AI-
augmented workows. Stakeholder communication across IT, business, and compliance teams.
Page 21

7 Phase VI: Governance, Risk Capstone
Weeks 2124
7.1 Week 21: AI Governance Frameworks
Sustainable AI deployment requires organizational governance structures. This week covers
building governance frameworks that enable responsible AI use.
ˆ Documentation standards: Model cards, system cards, datasheets for datasets. What
to document and why
ˆ Review processes: AI review boards, sign-o workows, risk scoring matrices, exception
processes
ˆ Risk classication: Categorizing AI use cases by risk level, tailoring controls to risk,
high-risk vs. low-risk workows
ˆ Policy development: Acceptable use policies, data usage policies, model training policies,
deployment policies
ˆ Vendor governance: Third-party AI risk management, contract requirements, ongoing
vendor monitoring
ˆ External stakeholders: Communicating AI governance to regulators, auditors, investors,
customers
7.1.2 Technical Lab
Create a complete governance package for an AI system: model card, system card, risk assess-
ment, and deployment checklist. Build a governance dashboard that tracks AI deployments
across the organization.
Creating governance frameworks that satisfy regulators, auditors, and boards. Balancing gover-
nance overhead with deployment velocity. Building governance that scales with AI adoption.
7.2 Week 22: AI Products Organizations
Building sustainable AI capability requires appropriate organizational structures, team compo-
sitions, and product thinking.
ˆ AI team structures: Centralized vs. embedded teams, MLOps/LLMOps roles, prompt
engineers, AI product managers. Skills and responsibilities for each
ˆ Build vs. buy decisions: When to use vendor solutions, when to build custom, hybrid
approaches. Decision frameworks
ˆ AI product development: User research for AI features, iterative development, handling
uncertainty in AI capabilities
Page 22

ˆ Success metrics: Dening and measuring AI product success, leading vs. lagging indica-
tors, avoiding vanity metrics
ˆ Change management: Introducing AI to existing workows, training end users, managing
resistance, celebrating wins
ˆ Multi-year planning: AI capability roadmaps, technology evolution, skill development,
budget planning
7.2.2 Technical Lab
Develop a comprehensive AI product proposal: problem denition, solution architecture, success
metrics, risk assessment, resource requirements, and rollout plan. Present to peer groups for
critique.
Executive-level planning for multi-year AI adoption. Building the business case for AI invest-
ment. Hiring and developing AI talent. Creating a culture of responsible AI innovation.
7.3 Weeks 2324: Capstone Project
The capstone integrates all program content into a complete, enterprise-ready GenAI system.
Teams work on realistic scenarios with full technical and governance requirements.
7.3.1 Capstone Requirements
Teams of 35 participants deliver an end-to-end solution including:
Technical Components:
ˆ Advanced RAG with hybrid search and reranking
ˆ Structured output parsing with validation
ˆ Function calling / tool use
ˆ Safety classiers and guardrails
ˆ API gateway with authentication
ˆ Comprehensive logging, monitoring, and alerting
ˆ Evaluation harness with regression testing
Governance Documentation:
ˆ System card with risk assessment
ˆ Compliance checklist
ˆ Deployment runbook
ˆ Incident response plan
Page 23

Business Case:
ˆ Cost-benet analysis
ˆ Success metrics and measurement plan
ˆ Rollout strategy
7.3.2 Capstone Presentation
Teams present to a panel simulating a CTO/board review:
ˆ Technical deep dive: Architecture decisions, implementation challenges, performance
results
ˆ Live demonstration: Working system handling realistic scenarios, including edge cases
ˆ Business justication: ROI analysis, risk mitigation, alignment with organizational goals
ˆ QA: Defending decisions, handling objections, demonstrating depth of understanding
Page 24

8 Appendix A: What Makes This Curriculum Rigorous
This curriculum is designed to produce genuine competence, not just exposure. Key design
principles:
ˆ Realistic pacing: 24 weeks allows topics to be covered at the depth required for production
deployment, rather than rushing through material
ˆ Prerequisite calibration: Week 0 diagnostic ensures participants have foundational skills
and provides remediation resources for those who need them
ˆ Failure modes rst: Week 11 covers failure modes before building RAG and ne-tuning
systems, ensuring defensive design thinking is applied throughout
ˆ Prompt engineering as a discipline: Dedicated weeks (78) treat prompt engineering
as a systematic practice, not an afterthought
ˆ Model selection framework: Week 9 provides structured approaches to vendor evalua-
tion rather than assuming model choice is given
ˆ Fine-tuning skepticism: Week 14 emphasizes when not to ne-tune, countering the
common tendency to over-apply this technique
ˆ Agentic systems depth: Week 19 provides a full week on agents rather than surface
coverage, reecting the complexity and risk of autonomous AI systems
ˆ Cost awareness throughout: Token costs, infrastructure costs, and ROI analysis are
integrated into technical discussions from Week 1
ˆ Governance integrated: Compliance and governance appear throughout, not just in a
nal governance week
ˆ Dual-track delity: Manager modules provide substantive content on architecture, cost,
and organizational decisionsnot watered-down versions of technical material
9 Appendix B: Suggested Reading Resources
9.1 Foundational Papers
ˆ Vaswani et al., Attention Is All You Need (2017)Transformer architecture
ˆ Brown et al., Language Models are Few-Shot Learners (2020)GPT-3 and in-context
learning
ˆ Wei et al., Chain-of-Thought Prompting (2022)Reasoning elicitation
ˆ Lewis et al., Retrieval-Augmented Generation (2020)RAG foundations
ˆ Hu et al., LoRA: Low-Rank Adaptation (2021)Parameter-ecient ne-tuning
9.2 Technical Resources
ˆ Anthropic's Claude documentation and cookbook
ˆ OpenAI's API documentation and best practices guides
ˆ LangChain and LlamaIndex documentation
ˆ vLLM documentation for inference optimization
ˆ Hugging Face Transformers library and model hub
Page 25

9.3 Enterprise Governance
ˆ NIST AI Risk Management Framework
ˆ EU AI Act text and implementation guidance
ˆ Model Cards for Model Reporting (Mitchell et al., 2019)
ˆ Datasheets for Datasets (Gebru et al., 2021)
10 Appendix C: Lab Environment Requirements
10.1 Compute Resources
ˆ Cloud GPU access: Minimum 1× A100 40GB for ne-tuning labs; 1× A10G or T4
sucient for inference labs
ˆ Local development: 16GB+ RAM, modern CPU, Docker support
ˆ API credits: $5001000 per participant for commercial API usage across the program
10.2 Software Stack
ˆ Python 3.10+
ˆ PyTorch 2.0+
ˆ Transformers, PEFT, BitsAndBytes for ne-tuning
ˆ LangChain or LlamaIndex for RAG
ˆ vLLM for inference
ˆ Docker, Kubernetes (Minikube or cloud-managed)
ˆ PostgreSQL with pgvector, or managed vector database
10.3 Accounts Required
ˆ OpenAI API account
ˆ Anthropic API account
ˆ Cloud provider account (AWS, GCP, or Azure)
ˆ Hugging Face account
ˆ GitHub account for version control
This curriculum was developed with input from enterprise AI practitioners,
reecting lessons learned from production deployments across industries.
Page 26

Curriculum: Generative AI for Enterprise Deployment

More Related Content

Similar to Curriculum: Generative AI for Enterprise Deployment

Recently uploaded

Curriculum: Generative AI for Enterprise Deployment