ENTERPRISE GENERATIVE AI
DEPLOYMENT CURRICULUM
A Rigorous & Practical Program
24-Week Comprehensive Training
Dual-Track: Technical Engineers & Technical Managers
Contents
1 Program Overview 3
1.1 Program Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Dual-Track Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Phase I: Foundations & Calibration 4
2.1 Week 0: Diagnostic & Cohort Calibration . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Diagnostic Assessment Components . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Remediation Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Week 1: The LLM Landscape & Mental Models . . . . . . . . . . . . . . . . . . . 4
2.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Week 2: Transformer Architecture Fundamentals . . . . . . . . . . . . . . . . . . 5
2.3.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Phase II: LLM Systems Engineering 6
3.1 Week 3: Tokenization & Context Management . . . . . . . . . . . . . . . . . . . . 6
3.1.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Week 4: Inference Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Week 5: Structured Outputs & Reliability . . . . . . . . . . . . . . . . . . . . . . 7
3.3.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4 Week 6: Function Calling & Tool Use . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 Phase III: Prompt Engineering & Model Selection 9
4.1 Week 7: Prompt Engineering Fundamentals . . . . . . . . . . . . . . . . . . . . . 9
4.1.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Week 8: Advanced Prompt Techniques . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3 Week 9: Model Selection & Vendor Evaluation . . . . . . . . . . . . . . . . . . . 10
4.3.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.4 Week 10: Evaluation & Testing Frameworks . . . . . . . . . . . . . . . . . . . . . 11
2
Enterprise Generative AI Deployment Curriculum
4.4.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.4.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.4.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5 Phase IV: Retrieval, Customization & Alignment 12
5.1 Week 11: Failure Modes & Defensive Design . . . . . . . . . . . . . . . . . . . . . 12
5.1.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.1.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.1.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2 Week 12: RAG Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3 Week 13: Advanced RAG & Hybrid Search . . . . . . . . . . . . . . . . . . . . . 13
5.3.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.4 Week 14: Fine-Tuning Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.4.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.4.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.4.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.5 Week 15: Safety, Alignment & Guardrails . . . . . . . . . . . . . . . . . . . . . . 14
5.5.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.5.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.5.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6 Phase V: Enterprise Deployment & Architecture 16
6.1 Week 16: Production Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.1.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.1.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.1.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.2 Week 17: Monitoring, Observability & Debugging . . . . . . . . . . . . . . . . . . 16
6.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.3 Week 18: Data Governance, Security & Compliance . . . . . . . . . . . . . . . . . 17
6.3.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.3.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.3.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.4 Week 19: Agentic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.4.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.4.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.4.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.5 Week 20: Enterprise Integrations . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.5.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.5.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.5.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7 Phase VI: Governance, Risk & Capstone 20
7.1 Week 21: AI Governance Frameworks . . . . . . . . . . . . . . . . . . . . . . . . 20
7.1.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.1.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Page 3
Enterprise Generative AI Deployment Curriculum
7.1.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.2 Week 22: AI Products & Organizations . . . . . . . . . . . . . . . . . . . . . . . 20
7.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.3 Weeks 2324: Capstone Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.3.1 Capstone Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.3.2 Capstone Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
8 Appendix A: What Makes This Curriculum Rigorous 23
9 Appendix B: Suggested Reading  Resources 23
9.1 Foundational Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
9.2 Technical Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
9.3 Enterprise  Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
10 Appendix C: Lab Environment Requirements 24
10.1 Compute Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
10.2 Software Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
10.3 Accounts Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Page 4
Enterprise Generative AI Deployment Curriculum
1 Program Overview
This 24-week program provides comprehensive training in enterprise LLM deployment, struc-
tured to build genuine competence rather than surface-level exposure. The expanded timeline
addresses the realistic complexity of modern AI systems while maintaining practical, hands-on
focus throughout.
1.1 Program Structure
The curriculum is organized into six phases, each building on the previous:
ˆ Phase I (Weeks 02): Foundations  Calibration
ˆ Phase II (Weeks 36): LLM Systems Engineering
ˆ Phase III (Weeks 710): Prompt Engineering  Model Selection
ˆ Phase IV (Weeks 1115): Retrieval, Customization  Alignment
ˆ Phase V (Weeks 1620): Enterprise Deployment  Architecture
ˆ Phase VI (Weeks 2124): Governance, Risk  Capstone
1.2 Dual-Track Design
Each week includes parallel tracks for dierent roles:
ˆ Technical Lab: Hands-on implementation exercises for engineers
ˆ Manager Module: Architecture decisions, cost modeling, compliance, and team structure
for technical leaders
1.3 Prerequisites
Participants should have:
ˆ Prociency in Python (comfortable with classes, decorators, async patterns)
ˆ Basic understanding of machine learning concepts (gradient descent, loss functions, overt-
ting)
ˆ Familiarity with REST APIs and microservice architectures
ˆ Experience with cloud platforms (AWS, GCP, or Azure)
ˆ Basic SQL and data manipulation skills
Page 5
Enterprise Generative AI Deployment Curriculum
2 Phase I: Foundations  Calibration
Weeks 02
2.1 Week 0: Diagnostic  Cohort Calibration
Before the program begins, participants complete a diagnostic assessment to calibrate cohort
composition and identify areas requiring additional support.
2.1.1 Diagnostic Assessment Components
ˆ Python prociency test: Async patterns, decorators, context managers, type hints
ˆ ML fundamentals quiz: Gradient descent mechanics, regularization, cross-validation,
bias-variance tradeo
ˆ Systems design exercise: Design a simple API with caching, rate limiting, and error
handling
ˆ LLM familiarity survey: Prior experience with ChatGPT, Claude, open-source models,
API usage
ˆ Role and goals questionnaire: Current responsibilities, deployment objectives, organi-
zational constraints
2.1.2 Remediation Resources
Based on diagnostic results, participants receive targeted preparation materials:
ˆ Python async/await tutorial and exercises
ˆ ML fundamentals refresher (Andrew Ng's Coursera modules or fast.ai Practical ML)
ˆ PyTorch basics: tensors, autograd, simple neural network training
ˆ Docker and Kubernetes fundamentals for those lacking container experience
2.2 Week 1: The LLM Landscape  Mental Models
Establish accurate mental models for what LLMs are and are not, how they dier from traditional
software, and where they fail.
2.2.1 Topics Covered
ˆ What LLMs actually do: Next-token prediction, pattern completion vs. reasoning, the
distinction between generating plausible text and computing correct answers
ˆ Capability taxonomy: What LLMs reliably do well (uent generation, translation, sum-
marization, code completion), what they do unreliably (math, logic, factual recall), what
they cannot do (real-time data, guaranteed correctness)
ˆ Failure modes overview: Hallucination, confabulation, instruction-following failures,
context window limitations, sensitivity to prompt phrasing
ˆ The API landscape: OpenAI, Anthropic, Google, Cohere, open-weight models (Llama,
Mistral, Qwen), when to use each
ˆ Cost structures: Token-based pricing, input vs. output tokens, cost estimation for realistic
workloads
Page 6
Enterprise Generative AI Deployment Curriculum
2.2.2 Technical Lab
Compare responses from 3+ models (GPT-4, Claude, Llama-3) on identical prompts across: fac-
tual questions with veriable answers, reasoning tasks, creative generation, and code generation.
Document failure patterns and model-specic behaviors.
2.2.3 Manager Module
Setting realistic expectations with stakeholders. How to communicate what LLMs can and
cannot do. Red ags in vendor claims. Building an initial cost model for a proposed use case.
2.3 Week 2: Transformer Architecture Fundamentals
Build intuition for how transformers work without requiring a full deep learning course. Focus
on the concepts that matter for deployment decisions.
2.3.1 Topics Covered
ˆ Attention mechanism intuition: What attention means, how tokens relate to each
other, why context windows exist
ˆ The embedding layer: Tokens as vectors, semantic similarity, why embeddings matter
for retrieval
ˆ Positional encoding: How models understand token order, rotary embeddings (RoPE),
implications for context length
ˆ Feed-forward networks and layers: What happens between attention layers, why depth
matters
ˆ Autoregressive generation: Why generation is sequential, why output length aects
latency, the temperature and top-p parameters
ˆ Model sizes and parameter counts: What 7B or 70B means, rough relationship
between size, capability, and cost
2.3.2 Technical Lab
Visualize attention patterns using BertViz or similar tools. Observe how attention changes with
dierent prompts. Experiment with temperature and sampling parameters to understand their
eects on output diversity and quality.
2.3.3 Manager Module
How to evaluate model architecture claims from vendors. Understanding the tradeos between
model size, speed, and capability. Making informed decisions about context window requirements
for your use cases.
Page 7
Enterprise Generative AI Deployment Curriculum
3 Phase II: LLM Systems Engineering
Weeks 36
3.1 Week 3: Tokenization  Context Management
Understand how text becomes tokens, why tokenization choices aect cost and performance,
and how to manage context windows eectively.
3.1.1 Topics Covered
ˆ Tokenization algorithms: BPE (Byte-Pair Encoding), SentencePiece, WordPiece, Uni-
gram LM. How each works, why dierent models use dierent tokenizers
ˆ Token eciency: Why some text tokenizes more eciently than others, implications for
non-English languages, code, and structured data
ˆ Context window anatomy: System prompt, conversation history, user input, model
output. How to allocate tokens across these components
ˆ Context management strategies: Sliding windows, summarization, hierarchical context,
when to truncate vs. summarize
ˆ Cost implications: How tokenization aects billing, optimizing prompts for cost without
sacricing quality
3.1.2 Technical Lab
Build a tokenizer visualization tool that shows token boundaries, counts, and costs for dierent
models. Compare tokenization eciency across: English prose, code, JSON, non-Latin scripts,
and domain-specic terminology. Implement a context manager that tracks token budgets and
handles overow gracefully.
3.1.3 Manager Module
Budgeting token costs for production workloads. Understanding how context window size aects
use case feasibility. Evaluating whether a use case requires long-context models (and the cost
implications).
3.2 Week 4: Inference Infrastructure
Understand how LLM inference works at the infrastructure level and the factors that determine
latency, throughput, and cost.
3.2.1 Topics Covered
ˆ GPU memory and computation: Why LLMs require GPUs, VRAM requirements for
dierent model sizes, the memory bandwidth bottleneck
ˆ KV-cache mechanics: What the key-value cache stores, why it grows with context length,
memory implications for long conversations
ˆ Batching strategies: Static vs. dynamic batching, continuous batching, how batching
aects latency vs. throughput tradeos
Page 8
Enterprise Generative AI Deployment Curriculum
ˆ Quantization: INT8, INT4, FP8, NF4, AWQ, GPTQ. What precision means, quality vs.
speed tradeos, when quantization is appropriate
ˆ Latency components: Time to rst token (TTFT), inter-token latency (ITL), total gen-
eration time. What aects each component
ˆ Serving frameworks: vLLM, TGI (Text Generation Inference), TensorRT-LLM, llama.cpp.
When to use each
3.2.2 Technical Lab
Deploy a model using vLLM. Measure TTFT and ITL under dierent load conditions. Exper-
iment with batch sizes and observe throughput changes. Compare quantized vs. full-precision
model performance. Prole GPU memory usage as context length increases.
3.2.3 Manager Module
Capacity planning for LLM workloads. Understanding SLAs: what latency guarantees are
achievable. Cost-performance tradeos for dierent hardware congurations. When to use
managed APIs vs. self-hosted infrastructure.
3.3 Week 5: Structured Outputs  Reliability
LLMs generate unstructured text by default. Production systems require structured, parseable
outputs. This week covers techniques for reliable structured generation.
3.3.1 Topics Covered
ˆ JSON mode and schema enforcement: Native JSON modes (OpenAI, Anthropic),
constrained decoding, grammar-based generation
ˆ Parsing strategies: Regex extraction, partial JSON parsing, streaming parsers, graceful
degradation when parsing fails
ˆ Retry and fallback patterns: Exponential backo, reformatting prompts on failure,
fallback to simpler output formats
ˆ Validation layers: Pydantic models, JSON Schema validation, custom validators for
domain-specic constraints
ˆ Streaming structured outputs: Incremental parsing, handling partial results, UI con-
siderations for streaming
ˆ Error budgets: Acceptable failure rates, monitoring parse failures, alerting thresholds
3.3.2 Technical Lab
Build a robust JSON extraction pipeline with: schema validation, streaming support, retry logic,
and fallback parsing. Test against adversarial inputs that typically break naive implementations.
Implement monitoring for parse failure rates.
3.3.3 Manager Module
Dening reliability requirements for LLM-powered features. Setting appropriate SLAs for struc-
tured output success rates. Understanding when best eort parsing is acceptable vs. when
guaranteed structure is required.
Page 9
Enterprise Generative AI Deployment Curriculum
3.4 Week 6: Function Calling  Tool Use
Enable LLMs to take actions in the world by calling functions and APIs. This is foundational
for agentic systems covered later.
3.4.1 Topics Covered
ˆ Function calling APIs: OpenAI function calling, Anthropic tool use, open-model ap-
proaches. Syntax and capabilities of each
ˆ Tool denition best practices: Clear descriptions, well-dened parameters, examples in
documentation, handling optional parameters
ˆ Execution safety: Sandboxing tool execution, permission models, preventing unintended
actions, audit logging
ˆ Multi-tool orchestration: When the model needs multiple tools, parallel vs. sequential
execution, handling tool dependencies
ˆ Error handling: What happens when tools fail, propagating errors to the model, retry
strategies
ˆ Human-in-the-loop patterns: Conrmation prompts for high-stakes actions, approval
workows, escalation paths
3.4.2 Technical Lab
Build a multi-tool agent that can: query a database, call external APIs, perform calculations,
and generate reports. Implement permission controls and audit logging. Test with adversarial
prompts that attempt to misuse tools.
3.4.3 Manager Module
Designing safe tool-calling systems for enterprise environments. Dening which actions require
human approval. Compliance implications of LLM-initiated actions. Audit and accountability
requirements.
Page 10
Enterprise Generative AI Deployment Curriculum
4 Phase III: Prompt Engineering  Model Selection
Weeks 710
4.1 Week 7: Prompt Engineering Fundamentals
Prompt engineering is the primary lever for controlling LLM behavior. This week covers sys-
tematic approaches to prompt design.
4.1.1 Topics Covered
ˆ Prompt anatomy: System prompts, user prompts, assistant prells. How each component
inuences model behavior
ˆ Instruction design: Clarity, specicity, avoiding ambiguity. Positive vs. negative instruc-
tions. Ordering eects
ˆ Few-shot prompting: Example selection, example ordering, format consistency, when
few-shot helps vs. hurts
ˆ Chain-of-thought (CoT): Eliciting reasoning steps, when CoT improves accuracy, when
it doesn't help, zero-shot CoT
ˆ Output formatting: Specifying format in prompts, XML tags, markdown structure, con-
trolling verbosity
ˆ Prompt sensitivity: How small changes aect outputs, strategies for robust prompts,
testing for sensitivity
4.1.2 Technical Lab
Develop prompts for a complex extraction task (e.g., extracting structured data from legal
documents). Iterate through multiple prompt versions, measuring accuracy on a held-out test
set. Document which techniques improved performance and which didn't.
4.1.3 Manager Module
Prompt engineering as an ongoing operational concern, not a one-time setup. Version control
and testing for prompts. Building prompt libraries and sharing across teams.
4.2 Week 8: Advanced Prompt Techniques
Beyond the basics: techniques for complex tasks, multi-step reasoning, and pushing model
capabilities.
4.2.1 Topics Covered
ˆ Self-consistency and majority voting: Sampling multiple responses, aggregating an-
swers, when this improves reliability
ˆ Decomposition strategies: Breaking complex tasks into subtasks, prompt chaining, man-
aging state across prompts
ˆ Reection and self-critique: Asking models to verify their own outputs, iterative rene-
ment, limitations of self-correction
Page 11
Enterprise Generative AI Deployment Curriculum
ˆ Role prompting: Persona assignment, expertise simulation, when role prompting helps
vs. introduces bias
ˆ Adversarial prompting: Jailbreaks, prompt injection, understanding attack vectors to
build defenses
ˆ Prompt compression: Reducing token usage while preserving eectiveness, distillation
of long prompts
4.2.2 Technical Lab
Implement a multi-step reasoning pipeline for a complex analytical task (e.g., nancial report
analysis with numerical reasoning). Compare single-prompt vs. chained-prompt approaches.
Implement self-consistency voting and measure accuracy improvements.
4.2.3 Manager Module
Cost implications of advanced prompting techniques (self-consistency requires N× API calls).
Latency tradeos for multi-step pipelines. When the complexity is worth it vs. when simpler
approaches suce.
4.3 Week 9: Model Selection  Vendor Evaluation
Choosing the right model for a use case is critical. This week provides frameworks for systematic
model evaluation and vendor comparison.
4.3.1 Topics Covered
ˆ The model landscape: GPT-4/4o, Claude 3.5/Opus, Gemini Pro/Ultra, Llama 3, Mis-
tral, Qwen, Command R+. Capabilities, pricing, and positioning of each
ˆ Benchmark interpretation: MMLU, HumanEval, GSM8K, HellaSwag. What bench-
marks measure, their limitations, why benchmark performance doesn't always predict pro-
duction performance
ˆ Task-specic evaluation: Building evaluation sets for your use case, avoiding contami-
nation, statistical signicance of comparisons
ˆ Cost-capability tradeos: When smaller/cheaper models suce, tiered model strategies,
routing between models
ˆ Vendor risk assessment: API stability, pricing changes, deprecation policies, data han-
dling practices, geographic availability
ˆ Open vs. closed models: Control, customization, and compliance tradeos. When open-
weight models are appropriate
4.3.2 Technical Lab
Design and execute a model comparison for a specic use case. Build a test set with ground
truth labels. Evaluate 3+ models on accuracy, latency, and cost. Present ndings with statistical
condence intervals.
4.3.3 Manager Module
Building a model selection framework for your organization. Vendor negotiation and contract
considerations. Multi-vendor strategies to reduce lock-in. Procurement processes for AI services.
Page 12
Enterprise Generative AI Deployment Curriculum
4.4 Week 10: Evaluation  Testing Frameworks
Rigorous evaluation is essential for production LLM systems. This week covers building com-
prehensive testing infrastructure.
4.4.1 Topics Covered
ˆ Evaluation metrics: Exact match, BLEU, ROUGE, BERTScore, semantic similarity.
When each metric is appropriate and what each actually measures
ˆ LLM-as-judge: Using LLMs to evaluate LLM outputs, designing rubrics, calibration,
known biases (verbosity, position)
ˆ Human evaluation: When human eval is necessary, designing annotation tasks, inter-
rater reliability, cost-eective sampling strategies
ˆ Regression testing: Detecting quality degradation over time, building regression suites,
automated alerts
ˆ A/B testing for LLMs: Statistical considerations, handling high variance in LLM out-
puts, minimum sample sizes
ˆ Red teaming: Systematic adversarial testing, coverage strategies, tracking and triaging
ndings
4.4.2 Technical Lab
Build a comprehensive evaluation harness including: automated metrics, LLM-as-judge scoring,
regression tracking, and alerting. Integrate with CI/CD to run evaluations on prompt or model
changes.
4.4.3 Manager Module
Operationalizing continuous evaluation. Building evaluation review boards. Dening quality
gates for deployment. Communicating evaluation results to non-technical stakeholders.
Page 13
Enterprise Generative AI Deployment Curriculum
5 Phase IV: Retrieval, Customization  Alignment
Weeks 1115
5.1 Week 11: Failure Modes  Defensive Design
Before building retrieval and customization systems, understand how they fail. This week's
placement ensures defensive thinking is applied to all subsequent work.
5.1.1 Topics Covered
ˆ Hallucination taxonomy: Factual errors, entity confusion, temporal confusion, fabricated
citations, condent incorrectness. Causes of each type
ˆ Retrieval failures: Missing relevant documents, retrieving irrelevant documents, retrieval
contamination, embedding space limitations
ˆ Context poisoning: When retrieved context leads to worse outputs, contradictory sources,
outdated information
ˆ Prompt injection and jailbreaks: Attack vectors, real-world examples, why these are
hard to fully prevent
ˆ Semantic drift: How model behavior changes over time, API updates, silent regressions
ˆ Defensive design patterns: Graceful degradation, uncertainty communication, human
escalation triggers, output validation
5.1.2 Technical Lab
Build a failure detection system: classify model outputs for hallucination indicators, implement
uncertainty scoring, create alerting for anomalous outputs. Red-team your own system and
document attack vectors.
5.1.3 Manager Module
Incident response planning for LLM failures. Dening severity levels. Communication templates
for stakeholders. Post-mortem processes. Building organizational awareness of failure modes.
5.2 Week 12: RAG Foundations
Retrieval-Augmented Generation (RAG) is the most common pattern for grounding LLMs in
organizational knowledge. This week covers the fundamentals.
5.2.1 Topics Covered
ˆ Embedding models: OpenAI embeddings, Cohere Embed, open-source alternatives (E5,
BGE, GTE). Dimensionality, performance, and cost tradeos
ˆ Similarity metrics: Cosine similarity, dot product, Euclidean distance. When each is
appropriate, normalization considerations
ˆ Chunking strategies: Fixed-size chunks, semantic chunking, sentence-based, paragraph-
based, document structure-aware chunking
ˆ Vector databases: Pinecone, Weaviate, Qdrant, Milvus, pgvector. Architecture, indexing
algorithms (HNSW, IVF), managed vs. self-hosted
Page 14
Enterprise Generative AI Deployment Curriculum
ˆ Basic RAG pipeline: Query embedding → similarity search → context injection →
generation. End-to-end implementation
ˆ RAG evaluation: Retrieval quality (recall@k, MRR), end-to-end accuracy, attribution
correctness
5.2.2 Technical Lab
Build a complete RAG pipeline: document ingestion, chunking, embedding, vector storage,
retrieval, and generation. Benchmark retrieval quality with dierent chunking strategies and
embedding models on a labeled dataset.
5.2.3 Manager Module
RAG infrastructure decisions: vector database selection, managed vs. self-hosted, cost modeling.
Data ingestion pipelines and update strategies. When RAG is the right solution vs. alternatives.
5.3 Week 13: Advanced RAG  Hybrid Search
Production RAG systems require more than basic semantic search. This week covers techniques
that signicantly improve retrieval quality.
5.3.1 Topics Covered
ˆ Hybrid search: Combining vector search with keyword search (BM25), fusion strategies,
when hybrid outperforms pure semantic search
ˆ Query transformation: Query expansion, hypothetical document embeddings (HyDE),
query decomposition for complex questions
ˆ Reranking: Cross-encoder rerankers (Cohere Rerank, BGE Reranker), ColBERT, when
reranking provides lift
ˆ Hierarchical retrieval: Document summaries → full documents, multi-stage retrieval,
parent-child chunk relationships
ˆ Structured data retrieval: Text-to-SQL, knowledge graphs, combining structured and
unstructured retrieval
ˆ Metadata ltering: Using document metadata to scope searches, access control integra-
tion, temporal ltering
5.3.2 Technical Lab
Extend the Week 12 RAG pipeline with: hybrid search, reranking, and query expansion. Measure
precision/recall improvements at each stage. Implement a text-to-SQL component for structured
data queries.
5.3.3 Manager Module
Justifying infrastructure complexity: when advanced RAG techniques are worth the added cost
and maintenance. Performance vs. cost tradeos for reranking. Build vs. buy decisions for
search infrastructure.
Page 15
Enterprise Generative AI Deployment Curriculum
5.4 Week 14: Fine-Tuning Fundamentals
Fine-tuning is often overused. This week covers when ne-tuning is appropriate and how to do
it correctly when it is.
5.4.1 Topics Covered
ˆ When to ne-tune: Style/format adaptation, specialized domains, performance-critical
latency reduction. When RAG or prompting suces instead
ˆ When NOT to ne-tune: Adding factual knowledge (use RAG), quick experiments,
rapidly changing information, limited data
ˆ Fine-tuning methods: Full ne-tuning, LoRA, QLoRA, DoRA. Parameter eciency,
memory requirements, quality tradeos
ˆ Data preparation: Instruction format, quality over quantity, data cleaning, deduplication,
balance across task types
ˆ Training dynamics: Learning rates, batch sizes, epochs, early stopping, catastrophic
forgetting, overtting detection
ˆ Evaluation during training: Held-out test sets, monitoring for capability regression,
comparing to base model
5.4.2 Technical Lab
Fine-tune a small model (7B parameters) using QLoRA for a domain-specic task. Prepare a
high-quality training dataset. Compare ne-tuned model performance against base model +
few-shot prompting + RAG approaches.
5.4.3 Manager Module
ROI analysis for ne-tuning projects. GPU compute budgeting. Maintenance burden of ne-
tuned models. When to use vendor ne-tuning APIs vs. self-managed training.
5.5 Week 15: Safety, Alignment  Guardrails
Deploying LLMs in production requires systematic approaches to safety and alignment with
organizational values.
5.5.1 Topics Covered
ˆ Alignment techniques: RLHF (Reinforcement Learning from Human Feedback), RLAIF,
Constitutional AI, DPO (Direct Preference Optimization)
ˆ Content moderation: Input ltering, output ltering, content classiers, toxicity detec-
tion, PII detection
ˆ Guardrail frameworks: NeMo Guardrails, Guardrails AI, custom rule engines. Archi-
tecture patterns for guardrail integration
ˆ Jailbreak mitigation: Known attack patterns, defense strategies, why perfect defense is
impossible, monitoring for novel attacks
ˆ Topic and scope control: Keeping models on-topic, handling out-of-scope requests grace-
fully, dening boundaries
Page 16
Enterprise Generative AI Deployment Curriculum
ˆ Bias and fairness: Testing for demographic biases, mitigation strategies, documentation
requirements
5.5.2 Technical Lab
Build a safety pipeline including: input classier, topic guardrails, output moderation, and PII
scrubbing. Red-team the system with jailbreak attempts. Implement monitoring and alerting
for safety violations.
5.5.3 Manager Module
Drafting an AI Safety Review Process for enterprise deployment. Dening acceptable use poli-
cies. Establishing review boards. Incident response for safety violations. Communicating safety
posture to stakeholders.
Page 17
Enterprise Generative AI Deployment Curriculum
6 Phase V: Enterprise Deployment  Architecture
Weeks 1620
6.1 Week 16: Production Infrastructure
Moving from prototype to production requires robust infrastructure. This week covers the
systems engineering required for reliable LLM deployments.
6.1.1 Topics Covered
ˆ Orchestration platforms: Kubernetes for LLM workloads, Ray Serve, Modal, Anyscale.
Scheduling and resource management for GPU workloads
ˆ Inference optimization: vLLM, TensorRT-LLM, speculative decoding, continuous batch-
ing. Maximizing throughput while meeting latency SLAs
ˆ Scaling patterns: Horizontal scaling, tensor parallelism, pipeline parallelism. Auto-
scaling based on load
ˆ Load balancing: Request routing for LLM services, session anity for stateful conversa-
tions, handling long-running requests
ˆ Caching strategies: Semantic caching, embedding caches, response caches. Cache invali-
dation for dynamic content
ˆ Cost optimization: Spot instances for batch workloads, reserved capacity planning, model
routing to optimize cost/quality
6.1.2 Technical Lab
Deploy vLLM on Kubernetes with auto-scaling. Implement load testing with realistic trac
patterns. Measure and optimize for latency percentiles (p50, p95, p99). Implement semantic
caching and measure cache hit rates.
6.1.3 Manager Module
TCO modeling: on-premises vs. cloud vs. hybrid. Reserved vs. on-demand pricing strategies.
Capacity planning for growth. SLA denition and monitoring.
6.2 Week 17: Monitoring, Observability  Debugging
LLM systems require specialized monitoring beyond traditional application observability. This
week covers the unique challenges of LLM operations.
6.2.1 Topics Covered
ˆ LLM-specic metrics: Token throughput, latency distributions, cost per request, cache
hit rates, error rates by type
ˆ Quality monitoring: Tracking output quality over time, drift detection, automated qual-
ity scoring, regression alerts
ˆ Tracing and debugging: End-to-end request tracing, prompt version tracking, debugging
multi-step chains, root cause analysis
Page 18
Enterprise Generative AI Deployment Curriculum
ˆ Logging best practices: What to log (prompts, responses, latencies), PII handling in
logs, log retention policies
ˆ Observability platforms: LangSmith, Weights  Biases, Arize, custom solutions. What
each provides, integration patterns
ˆ Alerting strategies: Dening alert thresholds for LLM systems, avoiding alert fatigue,
on-call considerations
6.2.2 Technical Lab
Build a complete observability stack for an LLM application: metrics collection, tracing, log-
ging, quality monitoring, and alerting. Create dashboards for operational visibility. Implement
automated quality regression detection.
6.2.3 Manager Module
Dening SLOs for LLM systems. Incident management processes. On-call structures for AI
systems. Building operational runbooks. Communicating system health to stakeholders.
6.3 Week 18: Data Governance, Security  Compliance
Enterprise LLM deployment requires careful attention to data handling, security, and regulatory
compliance.
6.3.1 Topics Covered
ˆ Data residency: Where data is processed and stored, geographic requirements, cross-
border data transfer implications
ˆ PII handling: Detection, masking, anonymization, pseudonymization. Handling PII in
prompts, responses, and logs
ˆ Access control: Authentication, authorization, row-level security for RAG, audit trails
ˆ Regulatory frameworks: GDPR, CCPA, HIPAA, SOC2, PCI-DSS. What each requires
for AI systems
ˆ AI-specic regulations: EU AI Act, emerging US regulations, industry-specic require-
ments (nance, healthcare)
ˆ Vendor security assessment: Evaluating API provider security, data handling agree-
ments, right to audit
6.3.2 Technical Lab
Build a compliance-aware gateway: PII detection and masking, audit logging, access control
integration, data residency enforcement. Implement prompt and response ltering for sensitive
content.
6.3.3 Manager Module
Designing compliance checkpoints for GenAI workows. Working with legal and compliance
teams. Documentation requirements. Preparing for audits. Vendor risk management for AI
providers.
Page 19
Enterprise Generative AI Deployment Curriculum
6.4 Week 19: Agentic Systems
Agentic systemsLLMs that plan, use tools, and take autonomous actionsare increasingly
important but require careful design. This week provides a deep dive rather than surface cover-
age.
6.4.1 Topics Covered
ˆ Agent architectures: ReAct (Reasoning + Acting), Plan-and-Execute, Tree of Thoughts.
When each pattern is appropriate
ˆ Agent frameworks: LangGraph, AutoGen, CrewAI, custom implementations. Frame-
work selection criteria, avoiding over-abstraction
ˆ State management: Tracking agent state across steps, persistence, recovery from failures,
conversation memory
ˆ Multi-agent systems: Agent coordination, task delegation, conict resolution, communi-
cation protocols
ˆ Guarded autonomy: Setting boundaries on agent actions, approval workows, rollback
mechanisms, kill switches
ˆ Agent failure modes: Innite loops, goal drift, unintended actions, cascading failures.
Detection and mitigation
6.4.2 Technical Lab
Build a multi-agent system for a realistic enterprise workow (e.g., research assistant with plan-
ner, researcher, and writer agents). Implement state persistence, approval gates, and failure
recovery. Test with scenarios designed to trigger failure modes.
6.4.3 Manager Module
What tasks should never be delegated to autonomous AI. Liability considerations for agent
actions. Approval workows and human oversight requirements. Building organizational trust
in agentic systems incrementally.
6.5 Week 20: Enterprise Integrations
Connecting LLM systems to enterprise infrastructure: CRMs, ERPs, data warehouses, and
collaboration tools.
6.5.1 Topics Covered
ˆ Integration patterns: REST APIs, GraphQL, message queues (Kafka, Pub/Sub), web-
hooks. Choosing the right pattern for each integration
ˆ Common enterprise systems: Salesforce, ServiceNow, SAP, Workday, Jira, Conuence.
Integration approaches for each
ˆ Data synchronization: Keeping RAG indices in sync with source systems, change data
capture, incremental updates
ˆ Authentication and authorization: OAuth ows, service accounts, API key manage-
ment, propagating user identity
Page 20
Enterprise Generative AI Deployment Curriculum
ˆ Error handling and resilience: Handling downstream system failures, circuit breakers,
fallback strategies
ˆ Event-driven architectures: Triggering LLM workows from business events, async pro-
cessing, result delivery
6.5.2 Technical Lab
Build a full-stack LLM microservice that: ingests documents from a simulated enterprise system,
maintains a RAG index, responds to queries via REST API, and pushes results to a message
queue. Implement proper error handling and monitoring.
6.5.3 Manager Module
Building AI Centers of Excellence. Integration roadmap planning. Change management for AI-
augmented workows. Stakeholder communication across IT, business, and compliance teams.
Page 21
Enterprise Generative AI Deployment Curriculum
7 Phase VI: Governance, Risk  Capstone
Weeks 2124
7.1 Week 21: AI Governance Frameworks
Sustainable AI deployment requires organizational governance structures. This week covers
building governance frameworks that enable responsible AI use.
7.1.1 Topics Covered
ˆ Documentation standards: Model cards, system cards, datasheets for datasets. What
to document and why
ˆ Review processes: AI review boards, sign-o workows, risk scoring matrices, exception
processes
ˆ Risk classication: Categorizing AI use cases by risk level, tailoring controls to risk,
high-risk vs. low-risk workows
ˆ Policy development: Acceptable use policies, data usage policies, model training policies,
deployment policies
ˆ Vendor governance: Third-party AI risk management, contract requirements, ongoing
vendor monitoring
ˆ External stakeholders: Communicating AI governance to regulators, auditors, investors,
customers
7.1.2 Technical Lab
Create a complete governance package for an AI system: model card, system card, risk assess-
ment, and deployment checklist. Build a governance dashboard that tracks AI deployments
across the organization.
7.1.3 Manager Module
Creating governance frameworks that satisfy regulators, auditors, and boards. Balancing gover-
nance overhead with deployment velocity. Building governance that scales with AI adoption.
7.2 Week 22: AI Products  Organizations
Building sustainable AI capability requires appropriate organizational structures, team compo-
sitions, and product thinking.
7.2.1 Topics Covered
ˆ AI team structures: Centralized vs. embedded teams, MLOps/LLMOps roles, prompt
engineers, AI product managers. Skills and responsibilities for each
ˆ Build vs. buy decisions: When to use vendor solutions, when to build custom, hybrid
approaches. Decision frameworks
ˆ AI product development: User research for AI features, iterative development, handling
uncertainty in AI capabilities
Page 22
Enterprise Generative AI Deployment Curriculum
ˆ Success metrics: Dening and measuring AI product success, leading vs. lagging indica-
tors, avoiding vanity metrics
ˆ Change management: Introducing AI to existing workows, training end users, managing
resistance, celebrating wins
ˆ Multi-year planning: AI capability roadmaps, technology evolution, skill development,
budget planning
7.2.2 Technical Lab
Develop a comprehensive AI product proposal: problem denition, solution architecture, success
metrics, risk assessment, resource requirements, and rollout plan. Present to peer groups for
critique.
7.2.3 Manager Module
Executive-level planning for multi-year AI adoption. Building the business case for AI invest-
ment. Hiring and developing AI talent. Creating a culture of responsible AI innovation.
7.3 Weeks 2324: Capstone Project
The capstone integrates all program content into a complete, enterprise-ready GenAI system.
Teams work on realistic scenarios with full technical and governance requirements.
7.3.1 Capstone Requirements
Teams of 35 participants deliver an end-to-end solution including:
Technical Components:
ˆ Advanced RAG with hybrid search and reranking
ˆ Structured output parsing with validation
ˆ Function calling / tool use
ˆ Safety classiers and guardrails
ˆ API gateway with authentication
ˆ Comprehensive logging, monitoring, and alerting
ˆ Evaluation harness with regression testing
Governance Documentation:
ˆ System card with risk assessment
ˆ Compliance checklist
ˆ Deployment runbook
ˆ Incident response plan
Page 23
Enterprise Generative AI Deployment Curriculum
Business Case:
ˆ Cost-benet analysis
ˆ Success metrics and measurement plan
ˆ Rollout strategy
7.3.2 Capstone Presentation
Teams present to a panel simulating a CTO/board review:
ˆ Technical deep dive: Architecture decisions, implementation challenges, performance
results
ˆ Live demonstration: Working system handling realistic scenarios, including edge cases
ˆ Business justication: ROI analysis, risk mitigation, alignment with organizational goals
ˆ QA: Defending decisions, handling objections, demonstrating depth of understanding
Page 24
Enterprise Generative AI Deployment Curriculum
8 Appendix A: What Makes This Curriculum Rigorous
This curriculum is designed to produce genuine competence, not just exposure. Key design
principles:
ˆ Realistic pacing: 24 weeks allows topics to be covered at the depth required for production
deployment, rather than rushing through material
ˆ Prerequisite calibration: Week 0 diagnostic ensures participants have foundational skills
and provides remediation resources for those who need them
ˆ Failure modes rst: Week 11 covers failure modes before building RAG and ne-tuning
systems, ensuring defensive design thinking is applied throughout
ˆ Prompt engineering as a discipline: Dedicated weeks (78) treat prompt engineering
as a systematic practice, not an afterthought
ˆ Model selection framework: Week 9 provides structured approaches to vendor evalua-
tion rather than assuming model choice is given
ˆ Fine-tuning skepticism: Week 14 emphasizes when not to ne-tune, countering the
common tendency to over-apply this technique
ˆ Agentic systems depth: Week 19 provides a full week on agents rather than surface
coverage, reecting the complexity and risk of autonomous AI systems
ˆ Cost awareness throughout: Token costs, infrastructure costs, and ROI analysis are
integrated into technical discussions from Week 1
ˆ Governance integrated: Compliance and governance appear throughout, not just in a
nal governance week
ˆ Dual-track delity: Manager modules provide substantive content on architecture, cost,
and organizational decisionsnot watered-down versions of technical material
9 Appendix B: Suggested Reading  Resources
9.1 Foundational Papers
ˆ Vaswani et al., Attention Is All You Need (2017)Transformer architecture
ˆ Brown et al., Language Models are Few-Shot Learners (2020)GPT-3 and in-context
learning
ˆ Wei et al., Chain-of-Thought Prompting (2022)Reasoning elicitation
ˆ Lewis et al., Retrieval-Augmented Generation (2020)RAG foundations
ˆ Hu et al., LoRA: Low-Rank Adaptation (2021)Parameter-ecient ne-tuning
9.2 Technical Resources
ˆ Anthropic's Claude documentation and cookbook
ˆ OpenAI's API documentation and best practices guides
ˆ LangChain and LlamaIndex documentation
ˆ vLLM documentation for inference optimization
ˆ Hugging Face Transformers library and model hub
Page 25
Enterprise Generative AI Deployment Curriculum
9.3 Enterprise  Governance
ˆ NIST AI Risk Management Framework
ˆ EU AI Act text and implementation guidance
ˆ Model Cards for Model Reporting (Mitchell et al., 2019)
ˆ Datasheets for Datasets (Gebru et al., 2021)
10 Appendix C: Lab Environment Requirements
10.1 Compute Resources
ˆ Cloud GPU access: Minimum 1× A100 40GB for ne-tuning labs; 1× A10G or T4
sucient for inference labs
ˆ Local development: 16GB+ RAM, modern CPU, Docker support
ˆ API credits: $5001000 per participant for commercial API usage across the program
10.2 Software Stack
ˆ Python 3.10+
ˆ PyTorch 2.0+
ˆ Transformers, PEFT, BitsAndBytes for ne-tuning
ˆ LangChain or LlamaIndex for RAG
ˆ vLLM for inference
ˆ Docker, Kubernetes (Minikube or cloud-managed)
ˆ PostgreSQL with pgvector, or managed vector database
10.3 Accounts Required
ˆ OpenAI API account
ˆ Anthropic API account
ˆ Cloud provider account (AWS, GCP, or Azure)
ˆ Hugging Face account
ˆ GitHub account for version control
This curriculum was developed with input from enterprise AI practitioners,
reecting lessons learned from production deployments across industries.
Page 26

Curriculum: Generative AI for Enterprise Deployment

  • 1.
    ENTERPRISE GENERATIVE AI DEPLOYMENTCURRICULUM A Rigorous & Practical Program 24-Week Comprehensive Training Dual-Track: Technical Engineers & Technical Managers
  • 2.
    Contents 1 Program Overview3 1.1 Program Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Dual-Track Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Phase I: Foundations & Calibration 4 2.1 Week 0: Diagnostic & Cohort Calibration . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Diagnostic Assessment Components . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 Remediation Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Week 1: The LLM Landscape & Mental Models . . . . . . . . . . . . . . . . . . . 4 2.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Week 2: Transformer Architecture Fundamentals . . . . . . . . . . . . . . . . . . 5 2.3.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Phase II: LLM Systems Engineering 6 3.1 Week 3: Tokenization & Context Management . . . . . . . . . . . . . . . . . . . . 6 3.1.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 Week 4: Inference Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Week 5: Structured Outputs & Reliability . . . . . . . . . . . . . . . . . . . . . . 7 3.3.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.4 Week 6: Function Calling & Tool Use . . . . . . . . . . . . . . . . . . . . . . . . 8 3.4.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.4.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.4.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4 Phase III: Prompt Engineering & Model Selection 9 4.1 Week 7: Prompt Engineering Fundamentals . . . . . . . . . . . . . . . . . . . . . 9 4.1.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2 Week 8: Advanced Prompt Techniques . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.3 Week 9: Model Selection & Vendor Evaluation . . . . . . . . . . . . . . . . . . . 10 4.3.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.3.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.3.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.4 Week 10: Evaluation & Testing Frameworks . . . . . . . . . . . . . . . . . . . . . 11 2
  • 3.
    Enterprise Generative AIDeployment Curriculum 4.4.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.4.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.4.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5 Phase IV: Retrieval, Customization & Alignment 12 5.1 Week 11: Failure Modes & Defensive Design . . . . . . . . . . . . . . . . . . . . . 12 5.1.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.1.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.1.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.2 Week 12: RAG Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.3 Week 13: Advanced RAG & Hybrid Search . . . . . . . . . . . . . . . . . . . . . 13 5.3.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.3.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.3.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.4 Week 14: Fine-Tuning Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.4.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.4.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.4.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.5 Week 15: Safety, Alignment & Guardrails . . . . . . . . . . . . . . . . . . . . . . 14 5.5.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.5.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.5.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 6 Phase V: Enterprise Deployment & Architecture 16 6.1 Week 16: Production Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6.1.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6.1.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6.1.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6.2 Week 17: Monitoring, Observability & Debugging . . . . . . . . . . . . . . . . . . 16 6.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.3 Week 18: Data Governance, Security & Compliance . . . . . . . . . . . . . . . . . 17 6.3.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.3.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.3.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.4 Week 19: Agentic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 6.4.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 6.4.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 6.4.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 6.5 Week 20: Enterprise Integrations . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 6.5.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 6.5.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6.5.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7 Phase VI: Governance, Risk & Capstone 20 7.1 Week 21: AI Governance Frameworks . . . . . . . . . . . . . . . . . . . . . . . . 20 7.1.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 7.1.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Page 3
  • 4.
    Enterprise Generative AIDeployment Curriculum 7.1.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 7.2 Week 22: AI Products & Organizations . . . . . . . . . . . . . . . . . . . . . . . 20 7.2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 7.2.2 Technical Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 7.2.3 Manager Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 7.3 Weeks 2324: Capstone Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 7.3.1 Capstone Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 7.3.2 Capstone Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 8 Appendix A: What Makes This Curriculum Rigorous 23 9 Appendix B: Suggested Reading Resources 23 9.1 Foundational Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 9.2 Technical Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 9.3 Enterprise Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 10 Appendix C: Lab Environment Requirements 24 10.1 Compute Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 10.2 Software Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 10.3 Accounts Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Page 4
  • 5.
    Enterprise Generative AIDeployment Curriculum 1 Program Overview This 24-week program provides comprehensive training in enterprise LLM deployment, struc- tured to build genuine competence rather than surface-level exposure. The expanded timeline addresses the realistic complexity of modern AI systems while maintaining practical, hands-on focus throughout. 1.1 Program Structure The curriculum is organized into six phases, each building on the previous: ˆ Phase I (Weeks 02): Foundations Calibration ˆ Phase II (Weeks 36): LLM Systems Engineering ˆ Phase III (Weeks 710): Prompt Engineering Model Selection ˆ Phase IV (Weeks 1115): Retrieval, Customization Alignment ˆ Phase V (Weeks 1620): Enterprise Deployment Architecture ˆ Phase VI (Weeks 2124): Governance, Risk Capstone 1.2 Dual-Track Design Each week includes parallel tracks for dierent roles: ˆ Technical Lab: Hands-on implementation exercises for engineers ˆ Manager Module: Architecture decisions, cost modeling, compliance, and team structure for technical leaders 1.3 Prerequisites Participants should have: ˆ Prociency in Python (comfortable with classes, decorators, async patterns) ˆ Basic understanding of machine learning concepts (gradient descent, loss functions, overt- ting) ˆ Familiarity with REST APIs and microservice architectures ˆ Experience with cloud platforms (AWS, GCP, or Azure) ˆ Basic SQL and data manipulation skills Page 5
  • 6.
    Enterprise Generative AIDeployment Curriculum 2 Phase I: Foundations Calibration Weeks 02 2.1 Week 0: Diagnostic Cohort Calibration Before the program begins, participants complete a diagnostic assessment to calibrate cohort composition and identify areas requiring additional support. 2.1.1 Diagnostic Assessment Components ˆ Python prociency test: Async patterns, decorators, context managers, type hints ˆ ML fundamentals quiz: Gradient descent mechanics, regularization, cross-validation, bias-variance tradeo ˆ Systems design exercise: Design a simple API with caching, rate limiting, and error handling ˆ LLM familiarity survey: Prior experience with ChatGPT, Claude, open-source models, API usage ˆ Role and goals questionnaire: Current responsibilities, deployment objectives, organi- zational constraints 2.1.2 Remediation Resources Based on diagnostic results, participants receive targeted preparation materials: ˆ Python async/await tutorial and exercises ˆ ML fundamentals refresher (Andrew Ng's Coursera modules or fast.ai Practical ML) ˆ PyTorch basics: tensors, autograd, simple neural network training ˆ Docker and Kubernetes fundamentals for those lacking container experience 2.2 Week 1: The LLM Landscape Mental Models Establish accurate mental models for what LLMs are and are not, how they dier from traditional software, and where they fail. 2.2.1 Topics Covered ˆ What LLMs actually do: Next-token prediction, pattern completion vs. reasoning, the distinction between generating plausible text and computing correct answers ˆ Capability taxonomy: What LLMs reliably do well (uent generation, translation, sum- marization, code completion), what they do unreliably (math, logic, factual recall), what they cannot do (real-time data, guaranteed correctness) ˆ Failure modes overview: Hallucination, confabulation, instruction-following failures, context window limitations, sensitivity to prompt phrasing ˆ The API landscape: OpenAI, Anthropic, Google, Cohere, open-weight models (Llama, Mistral, Qwen), when to use each ˆ Cost structures: Token-based pricing, input vs. output tokens, cost estimation for realistic workloads Page 6
  • 7.
    Enterprise Generative AIDeployment Curriculum 2.2.2 Technical Lab Compare responses from 3+ models (GPT-4, Claude, Llama-3) on identical prompts across: fac- tual questions with veriable answers, reasoning tasks, creative generation, and code generation. Document failure patterns and model-specic behaviors. 2.2.3 Manager Module Setting realistic expectations with stakeholders. How to communicate what LLMs can and cannot do. Red ags in vendor claims. Building an initial cost model for a proposed use case. 2.3 Week 2: Transformer Architecture Fundamentals Build intuition for how transformers work without requiring a full deep learning course. Focus on the concepts that matter for deployment decisions. 2.3.1 Topics Covered ˆ Attention mechanism intuition: What attention means, how tokens relate to each other, why context windows exist ˆ The embedding layer: Tokens as vectors, semantic similarity, why embeddings matter for retrieval ˆ Positional encoding: How models understand token order, rotary embeddings (RoPE), implications for context length ˆ Feed-forward networks and layers: What happens between attention layers, why depth matters ˆ Autoregressive generation: Why generation is sequential, why output length aects latency, the temperature and top-p parameters ˆ Model sizes and parameter counts: What 7B or 70B means, rough relationship between size, capability, and cost 2.3.2 Technical Lab Visualize attention patterns using BertViz or similar tools. Observe how attention changes with dierent prompts. Experiment with temperature and sampling parameters to understand their eects on output diversity and quality. 2.3.3 Manager Module How to evaluate model architecture claims from vendors. Understanding the tradeos between model size, speed, and capability. Making informed decisions about context window requirements for your use cases. Page 7
  • 8.
    Enterprise Generative AIDeployment Curriculum 3 Phase II: LLM Systems Engineering Weeks 36 3.1 Week 3: Tokenization Context Management Understand how text becomes tokens, why tokenization choices aect cost and performance, and how to manage context windows eectively. 3.1.1 Topics Covered ˆ Tokenization algorithms: BPE (Byte-Pair Encoding), SentencePiece, WordPiece, Uni- gram LM. How each works, why dierent models use dierent tokenizers ˆ Token eciency: Why some text tokenizes more eciently than others, implications for non-English languages, code, and structured data ˆ Context window anatomy: System prompt, conversation history, user input, model output. How to allocate tokens across these components ˆ Context management strategies: Sliding windows, summarization, hierarchical context, when to truncate vs. summarize ˆ Cost implications: How tokenization aects billing, optimizing prompts for cost without sacricing quality 3.1.2 Technical Lab Build a tokenizer visualization tool that shows token boundaries, counts, and costs for dierent models. Compare tokenization eciency across: English prose, code, JSON, non-Latin scripts, and domain-specic terminology. Implement a context manager that tracks token budgets and handles overow gracefully. 3.1.3 Manager Module Budgeting token costs for production workloads. Understanding how context window size aects use case feasibility. Evaluating whether a use case requires long-context models (and the cost implications). 3.2 Week 4: Inference Infrastructure Understand how LLM inference works at the infrastructure level and the factors that determine latency, throughput, and cost. 3.2.1 Topics Covered ˆ GPU memory and computation: Why LLMs require GPUs, VRAM requirements for dierent model sizes, the memory bandwidth bottleneck ˆ KV-cache mechanics: What the key-value cache stores, why it grows with context length, memory implications for long conversations ˆ Batching strategies: Static vs. dynamic batching, continuous batching, how batching aects latency vs. throughput tradeos Page 8
  • 9.
    Enterprise Generative AIDeployment Curriculum ˆ Quantization: INT8, INT4, FP8, NF4, AWQ, GPTQ. What precision means, quality vs. speed tradeos, when quantization is appropriate ˆ Latency components: Time to rst token (TTFT), inter-token latency (ITL), total gen- eration time. What aects each component ˆ Serving frameworks: vLLM, TGI (Text Generation Inference), TensorRT-LLM, llama.cpp. When to use each 3.2.2 Technical Lab Deploy a model using vLLM. Measure TTFT and ITL under dierent load conditions. Exper- iment with batch sizes and observe throughput changes. Compare quantized vs. full-precision model performance. Prole GPU memory usage as context length increases. 3.2.3 Manager Module Capacity planning for LLM workloads. Understanding SLAs: what latency guarantees are achievable. Cost-performance tradeos for dierent hardware congurations. When to use managed APIs vs. self-hosted infrastructure. 3.3 Week 5: Structured Outputs Reliability LLMs generate unstructured text by default. Production systems require structured, parseable outputs. This week covers techniques for reliable structured generation. 3.3.1 Topics Covered ˆ JSON mode and schema enforcement: Native JSON modes (OpenAI, Anthropic), constrained decoding, grammar-based generation ˆ Parsing strategies: Regex extraction, partial JSON parsing, streaming parsers, graceful degradation when parsing fails ˆ Retry and fallback patterns: Exponential backo, reformatting prompts on failure, fallback to simpler output formats ˆ Validation layers: Pydantic models, JSON Schema validation, custom validators for domain-specic constraints ˆ Streaming structured outputs: Incremental parsing, handling partial results, UI con- siderations for streaming ˆ Error budgets: Acceptable failure rates, monitoring parse failures, alerting thresholds 3.3.2 Technical Lab Build a robust JSON extraction pipeline with: schema validation, streaming support, retry logic, and fallback parsing. Test against adversarial inputs that typically break naive implementations. Implement monitoring for parse failure rates. 3.3.3 Manager Module Dening reliability requirements for LLM-powered features. Setting appropriate SLAs for struc- tured output success rates. Understanding when best eort parsing is acceptable vs. when guaranteed structure is required. Page 9
  • 10.
    Enterprise Generative AIDeployment Curriculum 3.4 Week 6: Function Calling Tool Use Enable LLMs to take actions in the world by calling functions and APIs. This is foundational for agentic systems covered later. 3.4.1 Topics Covered ˆ Function calling APIs: OpenAI function calling, Anthropic tool use, open-model ap- proaches. Syntax and capabilities of each ˆ Tool denition best practices: Clear descriptions, well-dened parameters, examples in documentation, handling optional parameters ˆ Execution safety: Sandboxing tool execution, permission models, preventing unintended actions, audit logging ˆ Multi-tool orchestration: When the model needs multiple tools, parallel vs. sequential execution, handling tool dependencies ˆ Error handling: What happens when tools fail, propagating errors to the model, retry strategies ˆ Human-in-the-loop patterns: Conrmation prompts for high-stakes actions, approval workows, escalation paths 3.4.2 Technical Lab Build a multi-tool agent that can: query a database, call external APIs, perform calculations, and generate reports. Implement permission controls and audit logging. Test with adversarial prompts that attempt to misuse tools. 3.4.3 Manager Module Designing safe tool-calling systems for enterprise environments. Dening which actions require human approval. Compliance implications of LLM-initiated actions. Audit and accountability requirements. Page 10
  • 11.
    Enterprise Generative AIDeployment Curriculum 4 Phase III: Prompt Engineering Model Selection Weeks 710 4.1 Week 7: Prompt Engineering Fundamentals Prompt engineering is the primary lever for controlling LLM behavior. This week covers sys- tematic approaches to prompt design. 4.1.1 Topics Covered ˆ Prompt anatomy: System prompts, user prompts, assistant prells. How each component inuences model behavior ˆ Instruction design: Clarity, specicity, avoiding ambiguity. Positive vs. negative instruc- tions. Ordering eects ˆ Few-shot prompting: Example selection, example ordering, format consistency, when few-shot helps vs. hurts ˆ Chain-of-thought (CoT): Eliciting reasoning steps, when CoT improves accuracy, when it doesn't help, zero-shot CoT ˆ Output formatting: Specifying format in prompts, XML tags, markdown structure, con- trolling verbosity ˆ Prompt sensitivity: How small changes aect outputs, strategies for robust prompts, testing for sensitivity 4.1.2 Technical Lab Develop prompts for a complex extraction task (e.g., extracting structured data from legal documents). Iterate through multiple prompt versions, measuring accuracy on a held-out test set. Document which techniques improved performance and which didn't. 4.1.3 Manager Module Prompt engineering as an ongoing operational concern, not a one-time setup. Version control and testing for prompts. Building prompt libraries and sharing across teams. 4.2 Week 8: Advanced Prompt Techniques Beyond the basics: techniques for complex tasks, multi-step reasoning, and pushing model capabilities. 4.2.1 Topics Covered ˆ Self-consistency and majority voting: Sampling multiple responses, aggregating an- swers, when this improves reliability ˆ Decomposition strategies: Breaking complex tasks into subtasks, prompt chaining, man- aging state across prompts ˆ Reection and self-critique: Asking models to verify their own outputs, iterative rene- ment, limitations of self-correction Page 11
  • 12.
    Enterprise Generative AIDeployment Curriculum ˆ Role prompting: Persona assignment, expertise simulation, when role prompting helps vs. introduces bias ˆ Adversarial prompting: Jailbreaks, prompt injection, understanding attack vectors to build defenses ˆ Prompt compression: Reducing token usage while preserving eectiveness, distillation of long prompts 4.2.2 Technical Lab Implement a multi-step reasoning pipeline for a complex analytical task (e.g., nancial report analysis with numerical reasoning). Compare single-prompt vs. chained-prompt approaches. Implement self-consistency voting and measure accuracy improvements. 4.2.3 Manager Module Cost implications of advanced prompting techniques (self-consistency requires N× API calls). Latency tradeos for multi-step pipelines. When the complexity is worth it vs. when simpler approaches suce. 4.3 Week 9: Model Selection Vendor Evaluation Choosing the right model for a use case is critical. This week provides frameworks for systematic model evaluation and vendor comparison. 4.3.1 Topics Covered ˆ The model landscape: GPT-4/4o, Claude 3.5/Opus, Gemini Pro/Ultra, Llama 3, Mis- tral, Qwen, Command R+. Capabilities, pricing, and positioning of each ˆ Benchmark interpretation: MMLU, HumanEval, GSM8K, HellaSwag. What bench- marks measure, their limitations, why benchmark performance doesn't always predict pro- duction performance ˆ Task-specic evaluation: Building evaluation sets for your use case, avoiding contami- nation, statistical signicance of comparisons ˆ Cost-capability tradeos: When smaller/cheaper models suce, tiered model strategies, routing between models ˆ Vendor risk assessment: API stability, pricing changes, deprecation policies, data han- dling practices, geographic availability ˆ Open vs. closed models: Control, customization, and compliance tradeos. When open- weight models are appropriate 4.3.2 Technical Lab Design and execute a model comparison for a specic use case. Build a test set with ground truth labels. Evaluate 3+ models on accuracy, latency, and cost. Present ndings with statistical condence intervals. 4.3.3 Manager Module Building a model selection framework for your organization. Vendor negotiation and contract considerations. Multi-vendor strategies to reduce lock-in. Procurement processes for AI services. Page 12
  • 13.
    Enterprise Generative AIDeployment Curriculum 4.4 Week 10: Evaluation Testing Frameworks Rigorous evaluation is essential for production LLM systems. This week covers building com- prehensive testing infrastructure. 4.4.1 Topics Covered ˆ Evaluation metrics: Exact match, BLEU, ROUGE, BERTScore, semantic similarity. When each metric is appropriate and what each actually measures ˆ LLM-as-judge: Using LLMs to evaluate LLM outputs, designing rubrics, calibration, known biases (verbosity, position) ˆ Human evaluation: When human eval is necessary, designing annotation tasks, inter- rater reliability, cost-eective sampling strategies ˆ Regression testing: Detecting quality degradation over time, building regression suites, automated alerts ˆ A/B testing for LLMs: Statistical considerations, handling high variance in LLM out- puts, minimum sample sizes ˆ Red teaming: Systematic adversarial testing, coverage strategies, tracking and triaging ndings 4.4.2 Technical Lab Build a comprehensive evaluation harness including: automated metrics, LLM-as-judge scoring, regression tracking, and alerting. Integrate with CI/CD to run evaluations on prompt or model changes. 4.4.3 Manager Module Operationalizing continuous evaluation. Building evaluation review boards. Dening quality gates for deployment. Communicating evaluation results to non-technical stakeholders. Page 13
  • 14.
    Enterprise Generative AIDeployment Curriculum 5 Phase IV: Retrieval, Customization Alignment Weeks 1115 5.1 Week 11: Failure Modes Defensive Design Before building retrieval and customization systems, understand how they fail. This week's placement ensures defensive thinking is applied to all subsequent work. 5.1.1 Topics Covered ˆ Hallucination taxonomy: Factual errors, entity confusion, temporal confusion, fabricated citations, condent incorrectness. Causes of each type ˆ Retrieval failures: Missing relevant documents, retrieving irrelevant documents, retrieval contamination, embedding space limitations ˆ Context poisoning: When retrieved context leads to worse outputs, contradictory sources, outdated information ˆ Prompt injection and jailbreaks: Attack vectors, real-world examples, why these are hard to fully prevent ˆ Semantic drift: How model behavior changes over time, API updates, silent regressions ˆ Defensive design patterns: Graceful degradation, uncertainty communication, human escalation triggers, output validation 5.1.2 Technical Lab Build a failure detection system: classify model outputs for hallucination indicators, implement uncertainty scoring, create alerting for anomalous outputs. Red-team your own system and document attack vectors. 5.1.3 Manager Module Incident response planning for LLM failures. Dening severity levels. Communication templates for stakeholders. Post-mortem processes. Building organizational awareness of failure modes. 5.2 Week 12: RAG Foundations Retrieval-Augmented Generation (RAG) is the most common pattern for grounding LLMs in organizational knowledge. This week covers the fundamentals. 5.2.1 Topics Covered ˆ Embedding models: OpenAI embeddings, Cohere Embed, open-source alternatives (E5, BGE, GTE). Dimensionality, performance, and cost tradeos ˆ Similarity metrics: Cosine similarity, dot product, Euclidean distance. When each is appropriate, normalization considerations ˆ Chunking strategies: Fixed-size chunks, semantic chunking, sentence-based, paragraph- based, document structure-aware chunking ˆ Vector databases: Pinecone, Weaviate, Qdrant, Milvus, pgvector. Architecture, indexing algorithms (HNSW, IVF), managed vs. self-hosted Page 14
  • 15.
    Enterprise Generative AIDeployment Curriculum ˆ Basic RAG pipeline: Query embedding → similarity search → context injection → generation. End-to-end implementation ˆ RAG evaluation: Retrieval quality (recall@k, MRR), end-to-end accuracy, attribution correctness 5.2.2 Technical Lab Build a complete RAG pipeline: document ingestion, chunking, embedding, vector storage, retrieval, and generation. Benchmark retrieval quality with dierent chunking strategies and embedding models on a labeled dataset. 5.2.3 Manager Module RAG infrastructure decisions: vector database selection, managed vs. self-hosted, cost modeling. Data ingestion pipelines and update strategies. When RAG is the right solution vs. alternatives. 5.3 Week 13: Advanced RAG Hybrid Search Production RAG systems require more than basic semantic search. This week covers techniques that signicantly improve retrieval quality. 5.3.1 Topics Covered ˆ Hybrid search: Combining vector search with keyword search (BM25), fusion strategies, when hybrid outperforms pure semantic search ˆ Query transformation: Query expansion, hypothetical document embeddings (HyDE), query decomposition for complex questions ˆ Reranking: Cross-encoder rerankers (Cohere Rerank, BGE Reranker), ColBERT, when reranking provides lift ˆ Hierarchical retrieval: Document summaries → full documents, multi-stage retrieval, parent-child chunk relationships ˆ Structured data retrieval: Text-to-SQL, knowledge graphs, combining structured and unstructured retrieval ˆ Metadata ltering: Using document metadata to scope searches, access control integra- tion, temporal ltering 5.3.2 Technical Lab Extend the Week 12 RAG pipeline with: hybrid search, reranking, and query expansion. Measure precision/recall improvements at each stage. Implement a text-to-SQL component for structured data queries. 5.3.3 Manager Module Justifying infrastructure complexity: when advanced RAG techniques are worth the added cost and maintenance. Performance vs. cost tradeos for reranking. Build vs. buy decisions for search infrastructure. Page 15
  • 16.
    Enterprise Generative AIDeployment Curriculum 5.4 Week 14: Fine-Tuning Fundamentals Fine-tuning is often overused. This week covers when ne-tuning is appropriate and how to do it correctly when it is. 5.4.1 Topics Covered ˆ When to ne-tune: Style/format adaptation, specialized domains, performance-critical latency reduction. When RAG or prompting suces instead ˆ When NOT to ne-tune: Adding factual knowledge (use RAG), quick experiments, rapidly changing information, limited data ˆ Fine-tuning methods: Full ne-tuning, LoRA, QLoRA, DoRA. Parameter eciency, memory requirements, quality tradeos ˆ Data preparation: Instruction format, quality over quantity, data cleaning, deduplication, balance across task types ˆ Training dynamics: Learning rates, batch sizes, epochs, early stopping, catastrophic forgetting, overtting detection ˆ Evaluation during training: Held-out test sets, monitoring for capability regression, comparing to base model 5.4.2 Technical Lab Fine-tune a small model (7B parameters) using QLoRA for a domain-specic task. Prepare a high-quality training dataset. Compare ne-tuned model performance against base model + few-shot prompting + RAG approaches. 5.4.3 Manager Module ROI analysis for ne-tuning projects. GPU compute budgeting. Maintenance burden of ne- tuned models. When to use vendor ne-tuning APIs vs. self-managed training. 5.5 Week 15: Safety, Alignment Guardrails Deploying LLMs in production requires systematic approaches to safety and alignment with organizational values. 5.5.1 Topics Covered ˆ Alignment techniques: RLHF (Reinforcement Learning from Human Feedback), RLAIF, Constitutional AI, DPO (Direct Preference Optimization) ˆ Content moderation: Input ltering, output ltering, content classiers, toxicity detec- tion, PII detection ˆ Guardrail frameworks: NeMo Guardrails, Guardrails AI, custom rule engines. Archi- tecture patterns for guardrail integration ˆ Jailbreak mitigation: Known attack patterns, defense strategies, why perfect defense is impossible, monitoring for novel attacks ˆ Topic and scope control: Keeping models on-topic, handling out-of-scope requests grace- fully, dening boundaries Page 16
  • 17.
    Enterprise Generative AIDeployment Curriculum ˆ Bias and fairness: Testing for demographic biases, mitigation strategies, documentation requirements 5.5.2 Technical Lab Build a safety pipeline including: input classier, topic guardrails, output moderation, and PII scrubbing. Red-team the system with jailbreak attempts. Implement monitoring and alerting for safety violations. 5.5.3 Manager Module Drafting an AI Safety Review Process for enterprise deployment. Dening acceptable use poli- cies. Establishing review boards. Incident response for safety violations. Communicating safety posture to stakeholders. Page 17
  • 18.
    Enterprise Generative AIDeployment Curriculum 6 Phase V: Enterprise Deployment Architecture Weeks 1620 6.1 Week 16: Production Infrastructure Moving from prototype to production requires robust infrastructure. This week covers the systems engineering required for reliable LLM deployments. 6.1.1 Topics Covered ˆ Orchestration platforms: Kubernetes for LLM workloads, Ray Serve, Modal, Anyscale. Scheduling and resource management for GPU workloads ˆ Inference optimization: vLLM, TensorRT-LLM, speculative decoding, continuous batch- ing. Maximizing throughput while meeting latency SLAs ˆ Scaling patterns: Horizontal scaling, tensor parallelism, pipeline parallelism. Auto- scaling based on load ˆ Load balancing: Request routing for LLM services, session anity for stateful conversa- tions, handling long-running requests ˆ Caching strategies: Semantic caching, embedding caches, response caches. Cache invali- dation for dynamic content ˆ Cost optimization: Spot instances for batch workloads, reserved capacity planning, model routing to optimize cost/quality 6.1.2 Technical Lab Deploy vLLM on Kubernetes with auto-scaling. Implement load testing with realistic trac patterns. Measure and optimize for latency percentiles (p50, p95, p99). Implement semantic caching and measure cache hit rates. 6.1.3 Manager Module TCO modeling: on-premises vs. cloud vs. hybrid. Reserved vs. on-demand pricing strategies. Capacity planning for growth. SLA denition and monitoring. 6.2 Week 17: Monitoring, Observability Debugging LLM systems require specialized monitoring beyond traditional application observability. This week covers the unique challenges of LLM operations. 6.2.1 Topics Covered ˆ LLM-specic metrics: Token throughput, latency distributions, cost per request, cache hit rates, error rates by type ˆ Quality monitoring: Tracking output quality over time, drift detection, automated qual- ity scoring, regression alerts ˆ Tracing and debugging: End-to-end request tracing, prompt version tracking, debugging multi-step chains, root cause analysis Page 18
  • 19.
    Enterprise Generative AIDeployment Curriculum ˆ Logging best practices: What to log (prompts, responses, latencies), PII handling in logs, log retention policies ˆ Observability platforms: LangSmith, Weights Biases, Arize, custom solutions. What each provides, integration patterns ˆ Alerting strategies: Dening alert thresholds for LLM systems, avoiding alert fatigue, on-call considerations 6.2.2 Technical Lab Build a complete observability stack for an LLM application: metrics collection, tracing, log- ging, quality monitoring, and alerting. Create dashboards for operational visibility. Implement automated quality regression detection. 6.2.3 Manager Module Dening SLOs for LLM systems. Incident management processes. On-call structures for AI systems. Building operational runbooks. Communicating system health to stakeholders. 6.3 Week 18: Data Governance, Security Compliance Enterprise LLM deployment requires careful attention to data handling, security, and regulatory compliance. 6.3.1 Topics Covered ˆ Data residency: Where data is processed and stored, geographic requirements, cross- border data transfer implications ˆ PII handling: Detection, masking, anonymization, pseudonymization. Handling PII in prompts, responses, and logs ˆ Access control: Authentication, authorization, row-level security for RAG, audit trails ˆ Regulatory frameworks: GDPR, CCPA, HIPAA, SOC2, PCI-DSS. What each requires for AI systems ˆ AI-specic regulations: EU AI Act, emerging US regulations, industry-specic require- ments (nance, healthcare) ˆ Vendor security assessment: Evaluating API provider security, data handling agree- ments, right to audit 6.3.2 Technical Lab Build a compliance-aware gateway: PII detection and masking, audit logging, access control integration, data residency enforcement. Implement prompt and response ltering for sensitive content. 6.3.3 Manager Module Designing compliance checkpoints for GenAI workows. Working with legal and compliance teams. Documentation requirements. Preparing for audits. Vendor risk management for AI providers. Page 19
  • 20.
    Enterprise Generative AIDeployment Curriculum 6.4 Week 19: Agentic Systems Agentic systemsLLMs that plan, use tools, and take autonomous actionsare increasingly important but require careful design. This week provides a deep dive rather than surface cover- age. 6.4.1 Topics Covered ˆ Agent architectures: ReAct (Reasoning + Acting), Plan-and-Execute, Tree of Thoughts. When each pattern is appropriate ˆ Agent frameworks: LangGraph, AutoGen, CrewAI, custom implementations. Frame- work selection criteria, avoiding over-abstraction ˆ State management: Tracking agent state across steps, persistence, recovery from failures, conversation memory ˆ Multi-agent systems: Agent coordination, task delegation, conict resolution, communi- cation protocols ˆ Guarded autonomy: Setting boundaries on agent actions, approval workows, rollback mechanisms, kill switches ˆ Agent failure modes: Innite loops, goal drift, unintended actions, cascading failures. Detection and mitigation 6.4.2 Technical Lab Build a multi-agent system for a realistic enterprise workow (e.g., research assistant with plan- ner, researcher, and writer agents). Implement state persistence, approval gates, and failure recovery. Test with scenarios designed to trigger failure modes. 6.4.3 Manager Module What tasks should never be delegated to autonomous AI. Liability considerations for agent actions. Approval workows and human oversight requirements. Building organizational trust in agentic systems incrementally. 6.5 Week 20: Enterprise Integrations Connecting LLM systems to enterprise infrastructure: CRMs, ERPs, data warehouses, and collaboration tools. 6.5.1 Topics Covered ˆ Integration patterns: REST APIs, GraphQL, message queues (Kafka, Pub/Sub), web- hooks. Choosing the right pattern for each integration ˆ Common enterprise systems: Salesforce, ServiceNow, SAP, Workday, Jira, Conuence. Integration approaches for each ˆ Data synchronization: Keeping RAG indices in sync with source systems, change data capture, incremental updates ˆ Authentication and authorization: OAuth ows, service accounts, API key manage- ment, propagating user identity Page 20
  • 21.
    Enterprise Generative AIDeployment Curriculum ˆ Error handling and resilience: Handling downstream system failures, circuit breakers, fallback strategies ˆ Event-driven architectures: Triggering LLM workows from business events, async pro- cessing, result delivery 6.5.2 Technical Lab Build a full-stack LLM microservice that: ingests documents from a simulated enterprise system, maintains a RAG index, responds to queries via REST API, and pushes results to a message queue. Implement proper error handling and monitoring. 6.5.3 Manager Module Building AI Centers of Excellence. Integration roadmap planning. Change management for AI- augmented workows. Stakeholder communication across IT, business, and compliance teams. Page 21
  • 22.
    Enterprise Generative AIDeployment Curriculum 7 Phase VI: Governance, Risk Capstone Weeks 2124 7.1 Week 21: AI Governance Frameworks Sustainable AI deployment requires organizational governance structures. This week covers building governance frameworks that enable responsible AI use. 7.1.1 Topics Covered ˆ Documentation standards: Model cards, system cards, datasheets for datasets. What to document and why ˆ Review processes: AI review boards, sign-o workows, risk scoring matrices, exception processes ˆ Risk classication: Categorizing AI use cases by risk level, tailoring controls to risk, high-risk vs. low-risk workows ˆ Policy development: Acceptable use policies, data usage policies, model training policies, deployment policies ˆ Vendor governance: Third-party AI risk management, contract requirements, ongoing vendor monitoring ˆ External stakeholders: Communicating AI governance to regulators, auditors, investors, customers 7.1.2 Technical Lab Create a complete governance package for an AI system: model card, system card, risk assess- ment, and deployment checklist. Build a governance dashboard that tracks AI deployments across the organization. 7.1.3 Manager Module Creating governance frameworks that satisfy regulators, auditors, and boards. Balancing gover- nance overhead with deployment velocity. Building governance that scales with AI adoption. 7.2 Week 22: AI Products Organizations Building sustainable AI capability requires appropriate organizational structures, team compo- sitions, and product thinking. 7.2.1 Topics Covered ˆ AI team structures: Centralized vs. embedded teams, MLOps/LLMOps roles, prompt engineers, AI product managers. Skills and responsibilities for each ˆ Build vs. buy decisions: When to use vendor solutions, when to build custom, hybrid approaches. Decision frameworks ˆ AI product development: User research for AI features, iterative development, handling uncertainty in AI capabilities Page 22
  • 23.
    Enterprise Generative AIDeployment Curriculum ˆ Success metrics: Dening and measuring AI product success, leading vs. lagging indica- tors, avoiding vanity metrics ˆ Change management: Introducing AI to existing workows, training end users, managing resistance, celebrating wins ˆ Multi-year planning: AI capability roadmaps, technology evolution, skill development, budget planning 7.2.2 Technical Lab Develop a comprehensive AI product proposal: problem denition, solution architecture, success metrics, risk assessment, resource requirements, and rollout plan. Present to peer groups for critique. 7.2.3 Manager Module Executive-level planning for multi-year AI adoption. Building the business case for AI invest- ment. Hiring and developing AI talent. Creating a culture of responsible AI innovation. 7.3 Weeks 2324: Capstone Project The capstone integrates all program content into a complete, enterprise-ready GenAI system. Teams work on realistic scenarios with full technical and governance requirements. 7.3.1 Capstone Requirements Teams of 35 participants deliver an end-to-end solution including: Technical Components: ˆ Advanced RAG with hybrid search and reranking ˆ Structured output parsing with validation ˆ Function calling / tool use ˆ Safety classiers and guardrails ˆ API gateway with authentication ˆ Comprehensive logging, monitoring, and alerting ˆ Evaluation harness with regression testing Governance Documentation: ˆ System card with risk assessment ˆ Compliance checklist ˆ Deployment runbook ˆ Incident response plan Page 23
  • 24.
    Enterprise Generative AIDeployment Curriculum Business Case: ˆ Cost-benet analysis ˆ Success metrics and measurement plan ˆ Rollout strategy 7.3.2 Capstone Presentation Teams present to a panel simulating a CTO/board review: ˆ Technical deep dive: Architecture decisions, implementation challenges, performance results ˆ Live demonstration: Working system handling realistic scenarios, including edge cases ˆ Business justication: ROI analysis, risk mitigation, alignment with organizational goals ˆ QA: Defending decisions, handling objections, demonstrating depth of understanding Page 24
  • 25.
    Enterprise Generative AIDeployment Curriculum 8 Appendix A: What Makes This Curriculum Rigorous This curriculum is designed to produce genuine competence, not just exposure. Key design principles: ˆ Realistic pacing: 24 weeks allows topics to be covered at the depth required for production deployment, rather than rushing through material ˆ Prerequisite calibration: Week 0 diagnostic ensures participants have foundational skills and provides remediation resources for those who need them ˆ Failure modes rst: Week 11 covers failure modes before building RAG and ne-tuning systems, ensuring defensive design thinking is applied throughout ˆ Prompt engineering as a discipline: Dedicated weeks (78) treat prompt engineering as a systematic practice, not an afterthought ˆ Model selection framework: Week 9 provides structured approaches to vendor evalua- tion rather than assuming model choice is given ˆ Fine-tuning skepticism: Week 14 emphasizes when not to ne-tune, countering the common tendency to over-apply this technique ˆ Agentic systems depth: Week 19 provides a full week on agents rather than surface coverage, reecting the complexity and risk of autonomous AI systems ˆ Cost awareness throughout: Token costs, infrastructure costs, and ROI analysis are integrated into technical discussions from Week 1 ˆ Governance integrated: Compliance and governance appear throughout, not just in a nal governance week ˆ Dual-track delity: Manager modules provide substantive content on architecture, cost, and organizational decisionsnot watered-down versions of technical material 9 Appendix B: Suggested Reading Resources 9.1 Foundational Papers ˆ Vaswani et al., Attention Is All You Need (2017)Transformer architecture ˆ Brown et al., Language Models are Few-Shot Learners (2020)GPT-3 and in-context learning ˆ Wei et al., Chain-of-Thought Prompting (2022)Reasoning elicitation ˆ Lewis et al., Retrieval-Augmented Generation (2020)RAG foundations ˆ Hu et al., LoRA: Low-Rank Adaptation (2021)Parameter-ecient ne-tuning 9.2 Technical Resources ˆ Anthropic's Claude documentation and cookbook ˆ OpenAI's API documentation and best practices guides ˆ LangChain and LlamaIndex documentation ˆ vLLM documentation for inference optimization ˆ Hugging Face Transformers library and model hub Page 25
  • 26.
    Enterprise Generative AIDeployment Curriculum 9.3 Enterprise Governance ˆ NIST AI Risk Management Framework ˆ EU AI Act text and implementation guidance ˆ Model Cards for Model Reporting (Mitchell et al., 2019) ˆ Datasheets for Datasets (Gebru et al., 2021) 10 Appendix C: Lab Environment Requirements 10.1 Compute Resources ˆ Cloud GPU access: Minimum 1× A100 40GB for ne-tuning labs; 1× A10G or T4 sucient for inference labs ˆ Local development: 16GB+ RAM, modern CPU, Docker support ˆ API credits: $5001000 per participant for commercial API usage across the program 10.2 Software Stack ˆ Python 3.10+ ˆ PyTorch 2.0+ ˆ Transformers, PEFT, BitsAndBytes for ne-tuning ˆ LangChain or LlamaIndex for RAG ˆ vLLM for inference ˆ Docker, Kubernetes (Minikube or cloud-managed) ˆ PostgreSQL with pgvector, or managed vector database 10.3 Accounts Required ˆ OpenAI API account ˆ Anthropic API account ˆ Cloud provider account (AWS, GCP, or Azure) ˆ Hugging Face account ˆ GitHub account for version control This curriculum was developed with input from enterprise AI practitioners, reecting lessons learned from production deployments across industries. Page 26