完整的考试准备指南
Complete Learning Path for Certification Success
This study guide provides a structured learning path from fundamentals to exam readiness for the Microsoft Certified: DevOps Engineer Expert certification. Designed for novices, it teaches all concepts progressively while focusing exclusively on exam-relevant content. Extensive diagrams and visual aids are integrated throughout to enhance understanding and retention.
Study Sections (in order):
Total Time: 6-10 weeks (2-3 hours daily)
Week 1-2: Fundamentals & Domain 1 (sections 01-02)
Week 3: Domain 2 (section 03)
Week 4-6: Domain 3 (section 04)
Week 7: Domains 4-5 (sections 05-06)
Week 8: Integration & Cross-domain scenarios (section 07)
Week 9: Practice & Review
Week 10: Final Prep (sections 08-09)
Use checkboxes to track completion:
Exam Information:
Domain Weight Distribution:
Prerequisites:
Practice Test Bundles:
Cheat Sheets: - Quick reference for final review
Begin with Fundamentals to build your foundation. Take your time with each chapter, ensuring you understand concepts before moving forward. This guide is designed to be comprehensive and self-sufficient - you should not need external resources to pass the exam.
Good luck on your DevOps Engineer Expert certification journey!
This certification assumes you understand:
If you're missing any: Consider reviewing Azure Fundamentals (AZ-900) materials or taking introductory courses in Git and Agile methodologies before proceeding.
What it is: DevOps is a cultural and technical movement that combines software development (Dev) and IT operations (Ops) into a unified approach. It integrates development, quality assurance, and IT operations into a unified culture and set of processes for delivering software efficiently and reliably.
Why it matters: Traditional software development had separate development and operations teams working in isolation, leading to slow releases, communication gaps, and deployment failures. DevOps breaks down these silos to enable faster, more reliable software delivery.
Real-world analogy: Think of DevOps like a relay race where the baton (your code) is passed seamlessly between runners (teams). In traditional development, runners would stop, hand over documentation about the baton, and the next runner would have to figure out how to carry it. In DevOps, everyone trains together, knows the process, and the handoff is smooth and automatic.
Key points:
💡 Tip: DevOps isn't a tool or a single role - it's a philosophy. You can't "install DevOps," but you can adopt DevOps practices and culture.
What it is: The DevOps lifecycle represents the continuous flow of activities from planning to monitoring in software delivery. Unlike traditional waterfall development with discrete phases, DevOps creates a continuous loop of improvement.
Why it exists: Software delivery is not a one-time event but a continuous process. Applications need updates, bug fixes, new features, and security patches throughout their lifetime. The DevOps lifecycle provides a framework for managing this continuous delivery.
Real-world analogy: The DevOps lifecycle is like a circular assembly line where each completed product immediately informs improvements to the next iteration. Feedback from customers (monitoring) directly influences what gets built next (planning), creating a continuous improvement loop.
How it works (Detailed step-by-step):
Plan: Teams define what to build, prioritize features, and create work items
Develop: Developers write code following best practices and standards
Build: Code is compiled, packaged, and prepared for deployment
Test: Automated tests validate functionality, performance, and security
Release: Approved builds are deployed to staging and production environments
Deploy: Application is installed and configured in target environments
Operate: Application runs in production, serving real users
Monitor: Telemetry and logs track application health and user behavior
📊 DevOps Lifecycle Diagram:
graph TB
Plan[1. Plan<br/>Define features & priorities] --> Develop[2. Develop<br/>Write code & tests]
Develop --> Build[3. Build<br/>Compile & package]
Build --> Test[4. Test<br/>Validate quality]
Test --> Release[5. Release<br/>Approve for deployment]
Release --> Deploy[6. Deploy<br/>Install to environment]
Deploy --> Operate[7. Operate<br/>Run in production]
Operate --> Monitor[8. Monitor<br/>Track performance]
Monitor -.Feedback.-> Plan
style Plan fill:#e3f2fd
style Develop fill:#f3e5f5
style Build fill:#fff3e0
style Test fill:#e8f5e9
style Release fill:#fce4ec
style Deploy fill:#e0f2f1
style Operate fill:#e1f5fe
style Monitor fill:#f9fbe7
See: diagrams/01_fundamentals_devops_lifecycle.mmd
Diagram Explanation (200-400 words):
The DevOps lifecycle diagram illustrates the eight continuous phases that form the foundation of modern software delivery. Starting with Plan (blue), teams use tools like Azure Boards or GitHub Projects to define user stories and prioritize work based on business value and customer feedback. The Develop phase (purple) represents developers writing code in their chosen IDE, creating unit tests, and committing changes to version control systems like Git.
The Build phase (orange) takes source code and transforms it into deployable artifacts through compilation, dependency resolution, and packaging - Azure Pipelines or GitHub Actions automate this process triggered by code commits. Test (green) represents the critical quality gates where automated tests (unit, integration, security) run against builds to catch defects early before they reach production.
Once tests pass, the Release phase (pink) manages approvals and gates, determining which builds are ready for deployment to various environments. Deploy (teal) executes the actual installation and configuration of applications to target environments using Infrastructure as Code (IaC) tools like ARM templates or Terraform. The Operate phase (light blue) represents the running application serving real users and generating business value.
Finally, Monitor (yellow-green) continuously collects telemetry, logs, and metrics about application performance, user behavior, and system health using tools like Azure Monitor and Application Insights. The critical feedback arrow from Monitor back to Plan represents the continuous improvement loop - insights from production inform what features to build next, what issues to fix, and how to optimize performance. This circular flow means DevOps never stops; each iteration builds on lessons learned from the previous deployment.
What it is: Continuous Integration is the practice of automatically building and testing code every time a team member commits changes to version control. Every code commit to the main branch triggers an automated build process that compiles the code, runs tests, and validates quality.
Why it exists: Before CI, developers would work in isolation for days or weeks, then try to merge their changes together. This led to "integration hell" - massive merge conflicts, broken builds, and bugs that were hard to trace. CI solves this by integrating code frequently (multiple times per day), catching conflicts and issues immediately when they're easier to fix.
Real-world analogy: CI is like checking your bank account balance daily versus once a month. Daily checks let you catch errors immediately (a duplicate charge today), while monthly checks mean discovering problems weeks later when you can't remember the transactions (which code change broke the build?).
How it works (Detailed step-by-step):
Developer commits code: A developer finishes a feature or bug fix and pushes changes to a shared Git repository (GitHub, Azure Repos)
Trigger fires: The commit triggers a webhook or polling mechanism that notifies the CI system
Build process starts: CI system checks out the code and begins building
Automated tests run: The build includes running the test suite (unit tests, integration tests, linting)
Results reported: The CI system reports success or failure to the team
Artifacts published (if successful): Compiled code is packaged and stored for deployment
📊 CI Process Flow Diagram:
sequenceDiagram
participant Dev as Developer
participant Git as Git Repository
participant CI as CI System<br/>(Azure Pipelines/GitHub Actions)
participant Tests as Test Suite
participant Artifact as Artifact Registry
Dev->>Git: 1. Push code commit
Git->>CI: 2. Trigger webhook
CI->>Git: 3. Clone repository
CI->>CI: 4. Install dependencies
CI->>CI: 5. Compile/Build code
CI->>Tests: 6. Run automated tests
Tests-->>CI: 7. Test results
alt Tests Pass
CI->>Artifact: 8a. Publish build artifact
CI->>Dev: 9a. ✅ Success notification
else Tests Fail
CI->>Dev: 8b. ❌ Failure notification
Dev->>Dev: 9b. Fix issues
end
See: diagrams/01_fundamentals_ci_flow.mmd
Diagram Explanation:
This sequence diagram shows the automated CI workflow from code commit to artifact publication. When a Developer pushes code to the Git Repository (step 1), Git immediately sends a webhook notification to the CI System like Azure Pipelines or GitHub Actions (step 2). The CI system responds by cloning the latest code from the repository (step 3), then installs all necessary dependencies like npm packages or NuGet libraries (step 4).
Next, the CI system compiles or builds the code (step 5) - for compiled languages this means creating binaries, for interpreted languages it might mean bundling and minification. The build is then passed to the Test Suite (step 6) where automated tests execute. The test results (step 7) determine the next steps: if tests pass (green path), the CI system publishes the build artifact to a registry like Azure Artifacts, Docker Hub, or npm (step 8a) and sends a success notification to the developer (step 9a). If tests fail (red path), the developer receives a failure notification immediately (step 8b) and can fix the issues before they affect others (step 9b). This entire process typically completes in minutes, providing rapid feedback to developers.
Detailed Example 1: Web Application CI Scenario
Imagine you're developing an e-commerce web application using React for frontend and Node.js for backend. A developer named Sarah completes a new feature that adds a shopping cart widget to the product page. She commits her code changes to the main branch in GitHub at 10:00 AM. Within seconds, GitHub sends a webhook to Azure Pipelines, which has been configured to trigger on any commit to main.
Azure Pipelines spins up a build agent (a clean virtual machine) and clones the repository. It runs npm install to download all dependencies listed in package.json - this includes React, Express, testing libraries, and dozens of other packages. Next, it runs npm run build which compiles the React code using Webpack, minifies JavaScript and CSS, and creates optimized bundles for production.
The build then executes npm test, running Jest unit tests that verify Sarah's shopping cart logic handles edge cases (empty cart, max quantity limits, price calculations). It also runs Cypress end-to-end tests that simulate a user adding items to cart in a real browser. All 247 tests pass in 3 minutes. Azure Pipelines then runs npm run lint to check code style (ESLint) - all checks pass. Finally, it creates a Docker image of the application, tags it with the commit SHA, and pushes it to Azure Container Registry. Sarah receives a Slack notification at 10:04 AM: "✅ Build #2847 succeeded - your changes are ready for deployment." The entire process took 4 minutes from commit to deployable artifact.
Detailed Example 2: Microservices CI with Multiple Languages
Consider a financial services company with a microservices architecture: payment service (Java), user service (Python), and notification service (C#). Each service has its own Git repository and CI pipeline, but they need to work together. When a developer commits to the payment service repository, Azure Pipelines triggers a build specific to Java: it runs mvn clean install to compile Java code with Maven, execute JUnit tests, perform static code analysis with SonarQube, and scan for vulnerabilities with OWASP Dependency-Check.
For the Python service, the CI pipeline runs pip install -r requirements.txt, executes pytest for unit tests, runs pylint for code quality, and uses safety to check for insecure dependencies. The C# service uses dotnet build, dotnet test with xUnit, and runs security scanning with Microsoft Security Code Analysis. Each pipeline is tailored to its language but follows the same principles: build, test, scan, publish. When all three services pass their individual CI pipelines, an integration test pipeline triggers that deploys all three services to a test environment and runs end-to-end API tests to verify they communicate correctly. This multi-language, multi-service CI approach ensures each component is validated individually and as part of the whole system.
Detailed Example 3: CI Catching a Critical Bug
A developer named Mike is working on a database migration feature for a SaaS application. He writes code to add a new column to the users table and updates the data access layer. He runs tests locally on his machine - everything passes. Confident, he commits the code at 2:00 PM. The CI pipeline triggers and starts building. During the automated test phase, integration tests that use a real PostgreSQL database (via Docker container) discover that Mike's migration script has a syntax error for PostgreSQL 14 (his local machine had PostgreSQL 13). The test "user_migration_adds_column_correctly" fails.
The CI system immediately sends Mike an email and updates the GitHub pull request with a red X. The build log shows: "ERROR: column 'user_preferences' of relation 'users' already exists." Mike realizes his migration doesn't check if the column exists before adding it. He adds IF NOT EXISTS to the SQL, commits again at 2:15 PM. The CI pipeline reruns - this time all tests pass, including additional tests that run migrations twice to verify idempotency. Without CI, this bug would have been discovered only when deploying to staging (maybe days later), potentially causing database corruption and requiring manual rollback. CI caught it in 15 minutes, before any environment was affected.
⭐ Must Know (Critical Facts):
When to use (Comprehensive):
What it is: Continuous Delivery is the practice of automatically building, testing, and preparing code changes for release to production. CD extends CI by ensuring that every successful build is automatically deployed to staging/testing environments and is always ready to be deployed to production at the click of a button. It's about keeping your software in a deployable state at all times.
Why it exists: Traditional software releases were risky, manual, and infrequent (quarterly or yearly). Teams spent weeks preparing for releases, writing deployment documents, and coordinating downtime windows. Continuous Delivery eliminates this friction by automating the entire release process, making deployment a low-risk, routine event that can happen anytime.
Real-world analogy: Think of CD like having pre-packed bags ready for a trip. Without CD, you pack frantically before each trip (deployment), often forgetting things. With CD, your bags are always packed and ready - you just grab them and go. The packing process (testing and preparation) happens automatically after each shopping trip (code commit).
How it works (Detailed step-by-step):
CI completes successfully: Continuous Integration has built the code and run all automated tests
Deployment to testing environment: Artifacts are automatically deployed to a staging/QA environment
Automated acceptance tests: Additional tests run in the staging environment
Manual approval gates (optional): Stakeholders review and approve for production
Production-ready state: Application is ready to deploy to production anytime
Key Difference - Continuous Delivery vs Continuous Deployment:
📊 Continuous Delivery Pipeline Diagram:
graph LR
A[Code Commit] --> B[CI Build & Test]
B -->|Success| C[Deploy to DEV]
C --> D[Automated Tests DEV]
D -->|Pass| E[Deploy to QA/Staging]
E --> F[Integration Tests]
F --> G[Performance Tests]
G --> H{Manual Approval Gate}
H -->|Approved| I[Ready for Production]
H -->|Rejected| J[Back to Development]
I -.Manual Trigger.-> K[Deploy to Production]
style A fill:#e3f2fd
style B fill:#f3e5f5
style C fill:#e8f5e9
style D fill:#fff3e0
style E fill:#f1f8e9
style F fill:#fce4ec
style G fill:#f3e5f5
style H fill:#ffebee
style I fill:#e0f2f1
style K fill:#c8e6c9
See: diagrams/01_fundamentals_cd_pipeline.mmd
Diagram Explanation:
This flowchart illustrates a complete Continuous Delivery pipeline from code commit to production-ready state. The journey begins when a developer commits code (A, blue), triggering the CI Build & Test phase (B, purple) where code is compiled and unit tests run. Upon success, the green path activates: the build automatically deploys to the DEV environment (C, light green) for developer validation.
Automated tests run in DEV (D, orange) to verify basic functionality. When those pass, the pipeline progresses to deploying to QA/Staging environment (E, yellow-green), which mirrors production infrastructure. Here, comprehensive testing occurs: Integration Tests (F, pink) validate that all services work together, and Performance Tests (G, purple) ensure the application meets speed and scalability requirements under load.
After all automated validations pass, the pipeline reaches a Manual Approval Gate (H, red) where designated approvers (product managers, tech leads, or change boards) review test results and business readiness. If approved, the build enters "Ready for Production" state (I, teal) - it's fully tested and can be deployed to production anytime via manual trigger. If rejected, feedback loops back to Development (J). The final production deployment (K, green) happens when someone clicks "Deploy" - this could be immediately after approval or scheduled for a specific time. Notice the dashed line to production indicates this step is manual (the key difference from Continuous Deployment where it would be automatic).
Detailed Example 1: E-commerce CD Pipeline
Consider an online retail company that processes millions of orders daily. They've implemented CD for their order processing service. When a developer commits a bug fix to improve order confirmation emails, the CI pipeline builds and tests the code (2 minutes). The CD pipeline automatically deploys the new version to the DEV environment where developers can manually verify the email formatting looks correct (5 minutes).
Next, it auto-deploys to the QA environment where automated Selenium tests verify the complete order flow: add to cart, checkout, payment, and confirmation email. These tests include checking that emails contain order numbers, item details, and delivery estimates (10 minutes). Then performance tests simulate 10,000 concurrent orders to ensure the fix doesn't impact throughput (15 minutes).
All tests pass, and the pipeline notifies the product manager via Microsoft Teams. She reviews the test results dashboard showing 100% test pass rate, 0.3% CPU increase (acceptable), and previews the new email template. She clicks "Approve for Production" in Azure DevOps. The system marks this build as "Production Ready" and tags it in the container registry.
That evening during off-peak hours (2 AM), the deployment engineer clicks "Deploy to Production" in Azure DevOps. The CD pipeline executes: it pulls the approved container image, performs a blue-green deployment (standing up new instances alongside old ones), runs smoke tests against the new version, and gradually shifts traffic from old to new. The entire production deployment takes 20 minutes with zero downtime. By morning, all order confirmations use the improved template, and customers notice the clearer messaging.
Detailed Example 2: Banking Application with Compliance Gates
A financial institution has strict regulatory requirements for their mobile banking app. Their CD pipeline includes compliance validation steps. When developers complete a new feature for balance transfers, the standard CI/CD flow begins: build, unit tests, deploy to DEV. But this pipeline includes additional compliance gates.
After DEV deployment, the pipeline triggers automated security scans: SAST (static analysis) with SonarQube checks for SQL injection vulnerabilities, DAST (dynamic analysis) with OWASP ZAP tests the running application for XSS attacks, and dependency scanning verifies no packages have known CVEs. The pipeline also runs accessibility tests (WCAG 2.1 compliance for screen readers) and data encryption validation (ensuring PII is encrypted at rest and in transit).
Only if all security scans pass does the build proceed to the QA environment. Here, automated test scripts validated by the compliance team run scenarios like: transaction limits, audit log generation, and session timeout enforcement. The pipeline generates a compliance report documenting all tests and their results.
The report goes to three approval groups in sequence: (1) QA Lead reviews test coverage, (2) Security Officer reviews vulnerability scans, (3) Compliance Officer verifies regulatory requirements. Each approver has 24 hours to review; if no response, the request auto-escalates to their manager. Once all three approve, the build reaches "Production Ready" status.
Production deployment happens only during approved change windows (Tuesday/Thursday nights, 10 PM - 2 AM). The deployment includes an automatic rollback trigger: if error rates exceed 0.1% or response times increase by 20%, the system automatically reverts to the previous version and pages the on-call engineer. This heavily gated CD pipeline ensures that the balance transfer feature meets all regulatory requirements before reaching customers' mobile devices.
⭐ Must Know (Critical Facts):
What it is: Infrastructure as Code is the practice of managing and provisioning infrastructure (servers, networks, databases) through machine-readable definition files rather than manual configuration or interactive configuration tools. Instead of clicking through Azure Portal to create resources, you write declarative code that describes what infrastructure you want, and tools automatically create it.
Why it exists: Manual infrastructure setup is error-prone, inconsistent, and doesn't scale. Two engineers manually creating "identical" environments will inevitably create slight differences (different OS patch levels, configuration settings, installed software). These differences cause the dreaded "works in dev, breaks in production" scenarios. IaC solves this by codifying infrastructure, making it repeatable, version-controlled, and testable.
Real-world analogy: IaC is like having a detailed recipe for cooking versus following verbal instructions. Verbal instructions ("add some salt, cook until it looks done") produce inconsistent results. A recipe with exact measurements (2 tsp salt, 20 minutes at 350°F) produces the same dish every time. Your infrastructure definition files are the exact recipe that produces identical environments consistently.
How IaC works: You write template files (ARM templates, Bicep, Terraform HCL) that declare desired infrastructure state. Tools read these templates and make API calls to cloud providers to create/update resources to match the declaration.
IaC Tools for Azure:
⭐ Must Know: IaC templates should be stored in version control (Git) alongside application code - this enables infrastructure versioning, code reviews, and rollback capabilities.
What it is: Version control systems track changes to code over time, allowing multiple developers to collaborate, review history, and revert changes. Git is the most popular distributed version control system used in DevOps.
Why it matters for DevOps: DevOps requires collaboration and automation. Git provides the foundation for both - teams collaborate through branches and pull requests, while CI/CD pipelines trigger from Git events (commits, merges, tags).
Key Git Concepts:
What is Agile: Iterative approach to software development with short cycles (sprints), frequent feedback, and adaptability to change. Common frameworks: Scrum, Kanban.
How DevOps Extends Agile: Agile focuses on development team collaboration; DevOps extends this to include operations, creating end-to-end ownership from code to production.
DevOps Cultural Principles:
Azure DevOps Services:
GitHub Services:
When to Choose:
| Term | Definition | Example |
|---|---|---|
| Artifact | Build output (compiled code, packages, containers) | Docker image, NuGet package, WAR file |
| Pipeline | Automated sequence of stages for build/deploy | CI pipeline, Release pipeline |
| Agent/Runner | Machine that executes pipeline tasks | Microsoft-hosted agent, Self-hosted runner |
| Stage | Logical phase in pipeline (Build, Test, Deploy) | Build stage runs compilation and unit tests |
| Job | Collection of steps executed on single agent | Build job with 5 steps |
| Task/Step | Individual action in pipeline | npm install task, Docker build step |
| Trigger | Event that starts pipeline | Commit to main, Pull request, Schedule |
| Gate | Approval or validation before proceeding | Manual approval, Security scan gate |
| Environment | Deployment target (Dev, QA, Prod) | Production environment with 10 VMs |
| Service Connection | Authenticated connection to external service | Azure subscription connection |
📊 DevOps Ecosystem Overview:
graph TB
subgraph "Planning & Collaboration"
A[Azure Boards/<br/>GitHub Issues]
end
subgraph "Source Control"
B[Azure Repos/<br/>GitHub]
end
subgraph "CI/CD Automation"
C[Azure Pipelines/<br/>GitHub Actions]
end
subgraph "Testing"
D[Test Plans/<br/>Test Frameworks]
end
subgraph "Package Management"
E[Azure Artifacts/<br/>GitHub Packages]
end
subgraph "Infrastructure"
F[ARM/Bicep/<br/>Terraform]
end
subgraph "Deployment Targets"
G[Azure App Service<br/>Kubernetes<br/>VMs]
end
subgraph "Monitoring"
H[Azure Monitor/<br/>App Insights]
end
A -->|Work Items| B
B -->|Code Commits| C
C -->|Run Tests| D
C -->|Publish| E
C -->|Provision| F
F -->|Deploy To| G
C -->|Deploy| G
G -->|Telemetry| H
H -->|Feedback| A
style A fill:#e3f2fd
style B fill:#f3e5f5
style C fill:#fff3e0
style D fill:#e8f5e9
style E fill:#fce4ec
style F fill:#e0f2f1
style G fill:#e1f5fe
style H fill:#f9fbe7
See: diagrams/01_fundamentals_ecosystem.mmd
Diagram Explanation: This ecosystem diagram shows how DevOps tools and practices interconnect. Planning & Collaboration (blue) tools like Azure Boards or GitHub Issues track work items (features, bugs, tasks). When work begins, developers create branches in Source Control (purple) using Azure Repos or GitHub. Code commits trigger CI/CD Automation (orange) via Azure Pipelines or GitHub Actions, which orchestrates the entire delivery process.
Pipelines execute Testing (green) using various frameworks (JUnit, pytest, Selenium) to validate quality. Successful builds publish artifacts to Package Management (pink) systems like Azure Artifacts or GitHub Packages for versioning and distribution. Pipelines also execute Infrastructure (teal) provisioning using ARM templates, Bicep, or Terraform to create or update cloud resources.
Applications deploy to Deployment Targets (light blue) like Azure App Service for web apps, Kubernetes for containers, or VMs for traditional applications. Running applications send telemetry to Monitoring (yellow-green) systems like Azure Monitor and Application Insights, collecting metrics, logs, and traces. Crucially, monitoring insights feed back into Planning (feedback arrow), creating a continuous improvement loop where production data informs what to build next. This circular flow represents the never-ending DevOps lifecycle.
Test yourself before moving on:
Key Concepts:
Tools:
Cultural Pillars:
📝 Practice: Before proceeding to Domain 1, ensure you can explain CI/CD to someone unfamiliar with DevOps using simple analogies. If you can't, review the CI and CD sections again.
Next Chapter: 02_domain1_processes_communications - Design and Implement Processes and Communications (Work tracking, metrics, collaboration)
What you'll learn:
Time to complete: 8-10 hours
Prerequisites: Chapter 0 (Fundamentals) - Understanding of DevOps lifecycle and version control
Exam Weight: 10-15% (using 12.5%)
This domain focuses on the planning and collaboration aspects of DevOps, ensuring teams can track work effectively, measure progress with meaningful metrics, and communicate efficiently.
The problem: Without proper work tracking and traceability, teams lose visibility into what's being worked on, why changes are made, and how work progresses from idea to production. This leads to missed requirements, duplicated effort, and inability to understand the impact of code changes.
The solution: Implement end-to-end traceability systems that connect planning (work items) to execution (code commits, builds, deployments) and results (monitoring data). Azure Boards and GitHub provide comprehensive work tracking with deep integration into the DevOps lifecycle.
Why it's tested: The AZ-400 exam emphasizes the ability to design workflow systems that provide visibility, enable collaboration, and support data-driven decision making. Understanding how to configure work tracking and establish traceability is fundamental to effective DevOps implementation.
What it is: Azure Boards is a work tracking system that uses customizable work items to plan, track, and discuss work across teams. It supports Agile methodologies (Scrum, Kanban) and provides visualization through boards, backlogs, and dashboards.
Why it exists: Traditional project management tools (spreadsheets, email threads) don't integrate with development workflows. Azure Boards solves this by embedding work tracking directly into the DevOps toolchain, linking planning to code, builds, and deployments for complete traceability.
Real-world analogy: Azure Boards is like a digital task board in a restaurant kitchen where each ticket (work item) represents an order. The ticket moves from "New Order" to "Cooking" to "Quality Check" to "Ready to Serve." Anyone in the kitchen can see all orders, their status, and who's working on what - and each ticket links to the recipe (code), ingredients used (commits), and customer feedback (monitoring).
How it works (Detailed step-by-step):
Create work items: Teams create work items representing features, user stories, tasks, bugs, or issues
Organize in backlogs: Work items are prioritized and organized in product and sprint backlogs
Visualize on Kanban board: Work items appear as cards on configurable board columns
Link to code: Developers reference work items in commits, pull requests, and branches
Track progress: Automated state transitions and burndown charts show sprint/iteration progress
📊 Azure Boards Workflow Diagram:
graph LR
A[Product Owner<br/>Creates Work Item] --> B[Backlog<br/>Prioritization]
B --> C[Sprint Planning<br/>Assign to Sprint]
C --> D[Developer<br/>Creates Branch]
D --> E[Code + Commits<br/>Link to Work Item]
E --> F[Pull Request<br/>Code Review]
F --> G[Merge to Main<br/>Auto-update Work Item]
G --> H[CI/CD Pipeline<br/>Build & Deploy]
H --> I[Work Item State<br/>Closed/Resolved]
style A fill:#e3f2fd
style B fill:#f3e5f5
style C fill:#fff3e0
style D fill:#e8f5e9
style E fill:#fce4ec
style F fill:#f1f8e9
style G fill:#e0f2f1
style H fill:#e1f5fe
style I fill:#c8e6c9
See: diagrams/02_domain1_azure_boards_workflow.mmd
Diagram Explanation:
This workflow diagram illustrates the complete lifecycle of work tracking in Azure Boards from inception to completion. The Product Owner (blue) creates work items representing features or user stories based on business requirements or user feedback. These items enter the Backlog (purple) where they're prioritized by business value, dependencies, and team capacity.
During Sprint Planning (orange), the team selects high-priority items from the backlog and assigns them to the current sprint, estimating effort in story points. A Developer (green) picks a work item and creates a feature branch (e.g., "feature/add-shopping-cart-AB123") from the main branch. As they write code and make Commits (pink), they reference the work item ID using "AB#123" syntax - this creates automatic links from commits to work items visible in both Git history and the work item's Development section.
When code is complete, the developer creates a Pull Request (light green) for code review, again mentioning "AB#123" in the PR description to maintain traceability. After approval and Merge to Main (teal), Azure Boards can automatically transition the work item state (e.g., from "Active" to "Resolved") based on keywords like "Fixes AB#123" in the merge commit. The merge triggers the CI/CD Pipeline (light blue) which builds, tests, and deploys the code. Finally, when deployment succeeds and validation passes, the Work Item State (green) updates to "Closed," completing the traceability loop. Every step is tracked, creating a complete audit trail from requirement to production.
What it is: GitHub Projects is a native project management tool built directly into GitHub repositories and organizations that provides kanban boards, tables, and roadmaps for tracking work.
Why it exists: Teams need lightweight, code-centric project management without switching between separate tools. GitHub Projects integrates work tracking directly where code lives, reducing context switching and improving developer productivity.
Real-world analogy: Like a digital whiteboard next to your desk where you can move sticky notes representing tasks, but this whiteboard automatically updates when code changes happen and is visible to your entire distributed team.
How it works (Detailed step-by-step):
Create a project: From repository or organization settings, create a new Project and choose view type (Board, Table, or Roadmap)
Add issues and PRs: Drag issues/PRs from repositories into the project or create new draft issues directly in the project
Customize fields: Add custom fields like Priority, Sprint, Team, Story Points, or custom statuses
Automate workflows: Set up built-in automations for common actions (e.g., "Auto-archive items when closed")
Track progress: Use insights, charts, and filters to monitor velocity, burndown, and completion rates
📊 GitHub Projects Architecture Diagram:
graph TB
subgraph "Organization Level"
ORG[Organization Project<br/>Cross-Repo View]
end
subgraph "Repository A"
ISSUE1[Issue #45<br/>Add Login Feature]
PR1[PR #46<br/>Implement OAuth]
end
subgraph "Repository B"
ISSUE2[Issue #12<br/>Fix API Bug]
PR2[PR #13<br/>Update Endpoint]
end
subgraph "Project Board"
TODO[📋 To Do]
PROGRESS[🔄 In Progress]
REVIEW[👀 In Review]
DONE[✅ Done]
end
ORG --> |aggregates| TODO
ORG --> |aggregates| PROGRESS
ORG --> |aggregates| REVIEW
ORG --> |aggregates| DONE
ISSUE1 --> TODO
PR1 --> PROGRESS
ISSUE2 --> REVIEW
PR2 --> DONE
AUTO[Automation:<br/>PR created → In Progress<br/>PR merged → Done]
AUTO -.triggers.-> PROGRESS
AUTO -.triggers.-> DONE
style ORG fill:#e3f2fd
style TODO fill:#fff3e0
style PROGRESS fill:#e1f5fe
style REVIEW fill:#f3e5f5
style DONE fill:#c8e6c9
style AUTO fill:#ffebee
See: diagrams/02_domain1_github_projects_architecture.mmd
Diagram Explanation:
This architecture diagram shows how GitHub Projects creates a unified view across multiple repositories at the organization level. The Organization Project (blue) acts as an aggregation layer that pulls issues and pull requests from multiple repositories into a single project board.
In Repository A, developers create Issue #45 for a new login feature and PR #46 to implement OAuth authentication. In Repository B, there's Issue #12 for an API bug and PR #13 to fix it. All these items are automatically or manually added to the organization-level project.
The Project Board has four columns representing workflow states: To Do (orange - work not started), In Progress (light blue - active development), In Review (purple - code review stage), and Done (green - completed work). Items move across columns as work progresses. Issue #45 sits in To Do waiting for someone to pick it up. PR #46 is In Progress as someone actively codes. Issue #12 is In Review as the fix undergoes code review. PR #13 is in Done because it merged successfully.
The Automation box (red) shows automated workflows that trigger state changes. When a developer creates a PR, automation moves the linked issue to "In Progress." When a PR merges, automation moves it to "Done" and can auto-close linked issues. This reduces manual board maintenance and ensures the project always reflects current work state. All changes bidirectionally sync - updating an item's status in the project updates the actual issue/PR, and vice versa.
Detailed Example 1: Sprint Planning with GitHub Projects
Your team starts a 2-week sprint. The product owner has prioritized 15 issues in the backlog labeled "sprint-12." You create a new GitHub Project called "Sprint 12" with custom fields: Priority (High/Medium/Low), Story Points (1-13), and Iteration (Sprint 12). Using automation rules, you configure "Auto-add items with label:sprint-12" which automatically populates the project with all 15 issues.
During sprint planning, the team reviews each issue in Table view, assigns story points based on complexity, sets priorities, and assigns developers. The team's capacity is 50 story points, and the auto-calculated sum shows 48 points - perfect fit. You switch to Board view for daily standups where developers move cards as they work.
When developer Sarah picks up "Add shopping cart," she creates a branch and a draft PR. The automation "When PR created → Move to In Progress" triggers, automatically moving the issue card. After code review and merge, "When PR merged → Move to Done" automation triggers, and the issue closes automatically via "Closes #45" in the PR description. By sprint end, the Insights tab shows burndown chart with 46/48 points completed, velocity chart showing improvement from last sprint, and distribution chart showing balanced workload across team members. The project becomes historical record of the sprint.
Detailed Example 2: Cross-Repository Dependency Tracking
Your microservices architecture has 8 repositories, and you're implementing a feature that touches 4 of them: API Gateway (repo A), Auth Service (repo B), User Service (repo C), and Database Migrations (repo D). You create an organization-level project called "SSO Implementation" to track all related work across repositories.
In each repository, developers create issues: API-123 in repo A, AUTH-45 in repo B, USER-67 in repo C, DB-89 in repo D. You add all issues to the project and create a custom "Dependency" field to track blockers. USER-67 depends on AUTH-45 (can't update user profiles until auth is ready), and API-123 depends on all others (gateway integrates everything).
You set up View filters: "Group by: Repository" shows work per service, "Group by: Assignee" shows work per developer, "Group by: Status" shows overall progress. As AUTH-45 completes, you update its status to Done, and USER-67's assignee gets notified via GitHub notifications that their blocker is cleared. The Roadmap view (timeline) shows all 4 issues with their target completion dates, making dependencies visual. When all issues move to Done, you know the cross-repo feature is complete. This single project replaces what would otherwise require tracking in separate tools or spreadsheets, keeping all information in the context of the code.
⭐ Must Know (Critical Facts):
When to use (Comprehensive):
Limitations & Constraints:
💡 Tips for Understanding:
⚠️ Common Mistakes & Misconceptions:
🔗 Connections to Other Topics:
The problem: Without measurable metrics, teams can't identify bottlenecks, track improvement, or make data-driven decisions about their development process.
The solution: DevOps dashboards aggregate key metrics (cycle time, lead time, velocity, deployment frequency) to provide visibility into team performance and process health.
Why it's tested: The AZ-400 exam emphasizes metric-driven continuous improvement - 12.5% of exam focuses on designing appropriate metrics for DevOps activities.
What it is: Two critical flow metrics that measure how fast work moves through your development pipeline, but they measure different parts of the process.
Why it exists: Teams need to distinguish between total delivery time (lead time - customer perspective) and actual work time (cycle time - team efficiency). Understanding both helps identify where delays occur and whether problems are in planning (long lead time) or execution (long cycle time).
Real-world analogy: Ordering a pizza: Lead time is from when you place the order to when it arrives at your door (total customer wait). Cycle time is from when the kitchen starts preparing your pizza to when it comes out of the oven (actual work time). If lead time is 60 minutes but cycle time is 15 minutes, most delay is in the queue, not preparation.
How it works (Detailed step-by-step):
Work item created: Timer for Lead Time starts immediately when issue/user story is created in backlog
Work item moves to Active/In Progress: Timer for Cycle Time starts when team begins active work
Work item completed: Both timers stop when work item reaches "Done/Closed" state
Reactivation handling: If work item reopens, cycle time aggregates active periods
📊 Lead Time vs Cycle Time Diagram:
graph LR
A[📝 Work Item<br/>Created] -->|Waiting in Backlog<br/>9 days| B[🚀 Work Started<br/>Active/In Progress]
B -->|Development<br/>5 days| C[✅ Work Item<br/>Completed]
A -.Lead Time: 14 days.-> C
B -.Cycle Time: 5 days.-> C
D[📊 Metrics] --> E[Lead Time:<br/>Customer perspective<br/>Total delivery time]
D --> F[Cycle Time:<br/>Team efficiency<br/>Actual work time]
style A fill:#fff3e0
style B fill:#e1f5fe
style C fill:#c8e6c9
style D fill:#f3e5f5
style E fill:#ffebee
style F fill:#e8f5e9
See: diagrams/02_domain1_lead_cycle_time_comparison.mmd
Diagram Explanation:
This diagram illustrates the critical difference between Lead Time and Cycle Time metrics in DevOps workflows. The timeline starts when a Work Item is Created (orange) - this could be a user story, bug, or feature request. At this moment, the Lead Time clock starts ticking because from the customer's perspective, they're waiting for this functionality.
The work item sits in the backlog for 9 days - prioritization meetings happen, dependencies clear, team capacity becomes available. During this Waiting in Backlog period, lead time continues accumulating, but cycle time hasn't started yet because no active development is occurring. This waiting period often reveals process inefficiencies: oversized backlogs, unclear priorities, or capacity constraints.
When Work Started (blue) - a developer picks up the item and moves it to "Active" or "In Progress" - the Cycle Time clock starts. Now the team is actively coding, testing, and reviewing. This development phase takes 5 days, during which both lead time and cycle time increase together. The cycle time measures pure team efficiency: how fast can developers deliver once they start working?
Finally, the Work Item Completed (green) when the code merges and deploys. Both timers stop. The Lead Time = 14 days (9 waiting + 5 working) represents what the customer experienced - two weeks from request to delivery. The Cycle Time = 5 days represents team efficiency - when focused, the team delivers in a week.
The Metrics section (purple) summarizes: Lead Time (red) is the customer perspective showing total delivery time, while Cycle Time (green) is team efficiency showing actual work time. If lead time is much higher than cycle time (like 14 vs 5), the problem isn't team speed - it's backlogs, prioritization, or wait time. If cycle time is high, the team's execution needs improvement through automation, better practices, or removing impediments.
Detailed Example 1: E-commerce Feature Delivery
Your team receives a request for a new feature: "Add wishlist functionality." On January 1, Product Owner creates User Story #456 in Azure Boards - Lead Time starts at Day 0. The story sits in the "New" state while the PO writes acceptance criteria, talks to stakeholders, and prioritizes against other work. During sprint planning on January 10, the team estimates it at 8 story points and adds to current sprint, moving it to "Approved" state - lead time is now at 9 days, but cycle time hasn't started because no development occurred yet.
On January 12, developer Maria picks up the story, creates branch feature/wishlist-456, and moves the work item to "Active" - Cycle Time starts at Day 0. She spends 3 days implementing the frontend wishlist UI, 1 day on backend API, and 1 day writing tests. On January 17, she creates a PR - cycle time is 5 days. Code review takes 1 day, PR merges on January 18, and automated deployment to production completes. Work item moves to "Closed" - Cycle Time = 6 days (Jan 12-18), Lead Time = 17 days (Jan 1-18).
Analysis: The 11-day gap (17 lead - 6 cycle) represents wait time before development started. To improve customer satisfaction (lead time), the team needs to reduce backlog size or prioritize faster, not necessarily work faster (cycle time already good at 6 days).
Detailed Example 2: Bug Fix with Reactivation
A critical bug report "#789 - Payment fails for international cards" is created on March 1 in "New" state - Lead Time starts. On March 2, it's triaged as P0 (highest priority) and moved to "Active" - Cycle Time starts. Developer fixes the validation logic in 2 hours and deploys on March 2 - Cycle Time = 1 day, Lead Time = 1 day. Both timers stop when bug moves to "Closed."
On March 5, the bug is reported again - it only fixed US cards, not all international cards. The bug reopens to "Active" state - Cycle Time restarts (lead time continues from original creation). Developer spends March 5-6 implementing comprehensive international card support and deploys. Bug closes again on March 6 - Second cycle period = 2 days.
Final Metrics: Lead Time = 5 days (March 1-6 total), Total Cycle Time = 1 + 2 = 3 days (sum of both active periods). This shows the bug required 5 days to truly deliver from customer perspective, with 3 days of actual work spread across two attempts. The reactivation reveals incomplete initial fix - a process improvement opportunity for better testing before closure.
⭐ Must Know (Critical Facts):
When to use (Comprehensive):
Limitations & Constraints:
💡 Tips for Understanding:
⚠️ Common Mistakes & Misconceptions:
🔗 Connections to Other Topics:
What it is: A stacked area chart that visualizes the distribution of work items across different workflow states over time, showing work in progress, throughput, and bottlenecks at a glance.
Why it exists: Teams need to visualize flow health and identify process problems quickly. A CFD shows not just point-in-time status, but trends: is work piling up in code review? Is "To Do" growing faster than "Done"? Are we delivering consistently?
Real-world analogy: Like watching a multi-lane highway from above with traffic cameras. Each lane (New, Active, Review, Done) is a colored band. Wide bands = lots of cars (work items) in that lane. If one lane keeps getting wider while others stay stable, there's a traffic jam (bottleneck) that needs fixing.
How it works (Detailed step-by-step):
Horizontal axis = Time: Shows date range (typically 30, 60, or 90 days rolling window)
Vertical axis = Work Item Count: Total number of items across all states at each point in time
Colored bands = Workflow states: Each color represents items in a specific state (bottom to top: Done, Review, Active, New)
Band transitions reveal flow: Smooth parallel bands = healthy flow; diverging/converging bands = bottlenecks or capacity changes
Arrival rate vs Departure rate: Slope of top edge (New) vs slope of bottom edge (Done) indicates balance
📊 Cumulative Flow Diagram Example:
graph TD
subgraph "CFD Visualization (30 Days)"
A[Day 1] --> B[Day 15] --> C[Day 30]
D["✅ Done<br/>(Growing Steadily)"]
E["👀 Review<br/>(Stable)"]
F["⚙️ Active<br/>(Bottleneck - Growing)"]
G["📋 To Do<br/>(Stable)"]
D -.Band 1: Green, Bottom.-> D
E -.Band 2: Purple.-> E
F -.Band 3: Blue - WIDENING.-> F
G -.Band 4: Orange, Top.-> G
end
H{Analysis} --> I[Bottleneck in Active<br/>Too many items in development]
H --> J[Review capacity adequate<br/>Band stable]
H --> K[Done rate steady<br/>Consistent throughput]
L[Actions] --> M[Add pair programming<br/>Reduce WIP limits]
L --> N[Break down large stories<br/>Improve flow]
style D fill:#c8e6c9
style E fill:#f3e5f5
style F fill:#e1f5fe
style G fill:#fff3e0
style H fill:#ffebee
style I fill:#ffe0b2
style J fill:#e0f2f1
style K fill:#e8eaf6
See: diagrams/02_domain1_cfd_example.mmd
Diagram Explanation:
The Cumulative Flow Diagram shows work distribution over a 30-day period from Day 1 to Day 30 with four workflow states stacked vertically. At the bottom, the Done band (green) shows steadily growing completion - items continuously move to Done, indicating healthy delivery. The slope of this band represents throughput rate.
Above Done, the Review band (purple) remains relatively stable in width throughout the period. This stable band indicates that code review capacity matches the flow - items don't pile up waiting for review. The team has adequate reviewers, or review is efficiently automated, preventing this from becoming a bottleneck.
The Active band (blue) is the problem area - notice it's WIDENING from Day 1 to Day 30. This expanding band shows items accumulating in active development. On Day 1, maybe 10 items were in Active; by Day 30, it's grown to 25 items. This is a bottleneck: work enters Active faster than it exits to Review. Possible causes: stories too large, developers context-switching, insufficient pair programming, or too high WIP limits.
At the top, the To Do band (orange) remains stable, indicating backlog is controlled. New work enters at roughly the same rate as work moves to Active, preventing backlog explosion. If this band were growing, it would signal prioritization problems or excessive commitments.
The Analysis section (red) identifies: (1) Bottleneck in Active due to widening band - too many concurrent items slow everything, (2) Review capacity is adequate since that band is stable, (3) Done rate is steady showing consistent team throughput despite the Active bottleneck.
Actions to improve: (1) Implement pair programming to increase Active capacity and knowledge sharing, (2) Reduce WIP limits to prevent too many concurrent items in Active - maybe limit to 1 item per developer instead of 2-3, (3) Break down large stories that sit in Active for weeks into smaller deliverable chunks that flow faster.
Detailed Example 1: Identifying Code Review Bottleneck
Your team's CFD for the past 60 days shows a concerning pattern. The "In Review" band starts narrow (5 items) in Week 1 but progressively widens to 25 items by Week 8. Meanwhile, the "Done" band's slope (delivery rate) flattens from 10 items/week to 4 items/week. The team lead examines the data: 25 PRs waiting for review, but only 3 team members designated as reviewers.
Root Cause: Reviewer capacity bottleneck. Only 3 of 10 developers review code, creating a queue. Solution: The team implements "reviewer rotation" - every developer reviews at least 2 PRs per week, distributing the load. They also add automated code quality gates (linting, security scanning) to catch issues before human review. After 3 weeks, the CFD shows "In Review" band narrowing back to 6-8 items, and "Done" slope returning to 9-10 items/week. The visual CFD made the invisible bottleneck obvious and measurable.
Detailed Example 2: Detecting Unsustainable Work Input Rate
An e-commerce team's CFD reveals troubling divergence. The top edge (New + Active + Review) rises at a steep slope of +15 items/week, while the bottom edge (Done) rises at only +8 items/week. Over 8 weeks, this 7-item/week gap accumulates to 56 extra items in the system. Total WIP grows from 40 items (Week 1) to 96 items (Week 8). Lead time increases from 12 days to 35 days because items wait longer in each state.
Root Cause: Product Owner adding work faster than team capacity. Analysis: Arrival rate (15/week) exceeds departure rate (8/week) by 87%. This is mathematically unsustainable - the backlog will grow infinitely. Solution: Product Owner implements strict WIP limit of 50 total items. When backlog reaches 50, no new items added until items complete. This forces prioritization: only truly important work enters. Within 4 weeks, CFD shows parallel top and bottom edges (balanced arrival/departure), total WIP stabilizes at 45 items, and lead time drops back to 14 days. The CFD's diverging bands visually proved the system was overloaded.
Detailed Example 3: Seasonal Capacity Variation
A mobile app team's CFD shows unusual pattern: every 3-4 weeks, the "Active" band suddenly narrows and the "Done" band's slope steepens sharply for 3-5 days, then returns to normal. Investigating, the team discovers this correlates with their biweekly "hackathon days" where developers focus solely on finishing in-progress work without starting new items. During these days, WIP drops from 30 to 18, and completion rate jumps from 3/day to 8/day.
Insight: Multitasking and frequent context switching during normal weeks significantly reduce throughput. When developers focus (hackathon days), they're 2.5x more productive. Solution: Team adopts WIP limits permanently - max 1 item per developer - mimicking hackathon focus daily. New CFD shows consistently narrow "Active" band and steeper "Done" slope. Average lead time drops from 18 to 9 days. The CFD's pattern revealed that their normal "busy" state was actually less productive than their focused "hackathon" state.
⭐ Must Know (Critical Facts):
When to use (Comprehensive):
Limitations & Constraints:
💡 Tips for Understanding:
⚠️ Common Mistakes & Misconceptions:
🔗 Connections to Other Topics:
The problem: Tribal knowledge trapped in developers' heads, outdated documentation in separate tools, and poor communication between distributed teams slow down onboarding and decision-making.
The solution: Integrated documentation tools (wikis, Markdown, Mermaid diagrams) and communication integrations (webhooks, Teams) keep knowledge accessible where code lives.
Why it's tested: 10-15% of AZ-400 exam covers process documentation and team collaboration - critical for DevOps culture and efficiency.
What it is: Built-in wiki systems in both Azure DevOps and GitHub that use Markdown formatting to create, version, and maintain project documentation directly alongside code repositories.
Why it exists: Teams need documentation to live close to code with the same version control,branching, and review processes. Separate wikis (Confluence, SharePoint) become outdated because updating them is a separate workflow. Integrated wikis get updated in the same PR that changes code.
Real-world analogy: Like having your car's owner manual stored in the glove compartment instead of on a shelf at home. When you need to check tire pressure or change oil, the instructions are right there in the car, always the correct version for your specific model year.
How it works (Detailed step-by-step):
Wiki creation: Azure DevOps creates wiki from repository folder (typically /docs) or as separate wiki, GitHub uses repository root or /docs folder
/docs/architecture in repo, it appears as "Architecture" page in wikiMarkdown formatting: Write docs in Markdown with headers, lists, code blocks, tables, links, images
# Header becomes <h1>, **bold** becomes bold, code blocks get syntax highlighting## Section, - bullet, ```python for code blocks, [link](url), Wiki structure: Organize pages hierarchically with table of contents auto-generated from headers or folder structure
/docs/getting-started, /docs/api/authentication, /docs/deployment/azureVersioning and branches: Wiki content versions with code; different branches can have different wiki versions
release/1.0 branch shows v1.0 docs; main branch shows latest docsCollaborative editing: Wiki edits go through pull request review like code changes
📊 Wiki Documentation Workflow Diagram:
sequenceDiagram
participant Dev as Developer
participant Branch as Feature Branch
participant Docs as Wiki/Docs Folder
participant PR as Pull Request
participant Review as Reviewer
participant Main as Main Branch
participant Wiki as Published Wiki
Dev->>Branch: Create feature branch
Dev->>Branch: Implement code changes
Dev->>Docs: Update relevant .md docs
Note over Docs: /docs/api/new-endpoint<br/>/docs/deployment/config
Dev->>PR: Create Pull Request
PR->>Review: Request review (code + docs)
Review->>Review: Review code correctness
Review->>Review: Review docs accuracy
alt Docs need updates
Review->>Dev: Request doc changes
Dev->>Branch: Update documentation
Dev->>PR: Push updated docs
end
Review->>PR: Approve PR
PR->>Main: Merge to main branch
Main->>Wiki: Auto-publish updated wiki
Note over Wiki: Wiki now reflects<br/>latest code + docs
style Dev fill:#e3f2fd
style Branch fill:#f3e5f5
style Docs fill:#fff3e0
style PR fill:#e1f5fe
style Review fill:#f3e5f5
style Main fill:#c8e6c9
style Wiki fill:#e8f5e9
See: diagrams/02_domain1_wiki_documentation_workflow.mmd
Diagram Explanation:
This sequence diagram shows the integrated workflow for maintaining documentation alongside code changes. A Developer (blue) starts by creating a Feature Branch (purple) to implement a new API endpoint. As they write code, they recognize the need to document the new endpoint and configuration changes.
The developer updates relevant Markdown files in the /docs folder (orange): they create /docs/api/new-endpoint explaining the new API with request/response examples, and update /docs/deployment/config to document the new configuration parameters required. These documentation changes are committed to the same feature branch as the code - keeping code and docs in sync.
When the developer creates a Pull Request (light blue), both code and documentation are included. The Reviewer (purple) performs a comprehensive review: they check that the code works correctly AND that the documentation accurately describes the new functionality. This dual review ensures docs don't lag behind code changes.
If documentation needs updates (alt flow), the reviewer requests changes: "Add error handling examples to API docs" or "Clarify the config parameter defaults." The developer updates documentation in the branch and pushes to the PR. This review loop continues until both code and docs meet quality standards.
After approval, the PR merges to Main Branch (green), and the Published Wiki (light green) auto-updates. Now when teammates or users access the wiki, they see documentation that exactly matches the current codebase. If someone checks out the feature branch before merge, they see docs for that branch's code version.
The key insight: Documentation updates flow through the same quality gates (branching, PR, review, merge) as code changes. This prevents the common problem where code gets reviewed rigorously but docs are added as an afterthought and become outdated. By treating docs as code, teams maintain accuracy and reduce knowledge silos.
Detailed Example 1: API Documentation with Code Generation
Your team builds a REST API in Azure. You set up wiki in the /docs folder of your repository. When a developer adds a new endpoint POST /api/orders, they:
/docs/api/ordersThe PR reviewer checks: (1) Code quality, (2) OpenAPI spec accuracy, (3) Generated docs completeness, (4) Example code works correctly. After merge, the wiki automatically displays the new API endpoint documentation. Three months later, when the endpoint changes, the developer updates code, OpenAPI spec, regenerates Markdown, updates examples, and creates PR - docs stay in sync through the same workflow.
Benefit: Documentation isn't a separate task done later; it's part of definition of done for every feature. Teams following this pattern have 90%+ accurate docs because updating docs is as natural as updating code.
Detailed Example 2: Architectural Decision Records (ADRs) in Wiki
An enterprise team adopts the practice of documenting major architectural decisions as ADRs in wiki. When architect proposes using microservices pattern instead of monolith, they:
Create ADR file: /docs/adrs/003-microservices-architecture (numbered sequentially)
Follow ADR template:
Create PR for ADR: Team reviews the architectural decision like they review code
Discussion in PR comments: Team debates trade-offs, suggests alternatives, asks questions
Approval and merge: When consensus reached, ADR merges and becomes official architectural guideline
Six months later, a new developer joins and wonders "Why microservices?" They browse /docs/adrs/ in wiki and find ADR-003 explaining the exact reasoning, context, and trade-offs. When a different team proposes serverless, they reference ADR-003 showing it was already considered and why microservices was chosen instead. The wiki becomes institutional memory that survives team turnover.
Detailed Example 3: Release Notes Auto-Generation from Git History
Your team wants release notes generated from commits. You set up automation:
Conventional commits: Developers write commits following convention: feat:, fix:, docs:, refactor:
feat: Add shopping cart persistence to databasefix: Resolve payment gateway timeout errordocs: Update API authentication examplesGit history parsing: CI pipeline runs script that reads commits between last release tag and current
Categorize changes: Script groups commits by type (Features, Bug Fixes, Documentation, etc.)
Generate Markdown: Script creates /docs/releases/v2.5.0:
# Release v2.5.0 (2024-10-15)
## Features
- Add shopping cart persistence to database (#142)
- Implement guest checkout flow (#155)
## Bug Fixes
- Resolve payment gateway timeout error (#148)
- Fix mobile UI rendering on iOS (#151)
## Documentation
- Update API authentication examples (#143)
Commit to release branch: Automation commits generated release notes to release branch
Wiki displays release notes: /docs/releases/ folder appears in wiki with all historical releases
Benefit: Release notes are always complete and accurate because they're generated from actual commits, not manually written (and forgotten) after the fact. Product managers, support teams, and customers can see exactly what changed in each release by browsing the wiki. The automation ensures no release ships without documentation.
⭐ Must Know (Critical Facts):
/docs folder or dedicated wiki repository with full Git featuresWhen to use (Comprehensive):
Limitations & Constraints:
💡 Tips for Understanding:
⚠️ Common Mistakes & Misconceptions:
/docs folder so documentation updates happen in the same PR as code changes🔗 Connections to Other Topics:
✅ Work Tracking and Traceability:
✅ DevOps Metrics and Dashboards:
✅ Documentation and Collaboration:
Traceability is bidirectional: Work items link to commits/PRs, commits reference work items via AB#{ID}, creating complete audit trail from requirement to deployment
Lead Time measures customer impact, Cycle Time measures team efficiency: Large gap between them indicates process problems (backlog, prioritization), not execution problems
Cumulative Flow Diagrams reveal bottlenecks visually: Widening bands show accumulating work in specific states; parallel top/bottom edges indicate sustainable flow
Documentation must version with code: Repository-based wikis ensure docs stay synchronized with code through same branching, PR review, and merge workflows
Metrics drive continuous improvement: Track cycle time, lead time, velocity, and deployment frequency to identify trends and validate process changes
Test yourself before moving on:
Try these from your practice test bundles:
If you scored below 70%:
Pattern 1: Metric Selection
Pattern 2: Tool Integration
Pattern 3: Documentation Strategy
Key Work Tracking Concepts:
Key Metrics Formulas:
CFD Interpretation:
Documentation Best Practices:
Decision Points:
In Chapter 3: Design and Implement a Source Control Strategy, you'll learn:
These source control concepts build on the work tracking and metrics you've learned - every commit will link to work items, every branch will follow team standards, and every merge will update your flow metrics.
Scenario: Your team of 10 developers is transitioning from a chaotic branching model to GitHub Flow. You need to implement branch policies to ensure code quality and prevent direct commits to main.
Step-by-Step Implementation:
Configure Branch Protection Rules (GitHub):
mainbuild, test, security-scanCreate CODEOWNERS File (optional but recommended):
# .github/CODEOWNERS
# Global owners (review all changes)
* @team-leads
# Frontend code
/src/frontend/** @frontend-team
# Backend code
/src/backend/** @backend-team
# Infrastructure code
/infrastructure/** @devops-team
# Security-sensitive files
/src/auth/** @security-team
# .github/workflows/pr-checks.yml
name: PR Checks
on:
pull_request:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: '18'
- name: Install dependencies
run: npm ci
- name: Build
run: npm run build
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: '18'
- name: Install dependencies
run: npm ci
- name: Run tests
run: npm test
- name: Upload coverage
uses: codecov/codecov-action@v3
security-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
scan-type: 'fs'
scan-ref: '.'
severity: 'CRITICAL,HIGH'
git checkout -b feature/add-logingit commit -m "Add login functionality"git push origin feature/add-loginWhy This Works:
📊 GitHub Flow with Branch Protection Diagram:
graph LR
A[Developer: Create Feature Branch] --> B[Developer: Make Changes]
B --> C[Developer: Push Branch]
C --> D[GitHub: Create Pull Request]
D --> E[GitHub Actions: Run Checks]
E --> F{All Checks Pass?}
F -->|No| G[Developer: Fix Issues]
G --> B
F -->|Yes| H[Reviewers: Review Code]
H --> I{2 Approvals?}
I -->|No| J[Reviewers: Request Changes]
J --> B
I -->|Yes| K[Developer: Merge PR]
K --> L[GitHub: Delete Branch]
L --> M[Main Branch Updated]
style A fill:#e1f5fe
style M fill:#c8e6c9
style F fill:#fff3e0
style I fill:#fff3e0
See: diagrams/02_domain1_github_flow_branch_protection.mmd
Scenario: Your organization uses Azure Boards for work tracking and GitHub for source control. You need to link commits, pull requests, and builds to work items for full traceability.
Step-by-Step Implementation:
Install Azure Boards App in GitHub:
Connect Azure Boards to GitHub:
Link Commits to Work Items:
git commit -m "Add login feature AB#123"AB#{work-item-id} or Fixes AB#{work-item-id}Link Pull Requests to Work Items:
Fixes AB#123Configure Auto-Linking Rules:
View Traceability:
Benefits:
📊 Azure Boards and GitHub Integration Diagram:
sequenceDiagram
participant Dev as Developer
participant GH as GitHub
participant AB as Azure Boards
participant AP as Azure Pipelines
Dev->>AB: 1. Create Work Item (AB#123)
AB->>Dev: 2. Work Item Created
Dev->>GH: 3. Create Branch (feature/AB#123)
Dev->>GH: 4. Commit with "AB#123"
GH->>AB: 5. Link Commit to Work Item
AB->>AB: 6. Update Development Section
Dev->>GH: 7. Create PR with "Fixes AB#123"
GH->>AB: 8. Link PR to Work Item
GH->>AP: 9. Trigger Build
AP->>AB: 10. Link Build to Work Item
Dev->>GH: 11. Merge PR
GH->>AB: 12. Auto-Transition Work Item to Resolved
AP->>AB: 13. Link Deployment to Work Item
Note over Dev,AB: Full traceability:<br/>Work Item → Code → Build → Deployment
See: diagrams/02_domain1_azure_boards_github_integration.mmd
Scenario: Your DevOps team needs a dashboard to monitor pipeline health, deployment frequency, and lead time. The dashboard should be visible to the entire team and update in real-time.
Step-by-Step Implementation:
Create Azure DevOps Dashboard:
Add Widgets:
Widget 1: Build Success Rate:
Widget 2: Deployment Frequency:
Widget 3: Lead Time:
Widget 4: Cycle Time:
Widget 5: Cumulative Flow Diagram:
Widget 6: Test Results Trend:
Configure Auto-Refresh:
Share Dashboard:
Dashboard Layout Example:
+------------------+------------------+------------------+
| Build Success | Deployment | Lead Time |
| Rate (30 days) | Frequency | (90 days) |
| | (per week) | |
+------------------+------------------+------------------+
| Cycle Time | Test Results | Active Bugs |
| (90 days) | Trend (30 days) | (by priority) |
+------------------+------------------+------------------+
| Cumulative Flow Diagram (60 days) |
| Shows: Work in progress, bottlenecks, flow |
+-------------------------------------------------------+
Key Metrics to Track:
📊 DevOps Metrics Dashboard Diagram:
graph TB
subgraph "DevOps Metrics Dashboard"
subgraph "Row 1: Velocity Metrics"
M1[Build Success Rate<br/>Target: >90%<br/>Current: 94%]
M2[Deployment Frequency<br/>Target: Daily<br/>Current: 3x/week]
M3[Lead Time<br/>Target: <7 days<br/>Current: 5.2 days]
end
subgraph "Row 2: Quality Metrics"
M4[Cycle Time<br/>Target: <3 days<br/>Current: 2.8 days]
M5[Test Pass Rate<br/>Target: >95%<br/>Current: 97%]
M6[Active Bugs<br/>Critical: 2<br/>High: 5<br/>Medium: 12]
end
subgraph "Row 3: Flow Visualization"
M7[Cumulative Flow Diagram<br/>Shows work in progress<br/>Identifies bottlenecks]
end
end
style M1 fill:#c8e6c9
style M2 fill:#fff3e0
style M3 fill:#c8e6c9
style M4 fill:#c8e6c9
style M5 fill:#c8e6c9
style M6 fill:#ffebee
See: diagrams/02_domain1_devops_metrics_dashboard.mmd
What you'll learn:
Time to complete: 6-8 hours
Prerequisites: Chapter 2 (Fundamentals and DevOps Principles)
The problem: Without a defined branching strategy, teams create chaos - conflicting changes, broken builds, unclear release process, difficulty tracking what's in production.
The solution: Structured branching strategies provide clear rules for when to branch, how to merge, and how to release, enabling team collaboration at scale.
Why it's tested: 12.5% of AZ-400 exam focuses on source control strategy - branch management is foundation of DevOps collaboration.
What it is: A branching strategy where all developers commit directly to a single main branch (trunk) or use very short-lived feature branches (< 24 hours) that merge quickly to main.
Why it exists: Long-lived feature branches create merge conflicts, delay integration, and hide problems. Trunk-based development forces continuous integration - developers integrate code daily, conflicts are small and manageable, feedback is immediate.
Real-world analogy: Like a highway with one main lane where everyone drives. Instead of building separate roads that later need to connect (merge conflicts), everyone stays on the main highway. If you need to make a quick stop, you pull into a rest area briefly (short branch) then merge back immediately.
How it works (Detailed step-by-step):
Single main branch: Team maintains one "trunk" (usually main or master) that is always deployable
Small, frequent commits: Developers commit working code to trunk multiple times per day
Short-lived feature branches (optional): If using branches, they live less than 24 hours and merge quickly
Feature flags for incomplete work: Use feature toggles to hide incomplete features in production while code is in trunk
if (featureFlag.enabled("newCheckout")) { /* new code */ } else { /* old code */ }Automated quality gates: Comprehensive CI pipeline runs on every commit - tests, linting, security scans
📊 Trunk-Based Development Flow Diagram:
graph TD
A[Developer Workstation] -->|1. Pull latest trunk| B[Local Main Branch]
B -->|2. Create short branch<br/>feature/quick-fix| C[Feature Branch<br/><24 hours]
C -->|3. Multiple commits<br/>2-3 hours work| C
C -->|4. Push branch| D[Remote Repository]
D -->|5. Create PR| E[Pull Request]
E -->|6. Automated CI| F{CI Pipeline}
F -->|Tests pass| G[Code Review]
F -->|Tests fail| H[Fix or Revert]
H -->|Fix commits| C
G -->|Approved| I[Merge to Trunk]
I -->|7. Deploy| J[Production<br/>via feature flags]
K[Feature Flags] -.Control visibility.-> J
style A fill:#e3f2fd
style B fill:#f3e5f5
style C fill:#fff3e0
style D fill:#e1f5fe
style E fill:#f3e5f5
style F fill:#ffe0b2
style G fill:#f3e5f5
style I fill:#c8e6c9
style J fill:#e8f5e9
style K fill:#ffebee
See: diagrams/03_domain2_trunk_based_development.mmd
Diagram Explanation:
This diagram illustrates the trunk-based development workflow for a single feature. The process starts at the Developer Workstation (blue) where a developer pulls the latest code from the Local Main Branch (purple) - this is the trunk, always up-to-date with remote main.
The developer creates a Short-Lived Feature Branch (orange) named feature/quick-fix with the discipline that it must merge within 24 hours. Over 2-3 hours, they make multiple commits to this branch - each commit represents incremental progress on the fix. This is shorter than traditional feature branches that might live for days or weeks.
After pushing the branch to the Remote Repository (light blue), the developer creates a Pull Request (purple) for code review. The CI Pipeline (orange) immediately triggers, running all automated tests, linting, and security scans. If tests fail, the developer must Fix or Revert - either push additional commits to fix the issue or abandon the branch entirely. No broken code enters trunk.
If tests pass, the PR enters Code Review (purple) where teammates review the changes. After approval, the code Merges to Trunk (green), and the automated deployment pipeline pushes to Production (light green). However, if the feature isn't complete, Feature Flags (red) control its visibility - the code deploys but remains hidden behind a toggle until ready.
The key principles: (1) Branches live <24 hours, (2) Trunk is always deployable, (3) CI blocks broken code, (4) Feature flags decouple deployment from release. This enables continuous integration while maintaining production stability.
Detailed Example 1: E-commerce Checkout Refactor
Your team needs to refactor the checkout flow for better performance. Traditional branching would create a long-lived feature/checkout-refactor branch, work for 2 weeks, then merge - causing massive conflicts. With trunk-based development, you:
Day 1: Create feature flag checkout_v2_enabled = false. Commit to trunk. Deploy to production (flag is off, users see old checkout).
Day 2: Create short branch, implement new payment validation, commit behind flag if (checkout_v2_enabled), create PR, merge same day. Code in production but not active.
Day 3-5: Repeat - each day, small branch for cart calculation logic, order submission, confirmation page. Each merges to trunk daily. All code deployed but hidden.
Day 6: All refactoring complete. In production, flip checkout_v2_enabled = true for 10% of users (canary). Monitor metrics.
Day 7: No issues detected. Flip to 100%. Refactor complete without a single merge conflict because changes integrated daily.
Contrast: Traditional feature branch would have 2 weeks of code divergence, 100+ file conflicts on merge, 2-3 days resolving conflicts, high risk of breaking production. Trunk-based had zero conflicts, continuous validation, and controlled rollout.
Detailed Example 2: Hotfix for Production Bug
Production bug discovered: payment processor returns error for amounts >$1000. With trunk-based development:
hotfix/payment-limit branch, fixes validation logic in 30 minutes.Why this was fast: (1) Trunk was current - no time wasted syncing branches, (2) CI was fast - optimized for trunk-based workflow, (3) No complex merge process - direct to trunk, (4) Automated deployment - no manual release process.
Contrast: In GitFlow with long-lived develop/feature branches, this same hotfix requires: merge to develop, test in staging, create release branch, merge to main, then deploy - adding hours or days of delay.
⭐ Must Know (Critical Facts):
When to use (Comprehensive):
Limitations & Constraints:
💡 Tips for Understanding:
⚠️ Common Mistakes & Misconceptions:
🔗 Connections to Other Topics:
The problem: Without enforced quality gates, developers can push broken code directly to important branches, bypassing review, skipping tests, causing production incidents.
The solution: Branch policies enforce automated and manual checks before code merges - required reviewers, passing builds, resolved comments, and more.
Why it's tested: AZ-400 exam heavily tests branch protection configuration - knowing what policies enforce which quality gates is critical.
What it is: Configurable rules in Azure DevOps (branch policies) and GitHub (branch protection rules) that enforce quality standards before allowing merges to protected branches like main or release/*.
Why it exists: Teams need programmatic enforcement of standards - relying on developer discipline alone fails at scale. Branch policies make quality gates automatic and consistent.
Real-world analogy: Like airport security checkpoints. You can't board a plane (merge to main) without passing through security (branch policies) - ID check (code review), metal detector (automated tests), baggage scan (security scans). No exceptions, automated enforcement.
How it works (Detailed step-by-step):
Protect critical branches: Configure policies on main, release/*, or any important branches
Require pull requests: Policy enforces that all merges happen via PR, never direct push
git push origin main fails with "branch protected"; must push to feature branch, create PRRequire minimum reviewers: Policy demands X approvals before PR can merge (typically 1-2)
Require build validation: Policy requires CI build to pass before merge
Require linked work items: Policy enforces PR must link to work item (user story/bug)
Comment resolution: Policy requires all PR comments resolved before merge
📊 Branch Policy Enforcement Diagram:
stateDiagram-v2
[*] --> FeatureBranch: Developer creates branch
FeatureBranch --> PRCreated: Push + Create PR
PRCreated --> BuildRunning: Automatic CI trigger
BuildRunning --> BuildFailed: Tests fail
BuildRunning --> BuildPassed: Tests pass
BuildFailed --> FeatureBranch: Fix code, push again
BuildPassed --> CodeReview: Request reviewers
CodeReview --> ChangesRequested: Reviewer requests changes
CodeReview --> Approved: Reviewers approve
ChangesRequested --> FeatureBranch: Update code
Approved --> CommentCheck: Check comment resolution
CommentCheck --> CommentsUnresolved: Comments pending
CommentCheck --> AllResolved: All resolved
CommentsUnresolved --> CodeReview: Resolve comments
AllResolved --> WorkItemCheck: Check work item link
WorkItemCheck --> NoWorkItem: No AB# link
WorkItemCheck --> WorkItemLinked: AB# present
NoWorkItem --> PRCreated: Add work item link
WorkItemLinked --> MergeReady: All policies passed ✓
MergeReady --> Merged: Merge to main
Merged --> [*]: Branch deleted
See: diagrams/03_domain2_branch_policy_enforcement.mmd
Diagram Explanation:
This state diagram shows the complete journey of a pull request through branch policy enforcement gates. A developer starts by creating a Feature Branch and makes code changes. After pushing changes, they Create PR, which immediately triggers the Build Running state where automated CI executes.
If the Build Fails, the PR cannot proceed - it returns to Feature Branch state where the developer fixes code and pushes again, restarting the cycle. Only when Build Passes can the PR move to Code Review state.
During Code Review, reviewers examine the changes. If they find issues, the state becomes Changes Requested, sending the PR back to Feature Branch for updates. When reviewers Approve (meeting the minimum reviewer count), the PR advances to Comment Check.
At Comment Check, the system verifies all review comments are resolved. If Comments Unresolved, the PR returns to Code Review to resolve them. When All Resolved, it proceeds to Work Item Check.
Work Item Check validates that the PR links to a work item via AB# syntax. If No Work Item, the developer must add the link, returning to PR Created state. With Work Item Linked, all policies are satisfied - the PR reaches Merge Ready state where the merge button becomes active.
Finally, the PR Merges to main, the feature branch is deleted, and the workflow completes. Every gate must pass - skip one, and merge is blocked. This ensures consistent quality enforcement without relying on developer memory or discipline.
Detailed Example 1: Implementing Branch Policies for Main Branch
Your organization wants to protect main branch. You configure Azure DevOps branch policies:
Require pull request reviews: Minimum 2 reviewers, reset approvals on new commits
Build validation: Require "PR-CI" pipeline to pass
Check for linked work items: Require PR associates with work item
Check for comment resolution: All comments must be resolved or marked "Won't fix"
Result: Developer creates PR for main. PR shows 4 status checks:
Developer adds AB#456 link → Work Items: ✓. Build completes → Build: ✓. Two teammates approve → Reviewers: 2/2 ✓. No comments → Comments: ✓. All green, merge button enabled.
Detailed Example 2: Configuring GitHub Branch Protection
Your team uses GitHub. You protect main branch with these rules:
mainResult: Developer tries git push origin main → Rejected: "Cannot push to protected branch". Creates PR instead. PR requires: (1) 2 approvals, (2) CI Build ✓, (3) CodeQL ✓, (4) Dependencies ✓, (5) Comments resolved. Only when all satisfied can merge occur.
⭐ Must Know (Critical Facts):
git push; all changes via PRWhen choosing which branch policies to implement:
📊 Branch Policy Decision Tree:
graph TD
A[Start: Analyze Branch Requirements] --> B{Critical branch?}
B -->|Yes - main/master/release| C[Enable ALL core policies]
B -->|No - feature/topic branch| D{Team size > 5?}
C --> E[Required Reviewers: 2+]
C --> F[Build Validation: Required]
C --> G[Work Item Linking: Required]
C --> H[Comment Resolution: Required]
D -->|Yes| I[Require 1 reviewer minimum]
D -->|No| J[Optional policies only]
I --> K[Build validation recommended]
J --> L[Work item linking optional]
style C fill:#c8e6c9
style E fill:#fff3e0
style F fill:#fff3e0
style G fill:#fff3e0
style H fill:#fff3e0
See: diagrams/03_domain2_branch_policy_decision.mmd
Decision Logic Explained:
For critical branches (main, master, release), always enable the complete policy suite to ensure code quality and traceability. This includes minimum 2 reviewers (prevents single person approving their own questionable code), build validation (catches breaking changes before merge), work item linking (maintains audit trail), and comment resolution (ensures feedback is addressed). For team branches with 5+ members, require at least 1 reviewer and strongly recommend build validation to catch integration issues early. For small teams or personal feature branches, policies can be optional to avoid slowing down exploratory work, but work item linking helps track feature development.
🎯 Exam Focus: Questions often test understanding of when to require vs. recommend policies
The problem: Teams struggle with merge conflicts, integration issues, and release coordination when using Git without clear workflow patterns.
The solution: Adopt proven branching strategies (trunk-based, GitFlow, feature branch) that match team size, release cadence, and risk tolerance.
Why it's tested: DevOps engineers must design workflows that balance speed with stability (15% of Domain 2 questions).
What it is: A source control workflow where developers collaborate on code in a single branch (trunk/main) with very short-lived feature branches (hours to 1-2 days maximum) that merge frequently.
Why it exists: Traditional long-lived feature branches cause massive merge conflicts and integration headaches. Trunk-based development emerged from Google, Facebook, and other tech giants to enable continuous integration and rapid deployment. The core principle: integrate often to avoid integration hell.
Real-world analogy: Like a highway where all cars (developers) stay in the main lanes and only briefly exit for quick stops (feature work), then immediately merge back. Contrast with GitFlow which is like having separate roads for each type of vehicle that rarely intersect.
How it works (Detailed step-by-step):
git checkout -b feature/add-login-button main (morning)git commit -m "Add login button UI"git pull origin main --rebase (keeps history clean)npm test (catches problems before PR)git branch -d feature/add-login-button📊 Trunk-Based Development Diagram:
sequenceDiagram
participant Dev as Developer
participant FB as Feature Branch
participant Main as Main Branch
participant CI as CI/CD Pipeline
participant Prod as Production
Note over Dev,Main: Morning: Start Work
Dev->>Main: Pull latest
Dev->>FB: Create short-lived branch
Note over Dev,FB: 2-4 hours: Development
Dev->>FB: Make small changes
Dev->>FB: Commit frequently
Note over FB,Main: Same Day: Integration
Dev->>Main: Pull latest (rebase)
Dev->>FB: Merge main changes
Dev->>CI: Create PR
CI->>FB: Run automated tests
CI-->>Dev: Tests pass ✓
Note over Dev,Prod: Same Day: Deployment
Dev->>Main: Merge PR (approved)
FB->>Main: Delete feature branch
CI->>Prod: Deploy (or feature flag)
style Main fill:#c8e6c9
style FB fill:#fff3e0
style CI fill:#e1f5fe
style Prod fill:#f3e5f5
See: diagrams/03_domain2_trunk_based_sequence.mmd
Diagram Explanation (detailed):
This sequence diagram shows a complete trunk-based development cycle from start to finish. In the morning, the developer pulls the latest code from the main branch to ensure they're working with current code, then creates a very short-lived feature branch. Over the next 2-4 hours (not days!), they make focused changes and commit frequently to avoid losing work. The same day, before creating a PR, they pull main again and rebase their changes on top (this prevents merge conflicts by integrating latest changes before the PR). The PR triggers CI pipeline which runs all automated tests. Since the changes are small and frequently integrated, tests typically pass quickly. Once approved, the code merges to main and the feature branch is immediately deleted. The main branch remains deployable at all times - either deploy immediately or use feature flags to hide incomplete features. This rapid cycle (hours, not days) prevents integration problems and enables continuous deployment.
Detailed Example 1: E-commerce Team Using Trunk-Based Development
Your e-commerce platform team of 15 developers needs to deploy multiple times daily during Black Friday preparation. Here's how trunk-based development works: Monday 9 AM, Sarah pulls main and creates feature/cart-discount-badge. By 11 AM, she's added the discount badge UI component, written unit tests, and committed 4 times. She pulls main again (3 other developers merged since 9 AM), rebases her branch, runs tests locally - all pass. She creates PR at 11:30 AM. CI pipeline runs: unit tests ✓, integration tests ✓, security scan ✓. Mike reviews at 12 PM, approves with minor comment about CSS naming. Sarah fixes, pushes update, CI re-runs, Mike re-approves. Merge completes at 12:15 PM. The discount badge code is now in main, but wrapped in feature flag discount_badge_enabled=false so it's hidden from users. At 2 PM, product team enables flag for 5% of users to test. At 4 PM, enabled for 100%. The badge is live. Total time: feature branch lived 3 hours. No merge conflicts because changes were small and frequently integrated.
Detailed Example 2: Trunk-Based with Feature Flags for Long Features
Your team needs to build a complete checkout redesign that will take 2 weeks. Old approach: long-lived feature/checkout-redesign branch → massive merge conflicts. Trunk-based approach: Day 1, add feature flag new_checkout_enabled=false to main. Days 1-10, developers create small branches that merge same day: feature/checkout-step1-ui (4 hours), feature/checkout-validation-logic (6 hours), feature/checkout-payment-integration (1 day, split into 3 PRs). Each PR adds code to main wrapped in if (feature_flag.new_checkout_enabled) checks. Old checkout still works because flag is false. By Day 10, entire new checkout is in main but hidden. Day 11-12, QA tests by enabling flag in staging. Day 13, enable for 10% users. Day 14, enable for 100%, remove old checkout code. Result: no merge conflicts (integrated daily), reduced risk (gradual rollout), faster feedback (QA started Day 11, not Day 14).
Detailed Example 3: Handling Hotfixes in Trunk-Based Development
Production bug discovered Friday 3 PM: payment processing fails for Safari users. Trunk-based hotfix flow: (1) Developer pulls main, creates hotfix/safari-payment-fix, (2) Fixes bug in 30 minutes, adds test that reproduces issue, (3) Creates PR with [HOTFIX] label, (4) Automated tests run + required reviewer notified, (5) Reviewer approves in 10 minutes (small, obvious fix), (6) Merge to main at 4 PM, (7) CI auto-deploys to production (main is always deployable), (8) Fix live by 4:15 PM. Total time: 1 hour 15 minutes from discovery to production. If using GitFlow: would need to merge to develop, then merge to master, then create release branch, then deploy → 3-4 hours minimum.
⭐ Must Know (Critical Facts):
When to use (Comprehensive):
Limitations & Constraints:
💡 Tips for Understanding:
⚠️ Common Mistakes & Misconceptions:
🔗 Connections to Other Topics:
Troubleshooting Common Issues:
What it is: A structured branching model with dedicated branch types (main/master, develop, feature, release, hotfix) designed for projects with scheduled release cycles and the need to support multiple production versions simultaneously.
Why it exists: Created by Vincent Driessen in 2010 to solve the problem of coordinating parallel development, managing scheduled releases, and supporting production hotfixes without disrupting ongoing development. Before GitFlow, teams struggled with "when do we stop adding features and start stabilizing for release?" GitFlow provides clear answers through its branching structure.
Real-world analogy: Like a manufacturing assembly line with different stations - features are built in parallel (feature branches), assembled on the main line (develop), sent to quality control for final checks (release branch), shipped to customers (master/main), and if a defect is found, a recall process fixes it (hotfix branch). Each station has a specific purpose and clear handoff points.
How it works (Detailed step-by-step):
main (production code only), develop (integration branch for next release)feature/user-authentication from develop, works for days/weeks, merges back to develop when completedevelop, create release/v2.0 from develop for final testing and bug fixesrelease/v2.0, also merged back to develop to keep it updatedrelease/v2.0 is stable, merge to main, tag as v2.0, deploy to productionhotfix/payment-bug from main, fix, merge to both main and developdevelop for next release📊 GitFlow Architecture Diagram:
graph TB
subgraph "Long-Lived Branches"
M[main/master<br/>Production Code]
D[develop<br/>Next Release]
end
subgraph "Short-Lived Branches"
F1[feature/login]
F2[feature/dashboard]
R[release/v2.0]
H[hotfix/bug-123]
end
D -->|Create| F1
D -->|Create| F2
F1 -->|Merge when complete| D
F2 -->|Merge when complete| D
D -->|Create when ready| R
R -->|Bug fixes| R
R -->|Merge when stable| M
R -->|Merge fixes back| D
M -->|Critical bug| H
H -->|Merge fix| M
H -->|Merge fix| D
M -->|Tag| TAG[v2.0 Tag]
style M fill:#c8e6c9
style D fill:#e1f5fe
style F1 fill:#fff3e0
style F2 fill:#fff3e0
style R fill:#f3e5f5
style H fill:#ffebee
See: diagrams/03_domain2_gitflow_architecture.mmd
Diagram Explanation (detailed):
GitFlow maintains two permanent branches: main (green) contains only production-ready code, and develop (blue) serves as the integration branch for the next release. Feature branches (orange) like feature/login and feature/dashboard are created from develop and can live for days or weeks while developers work on complete features. When a feature is done, it merges back to develop. When enough features accumulate in develop and it's time for a release, a release/v2.0 branch (purple) is created from develop. This release branch is where final testing, documentation, and minor bug fixes occur - no new features allowed. Once the release branch is stable, it merges to both main (becoming production) and back to develop (ensuring bug fixes aren't lost). The main branch is tagged with version number for traceability. If a critical production bug is discovered, a hotfix branch (red) is created from main, the fix is applied, and then merged to both main (immediate production fix) and develop (prevent bug in next release). This structure allows parallel work: new features can continue in develop while a release is being stabilized.
Detailed Example 1: Software Company with Quarterly Releases
Your SaaS company releases new versions quarterly. Current state: v1.5 in production, v1.6 in development. January: Developers create feature branches from develop: feature/export-pdf, feature/dark-mode, feature/api-v2. Over 6 weeks, these features are completed and merged to develop. Mid-February: Product decides v1.6 has enough features, time to release. QA creates release/1.6 from develop. Meanwhile, developers continue creating feature branches from develop for v1.7. QA finds 5 bugs in release/1.6 branch - fixes are committed to release/1.6 and also merged back to develop. March 1: release/1.6 is stable, merged to main, tagged v1.6, deployed to production. March 15: Customer reports critical data loss bug. Developer creates hotfix/1.6.1-data-loss from main, fixes it, merges to both main (becomes v1.6.1 in production) and develop (prevents bug in v1.7). Development continues on develop for v1.7 release in June. Result: Structured release process with clear separation between "next release" and "current production."
Detailed Example 2: GitFlow for Multi-Version Support
Your enterprise software supports 3 versions: v3.0 (current), v2.5 (legacy support), v1.0 (critical fixes only). GitFlow adaptation: Maintain main-v3, main-v2, main-v1 branches (one per supported version), plus develop for next release (v3.1). Customer on v2.5 reports security bug. Flow: (1) Create hotfix/v2.5-security from main-v2, (2) Fix bug, test, (3) Merge to main-v2, deploy to v2.5 customers, (4) Cherry-pick fix to main-v3 (current version needs fix too), (5) Merge fix to develop (v3.1 needs it). For new features: All features go to develop, when ready create release/3.1, stabilize, merge to new main-v3 (becomes current), old main-v3 becomes main-v3-archived. This allows supporting multiple versions while developing new features.
Detailed Example 3: GitFlow Release Branch Workflow Detail
Release day approaches for v2.0. State: 50 features merged to develop over 3 months. Actions: (1) March 1, 9 AM: Release manager creates release/2.0 from develop, (2) CI pipeline deploys release/2.0 to staging environment, (3) QA tests for 2 weeks, logs 12 bugs in Azure Boards, (4) Developers fix bugs by creating small branches from release/2.0: bugfix/login-crash, bugfix/export-timeout, each merges back to release/2.0 AND develop, (5) March 14: All bugs fixed, QA approves, (6) March 15: Merge release/2.0 to main, tag as v2.0.0, deploy to production, (7) March 16: Monitor production, no issues, (8) March 17: Delete release/2.0 branch (no longer needed), (9) Development continues on develop for v2.1, already has 15 new features merged during the 2-week release stabilization. Clean separation of release stabilization from ongoing development.
⭐ Must Know (Critical Facts):
main (production), develop (next release) - if you only have one long-lived branch, it's not GitFlowrelease/X created, no new features - only bug fixes allowedmain is tagged (v1.0, v2.0) to identify what's deployed whenWhen to use (Comprehensive):
Limitations & Constraints:
develop or release fixes to develop causes bugs to reappeardevelopdevelop that's not in main contradicts "main always deployable"💡 Tips for Understanding:
develop is the staging area, release is the final inspection track, main is the departure platformdevelop and go straight to main, but must still merge to develop aftermain, you can't identify what version is deployed when⚠️ Common Mistakes & Misconceptions:
developdevelop means bug fixes are lost in next releasemain and develop before deletiondevelop is fine"develop might have untested features; hotfix must be from stable main branchmain to ensure only production-tested code is includeddevelop, release, and hotfix branches too🔗 Connections to Other Topics:
Troubleshooting Common Issues:
| Feature | Trunk-Based Development | GitFlow | Feature Branch Workflow |
|---|---|---|---|
| Use case | Continuous deployment, rapid iteration | Scheduled releases, formal QA | Simple projects, small teams |
| Main branch | Always deployable, directly to prod | Production-ready code only | Integration branch |
| Feature branch lifespan | Hours to 1-2 days | Days to weeks | Days to weeks |
| Release mechanism | Deploy main anytime, use feature flags | Dedicated release branches | Tag main or create release branch |
| Hotfix process | Fix in main, deploy (1 step) | Hotfix branch → main + develop (3 steps) | Fix in main, tag |
| Pros | • Fast deployment • No merge conflicts • Simple structure |
• Clear release process • Multiple version support • Formal QA stage |
• Easy to learn • Flexible • Good for small teams |
| Cons | • Requires feature flags • Needs strong CI/CD • High discipline needed |
• Complex merges • Slower hotfixes • Not for continuous deployment |
• Can cause merge conflicts • No formal release process • Scales poorly |
| 🎯 Exam tip | Look for: "continuous deployment", "multiple deploys/day", "fast iteration" | Look for: "quarterly releases", "support multiple versions", "formal approval" | Look for: "small team", "simple process", "getting started" |
Scenario 1: Choosing Strategy for E-commerce Platform
📊 Solution Architecture:
graph LR
A[Developer] -->|Create branch| B[feature/fix-cart]
B -->|4 hours work| C[PR to main]
C -->|CI tests pass| D[Merge to main]
D -->|Auto-deploy| E[Production]
E -->|Feature flag OFF| F[Hidden from users]
F -->|Black Friday| G[Enable flag]
G -->|Gradual rollout| H[100% users]
style D fill:#c8e6c9
style E fill:#e1f5fe
style G fill:#fff3e0
See: diagrams/03_domain2_scenario_ecommerce.mmd
Scenario 2: Enterprise SaaS with Compliance Requirements
main-v2, main-v1)Scenario 3: Startup Rapid Prototyping
The problem: Code reviews are often inconsistent, delayed, or superficial, leading to bugs slipping through and knowledge silos forming in teams.
The solution: Implement structured pull request workflows with clear guidelines, automated checks, and effective review practices.
Why it's tested: Code review is the primary quality gate in modern development (20% of Domain 2 questions test PR workflows).
What it is: A systematic approach to creating pull requests that are easy to review, understand, and approve quickly while maintaining high code quality standards.
Why it exists: Large, complex PRs (500+ line changes) take hours to review and often get rubber-stamped without thorough inspection. Small, well-structured PRs get reviewed in 10-15 minutes with better quality outcomes. The problem: developers create massive PRs; the solution: enforce size limits and clear structure.
Real-world analogy: Like proofreading documents - reviewing a 2-page memo takes 5 minutes and catches most errors, while reviewing a 100-page report takes hours and errors slip through due to reviewer fatigue.
How it works (Detailed step-by-step):
📊 Effective PR Workflow:
sequenceDiagram
participant Dev as Developer
participant PR as Pull Request
participant CI as CI Pipeline
participant Rev1 as Reviewer 1
participant Rev2 as Reviewer 2
participant Main as Main Branch
Dev->>Dev: Self-review code
Dev->>PR: Create PR (< 250 lines)
PR->>CI: Trigger automated checks
PR->>Rev1: Notify reviewer 1
PR->>Rev2: Notify reviewer 2
CI->>PR: Build ✓, Tests ✓, Lint ✓
Rev1->>PR: Review code, add comments
Rev2->>PR: Review code, add comments
Dev->>PR: Address feedback
Dev->>PR: Resolve conversations
PR->>CI: Re-run checks
CI->>PR: All checks pass ✓
Rev1->>PR: Approve
Rev2->>PR: Approve
PR->>Main: Merge (squash/rebase)
PR->>PR: Delete feature branch
style CI fill:#e1f5fe
style Main fill:#c8e6c9
style PR fill:#fff3e0
See: diagrams/03_domain2_pr_workflow.mmd
Diagram Explanation (detailed):
The effective PR workflow begins with developer self-review - before creating the PR, the developer reviews their own code to catch obvious issues (typos, console.log statements, unused imports). This saves reviewer time. When creating the PR, the developer ensures it's under 250 lines (large PRs get poor reviews). The PR automatically triggers CI pipeline for automated checks (build, tests, linting, security scans) and notifies assigned reviewers. Both automated (CI) and human (reviewers) validation happen in parallel. Reviewers examine code and add comments/questions. Developer addresses feedback by making changes and explicitly resolving conversations (not ignoring them). After changes, CI re-runs to ensure fixes didn't break anything. When all conversations are resolved and checks pass, reviewers approve. Only then can the PR merge to main using squash or rebase strategy to keep history clean. Finally, the feature branch is automatically deleted to prevent clutter.
Detailed Example 1: Small PR vs Large PR Review Quality
Scenario A (Small PR): Developer creates PR with 150 lines changed - adds new API endpoint. Description: "Add GET /api/users/:id endpoint. Returns user by ID. Related to AB#789." Reviewer clicks PR, sees concise changes in 3 files (route, controller, test). Reviews in 12 minutes, spots issue: "Missing error handling for invalid user ID." Developer fixes in 5 minutes, reviewer re-approves. Total time: 20 minutes, caught 1 bug.
Scenario B (Large PR): Developer creates PR with 1,200 lines changed - refactors entire API layer. Description: "API refactoring." Reviewer clicks PR, sees 45 files changed, overwhelmed. Skims for 30 minutes, approves with "LGTM" comment despite not fully understanding changes. Merges. Production deploys. 3 bugs discovered in production because reviewer missed: (1) broken error handling, (2) race condition, (3) memory leak. Total time: 30 minutes review + 4 hours fixing production bugs. Lesson: Small PRs get better reviews.
Detailed Example 2: PR Description Best Practices
Bad PR description: "Fixed stuff. Updated code. See changes." - Reviewer has no context, must read entire codebase to understand.
Good PR description template:
## What changed
- Added retry logic to payment API client
- Increased timeout from 5s to 30s
- Added exponential backoff (max 3 retries)
## Why
Payment API occasionally returns 503 under load. Current implementation fails immediately.
Customer transactions lost. Business impact: $50K/month failed orders.
## How to test
1. Run: npm test -- payment.test.js
2. Manual test: Simulate API timeout (see test/README)
3. Verify: Logs show retry attempts
## Breaking changes
None - backward compatible
## Related work item
Fixes AB#1234
Result: Reviewer understands context immediately, knows what to focus on, can test changes. Review is faster and more effective.
⭐ Must Know (Critical Facts):
When to use (Comprehensive):
Limitations & Constraints:
💡 Tips for Understanding:
⚠️ Common Mistakes & Misconceptions:
🔗 Connections to Other Topics:
Troubleshooting Common Issues:
The problem: Teams struggle with advanced Git scenarios - resolving conflicts, recovering lost work, cleaning up history, managing large repositories.
The solution: Master Git's powerful features (rebase, cherry-pick, reflog, bisect) to handle complex situations efficiently.
Why it's tested: DevOps engineers must troubleshoot Git issues and guide teams (15% of Domain 2 questions).
What it is: An alternative to merge that rewrites commit history by replaying commits from one branch onto another, creating a linear history instead of merge commits.
Why it exists: git merge creates merge commits that clutter history with "Merged feature/X into main" messages. For frequently-integrated branches, history becomes a tangled web. Rebase solves this by making history linear and readable.
Real-world analogy: Merge is like combining two separate document timelines with a note "Combined documents here." Rebase is like rewriting the second document as if it was always part of the first document's timeline - cleaner, but changes history.
How it works (Detailed step-by-step):
git checkout feature/login then git rebase main⭐ Must Know (Critical Facts):
git rebase -i) - lets you squash, reword, reorder commits; clean up before PRTest yourself before moving on:
Try these from your practice test bundles:
If you scored below 70%:
[One-page summary of chapter - copy to your notes]
Key Concepts:
Decision Points:
git rebase -i)Commands:
git pull origin main --rebase before PRmain, merge to main AND developgit rebase -i HEAD~3 (squash last 3 commits)What you'll learn:
Time to complete: 12-16 hours (largest domain, most exam weight)
Prerequisites: Chapters 1-2 (Fundamentals, Source Control)
The problem: Teams struggle with manual builds, inconsistent deployments, and lack of automation, leading to slow delivery and production bugs.
The solution: Azure Pipelines automates build, test, and deployment processes with YAML-based configuration-as-code.
Why it's tested: Azure Pipelines is the core of DevOps automation (30% of Domain 3 questions, 15% of entire exam).
What it is: Azure Pipelines uses YAML (YAML Ain't Markup Language) files to define CI/CD pipelines as code, with a hierarchical structure of stages, jobs, and steps that execute automation tasks.
Why it exists: Before YAML, pipelines were configured through UI (Classic pipelines), which had problems: not version-controlled, hard to replicate, no code review for pipeline changes, prone to drift. YAML pipelines solve this by treating pipeline configuration as code - versioned, reviewed, reusable, consistent.
Real-world analogy: YAML pipeline is like a recipe book checked into source control. Classic pipeline is like verbal instructions passed between cooks - inconsistent and forgotten. With YAML, every team member has the exact same recipe, can suggest improvements via PR, and changes are tracked in Git history.
How it works (Detailed step-by-step):
trigger → stages → jobs → steps (each level contains the next)📊 YAML Pipeline Structure Diagram:
graph TD
A[azure-pipelines.yml] --> B[Trigger: push to main]
A --> C[Variables: Build config]
A --> D[Stages]
D --> E[Stage: Build]
D --> F[Stage: Test]
D --> G[Stage: Deploy]
E --> H[Job: BuildJob]
H --> I[Step: Install dependencies]
H --> J[Step: Compile code]
H --> K[Step: Publish artifact]
F --> L[Job: UnitTests]
F --> M[Job: IntegrationTests]
G --> N[Job: DeployToDev]
G --> O[Job: DeployToStaging]
style A fill:#e1f5fe
style E fill:#c8e6c9
style F fill:#fff3e0
style G fill:#f3e5f5
I -.Sequential.-> J
J -.Sequential.-> K
L -.Parallel.-> M
N -.Dependent.-> O
See: diagrams/04_domain3_yaml_pipeline_structure.mmd
Diagram Explanation (detailed):
The YAML pipeline starts with a single file (azure-pipelines.yml, blue) containing all configuration. At the top level, you define triggers (when pipeline runs), variables (configuration values), and stages (logical divisions). The pipeline flows through three stages sequentially: Build (green), Test (orange), Deploy (purple). Within the Build stage, a single job (BuildJob) contains three steps that run sequentially on the same agent: install dependencies → compile code → publish artifact (the arrows show sequential execution). The Test stage has two jobs (UnitTests and IntegrationTests) that run in parallel on separate agents to speed up testing. The Deploy stage has two jobs where DeployToStaging depends on DeployToDev completing successfully (dependent execution). This hierarchical structure (trigger → stages → jobs → steps) provides flexibility: parallel where possible (jobs within stage), sequential where necessary (steps within job, stages with dependencies).
Detailed Example 1: Simple Node.js CI Pipeline
Your Node.js app needs automated testing on every commit to main. Here's the YAML:
# azure-pipelines.yml
trigger:
branches:
include:
- main
paths:
include:
- src/*
- tests/*
pool:
vmImage: 'ubuntu-latest'
stages:
- stage: Build
jobs:
- job: BuildJob
steps:
- task: NodeTool@0
inputs:
versionSpec: '18.x'
displayName: 'Install Node.js'
- script: npm ci
displayName: 'Install dependencies'
- script: npm run build
displayName: 'Build application'
- task: PublishBuildArtifacts@1
inputs:
PathtoPublish: 'dist'
ArtifactName: 'webapp'
displayName: 'Publish artifact'
- stage: Test
dependsOn: Build
jobs:
- job: UnitTest
steps:
- script: npm ci
displayName: 'Install dependencies'
- script: npm test -- --coverage
displayName: 'Run unit tests'
- task: PublishTestResults@2
inputs:
testResultsFormat: 'JUnit'
testResultsFiles: '**/junit.xml'
condition: always()
- task: PublishCodeCoverageResults@1
inputs:
codeCoverageTool: 'Cobertura'
summaryFileLocation: '**/coverage/cobertura-coverage.xml'
Breakdown: Pipeline triggers on push to main, but only if src/ or tests/ changed (path filter saves agent time). Runs on Microsoft-hosted Ubuntu agent (vmImage). Build stage installs Node 18, runs npm ci (faster than npm install in CI), builds app, publishes dist folder as artifact named 'webapp'. Test stage depends on Build (runs after), reinstalls dependencies (fresh job, clean environment), runs tests with coverage, publishes results (visible in Azure DevOps Tests tab), publishes coverage report (condition: always() means publish even if tests fail). Result: Every commit to main triggers → build → test → artifact + results available in 5-10 minutes.
Detailed Example 2: GitHub Actions Workflow with Matrix Strategy
You're building a Node.js library that needs to support multiple Node versions (14, 16, 18) and run on both Linux and Windows. Matrix strategy runs the same job with different variable combinations in parallel.
Workflow file (.github/workflows/test.yml):
name: Test Suite
on:
pull_request:
branches: [main]
push:
branches: [main]
jobs:
test:
strategy:
matrix:
os: [ubuntu-latest, windows-latest]
node: [14, 16, 18]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v3
- name: Setup Node ${{ matrix.node }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node }}
- name: Install dependencies
run: npm ci
- name: Run tests
run: npm test
- name: Upload coverage (Ubuntu + Node 18 only)
if: matrix.os == 'ubuntu-latest' && matrix.node == 18
uses: codecov/codecov-action@v3
What happens: GitHub creates 6 parallel jobs (2 OS × 3 Node versions = 6 combinations). Each job checks out code, installs the specific Node version from matrix, runs tests. Coverage only uploads once (from Ubuntu+Node 18 to avoid duplicates). All 6 jobs must pass for PR to be mergeable. If Node 16 on Windows fails but others pass, you know it's a Windows+16 specific issue. Result: Comprehensive compatibility testing across environments, completed in parallel (same time as 1 job, not 6× longer).
Detailed Example 3: Self-Hosted Runner with Specific Capabilities
Your company has specialized build requirements: access to private NuGet feed (internal network), requires specific SDK versions not available on Microsoft-hosted agents, needs access to on-premises database for integration tests. Solution: Set up self-hosted agent/runner.
Azure DevOps agent setup:
# Download and configure agent on your build server
./config.sh --url https://dev.azure.com/yourorg --auth pat --token YOUR_PAT
./run.sh
# Add to agent pool "OnPremises"
# Configure capabilities in Azure DevOps: custom.sdk=specialized, custom.network=internal
Pipeline configuration:
pool:
name: OnPremises
demands:
- custom.sdk -equals specialized
- custom.network -equals internal
steps:
- script: dotnet restore --source http://internal-nuget.company.local/feed
displayName: 'Restore from internal feed'
- script: dotnet build
displayName: 'Build with specialized SDK'
- script: dotnet test --settings integration.runsettings
displayName: 'Run integration tests'
env:
DB_CONNECTION: $(OnPremDbConnection)
Breakdown: Self-hosted agent runs on your infrastructure (Windows Server in your datacenter), has network access to internal resources, pre-configured with specialized SDK. Pipeline requests agents from "OnPremises" pool with specific capabilities (demands). Agent evaluates: "Do I have custom.sdk=specialized? Yes. Do I have custom.network=internal? Yes. I can run this job." Restore pulls packages from internal feed (http://internal-nuget.company.local), build uses the specialized SDK installed on agent, tests connect to on-premises database using secure variable. Result: Build succeeds where Microsoft-hosted agents would fail (no access to internal network). Trade-off: You maintain the agent infrastructure (updates, security patches, scaling).
⭐ Must Know (Pipeline Design Critical Facts):
When to use (Pipeline Design Decisions):
Limitations & Constraints:
💡 Tips for Understanding:
⚠️ Common Mistakes & Misconceptions:
🔗 Connections to Other Topics:
Troubleshooting Common Issues:
trigger: branches: include: [main]./run.sh), verify capabilities match pipeline demands, ensure pipeline has permission to access agent poolThe problem: Applications depend on external libraries (packages). Without centralized management, developers pull packages from public internet (security risk), version conflicts arise (dependency hell), no control over what enters codebase (compliance nightmare). Teams waste time troubleshooting "works on my machine" issues caused by different package versions.
The solution: Implement package management strategy using Azure Artifacts or GitHub Packages. Create feeds (package repositories) for different purposes (development, production, upstream caching). Define versioning standards (SemVer for releases, CalVer for time-based). Control package lifecycle (publishing, promotion, retention). Result: Consistent dependencies across all environments, security scanning of packages, faster builds with upstream caching.
Why it's tested: Package management is fundamental to modern DevOps (20% of Domain 3). Exam tests: Choosing between Azure Artifacts and GitHub Packages, designing feed structures, implementing versioning strategies, configuring upstream sources, managing package retention.
What it is: A package feed is a repository that stores packages (NuGet, npm, Maven, Python). Views are filtered subsets of a feed that show only packages meeting certain criteria (e.g., "Release" view shows only non-prerelease packages, "Latest" view shows only latest versions).
Why it exists: Organizations need to separate package maturity levels (development packages shouldn't mix with production-approved packages), control what developers can consume (some packages may have vulnerabilities), improve performance (upstream sources cache external packages locally). Feed views solve this by creating logical partitions without duplicating storage.
Real-world analogy: Think of a feed like a warehouse with different sections. The warehouse holds all inventory (all package versions), but you create "sections" (views) for different customers: "Retail Section" (stable products only), "Wholesale Section" (bulk items), "Clearance Section" (old versions). Same warehouse, different access points.
How it works (Detailed step-by-step):
https://pkgs.dev.azure.com/myorg/_packaging/MyCompanyPackages@Release/nuget/v3/index.jsondotnet restore, connects to feed, sees only packages in @Release view (beta packages hidden), downloads approved packages only📊 Package Feed Architecture Diagram:
graph TB
subgraph "Azure Artifacts Feed: MyCompanyPackages"
LOCAL["@Local View<br/>(All Packages)"]
PRERELEASE["@Prerelease View<br/>(Beta/Alpha packages)"]
RELEASE["@Release View<br/>(Stable only)"]
LOCAL --> PRERELEASE
LOCAL --> RELEASE
end
subgraph "Upstream Sources"
NUGET["nuget.org<br/>(Public NuGet)"]
NPM["npmjs.com<br/>(Public npm)"]
end
subgraph "Consumers"
DEV["Dev Pipelines"] --> LOCAL
TEST["Test Pipelines"] --> PRERELEASE
PROD["Prod Pipelines"] --> RELEASE
end
CI["CI Pipeline"] -->|Publish| LOCAL
LOCAL -->|Cache| NUGET
LOCAL -->|Cache| NPM
style LOCAL fill:#e3f2fd
style PRERELEASE fill:#fff3e0
style RELEASE fill:#c8e6c9
style PROD fill:#f3e5f5
See: diagrams/04_domain3_package_feed_architecture.mmd
Diagram Explanation (comprehensive breakdown):
This diagram illustrates a complete Azure Artifacts feed architecture with views and upstream sources. At the center is the MyCompanyPackages feed containing three views. The @Local view (blue) is the entry point where ALL packages land when published by CI Pipeline - it contains every version including prereleases, betas, and stable releases. From @Local, packages can be visible in two filtered views: @Prerelease view (orange) automatically shows packages with version suffixes (-beta, -alpha, -rc) for testing environments, and @Release view (green) shows only stable packages without suffixes for production use. On the left, Upstream Sources (nuget.org and npmjs.com) are configured as package origins - when a pipeline requests a package not in the feed, Azure Artifacts fetches it from upstream and caches it in @Local view, so subsequent requests are instant (no internet call). On the right, Consumers show different pipeline types connecting to appropriate views: Dev Pipelines use @Local (can access all packages including experiments), Test Pipelines use @Prerelease (validate beta packages before release), Prod Pipelines use @Release (only approved stable packages). Flow: Developer commits code → CI Pipeline builds and publishes "MyLib 1.2.0-beta" → Package enters @Local and @Prerelease views → Test pipeline tests beta → If tests pass, developer promotes package to version "1.2.0" (removes suffix) → Package now visible in @Release view → Production pipeline can consume it. Upstream caching means if pipeline requests "Newtonsoft.Json 13.0.1" (external package), feed checks @Local, doesn't find it, fetches from nuget.org, caches in @Local, returns to pipeline. Next pipeline requesting same package gets it from cache instantly. Result: Security (all packages flow through your feed, can be scanned), Performance (upstream caching eliminates internet calls), Control (views ensure environments get appropriate package maturity levels).
Detailed Example 1: Publishing npm Package to GitHub Packages
You're building a shared React component library used across multiple projects in your organization. You want to publish it to GitHub Packages so other teams can consume it. GitHub Packages is free for public repos, tightly integrated with GitHub repositories.
Package.json configuration:
{
"name": "@myorg/component-library",
"version": "2.1.0",
"repository": {
"type": "git",
"url": "https://github.com/myorg/component-library.git"
},
"publishConfig": {
"registry": "https://npm.pkg.github.com/@myorg"
}
}
GitHub Actions workflow (.github/workflows/publish.yml):
name: Publish Package
on:
release:
types: [published]
jobs:
publish:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: '18'
registry-url: 'https://npm.pkg.github.com'
scope: '@myorg'
- run: npm ci
- run: npm run build
- run: npm run test
- run: npm publish
env:
NODE_AUTH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Consumer configuration (in other repos, .npmrc file):
@myorg:registry=https://npm.pkg.github.com
//npm.pkg.github.com/:_authToken=${NPM_TOKEN}
Breakdown: Package name must be scoped (@myorg/component-library) for GitHub Packages. publishConfig tells npm to publish to GitHub registry, not public npmjs.com. Workflow triggers on GitHub Release creation (when you tag v2.1.0 and create release). Job needs permissions: packages:write to push package, contents:read to checkout code. Setup-node action configures npm to authenticate with GitHub Packages using built-in GITHUB_TOKEN (no manual secret needed). npm ci installs deps (clean install), npm run build compiles TypeScript to JavaScript, npm run test validates package works, npm publish pushes package to GitHub Packages at https://npm.pkg.github.com/@myorg/component-library. Other teams consuming this package create .npmrc file telling npm where to find @myorg packages (GitHub, not npmjs), authenticate with personal access token (NPM_TOKEN secret in their repo), run npm install @myorg/component-library@2.1.0, package downloads from GitHub Packages. Result: Internal package stays within GitHub ecosystem, automatically linked to source code repository, free for private repos (unlike Azure Artifacts which charges after 2GB), version tied to Git tags.
Detailed Example 2: Azure Artifacts with Upstream Sources
Your company builds .NET applications. You want all teams to restore NuGet packages through Azure Artifacts (for security scanning and caching), but don't want to manually copy every public package. Solution: Configure upstream sources.
Azure Artifacts feed setup (via Azure DevOps UI):
Pipeline configuration (azure-pipelines.yml):
steps:
- task: NuGetAuthenticate@1
displayName: 'Authenticate with Azure Artifacts'
- task: DotNetCoreCLI@2
displayName: 'Restore packages'
inputs:
command: 'restore'
projects: '**/*.csproj'
feedsToUse: 'select'
vstsFeed: 'CompanyNuGet@Approved'
includeNuGetOrg: false
- task: DotNetCoreCLI@2
displayName: 'Build solution'
inputs:
command: 'build'
projects: '**/*.sln'
NuGet.config (in repository):
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<packageSources>
<clear />
<add key="CompanyNuGet" value="https://pkgs.dev.azure.com/myorg/_packaging/CompanyNuGet@Approved/nuget/v3/index.json" />
</packageSources>
</configuration>
What happens on first package request:
dotnet restore for project requiring "Newtonsoft.Json 13.0.3"What happens on subsequent requests (same package):
dotnet restore for same packageSecurity benefit: InfoSec team configures Azure Defender for DevOps to scan all packages in CompanyNuGet feed. If Newtonsoft.Json 13.0.3 has vulnerability, alert triggers, package can be blocked from @Approved view. All projects automatically prevented from using vulnerable package. Result: Upstream caching improves build speed (cached packages are instant), security scanning protects codebase (all packages flow through scanning), compliance audit (track exactly which packages entered organization).
Detailed Example 3: Semantic Versioning (SemVer) Strategy
Your team maintains internal libraries consumed by 50+ microservices. You need versioning strategy that communicates breaking changes clearly so consumers know when updates are safe vs risky. Solution: Semantic Versioning (SemVer): MAJOR.MINOR.PATCH.
Versioning rules implementation:
Example scenario - Library evolution:
v1.0.0 (Initial release):
- UserService.GetUser(id) returns User object
v1.1.0 (Added feature, MINOR bump):
- Added UserService.GetUserByEmail(email) method
- GetUser(id) still works exactly as before
- Consumers can update from 1.0.0 → 1.1.0 safely
v1.1.1 (Bug fix, PATCH bump):
- Fixed null reference bug in GetUser
- No API changes
- Consumers should update 1.1.0 → 1.1.1 (bug fix)
v2.0.0 (Breaking change, MAJOR bump):
- Changed GetUser(id) return type from User to Task<User> (async)
- Consumers must update code: await GetUser(id)
- Update 1.1.1 → 2.0.0 requires code changes
Pipeline implementation (automatically bump version):
variables:
majorVersion: 2
minorVersion: 3
patchVersion: $[counter(variables['minorVersion'], 0)]
packageVersion: $(majorVersion).$(minorVersion).$(patchVersion)
steps:
- script: dotnet pack -p:PackageVersion=$(packageVersion)
displayName: 'Create package with version $(packageVersion)'
- task: NuGetCommand@2
inputs:
command: 'push'
packagesToPush: '**/*.nupkg'
nuGetFeedType: 'internal'
publishVstsFeed: 'MyFeed'
Consumer package.json dependency configurations:
{
"dependencies": {
"@mycompany/lib-stable": "1.1.1",
"@mycompany/lib-minor-updates": "^1.1.0",
"@mycompany/lib-patch-only": "~1.1.0",
"@mycompany/lib-bleeding-edge": "*"
}
}
Version range meanings:
"1.1.1" (exact): Only version 1.1.1, no automatic updates (maximum stability, miss bug fixes)"^1.1.0" (caret): 1.1.0 to <2.0.0 (accept minor and patch updates, no breaking changes)"~1.1.0" (tilde): 1.1.0 to <1.2.0 (accept patch updates only, very conservative)"*" (wildcard): Any version (dangerous, could get breaking changes)Result: SemVer provides contract between library maintainer and consumers. MAJOR bump signals "read changelog, expect code changes", MINOR bump signals "safe to update, new features available", PATCH bump signals "bug fixes, update recommended". Automated counter in pipeline ensures each build gets unique version (2.3.0, 2.3.1, 2.3.2...). Consumers use version ranges to control update aggressiveness (^ for active development, ~ for production stability).
⭐ Must Know (Package Management Critical Facts):
When to use (Package Management Decisions):
Limitations & Constraints:
💡 Tips for Understanding:
⚠️ Common Mistakes & Misconceptions:
🔗 Connections to Other Topics:
Troubleshooting Common Issues:
The problem: Code defects discovered in production are 100X more expensive to fix than defects found during development. Without automated testing in pipelines, manual testing creates bottleneck (QA team overwhelmed), inconsistency (tests skipped under pressure), late detection (bugs found weeks after coding). Teams ship broken code, customers experience failures, reputation damaged.
The solution: Implement comprehensive testing strategy in CI/CD pipelines. Run tests automatically on every commit (shift-left testing). Create quality gates (pipelines fail if tests fail, broken code never merges). Layer tests by type (unit tests for logic, integration tests for APIs, load tests for performance). Measure code coverage (ensure tests actually exercise code). Result: Defects caught in minutes not weeks, consistent quality enforcement, faster development cycles (confident deploys).
Why it's tested: Testing strategy is critical DevOps practice (15% of Domain 3). Exam tests: Designing quality gates, implementing test pyramid (unit/integration/e2e), configuring test tasks and agents, analyzing code coverage, managing flaky tests.
What it is: Quality gates are automated checks in CI pipelines that must pass before code can merge (e.g., "80% code coverage required", "0 critical bugs", "all tests pass"). Release gates are approval checkpoints in CD pipelines before deploying to environments (e.g., "security scan passed", "manual approval from manager", "incident count is low").
Why it exists: Prevents quality degradation by enforcing standards automatically. Without gates, developers can merge failing code (pressure to ship fast), deploy to production during incidents (should wait for stability), skip security scans (convenience over safety). Gates act as guardrails: pipeline stops if quality standards not met, human judgment required for critical deploys, compliance requirements enforced programmatically.
Real-world analogy: Like TSA security at airport. Quality gates are the metal detector and X-ray (automated checks, everyone must pass, no exceptions). Release gates are the customs officer (human review at specific checkpoints, judgment call on suspicious items). You can't board (deploy) until you pass both.
How it works (Detailed step-by-step):
PublishTestResults@2, configure branch policy requiring test runs to pass with 80% pass rate, set code coverage threshold in SonarQube quality gate📊 Quality Gates and Release Gates Flow Diagram:
graph TD
subgraph "CI Pipeline - Quality Gates"
CODE[Code Commit] --> BUILD[Build Code]
BUILD --> UNIT[Run Unit Tests]
UNIT --> INT[Run Integration Tests]
INT --> COV[Code Coverage Analysis]
COV --> QG{Quality Gates<br/>Pass?}
QG -->|Yes| MERGE[Allow Merge]
QG -->|No| BLOCK[Block Merge]
end
subgraph "CD Pipeline - Release Gates"
MERGE --> DEPLOY_START[Start Release]
DEPLOY_START --> SEC{Security Scan<br/>Passed?}
SEC -->|Yes| INC{Incident Count<br/>Low?}
SEC -->|No| REJECT1[Deployment Blocked]
INC -->|Yes| APPROVAL{Manual<br/>Approval?}
INC -->|No| REJECT2[Deployment Blocked]
APPROVAL -->|Approved| DEPLOY[Deploy to Staging]
APPROVAL -->|Rejected| REJECT3[Deployment Blocked]
end
style QG fill:#fff3e0
style SEC fill:#fff3e0
style INC fill:#fff3e0
style APPROVAL fill:#fff3e0
style MERGE fill:#c8e6c9
style DEPLOY fill:#c8e6c9
style BLOCK fill:#ffcdd2
style REJECT1 fill:#ffcdd2
style REJECT2 fill:#ffcdd2
style REJECT3 fill:#ffcdd2
See: diagrams/04_domain3_quality_release_gates.mmd
Diagram Explanation: This diagram shows the two-phase gating system in modern DevOps pipelines. The top section illustrates Quality Gates in the CI Pipeline (Continuous Integration), which act as automated checks preventing bad code from merging. Flow starts with Code Commit, which triggers Build Code step (compilation). If build succeeds, pipeline runs Unit Tests (500+ fast tests checking individual functions), then Integration Tests (50-100 medium-speed tests validating API interactions). Next, Code Coverage Analysis measures what percentage of code is exercised by tests. All results feed into Quality Gates decision point (orange diamond): System evaluates "Did ≥80% of tests pass?", "Is code coverage ≥60%?", "Are there critical bugs?". If ALL conditions met, Quality Gate passes (green arrow) → Allow Merge (code can enter main branch). If ANY condition fails, Quality Gate fails (red arrow) → Block Merge (PR cannot merge, developer must fix). This prevents broken code from entering codebase. The bottom section shows Release Gates in CD Pipeline (Continuous Deployment), which are approval checkpoints before environment deployment. After successful merge, Start Release initiates deployment pipeline. First release gate: Security Scan check (orange diamond) - "Did vulnerability scan complete in last 24 hours? Were critical CVEs found?". If scan failed or has critical vulnerabilities → Deployment Blocked (red). If passed → continue to next gate: Incident Count check - "Are there <5 active incidents in production?". If incident count high (system unstable) → Deployment Blocked (wise to wait for stability). If low → continue to Manual Approval gate: Security team or manager must explicitly approve deploy (human judgment for production changes). Rejected → Deployment Blocked. Approved → Deploy to Staging (green box, deployment proceeds). Result: Quality gates enforce technical standards automatically (tests, coverage), Release gates enforce operational safety (security, stability, human oversight). Together they prevent both code defects (quality) and risky deploys (operational risk).
Detailed Example 1: Branch Policy Quality Gate
You want to ensure all code merged to main branch meets quality standards: all tests pass, code coverage ≥60%, security scan clean. Manual review too slow (20 PRs/day). Solution: Configure branch policies as quality gates.
Azure DevOps branch policy configuration (via UI → Repos → Branches → main → Branch Policies):
# Build validation
Require build validation:
Build pipeline: CI-Pipeline
Trigger: Automatic (when PR updated)
Policy requirement: Required (blocks PR if build fails)
Build expiration: Immediately (re-run on every commit)
Display name: "CI Validation"
# Status checks
Require status checks to pass:
- SonarQube Quality Gate: Required
- Security Scan: Required
- Code Coverage ≥60%: Required
# Code reviewers
Require minimum number of reviewers: 2
Reset votes when source branch updated: Yes
Allow requestors to approve changes: No
Pipeline with quality checks (azure-pipelines.yml):
trigger:
branches:
include: [main]
pr:
branches:
include: [main]
pool:
vmImage: 'ubuntu-latest'
steps:
- task: DotNetCoreCLI@2
displayName: 'Restore packages'
inputs:
command: 'restore'
- task: DotNetCoreCLI@2
displayName: 'Build solution'
inputs:
command: 'build'
- task: DotNetCoreCLI@2
displayName: 'Run unit tests'
inputs:
command: 'test'
arguments: '--collect:"XPlat Code Coverage"'
publishTestResults: true
- task: PublishCodeCoverageResults@1
displayName: 'Publish code coverage'
inputs:
codeCoverageTool: 'Cobertura'
summaryFileLocation: '**/coverage.cobertura.xml'
failIfCoverageEmpty: true
- task: SonarQubeAnalyze@5
displayName: 'Run SonarQube analysis'
- task: SonarQubePublish@5
displayName: 'Publish Quality Gate result'
inputs:
pollingTimeoutSec: '300'
Developer workflow:
Result: Quality standards enforced automatically (no human oversight needed for technical checks), broken code physically cannot merge (disabled button), developers get immediate feedback (5 minutes after commit, not 2 days later from QA), consistent quality (same standards for all PRs, no exceptions).
What it is: Test pyramid is a testing strategy that balances test coverage with execution speed by organizing tests into three layers: Unit tests at the base (70% of tests, fast, cheap), Integration tests in the middle (20% of tests, medium speed), End-to-End tests at the top (10% of tests, slow, expensive). More tests at bottom (fast), fewer at top (slow).
Why it exists: Running only E2E tests is too slow (1 hour to get feedback, developers context-switched to other work, expensive infrastructure). Running only unit tests misses integration bugs (database connection failures, API contract mismatches). Pyramid balances coverage (all types tested) with speed (most tests are fast unit tests giving quick feedback).
Real-world analogy: Building inspection process. Unit tests are like checking individual bricks (is each brick solid? cracked?). Integration tests are like checking walls (do bricks bond together? mortar correct?). E2E tests are like checking whole building (does roof not leak when it rains? do doors open?). You check thousands of bricks (fast, cheap), hundreds of walls (medium effort), final building once (slow, expensive). Same pyramid shape.
How it works (Detailed step-by-step):
calculateTax(income) function - test with income=50000 (expect tax=7500), income=0 (expect tax=0), income=-1000 (expect exception). Run in milliseconds (no database, no network), mocked dependencies. Pipeline runs 3500 unit tests in 2 minutes📊 Test Pyramid Strategy Diagram:
graph TB
subgraph "Test Pyramid - DevOps Strategy"
E2E["End-to-End Tests<br/>10-15% of tests<br/>Slow, Expensive<br/>Full user workflows"]
INT["Integration Tests<br/>20-30% of tests<br/>Medium speed<br/>API/Service interactions"]
UNIT["Unit Tests<br/>50-70% of tests<br/>Fast, Cheap<br/>Individual functions/methods"]
E2E --> INT
INT --> UNIT
end
subgraph "Pipeline Execution"
UNIT --> FAST["Run in parallel<br/>1-5 minutes<br/>Every commit"]
INT --> MEDIUM["Run selectively<br/>5-15 minutes<br/>PR validation"]
E2E --> SLOW["Run nightly<br/>30-60 minutes<br/>Scheduled/Pre-deploy"]
end
style UNIT fill:#c8e6c9
style INT fill:#fff3e0
style E2E fill:#ffcdd2
style FAST fill:#e8f5e9
style MEDIUM fill:#fff9c4
style SLOW fill:#ffebee
See: diagrams/04_domain3_test_pyramid.mmd
Diagram Explanation: The test pyramid (top section) visualizes the ideal distribution of test types in a DevOps pipeline, shaped like a pyramid to represent both quantity and execution characteristics. At the base (widest part, green) are Unit Tests comprising 50-70% of the test suite - these are fast (milliseconds each) and cheap (no infrastructure needed) because they test individual functions/methods in isolation with mocked dependencies. Example: Testing a calculateDiscount(price, percentage) function with various inputs (price=100, percentage=10 → expect 90). Moving up the pyramid, Integration Tests (middle layer, orange) make up 20-30% of tests - medium speed (seconds each) and moderate cost (need database, message queues) because they validate how components interact. Example: Testing that when API receives POST /orders, it correctly writes to database AND sends message to queue AND returns 201 status. At the top (smallest section, red) are End-to-End Tests comprising only 10-15% of suite - slow (minutes each) and expensive (require full environment: browser, multiple services, database, external APIs) because they test complete user workflows. Example: Selenium test that opens browser, logs in, adds item to cart, checks out, verifies order confirmation email. The pyramid shape ensures most tests are fast (quick feedback loop) while still having coverage of integration and user scenarios. The bottom section (Pipeline Execution) maps each test layer to execution strategy. Unit Tests (green) run in parallel across multiple agents, complete in 1-5 minutes, trigger on every commit (continuous feedback). Integration Tests (orange) run selectively (only on PR validation or when integration code changes), complete in 5-15 minutes, provide medium-latency feedback. End-to-End Tests (red) run nightly on schedule or pre-deployment only (not every commit - too slow), complete in 30-60 minutes, validate system health before releases. Result: Developers get feedback in 2 minutes from unit tests (90% of bugs caught here), 10 minutes from integration tests (API contract issues), full validation overnight (E2E catches UI/workflow bugs). Anti-pattern (inverted pyramid): Having 70% E2E tests and 10% unit tests → 2 hour feedback loop, flaky tests, expensive infrastructure, developers wait hours for results. Correct pyramid: Most tests fast and stable (unit), fewer tests slower but broader (integration), fewest tests slowest but comprehensive (E2E).
Detailed Example 2: Code Coverage Analysis with Thresholds
Your team ships critical financial software. Management requires proof that code is adequately tested before production deployment. Solution: Implement code coverage analysis with enforced thresholds.
Pipeline configuration with coverage (azure-pipelines.yml):
steps:
- task: DotNetCoreCLI@2
displayName: 'Run tests with coverage'
inputs:
command: 'test'
projects: '**/*Tests.csproj'
arguments: '--configuration Release --collect:"XPlat Code Coverage" --results-directory $(Agent.TempDirectory)'
- task: PublishCodeCoverageResults@1
displayName: 'Publish coverage results'
inputs:
codeCoverageTool: 'Cobertura'
summaryFileLocation: '$(Agent.TempDirectory)/**/coverage.cobertura.xml'
failIfCoverageEmpty: true
- task: BuildQualityChecks@8
displayName: 'Check coverage threshold'
inputs:
checkCoverage: true
coverageFailOption: 'build'
coverageType: 'lines'
coverageThreshold: '75'
What coverage measures (example):
public class PaymentProcessor {
public decimal CalculateFee(decimal amount, string customerType) {
if (amount < 0) {
throw new ArgumentException("Amount cannot be negative"); // Line 3
}
decimal baseFee = amount * 0.029m; // Line 6
if (customerType == "Premium") { // Line 8
baseFee = baseFee * 0.5m; // Line 9 - 50% discount for premium
} else if (customerType == "Enterprise") { // Line 10
baseFee = 0; // Line 11 - no fees for enterprise
}
return baseFee; // Line 14
}
}
// Test coverage scenario 1 (poor coverage - 57%)
[Test]
public void CalculateFee_StandardCustomer_ReturnsFee() {
var processor = new PaymentProcessor();
var fee = processor.CalculateFee(100, "Standard");
Assert.AreEqual(2.90m, fee);
}
// Lines executed: 3 (checked), 6, 8 (false branch), 10 (false branch), 14
// Lines NOT executed: 9, 11 (Premium and Enterprise paths never tested)
// Coverage: 4 of 7 lines = 57% ❌ Below 75% threshold
// Test coverage scenario 2 (good coverage - 100%)
[Test]
public void CalculateFee_NegativeAmount_ThrowsException() {
var processor = new PaymentProcessor();
Assert.Throws<ArgumentException>(() => processor.CalculateFee(-10, "Standard"));
}
// Covers line 3 ✓
[Test]
public void CalculateFee_StandardCustomer_ReturnsFee() {
var processor = new PaymentProcessor();
var fee = processor.CalculateFee(100, "Standard");
Assert.AreEqual(2.90m, fee);
}
// Covers lines 6, 8 (false), 10 (false), 14 ✓
[Test]
public void CalculateFee_PremiumCustomer_Returns50PercentDiscount() {
var processor = new PaymentProcessor();
var fee = processor.CalculateFee(100, "Premium");
Assert.AreEqual(1.45m, fee);
}
// Covers lines 6, 8 (true), 9, 14 ✓
[Test]
public void CalculateFee_EnterpriseCustomer_ReturnsZeroFee() {
var processor = new PaymentProcessor();
var fee = processor.CalculateFee(100, "Enterprise");
Assert.AreEqual(0m, fee);
}
// Covers lines 6, 8 (false), 10 (true), 11, 14 ✓
// All lines executed at least once = 100% coverage ✅
Pipeline execution result:
Result: Code coverage ensures tests actually exercise code paths (not just dummy tests that run but don't check anything). Threshold enforcement prevents coverage regression (can't merge PR that drops coverage from 80% to 70%). Visual reports identify untested code (red highlighting in Azure DevOps shows which exact lines have no tests). Management has audit trail (code is 75%+ tested, compliance requirement met).
⭐ Must Know (Testing Strategy Critical Facts):
When to use (Testing Strategy Decisions):
Limitations & Constraints:
💡 Tips for Understanding:
⚠️ Common Mistakes & Misconceptions:
🔗 Connections to Other Topics:
Troubleshooting Common Issues:
The problem: Traditional deployments cause downtime (application offline while deploying new version), high risk (if deployment fails, entire system down), no rollback plan (manually revert changes, takes hours), customer impact (users see errors during deployment). Friday night deploys become emergency events (team on call, stressful, error-prone).
The solution: Implement advanced deployment strategies that enable zero-downtime deployments. Blue-green: Run two identical environments, switch traffic instantly, rollback in seconds. Canary: Deploy to small subset of users first, monitor metrics, gradually increase if healthy, rollback if issues. Feature flags: Deploy code to production but keep features disabled, enable for specific users, instant rollback by toggling flag. Ring deployment: Progressive rollout starting with internal users, then early adopters, finally everyone. Result: Deployments happen during business hours (low risk), instant rollbacks (flip switch), gradual validation (catch issues early), happy customers (no downtime).
Why it's tested: Deployment strategies are core DevOps skill (20% of Domain 3). Exam tests: Blue-green vs canary differences, slot swap configuration, feature flag implementation with Azure App Configuration, ring deployment design, minimizing downtime techniques.
What it is: Blue-green deployment maintains two identical production environments: "Blue" (current version serving users) and "Green" (new version being prepared). Deploy new version to Green environment, run smoke tests, when ready switch router/load balancer from Blue → Green instantly. If issues found, switch back Blue ← Green instantly. Old Blue environment remains running as fallback.
Why it exists: Eliminates deployment downtime (users never see "under maintenance" page), enables instant rollback (flip switch back to Blue if Green has problems), reduces deployment risk (new version fully tested in production-like environment before users see it), allows validation before cutover (smoke tests, performance tests on Green before switching traffic).
Real-world analogy: Theater with two stages. Stage Blue has actors performing current play (audience watching). Stage Green has different actors rehearsing new play (no audience yet). When new play is ready, rotate theater (audience now sees stage Green), old play on stage Blue ready if need to rotate back. Audience never sees empty stage (no downtime).
How it works (Detailed step-by-step):
📊 Blue-Green Deployment Flow Diagram:
graph TD
subgraph "Blue-Green Deployment Flow"
START[Start Deployment] --> DEPLOY_GREEN[Deploy v1.6 to Green]
DEPLOY_GREEN --> SMOKE[Run Smoke Tests on Green]
SMOKE --> HEALTH{Green<br/>Healthy?}
HEALTH -->|No| ROLLBACK1[Keep Blue Active]
HEALTH -->|Yes| SWITCH[Switch Load Balancer<br/>Blue → Green]
SWITCH --> MONITOR[Monitor Green for 30 min]
MONITOR --> CHECK{Error Rate<br/>Normal?}
CHECK -->|No| ROLLBACK2[Immediate Rollback<br/>Green → Blue]
CHECK -->|Yes| SUCCESS[Deployment Complete<br/>Scale Down Blue]
end
subgraph "Environment State"
BLUE[Blue Environment<br/>v1.5 - Active]
GREEN[Green Environment<br/>v1.6 - Standby]
LB[Load Balancer]
LB -->|100% Traffic| BLUE
LB -.->|After Switch| GREEN
end
style HEALTH fill:#fff3e0
style CHECK fill:#fff3e0
style SUCCESS fill:#c8e6c9
style ROLLBACK1 fill:#ffcdd2
style ROLLBACK2 fill:#ffcdd2
style GREEN fill:#e8f5e9
style BLUE fill:#e3f2fd
See: diagrams/04_domain3_blue_green_deployment.mmd
Diagram Explanation: The blue-green deployment process flows through distinct validation stages before cutover. Starting at top-left, Start Deployment initiates the process by triggering pipeline to Deploy v1.6 to Green environment (new version) while Blue environment v1.5 continues serving production traffic. Once Green deployment completes, pipeline executes Run Smoke Tests on Green to validate basic functionality (health endpoints, critical paths, database connectivity). Results feed into Green Healthy? decision (orange diamond). If health checks fail (database migration issue, missing configuration, service won't start) → Keep Blue Active (red box, deployment aborted, Green torn down or fixed, users unaffected because Blue still serving). If healthy → Switch Load Balancer (Blue → Green) executes traffic cutover - load balancer configuration updated to route requests to Green, DNS records updated, traffic shifts from Blue to Green over ~5-10 minutes as DNS caches expire. After switch, Monitor Green for 30 min observes error rates, latency, CPU/memory, comparing to baseline. After monitoring window, Error Rate Normal? decision evaluates metrics. If error spike detected (5% error rate vs 1% baseline, latency 2x normal, memory leak) → Immediate Rollback (Green → Blue) - load balancer flips back instantly (<30 seconds), users return to stable Blue v1.5, Green stays up for debugging. If metrics normal → Deployment Complete, Scale Down Blue (green box success state) - Blue environment scaled to minimum or terminated (save costs), Green becomes new production, next deployment will flip roles (Green becomes old, new Blue provisioned). Bottom section shows Environment State: Blue environment (light blue box) running v1.5, Green environment (light green box) running v1.6, both connected to Load Balancer. Initially LB sends 100% traffic to Blue (solid arrow), after successful switch LB sends 100% to Green (dashed arrow shows new path). Key advantage: Blue remains running during monitoring period (instant rollback capability), users never experience downtime (switch is instant at LB level), full validation happens before traffic shift (smoke tests on Green with real production data/config).
Detailed Example 1: Azure App Service Deployment Slots (Blue-Green)
Azure App Service natively supports blue-green deployments through deployment slots. You have production slot (Blue) and staging slot (Green), can swap them instantly. This is Azure's built-in implementation of blue-green pattern.
Azure App Service setup:
# Create App Service with staging slot
az appservice plan create --name myAppPlan --resource-group myRG --sku S1
az webapp create --name myWebApp --resource-group myRG --plan myAppPlan
az webapp deployment slot create --name myWebApp --resource-group myRG --slot staging
Azure Pipeline for slot deployment (azure-pipelines.yml):
stages:
- stage: DeployToStaging
jobs:
- deployment: DeployStaging
environment: 'staging'
strategy:
runOnce:
deploy:
steps:
- task: AzureWebApp@1
inputs:
azureSubscription: 'MyAzureConnection'
appName: 'myWebApp'
deployToSlotOrASE: true
resourceGroupName: 'myRG'
slotName: 'staging'
package: '$(Pipeline.Workspace)/drop/**/*.zip'
- task: AzureAppServiceManage@0
displayName: 'Start staging slot'
inputs:
azureSubscription: 'MyAzureConnection'
action: 'Start Azure App Service'
webAppName: 'myWebApp'
specifySlotOrASE: true
resourceGroupName: 'myRG'
slot: 'staging'
- stage: ValidateStaging
dependsOn: DeployToStaging
jobs:
- job: SmokeTests
steps:
- task: PowerShell@2
displayName: 'Run smoke tests against staging'
inputs:
targetType: 'inline'
script: |
$response = Invoke-WebRequest -Uri 'https://mywebapp-staging.azurewebsites.net/health' -UseBasicParsing
if ($response.StatusCode -ne 200) {
Write-Error "Health check failed"
exit 1
}
$loginResponse = Invoke-WebRequest -Uri 'https://mywebapp-staging.azurewebsites.net/api/login' `
-Method POST `
-Body '{"username":"test","password":"test123"}' `
-ContentType 'application/json'
if ($loginResponse.StatusCode -ne 200) {
Write-Error "Login test failed"
exit 1
}
Write-Host "Smoke tests passed"
- stage: SwapToProduction
dependsOn: ValidateStaging
jobs:
- deployment: SwapSlots
environment: 'production'
strategy:
runOnce:
deploy:
steps:
- task: AzureAppServiceManage@0
displayName: 'Swap staging to production'
inputs:
azureSubscription: 'MyAzureConnection'
action: 'Swap Slots'
webAppName: 'myWebApp'
resourceGroupName: 'myRG'
sourceSlot: 'staging'
targetSlot: 'production'
- task: PowerShell@2
displayName: 'Monitor production for 5 minutes'
inputs:
targetType: 'inline'
script: |
$endTime = (Get-Date).AddMinutes(5)
while ((Get-Date) -lt $endTime) {
$response = Invoke-WebRequest -Uri 'https://mywebapp.azurewebsites.net/health' -UseBasicParsing
if ($response.StatusCode -ne 200) {
Write-Error "Production health check failed - Initiating rollback"
exit 1
}
Write-Host "Health check passed - $(Get-Date)"
Start-Sleep -Seconds 30
}
Write-Host "Monitoring complete - Deployment successful"
- stage: Rollback
dependsOn: SwapToProduction
condition: failed()
jobs:
- deployment: RollbackSwap
environment: 'production-rollback'
strategy:
runOnce:
deploy:
steps:
- task: AzureAppServiceManage@0
displayName: 'Rollback - Swap production back to staging'
inputs:
azureSubscription: 'MyAzureConnection'
action: 'Swap Slots'
webAppName: 'myWebApp'
resourceGroupName: 'myRG'
sourceSlot: 'production'
targetSlot: 'staging'
What happens during slot swap:
Result: Zero-downtime deployment (swap is instant), instant rollback capability (another swap), staging slot validates code with production configuration before going live, all built into Azure App Service (no custom load balancer config needed).
What it is: Canary deployment gradually rolls out new version to small percentage of users first (5% canary), monitors metrics (error rate, latency, CPU), if healthy increases percentage (25%, 50%, 100%), if unhealthy stops rollout and rollback. Named after "canary in coal mine" (early warning system).
Why it exists: Blue-green switches 100% traffic at once (big bang, high risk if bug only appears at scale). Canary reduces blast radius (only 5% of users affected by bugs), enables real-world validation (actual user traffic, not synthetic tests), provides early warning (metrics anomaly in canary traffic stops rollout before impacting everyone), allows A/B comparison (canary metrics vs production baseline).
Real-world analogy: Restaurant testing new menu item. Don't serve new dish to entire restaurant (what if everyone hates it?). Instead: Offer new dish to one table (5%), watch their reaction, if they love it offer to five tables (25%), if still positive offer to everyone (100%). If first table complains, remove dish, only one table impacted.
How it works (Detailed step-by-step):
Rollback scenario:
📊 Canary Deployment Sequence:
sequenceDiagram
participant Pipeline
participant Canary as Canary (5%)
participant Production as Production (95%)
participant Monitor as Azure Monitor
Pipeline->>Canary: Deploy v2.0 to canary
Pipeline->>Monitor: Start monitoring canary metrics
Note over Canary: 5% of traffic → v2.0
Note over Production: 95% of traffic → v1.9
Monitor-->>Pipeline: Canary metrics healthy (10 min)
Pipeline->>Canary: Increase to 25% traffic
Monitor-->>Pipeline: Error rate spike detected!
Pipeline->>Canary: Rollback - Stop canary
Pipeline->>Production: 100% traffic back to v1.9
Note over Pipeline: Investigate issue, fix, redeploy
See: diagrams/04_domain3_canary_deployment.mmd
Diagram Explanation: Canary deployment uses progressive traffic shifting with continuous monitoring to detect issues early. The sequence begins with Pipeline deploying v2.0 to Canary environment (5% traffic destination) while Production environment (95% traffic) continues serving v1.9. Pipeline immediately starts monitoring canary metrics via Azure Monitor to establish baseline comparison. First monitoring phase (10 min): Small percentage of real user traffic flows to Canary (5%), allowing real-world validation with limited blast radius. Monitor continuously compares canary metrics (error rate, latency, throughput) against production baseline. If metrics are healthy (error rate within tolerance, latency normal, no anomalies) → Monitor returns "Canary metrics healthy" to Pipeline → Pipeline executes traffic increase to 25% (second phase), continues monitoring. However, the diagram shows failure scenario: During monitoring, Monitor detects "Error rate spike!" in canary (e.g., 8% errors in canary vs 1% in production, significant deviation). Monitor immediately alerts Pipeline. Pipeline responds with two actions: (1) Rollback - Stop canary: Traffic weight for canary set to 0%, (2) 100% traffic back to v1.9: All users return to stable production version. Final note indicates "Investigate issue, fix, redeploy" - dev team examines canary logs, identifies root cause (database timeout, memory leak, API integration failure), fixes bug, redeploys as new canary attempt. Key advantage over blue-green: Only 5% of users affected by bug (95% never saw issue), early detection prevented full rollout (canary acted as early warning system), automatic rollback triggered by metrics (no manual intervention), gradual increase allows validation at each stage (5% → 25% → 50% → 100%, stop at any stage if issues arise). This pattern reduces risk of large-scale failures by validating with real traffic in controlled increments.
Detailed Example 2: Kubernetes Canary with Istio
You're deploying microservice to Kubernetes cluster with 100 pods. Want canary deployment with traffic splitting. Solution: Use Istio service mesh for intelligent traffic routing.
Kubernetes deployment manifest:
# Production deployment (v1.9)
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service-v1
spec:
replicas: 95
selector:
matchLabels:
app: api-service
version: v1
template:
metadata:
labels:
app: api-service
version: v1
spec:
containers:
- name: api
image: myregistry.azurecr.io/api-service:1.9
---
# Canary deployment (v2.0)
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service-v2
spec:
replicas: 5
selector:
matchLabels:
app: api-service
version: v2
template:
metadata:
labels:
app: api-service
version: v2
spec:
containers:
- name: api
image: myregistry.azurecr.io/api-service:2.0
Istio VirtualService for traffic splitting:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: api-service
spec:
hosts:
- api-service
http:
- match:
- headers:
canary-user:
exact: "true"
route:
- destination:
host: api-service
subset: v2
weight: 100
- route:
- destination:
host: api-service
subset: v1
weight: 95
- destination:
host: api-service
subset: v2
weight: 5
Azure Pipeline for progressive canary:
stages:
- stage: DeployCanary5Percent
jobs:
- job: Deploy
steps:
- task: Kubernetes@1
inputs:
command: 'apply'
arguments: '-f k8s/api-service-v2-deployment.yaml'
- task: Kubernetes@1
inputs:
command: 'apply'
arguments: '-f k8s/istio-virtualservice-5percent.yaml'
- task: PowerShell@2
displayName: 'Monitor canary metrics'
inputs:
script: |
$query = @"
requests
| where timestamp > ago(10m)
| where customDimensions.version == "v2"
| summarize
ErrorRate = countif(success == false) * 100.0 / count(),
AvgDuration = avg(duration),
RequestCount = count()
"@
$metrics = Invoke-AzOperationalInsightsQuery -WorkspaceId $(WorkspaceId) -Query $query
$errorRate = $metrics.Results.ErrorRate
$avgDuration = $metrics.Results.AvgDuration
Write-Host "Canary metrics: ErrorRate=$errorRate%, AvgDuration=$avgDuration ms"
if ($errorRate -gt 2) {
Write-Error "Canary error rate too high: $errorRate%"
exit 1
}
if ($avgDuration -gt 300) {
Write-Error "Canary latency too high: $avgDuration ms"
exit 1
}
- stage: IncreaseCanary25Percent
dependsOn: DeployCanary5Percent
condition: succeeded()
jobs:
- job: IncreaseTraffic
steps:
- task: Kubernetes@1
inputs:
command: 'apply'
arguments: '-f k8s/istio-virtualservice-25percent.yaml'
- task: PowerShell@2
displayName: 'Monitor 25% canary'
inputs:
script: |
# Same monitoring logic, 15 minute window
Start-Sleep -Seconds 900
- stage: FullRollout
dependsOn: IncreaseCanary25Percent
condition: succeeded()
jobs:
- job: CompleteDeployment
steps:
- task: Kubernetes@1
inputs:
command: 'scale'
arguments: 'deployment/api-service-v2 --replicas=100'
- task: Kubernetes@1
inputs:
command: 'scale'
arguments: 'deployment/api-service-v1 --replicas=0'
- task: Kubernetes@1
inputs:
command: 'apply'
arguments: '-f k8s/istio-virtualservice-100percent.yaml'
- stage: Rollback
dependsOn: DeployCanary5Percent
condition: failed()
jobs:
- job: RollbackCanary
steps:
- task: Kubernetes@1
inputs:
command: 'delete'
arguments: 'deployment/api-service-v2'
- task: Kubernetes@1
inputs:
command: 'apply'
arguments: '-f k8s/istio-virtualservice-0percent.yaml'
What happens:
Result: Gradual validation with real production traffic, automatic rollback on metric anomaly, limited blast radius (5% → 25% → 100% progression), Istio provides fine-grained traffic control (can route by headers, paths, user segments), Azure Monitor integration for automated decision-making.
What it is: Feature flags (feature toggles) allow deploying code to production with new features disabled, then enable features for specific users/environments by toggling configuration, no code deployment needed. Implemented using Azure App Configuration Feature Manager, flags stored centrally, evaluated at runtime.
Why it exists: Traditional deployment couples code deployment with feature release (deploy new code = users see new feature immediately, risky). Feature flags decouple deployment from release (deploy code anytime, enable feature when ready, instant rollback by disabling flag, no code changes). Enable progressive rollout (enable for 5% users, then 25%, then 100%), A/B testing (enable for group A, disabled for group B, compare metrics), emergency killswitch (production bug? disable feature instantly, no deployment needed).
Real-world analogy: Light switch for new features. You install the light bulb (deploy code) but switch is OFF (feature disabled). When ready, flip switch ON (enable flag) - light turns on (users see feature). If light flickers (bug), flip switch OFF instantly - light off (feature disabled), no need to uninstall bulb (no code deployment). Can have dimmer switch (enable for 25% brightness = 25% of users).
How it works (Detailed step-by-step):
if (featureManager.IsEnabledAsync("NewCheckout")) { ShowNewCheckout(); } else { ShowOldCheckout(); }, code deployed to production, flag is OFF (0%), all users see old checkoutDetailed Example: Feature Flag with Targeted Rollout
You're deploying new recommendation engine. Want to enable for beta testers first, then premium customers, then everyone. Use Azure App Configuration with custom filters.
Azure App Configuration setup (via Azure CLI):
# Create App Configuration
az appconfig create --name myAppConfig --resource-group myRG --location eastus
# Create feature flag with targeting filter
az appconfig feature set --name RecommendationEngine \
--feature-flag true \
--label Production \
--connection-string $(az appconfig credential list --name myAppConfig --query "[0].connectionString" -o tsv) \
--description "New AI-powered recommendation engine"
# Configure targeting filter
az appconfig feature filter add --name RecommendationEngine \
--feature-flag-filter Microsoft.Targeting \
--filter-parameters '{"Audience":{"Users":["beta-tester@company.com"],"Groups":["BetaTesters"],"DefaultRolloutPercentage":0}}'
ASP.NET Core application code:
// Startup.cs - Configure feature management
public void ConfigureServices(IServiceCollection services)
{
services.AddAzureAppConfiguration();
services.AddFeatureManagement()
.AddFeatureFilter<TargetingFilter>();
services.AddSingleton<ITargetingContextAccessor, UserTargetingContextAccessor>();
}
public void Configure(IApplicationBuilder app)
{
app.UseAzureAppConfiguration();
}
// UserTargetingContextAccessor.cs - Define targeting context
public class UserTargetingContextAccessor : ITargetingContextAccessor
{
private readonly IHttpContextAccessor _httpContextAccessor;
public ValueTask<TargetingContext> GetContextAsync()
{
var httpContext = _httpContextAccessor.HttpContext;
var userId = httpContext.User.FindFirst(ClaimTypes.NameIdentifier)?.Value;
var groups = httpContext.User.FindAll(ClaimTypes.Role).Select(c => c.Value);
return new ValueTask<TargetingContext>(new TargetingContext
{
UserId = userId,
Groups = groups.ToList()
});
}
}
// RecommendationsController.cs - Use feature flag
public class RecommendationsController : Controller
{
private readonly IFeatureManager _featureManager;
[HttpGet]
public async Task<IActionResult> GetRecommendations()
{
if (await _featureManager.IsEnabledAsync("RecommendationEngine"))
{
// New AI recommendations
var recommendations = await _aiService.GetRecommendations(User.Id);
return Json(new { source = "ai", items = recommendations });
}
else
{
// Old rule-based recommendations
var recommendations = await _ruleService.GetRecommendations(User.Id);
return Json(new { source = "rules", items = recommendations });
}
}
}
Rollout progression (via App Configuration portal):
Day 1: Enable for "BetaTesters" group (20 internal users)
- Targeting filter: Users=[], Groups=["BetaTesters"], DefaultRolloutPercentage=0
- Result: Only users in BetaTesters AD group see new recommendations
Day 3: Add specific premium customers by email
- Targeting filter: Users=["premium@acme.com", "vip@contoso.com"], Groups=["BetaTesters"], DefaultRolloutPercentage=0
- Result: Beta testers + 2 premium customers see new recommendations
Day 5: Rollout to 10% of premium tier
- Targeting filter: Users=[...], Groups=["BetaTesters", "PremiumCustomers"], DefaultRolloutPercentage=10
- Result: All beta testers + specific users + 10% of premium customers
Day 7: Rollout to 50% of all users
- Targeting filter: Users=[...], Groups=[...], DefaultRolloutPercentage=50
- Result: Specific users + specific groups + 50% random rollout
Day 10: Full rollout
- Targeting filter: Users=[...], Groups=[...], DefaultRolloutPercentage=100
- Result: Everyone sees new AI recommendations
What happens during rollout:
Result: Zero-downtime feature rollout, instant rollback capability (toggle flag OFF), targeted rollout by user attributes (email, groups, percentage), no code deployment needed for rollout changes, built-in A/B testing capability, App Configuration provides UI for business users to manage flags (no DevOps needed for rollout adjustments).
⭐ Must Know (Deployment Strategy Critical Facts):
When to use (Deployment Strategy Decisions):
Limitations & Constraints:
💡 Tips for Understanding:
⚠️ Common Mistakes & Misconceptions:
🔗 Connections to Other Topics:
Troubleshooting Common Issues:
UseAzureAppConfiguration() called in Startup, check refresh interval (default 30sec), test with configuration sentinel (force refresh), check App Configuration connection in AzureThis comprehensive chapter covered the entire Build and Release Pipelines domain (50-55% of AZ-400 exam), including:
✅ Section 1: Pipeline Design and Implementation - GitHub Actions vs Azure Pipelines, agents (Microsoft-hosted vs self-hosted), YAML syntax, triggers, stages/jobs/steps hierarchy, templates, matrix builds, pipeline optimization
✅ Section 2: Package Management Strategy - Azure Artifacts vs GitHub Packages, feed views (@Local, @Prerelease, @Release), upstream sources, semantic versioning (SemVer), package promotion workflows, retention policies
✅ Section 3: Testing Strategy - Quality gates (branch policies, code coverage), release gates (approvals, monitoring), test pyramid (unit 70%, integration 20%, E2E 10%), shift-left testing, flaky test management
✅ Section 4: Deployment Strategies - Blue-green deployments (Azure App Service slots), canary releases (progressive rollout 5%→100%), feature flags (Azure App Configuration), ring deployment, zero-downtime techniques, rollback strategies
Test yourself before moving on:
Pipeline Design:
Package Management:
Testing:
Deployment:
Try these from your practice test bundles:
If you scored below 75%:
[One-page summary of chapter - copy to your notes]
Pipeline Essentials:
trigger: (CI), pr: (PR validation), schedules: (nightly builds)Package Management:
Testing Gates:
Deployment Patterns:
Key Azure Tasks:
AzureWebApp@1 - Deploy to App ServiceAzureAppServiceManage@0 - Swap slots, start/stopPublishTestResults@2 - Publish test results (enables quality gates)PublishCodeCoverageResults@1 - Publish coverage (visualize in Azure DevOps)NuGetAuthenticate@1 - Authenticate to Azure ArtifactsNext Chapter: Domain 4 - Security and Compliance Plan
You should know: Authentication methods (service principals, managed identity), secrets management (Azure Key Vault), security scanning (Defender for Cloud, GitHub Advanced Security)
This comprehensive chapter covered the largest domain of the AZ-400 exam (50-55%), focusing on:
✅ Package Management Strategy
✅ Testing Strategy for Pipelines
✅ Pipeline Design and Implementation
✅ Deployment Strategies
✅ Infrastructure as Code
✅ Pipeline Maintenance
Test yourself before moving on:
Package Management:
Testing:
Pipeline Design:
Deployments:
Infrastructure as Code:
Pipeline Maintenance:
Try these from your practice test bundles:
If you scored below 75%:
Package Management:
Testing:
Pipeline Structure:
stages:
- stage: Build
jobs:
- job: BuildJob
steps:
- task: TaskName@Version
Deployment Patterns:
IaC Tools:
Key Tasks:
AzureWebApp@1 - Deploy to App ServiceAzureAppServiceManage@0 - Swap slotsPublishTestResults@2 - Publish test resultsPublishCodeCoverageResults@1 - Publish coverageNuGetAuthenticate@1 - Authenticate to Azure ArtifactsDecision Points:
Next Chapter: 05_domain4_security_compliance - Develop a Security and Compliance Plan (Authentication, secrets management, security scanning)
What you'll learn:
Time to complete: 6-8 hours
Prerequisites: Chapters 0-3 (Fundamentals, Processes, Source Control, Pipelines)
Why this domain matters: Security is no longer an afterthought - it's integrated throughout the DevOps lifecycle (DevSecOps). This domain tests your ability to secure pipelines, protect sensitive data, and automate security scanning to catch vulnerabilities early.
The problem: Pipelines need to access Azure resources, GitHub repositories, and external services. Using personal credentials is insecure (credentials shared, no audit trail, no rotation). Manual permission management doesn't scale.
The solution: Use identity-based authentication (Service Principals, Managed Identity) and role-based access control (RBAC) to grant least-privilege access. Automate permission management through groups and teams.
Why it's tested: Authentication and authorization are fundamental to secure DevOps. The exam tests your ability to choose the right authentication method for different scenarios and implement proper access control.
What they are: Both are Azure AD identities used by applications and services to authenticate to Azure resources without using user credentials.
Why they exist: Applications and pipelines need to access Azure resources (deploy to App Service, read from Key Vault, write to Storage). Using user credentials is problematic:
Service Principals and Managed Identities solve this by providing application-specific identities with their own permissions.
Real-world analogy: Think of a hotel key card system. Instead of giving every employee the master key (user credentials), each employee gets a key card (Service Principal/Managed Identity) that only opens the doors they need access to. If an employee leaves, you deactivate their card without changing all the locks.
How Service Principals work (Detailed step-by-step):
How Managed Identity works (Detailed step-by-step):
📊 Service Principal vs Managed Identity Architecture:
graph TB
subgraph "Service Principal Flow"
SP1[Pipeline] -->|1. Auth with App ID + Secret| AAD1[Azure AD]
AAD1 -->|2. Access Token| SP1
SP1 -->|3. API Call + Token| AZ1[Azure Resource]
AZ1 -->|4. Validate Token & RBAC| AAD1
AZ1 -->|5. Allow/Deny| SP1
end
subgraph "Managed Identity Flow"
MI1[Azure VM/App Service] -->|1. Request Token| IMDS[Instance Metadata Service]
IMDS -->|2. Validate Resource| AAD2[Azure AD]
AAD2 -->|3. Access Token| IMDS
IMDS -->|4. Return Token| MI1
MI1 -->|5. API Call + Token| AZ2[Azure Resource]
AZ2 -->|6. Validate Token & RBAC| AAD2
AZ2 -->|7. Allow/Deny| MI1
end
style SP1 fill:#fff3e0
style MI1 fill:#c8e6c9
style AAD1 fill:#e1f5fe
style AAD2 fill:#e1f5fe
style IMDS fill:#f3e5f5
See: diagrams/05_domain4_sp_vs_mi_flow.mmd
Diagram Explanation (detailed):
The diagram shows two authentication flows side by side to highlight the key difference.
Service Principal Flow (orange): The pipeline must store and provide credentials (Application ID + Secret) to Azure AD. This creates a security risk - if the secret is leaked, anyone can impersonate the Service Principal. The secret must be rotated periodically (every 90 days recommended), requiring updates to all pipelines using it. The flow is: (1) Pipeline authenticates with stored credentials, (2) Azure AD validates and issues token, (3) Pipeline calls Azure resource with token, (4) Resource validates token and checks RBAC, (5) Operation allowed or denied.
Managed Identity Flow (green): No credentials to store or manage. The Azure resource (VM, App Service, or self-hosted agent) has Managed Identity enabled, which means Azure automatically manages its identity. The flow is: (1) Application requests token from IMDS (a special endpoint only accessible from within the Azure resource), (2) IMDS validates the request is coming from a resource with Managed Identity, (3) Azure AD issues token, (4) Token returned to application, (5) Application calls Azure resource with token, (6) Resource validates token and RBAC, (7) Operation allowed or denied. The key advantage: no secrets to leak, rotate, or manage.
Detailed Example 1: Service Principal for GitHub Actions
Scenario: You have a GitHub Actions workflow that needs to deploy a web application to Azure App Service. GitHub Actions runs on GitHub-hosted runners (not in Azure), so Managed Identity is not available.
Solution: Create a Service Principal and store its credentials in GitHub Secrets.
Step-by-step:
az ad sp create-for-rbac --name "github-actions-sp" --role Contributor --scopes /subscriptions/{subscription-id}/resourceGroups/{rg-name}appId, password, and tenantAZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_IDazure/login@v1 action:- uses: azure/login@v1
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
Why this approach: GitHub Actions runners are outside Azure, so they can't use Managed Identity. Service Principal is the only option. The secret is stored in GitHub Secrets (encrypted at rest), and only the workflow can access it.
Detailed Example 2: Managed Identity for Self-Hosted Azure DevOps Agent
Scenario: You have a self-hosted Azure DevOps agent running on an Azure VM. The agent needs to deploy to Azure resources and read secrets from Key Vault.
Solution: Enable System-Assigned Managed Identity on the VM and grant it appropriate RBAC roles.
Step-by-step:
az role assignment create --assignee {vm-principal-id} --role Contributor --scope /subscriptions/{sub-id}/resourceGroups/{rg-name}AzureCLI@2 task without service connection:- task: AzureCLI@2
inputs:
azureSubscription: 'ManagedIdentityConnection'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
az webapp deploy --resource-group myRG --name myApp --src-path app.zip
Why this approach: The agent runs in Azure, so Managed Identity is available. No secrets to manage, rotate, or leak. If the VM is compromised, you can disable the Managed Identity instantly without updating any pipelines.
Detailed Example 3: User-Assigned Managed Identity for Multiple Resources
Scenario: You have 10 Azure VMs running self-hosted agents, and they all need the same permissions (deploy to App Service, read from Key Vault). You don't want to configure each VM individually.
Solution: Create a User-Assigned Managed Identity and assign it to all VMs.
Step-by-step:
az identity create --name "devops-agents-identity" --resource-group "identities-rg"az role assignment create --assignee {identity-principal-id} --role Contributor --scope /subscriptions/{sub-id}az vm identity assign --name {vm-name} --resource-group {rg-name} --identities /subscriptions/{sub-id}/resourceGroups/identities-rg/providers/Microsoft.ManagedIdentity/userAssignedIdentities/devops-agents-identityWhy this approach: User-Assigned Managed Identity is reusable across multiple resources. You manage permissions in one place. If you need to revoke access, you can delete the identity or remove role assignments, affecting all VMs instantly.
⭐ Must Know (Critical Facts):
When to use (Comprehensive):
Limitations & Constraints:
💡 Tips for Understanding:
⚠️ Common Mistakes & Misconceptions:
🔗 Connections to Other Topics:
What they are: GitHub provides three main authentication methods for automation: GitHub Apps, GITHUB_TOKEN (automatic), and Personal Access Tokens (PATs).
Why they exist: GitHub Actions workflows and external tools need to interact with GitHub repositories (clone code, create issues, trigger workflows, publish packages). Using personal user credentials is insecure and doesn't scale. These authentication methods provide secure, scoped access for automation.
Real-world analogy: Think of a building with different types of access cards. A GitHub App is like a master access card for a specific application (can access multiple buildings/repos with fine-grained permissions). GITHUB_TOKEN is like a temporary visitor badge (automatically issued, expires after the visit/workflow run, limited scope). A PAT is like a personal employee badge (tied to your account, you control the scope, but if lost, it can be misused).
How GitHub Apps work (Detailed step-by-step):
How GITHUB_TOKEN works (Detailed step-by-step):
permissions: block${{ secrets.GITHUB_TOKEN }} in workflow stepsHow Personal Access Tokens (PATs) work (Detailed step-by-step):
📊 GitHub Authentication Methods Comparison:
graph TB
subgraph "GitHub App"
GA1[Workflow] -->|1. Request Token| GA2[GitHub App]
GA2 -->|2. Auth with Private Key| GH1[GitHub API]
GH1 -->|3. Installation Token 1hr| GA2
GA2 -->|4. Return Token| GA1
GA1 -->|5. API Call + Token| GH1
GH1 -->|6. Validate & Authorize| GA1
end
subgraph "GITHUB_TOKEN Automatic"
GT1[Workflow Starts] -->|1. Auto-Generate| GH2[GitHub]
GH2 -->|2. GITHUB_TOKEN| GT2[Workflow Steps]
GT2 -->|3. API Call + Token| GH2
GH2 -->|4. Validate Permissions| GT2
GT2 -->|5. Workflow Ends| GH2
GH2 -->|6. Token Expires| GT1
end
subgraph "Personal Access Token"
PAT1[User] -->|1. Create PAT| GH3[GitHub Settings]
GH3 -->|2. Generate Token| PAT1
PAT1 -->|3. Store in Secrets| PAT2[GitHub Secrets]
PAT3[Workflow] -->|4. Read Secret| PAT2
PAT3 -->|5. API Call + PAT| GH4[GitHub API]
GH4 -->|6. Validate & Authorize| PAT3
end
style GA2 fill:#c8e6c9
style GT2 fill:#e1f5fe
style PAT2 fill:#fff3e0
See: diagrams/05_domain4_github_auth_methods.mmd
Diagram Explanation (detailed):
The diagram shows three authentication flows for GitHub automation, each with different security and lifecycle characteristics.
GitHub App Flow (green): Most secure and recommended for production. The workflow requests a token from the GitHub App, which authenticates using a private key (stored securely, never exposed). GitHub validates the app and issues a short-lived installation access token (1 hour expiration). The workflow uses this token for API calls. Key advantages: (1) Tokens are short-lived (1 hour), (2) Actions are attributed to the app, not a user, (3) Fine-grained permissions per repository, (4) Survives user account changes (not tied to a person).
GITHUB_TOKEN Flow (blue): Simplest and automatic. GitHub automatically generates a token when the workflow starts. The token is scoped to the repository and has default permissions (customizable via permissions: block). The token is available as ${{ secrets.GITHUB_TOKEN }} in all steps. When the workflow completes, the token expires immediately. Key advantages: (1) Zero setup required, (2) Automatic and free, (3) Scoped to repository, (4) Expires automatically. Limitations: (1) Can't trigger other workflows (prevents recursive triggers), (2) Can't access other repositories (unless explicitly granted), (3) Limited to workflow duration.
Personal Access Token Flow (orange): User-centric authentication. The user creates a PAT in GitHub settings, selects scopes, and sets expiration (max 1 year for Classic PATs). The PAT is stored in GitHub Secrets (encrypted at rest). The workflow reads the secret and uses it for API calls. Key advantages: (1) Can access multiple repositories, (2) Can trigger other workflows, (3) Works for cross-repo scenarios. Limitations: (1) Tied to user account (if user leaves, PAT stops working), (2) Requires manual rotation before expiration, (3) Broader permissions than needed (especially Classic PATs), (4) Actions attributed to user (audit trail confusion).
Detailed Example 1: Using GITHUB_TOKEN for Package Publishing
Scenario: You have a GitHub Actions workflow that builds a NuGet package and publishes it to GitHub Packages. You want to use the simplest authentication method.
Solution: Use the automatic GITHUB_TOKEN with appropriate permissions.
Step-by-step:
name: Publish Package
on:
push:
branches: [main]
permissions:
packages: write # Grant write access to GitHub Packages
contents: read # Grant read access to repository contents
jobs:
publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup .NET
uses: actions/setup-dotnet@v3
with:
dotnet-version: '8.0.x'
- name: Build and Pack
run: dotnet pack -c Release
- name: Publish to GitHub Packages
run: dotnet nuget push **/*.nupkg --source https://nuget.pkg.github.com/${{ github.repository_owner }}/index.json --api-key ${{ secrets.GITHUB_TOKEN }}
packages:write and contents:read permissions${{ secrets.GITHUB_TOKEN }} to authenticate to GitHub PackagesWhy this approach: GITHUB_TOKEN is automatic, requires no setup, and is scoped to the repository. For same-repo operations (build and publish), it's the simplest and most secure option.
Detailed Example 2: Using GitHub App for Cross-Repo Workflow Triggers
Scenario: You have a monorepo with multiple services. When code is pushed to the shared-library repository, you want to trigger workflows in 5 dependent service repositories. GITHUB_TOKEN can't trigger workflows in other repos.
Solution: Create a GitHub App with workflow permissions and use it to trigger workflows.
Step-by-step:
APP_PRIVATE_KEY)APP_ID)name: Trigger Dependent Workflows
on:
push:
branches: [main]
jobs:
trigger:
runs-on: ubuntu-latest
steps:
- name: Generate GitHub App Token
id: generate_token
uses: actions/create-github-app-token@v1
with:
app-id: ${{ secrets.APP_ID }}
private-key: ${{ secrets.APP_PRIVATE_KEY }}
repositories: service1,service2,service3,service4,service5
- name: Trigger Service Workflows
run: |
for repo in service1 service2 service3 service4 service5; do
curl -X POST \
-H "Authorization: Bearer ${{ steps.generate_token.outputs.token }}" \
-H "Accept: application/vnd.github.v3+json" \
https://api.github.com/repos/${{ github.repository_owner }}/$repo/actions/workflows/build.yml/dispatches \
-d '{"ref":"main"}'
done
Why this approach: GitHub Apps can trigger workflows in other repositories (GITHUB_TOKEN cannot). The app is not tied to a user account, so it survives personnel changes. Tokens are short-lived (1 hour), reducing security risk.
Detailed Example 3: Using Fine-Grained PAT for Azure DevOps Integration
Scenario: You have an Azure DevOps pipeline that needs to clone a private GitHub repository, create issues, and update pull request statuses. Azure DevOps is outside GitHub, so GITHUB_TOKEN is not available.
Solution: Create a Fine-grained Personal Access Token with specific repository access and minimal scopes.
Step-by-step:
az keyvault secret set --vault-name myVault --name github-pat --value {token}trigger:
- main
pool:
vmImage: 'ubuntu-latest'
variables:
- group: github-secrets # Variable group linked to Key Vault
steps:
- script: |
git clone https://$(github-pat)@github.com/myorg/myrepo.git
displayName: 'Clone GitHub Repo'
- script: |
curl -X POST \
-H "Authorization: token $(github-pat)" \
-H "Accept: application/vnd.github.v3+json" \
https://api.github.com/repos/myorg/myrepo/issues \
-d '{"title":"Build completed","body":"Azure Pipeline build #$(Build.BuildId) completed successfully"}'
displayName: 'Create GitHub Issue'
Why this approach: Azure DevOps is outside GitHub, so GITHUB_TOKEN is not available. Fine-grained PAT provides minimal necessary permissions (better than Classic PAT with broad scopes). Storing in Key Vault adds security layer (not in pipeline variables).
⭐ Must Know (Critical Facts):
repo (clone), admin:repo_hook (webhooks), workflow (trigger workflows)permissions: grants it), can't be used outside GitHub ActionsWhen to use (Comprehensive):
Limitations & Constraints:
permissions: grants broader access), expires with workflow💡 Tips for Understanding:
⚠️ Common Mistakes & Misconceptions:
workflow scope to trigger workflows in other repos🔗 Connections to Other Topics:
What they are: Azure DevOps uses a hierarchical permission model with security groups at organization, project, and resource levels. Permissions control who can view, create, modify, or delete resources.
Why they exist: Teams need different levels of access. Developers need to create branches and run pipelines, but shouldn't delete projects. Administrators need full control. Stakeholders need read-only access. Security groups provide role-based access control (RBAC) to enforce least-privilege access.
Real-world analogy: Think of a hospital with different access levels. Doctors (Contributors) can access patient records and write prescriptions. Nurses (Readers) can view records but not prescribe. Hospital administrators (Project Administrators) can manage departments and staff. Visitors (Stakeholders) can only access public areas. Each role has specific permissions appropriate to their responsibilities.
How Azure DevOps permissions work (Detailed step-by-step):
⭐ Must Know (Critical Facts):
When to use (Comprehensive):
💡 Tips for Understanding:
The problem: Pipelines need access to sensitive information (database passwords, API keys, certificates, connection strings). Storing secrets in code or pipeline variables is insecure (visible in logs, accessible to anyone with repo access, no rotation, no audit trail).
The solution: Use Azure Key Vault to centrally store and manage secrets, keys, and certificates. Access secrets in pipelines using Managed Identity or Service Principal. Implement secret rotation, access policies, and audit logging.
Why it's tested: Secret management is critical to DevSecOps. The exam tests your ability to securely store secrets, access them in pipelines without exposure, and implement secret rotation and compliance.
What it is: Azure Key Vault is a cloud service for securely storing and accessing secrets (passwords, API keys), keys (encryption keys), and certificates (SSL/TLS certificates).
Why it exists: Applications need secrets to connect to databases, APIs, and services. Storing secrets in code, configuration files, or pipeline variables creates security risks:
Azure Key Vault solves these problems by providing centralized, secure secret storage with access control, audit logging, and rotation capabilities.
Real-world analogy: Think of a bank safe deposit box. Instead of keeping valuables (secrets) in your desk drawer (code/variables) where anyone can find them, you store them in a secure vault. Only authorized people with the right key (Managed Identity/Service Principal) can access the box. The bank (Azure) keeps a log of every access. If you need to change the lock (rotate secret), you do it once in the vault, not in every location.
How Azure Key Vault works (Detailed step-by-step):
az keyvault secret set --vault-name myVault --name dbPassword --value "P@ssw0rd123"az keyvault secret show --vault-name myVault --name dbPassword --query value -o tsv📊 Azure Key Vault Integration with Pipelines:
sequenceDiagram
participant Pipeline
participant MI as Managed Identity
participant AAD as Azure AD
participant KV as Key Vault
participant App as Application
Pipeline->>MI: 1. Request Token
MI->>AAD: 2. Authenticate (no secret)
AAD->>MI: 3. Access Token
MI->>Pipeline: 4. Return Token
Pipeline->>KV: 5. Get Secret + Token
KV->>AAD: 6. Validate Token
AAD->>KV: 7. Token Valid
KV->>KV: 8. Check Access Policy
KV->>Pipeline: 9. Return Secret Value
Pipeline->>App: 10. Deploy with Secret
KV->>KV: 11. Log Access (audit)
Note over Pipeline,KV: Secret never stored in pipeline<br/>No secret in logs<br/>Centralized rotation<br/>Full audit trail
See: diagrams/05_domain4_keyvault_pipeline_flow.mmd
Diagram Explanation (detailed):
This sequence diagram shows the secure flow of accessing secrets from Azure Key Vault in a pipeline using Managed Identity.
Authentication Phase (Steps 1-4): The pipeline running on an Azure resource (VM, App Service) requests a token from the Managed Identity service. The Managed Identity authenticates to Azure AD without any stored credentials (Azure manages this automatically). Azure AD validates the resource has Managed Identity enabled and issues an access token. The token is returned to the pipeline. This entire phase happens without any secrets being stored or exposed.
Secret Retrieval Phase (Steps 5-9): The pipeline makes a request to Key Vault to retrieve a specific secret, including the access token. Key Vault validates the token with Azure AD to ensure it's legitimate and not expired. Azure AD confirms the token is valid. Key Vault then checks its access policies to verify the Managed Identity has permission to read the requested secret. If authorized, Key Vault returns the secret value to the pipeline. The secret is transmitted over HTTPS and never logged.
Usage and Audit Phase (Steps 10-11): The pipeline uses the secret to deploy the application (e.g., connection string for database, API key for external service). The secret is passed securely to the application without being exposed in logs or pipeline variables. Key Vault logs the access event, recording which identity accessed which secret at what time. This creates a full audit trail for compliance.
Key Security Benefits: (1) No secrets stored in pipeline code or variables, (2) Secrets never appear in logs or error messages, (3) Centralized secret rotation (update once in Key Vault, all pipelines get new value), (4) Full audit trail (who accessed what when), (5) Access control (only authorized identities can read secrets), (6) Managed Identity eliminates need to store Service Principal secrets.
Detailed Example 1: Using Key Vault in Azure Pipeline with Managed Identity
Scenario: You have an Azure Pipeline running on a self-hosted agent (Azure VM with Managed Identity). The pipeline needs to deploy a web app with a database connection string stored in Key Vault.
Solution: Configure Key Vault access policy for the VM's Managed Identity and retrieve the secret in the pipeline.
Step-by-step:
az keyvault set-policy --name myVault --object-id {vm-principal-id} --secret-permissions get listaz keyvault secret set --vault-name myVault --name dbConnectionString --value "Server=myserver;Database=mydb;User=admin;Password=P@ssw0rd"trigger:
- main
pool:
name: 'SelfHostedPool' # Pool with VMs that have Managed Identity
variables:
keyVaultName: 'myVault'
steps:
- task: AzureCLI@2
displayName: 'Get Secret from Key Vault'
inputs:
azureSubscription: 'ManagedIdentityConnection' # Service connection using Managed Identity
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
# Retrieve secret from Key Vault
DB_CONNECTION_STRING=$(az keyvault secret show --vault-name $(keyVaultName) --name dbConnectionString --query value -o tsv)
# Set as pipeline variable (marked as secret)
echo "##vso[task.setvariable variable=DbConnectionString;issecret=true]$DB_CONNECTION_STRING"
- task: AzureWebApp@1
displayName: 'Deploy Web App'
inputs:
azureSubscription: 'ManagedIdentityConnection'
appName: 'myWebApp'
package: '$(System.DefaultWorkingDirectory)/**/*.zip'
appSettings: '-ConnectionStrings:DefaultConnection "$(DbConnectionString)"'
issecret=true (masked in logs)Why this approach: Managed Identity eliminates need to store Service Principal credentials. Secret is retrieved at runtime (always current if rotated in Key Vault). Secret is masked in logs (issecret=true). Full audit trail in Key Vault.
Detailed Example 2: Using Key Vault Task in Azure Pipeline
Scenario: You have an Azure Pipeline using Microsoft-hosted agents (no Managed Identity available). The pipeline needs multiple secrets from Key Vault.
Solution: Use Azure Key Vault task to download secrets as pipeline variables.
Step-by-step:
az ad sp create-for-rbac --name "pipeline-sp" --role Reader --scopes /subscriptions/{sub-id}az keyvault set-policy --name myVault --spn {app-id} --secret-permissions get listtrigger:
- main
pool:
vmImage: 'ubuntu-latest' # Microsoft-hosted agent
steps:
- task: AzureKeyVault@2
displayName: 'Download Secrets from Key Vault'
inputs:
azureSubscription: 'AzureServiceConnection' # Service connection with Service Principal
KeyVaultName: 'myVault'
SecretsFilter: 'dbPassword,apiKey,certificatePassword' # Comma-separated list of secrets
RunAsPreJob: false # Download secrets during job execution
- script: |
echo "Connecting to database..."
# Secrets are available as pipeline variables
# $(dbPassword), $(apiKey), $(certificatePassword)
# They are automatically masked in logs
mysql -h myserver -u admin -p$(dbPassword) -e "SELECT 1"
displayName: 'Use Secrets'
$(secretName)Why this approach: Works with Microsoft-hosted agents (no Managed Identity available). Multiple secrets downloaded in one task. Secrets automatically masked in logs. No need to manually retrieve each secret.
Detailed Example 3: Secret Rotation with Key Vault
Scenario: Your database password is stored in Key Vault and used by 10 different pipelines. The password must be rotated every 90 days for compliance.
Solution: Rotate secret in Key Vault; all pipelines automatically use new value on next run.
Step-by-step:
dbPassword = "OldP@ssw0rd123"ALTER USER admin WITH PASSWORD 'NewP@ssw0rd456'az keyvault secret set --vault-name myVault --name dbPassword --value "NewP@ssw0rd456"Why this approach: Centralized rotation (update once, affects all pipelines). No pipeline changes required. Old versions retained for rollback. Full audit trail of rotation events.
⭐ Must Know (Critical Facts):
issecret=true when setting pipeline variables to mask them in logsWhen to use (Comprehensive):
Limitations & Constraints:
💡 Tips for Understanding:
⚠️ Common Mistakes & Misconceptions:
🔗 Connections to Other Topics:
The problem: Security vulnerabilities in code, dependencies, containers, and infrastructure configurations are discovered late in the development cycle (or in production), making them expensive and time-consuming to fix. Manual security reviews don't scale and miss issues.
The solution: Automate security scanning in CI/CD pipelines to detect vulnerabilities early ("shift-left security"). Use tools like Microsoft Defender for Cloud DevOps Security, GitHub Advanced Security, CodeQL, and Dependabot to scan code, secrets, dependencies, containers, and IaC templates.
Why it's tested: DevSecOps is a core principle of modern DevOps. The exam tests your ability to integrate security scanning into pipelines, configure scanning tools, and prioritize remediation based on findings.
What it is: A cloud-native application protection platform (CNAPP) that provides unified visibility, posture management, and threat protection for DevOps environments (Azure DevOps, GitHub, GitLab). It scans code, secrets, dependencies, IaC templates, and container images for vulnerabilities and misconfigurations.
Why it exists: Security teams need visibility into security posture across multi-pipeline environments. Developers need actionable findings integrated into their workflows (pull request annotations). Organizations need to prioritize remediation based on code-to-cloud context (which vulnerabilities affect production resources?).
How Defender for Cloud DevOps Security works (Detailed step-by-step):
⭐ Must Know (Critical Facts):
What it is: A suite of security features for GitHub repositories, including CodeQL (code scanning), secret scanning, and Dependabot (dependency scanning). Available for GitHub Enterprise Cloud and GitHub Enterprise Server.
Why it exists: Developers need security feedback integrated into their workflow (GitHub UI, pull requests). Organizations need to enforce security policies (block PRs with critical vulnerabilities). GitHub Advanced Security provides native security scanning within GitHub.
How GitHub Advanced Security works (Detailed step-by-step):
.github/workflows/codeql.yml⭐ Must Know (Critical Facts):
What it is: Automated scanning of container images for vulnerabilities in OS packages, application dependencies, and configuration issues. Integrated into CI/CD pipelines to prevent vulnerable images from reaching production.
Why it exists: Container images often contain vulnerable OS packages (outdated Ubuntu, Alpine) and application dependencies (vulnerable npm packages). Scanning images before deployment prevents known vulnerabilities from reaching production.
How container scanning works (Detailed step-by-step):
docker build -t myapp:latest .⭐ Must Know (Critical Facts):
FROM ubuntu:22.04 → FROM ubuntu:24.04) to get security patchesWhen to use (Comprehensive):
💡 Tips for Understanding:
⚠️ Common Mistakes & Misconceptions:
This chapter covered the security and compliance domain (10-15% of exam), focusing on:
✅ Authentication and Authorization
✅ Managing Sensitive Information
✅ Automating Security Scanning
Test yourself before moving on:
Authentication:
Secrets Management:
Security Scanning:
Try these from your practice test bundles:
If you scored below 75%:
Authentication:
Key Vault:
issecret=true when setting pipeline variablesSecurity Scanning:
Decision Points:
Next Chapter: 06_domain5_instrumentation - Implement an Instrumentation Strategy (Monitoring, telemetry, alerts, log analysis)
What you'll learn:
Time to complete: 4-6 hours
Prerequisites: Chapters 0-4 (Fundamentals through Security)
Why this domain matters: "You can't improve what you don't measure." Instrumentation provides visibility into application performance, infrastructure health, and pipeline reliability. This domain tests your ability to configure monitoring, collect telemetry, analyze data, and set up alerts to detect and respond to issues quickly.
The problem: Without monitoring, you're blind to application performance issues, infrastructure failures, and pipeline problems. Issues are discovered by users (bad customer experience) or during incidents (reactive firefighting). No data means no ability to optimize or improve.
The solution: Implement comprehensive monitoring across applications, infrastructure, and pipelines. Collect telemetry (metrics, logs, traces) from all components. Visualize data in dashboards. Set up alerts to detect issues proactively.
Why it's tested: Monitoring is essential to DevOps feedback loops. The exam tests your ability to configure monitoring tools, collect relevant telemetry, and use data to improve systems.
What they are: Azure Monitor is a comprehensive monitoring solution for Azure resources, applications, and infrastructure. Application Insights is a feature of Azure Monitor focused on application performance monitoring (APM) with distributed tracing, dependency tracking, and exception monitoring.
Why they exist: Applications and infrastructure generate vast amounts of telemetry data (metrics, logs, traces). Without a centralized monitoring solution, this data is scattered across systems, making it impossible to correlate events, identify root causes, or detect patterns. Azure Monitor provides unified collection, storage, analysis, and alerting for all telemetry.
Real-world analogy: Think of Azure Monitor as a hospital's patient monitoring system. Just as doctors monitor vital signs (heart rate, blood pressure, temperature) from a central dashboard and receive alerts when values are abnormal, Azure Monitor collects telemetry (CPU, memory, request rate, error rate) from all systems and alerts you when thresholds are exceeded.
How Azure Monitor works (Detailed step-by-step):
⭐ Must Know (Critical Facts):
Detailed Example 1: Instrumenting ASP.NET Core Application with Application Insights
Scenario: You have an ASP.NET Core web application deployed to Azure App Service. You need to monitor request performance, track dependencies (database, external APIs), and detect exceptions.
Solution: Add Application Insights SDK to the application and configure telemetry collection.
Step-by-step:
az monitor app-insights component create --app myApp --location eastus --resource-group myRG --workspace /subscriptions/{sub-id}/resourceGroups/myRG/providers/Microsoft.OperationalInsights/workspaces/myWorkspacedotnet add package Microsoft.ApplicationInsights.AspNetCoreProgram.cs:var builder = WebApplication.CreateBuilder(args);
// Add Application Insights telemetry
builder.Services.AddApplicationInsightsTelemetry(options =>
{
options.ConnectionString = builder.Configuration["ApplicationInsights:ConnectionString"];
});
var app = builder.Build();
app.Run();
az webapp config appsettings set --name myApp --resource-group myRG --settings ApplicationInsights__ConnectionString="{connection-string}"Why this approach: Application Insights SDK automatically instruments common scenarios (HTTP requests, database calls, exceptions) with zero code changes. Distributed tracing works across microservices. Telemetry is correlated (can see all dependencies for a single request).
Detailed Example 2: Configuring Alerts for Pipeline Failures
Scenario: You have Azure Pipelines running critical deployments. You need to be notified immediately when a pipeline fails so you can investigate and fix issues quickly.
Solution: Configure Azure Monitor alerts for pipeline failures using Azure DevOps audit logs.
Step-by-step:
AzureDevOpsAuditing
| where OperationName == "Pipelines.PipelineCompleted"
| where Data has "failed"
| project TimeGenerated, ProjectName, PipelineName, Result = tostring(Data.result), BuildNumber = tostring(Data.buildNumber)
Why this approach: Proactive notification of pipeline failures (don't wait for users to report). Audit logs provide detailed context (which pipeline, which project, when). Can extend query to filter by specific pipelines or projects.
⭐ Must Know (Critical Facts):
where (filter), project (select columns), summarize (aggregate), join (combine tables), render (visualize)The problem: Collecting telemetry is not enough - you need to analyze it to identify trends, detect anomalies, and troubleshoot issues. Raw telemetry data is overwhelming (millions of events per day). Manual analysis doesn't scale.
The solution: Use query languages (KQL), visualization tools (dashboards, workbooks), and AI-powered analytics (Smart Detection) to extract insights from telemetry. Focus on key performance indicators (KPIs) and actionable metrics.
Why it's tested: The exam tests your ability to query logs, analyze metrics, and use telemetry to troubleshoot issues and optimize performance.
What it is: A query language for analyzing large volumes of log and telemetry data in Azure Monitor, Application Insights, and Azure Data Explorer. Optimized for fast queries over time-series data.
Why it exists: SQL is designed for relational databases (tables with fixed schemas). Log data is semi-structured (JSON, key-value pairs) and time-series (events over time). KQL is optimized for these data types with operators for filtering, aggregating, and visualizing time-series data.
Real-world analogy: Think of KQL as a specialized tool for analyzing security camera footage. While you could use a general video player (SQL), KQL is like a security system with fast-forward, rewind, motion detection, and timeline scrubbing - optimized for finding specific events in large volumes of footage.
Basic KQL Query Structure:
TableName
| where Condition
| project Column1, Column2
| summarize AggregateFunction by GroupByColumn
| order by Column desc
| take 10
Common KQL Queries for DevOps:
requests
| where timestamp > ago(24h)
| where success == false
| project timestamp, name, url, resultCode, duration
| order by timestamp desc
requests
| where timestamp > ago(1h)
| summarize AvgDuration = avg(duration), RequestCount = count() by operation_Name
| order by AvgDuration desc
dependencies
| where timestamp > ago(1h)
| where duration > 1000 // milliseconds
| where type == "SQL"
| project timestamp, name, duration, success
| order by duration desc
exceptions
| where timestamp > ago(24h)
| summarize ExceptionCount = count() by type, outerMessage
| order by ExceptionCount desc
AzureDevOpsAuditing
| where OperationName == "Pipelines.PipelineCompleted"
| extend Duration = todouble(Data.duration)
| summarize AvgDuration = avg(Duration), MaxDuration = max(Duration) by bin(TimeGenerated, 1h), PipelineName = tostring(Data.pipelineName)
| render timechart
⭐ Must Know (Critical Facts):
Azure Monitor Components:
KQL Basics:
where: Filter rowsproject: Select columnssummarize: Aggregate dataorder by: Sort resultstake: Limit resultsago(): Relative timebin(): Time bucketsCommon Metrics:
Next Chapter: 07_integration - Integration & Cross-Domain Scenarios
This chapter integrates concepts from all domains to solve complex, real-world scenarios that span multiple areas of DevOps. The AZ-400 exam tests your ability to apply knowledge across domains, not just recall facts from individual topics.
What you'll learn:
Time to complete: 6-8 hours
Prerequisites: All previous chapters (0-5)
You're designing a CI/CD pipeline for a microservices application with the following requirements:
This scenario integrates concepts from all 5 domains:
Step-by-Step Implementation:
# Branch protection rules in GitHub
- Require pull request reviews (2 approvers)
- Require status checks (build, tests, security scan)
- Require branches to be up to date
- Require signed commits
# azure-pipelines.yml
trigger:
branches:
include:
- main
variables:
- group: production-secrets # Linked to Key Vault
stages:
- stage: Build
jobs:
- job: BuildAndTest
pool:
vmImage: 'ubuntu-latest'
steps:
- task: UseDotNet@2
inputs:
version: '8.0.x'
- task: DotNetCoreCLI@2
displayName: 'Restore dependencies'
inputs:
command: 'restore'
- task: DotNetCoreCLI@2
displayName: 'Build'
inputs:
command: 'build'
arguments: '--configuration Release'
- task: DotNetCoreCLI@2
displayName: 'Run tests'
inputs:
command: 'test'
arguments: '--configuration Release --collect:"XPlat Code Coverage"'
- task: PublishCodeCoverageResults@1
inputs:
codeCoverageTool: 'Cobertura'
summaryFileLocation: '$(Agent.TempDirectory)/**/*coverage.cobertura.xml'
- task: DotNetCoreCLI@2
displayName: 'Publish'
inputs:
command: 'publish'
publishWebProjects: true
arguments: '--configuration Release --output $(Build.ArtifactStagingDirectory)'
- task: PublishBuildArtifacts@1
inputs:
PathtoPublish: '$(Build.ArtifactStagingDirectory)'
ArtifactName: 'drop'
- stage: SecurityScan
dependsOn: Build
jobs:
- job: ScanCode
pool:
vmImage: 'ubuntu-latest'
steps:
- task: UseDotNet@2
inputs:
version: '8.0.x'
# Scan dependencies for vulnerabilities
- script: |
dotnet list package --vulnerable --include-transitive
displayName: 'Scan dependencies'
# Container scanning (if using containers)
- task: Docker@2
displayName: 'Build container image'
inputs:
command: 'build'
Dockerfile: '**/Dockerfile'
tags: '$(Build.BuildId)'
- script: |
# Install Trivy
wget -qO - https://aquasecurity.github.io/trivy-repo/deb/public.key | sudo apt-key add -
echo "deb https://aquasecurity.github.io/trivy-repo/deb $(lsb_release -sc) main" | sudo tee -a /etc/apt/sources.list.d/trivy.list
sudo apt-get update
sudo apt-get install trivy
# Scan image
trivy image --severity HIGH,CRITICAL --exit-code 1 myapp:$(Build.BuildId)
displayName: 'Scan container for vulnerabilities'
- stage: DeployStaging
dependsOn: SecurityScan
jobs:
- deployment: DeployToStaging
environment: 'staging'
pool:
vmImage: 'ubuntu-latest'
strategy:
runOnce:
deploy:
steps:
- task: AzureKeyVault@2
displayName: 'Get secrets from Key Vault'
inputs:
azureSubscription: 'ManagedIdentityConnection'
KeyVaultName: 'myapp-keyvault'
SecretsFilter: 'DbConnectionString,ApiKey'
- task: AzureWebApp@1
displayName: 'Deploy to staging slot'
inputs:
azureSubscription: 'ManagedIdentityConnection'
appName: 'myapp'
package: '$(Pipeline.Workspace)/drop/**/*.zip'
deployToSlotOrASE: true
resourceGroupName: 'myapp-rg'
slotName: 'staging'
appSettings: '-ConnectionStrings:DefaultConnection "$(DbConnectionString)" -ApiKey "$(ApiKey)"'
- stage: DeployProduction
dependsOn: DeployStaging
jobs:
- deployment: DeployToProduction
environment: 'production'
pool:
vmImage: 'ubuntu-latest'
strategy:
runOnce:
deploy:
steps:
- task: AzureKeyVault@2
displayName: 'Get secrets from Key Vault'
inputs:
azureSubscription: 'ManagedIdentityConnection'
KeyVaultName: 'myapp-keyvault'
SecretsFilter: 'DbConnectionString,ApiKey'
# Blue-green deployment using slots
- task: AzureWebApp@1
displayName: 'Deploy to blue slot'
inputs:
azureSubscription: 'ManagedIdentityConnection'
appName: 'myapp'
package: '$(Pipeline.Workspace)/drop/**/*.zip'
deployToSlotOrASE: true
resourceGroupName: 'myapp-rg'
slotName: 'blue'
appSettings: '-ConnectionStrings:DefaultConnection "$(DbConnectionString)" -ApiKey "$(ApiKey)"'
# Warm up blue slot
- script: |
curl -f https://myapp-blue.azurewebsites.net/health || exit 1
displayName: 'Health check blue slot'
# Swap blue to production
- task: AzureAppServiceManage@0
displayName: 'Swap blue to production'
inputs:
azureSubscription: 'ManagedIdentityConnection'
action: 'Swap Slots'
webAppName: 'myapp'
resourceGroupName: 'myapp-rg'
sourceSlot: 'blue'
targetSlot: 'production'
Why this architecture?:
Trade-offs:
You have 10 microservices in separate GitHub repositories. When a shared library is updated, all dependent services must be rebuilt and tested. You need to:
Approach: Use GitHub Apps for cross-repo workflow triggers, Azure Boards for work tracking, feature flags for deployment control, and Application Insights for distributed tracing.
Key Components:
How to recognize:
How to answer:
Example: "You need to scan code for vulnerabilities in a GitHub repository. Which tool should you use?"
How to recognize:
How to answer:
Example: "How do you implement blue-green deployment in Azure App Service?"
How to recognize:
How to answer:
Example: "What is the best practice for storing database passwords in pipelines?"
This chapter integrated concepts from all domains to solve complex, real-world scenarios. Key takeaways:
Next Chapter: 08_study_strategies - Study Strategies & Test-Taking Techniques
Pass 1: Understanding (Weeks 1-6)
Pass 2: Application (Weeks 7-8)
Pass 3: Reinforcement (Week 9-10)
Teach Someone: Explain concepts out loud to a friend, colleague, or rubber duck. If you can't explain it simply, you don't understand it well enough.
Draw Diagrams: Visualize architectures, workflows, and decision trees. Drawing forces you to understand relationships between components.
Write Scenarios: Create your own exam questions based on real-world scenarios you've encountered. This helps you think like the exam writers.
Compare Options: Use comparison tables to understand differences between similar tools (GitHub Actions vs Azure Pipelines, Service Principal vs Managed Identity, etc.).
Hands-On Practice: Set up a free Azure account and GitHub account. Build actual pipelines, configure Key Vault, set up monitoring. Hands-on experience solidifies learning.
Mnemonics for Common Lists:
Visual Patterns:
Strategy:
Step 1: Read the scenario (30 seconds)
Step 2: Identify what's being tested (15 seconds)
Step 3: Eliminate wrong answers (30 seconds)
Step 4: Choose best answer (30 seconds)
When stuck:
Common traps:
⚠️ Never: Spend more than 3 minutes on one question initially. Flag it and return later.
7 days before:
5 days before:
3 days before:
1 day before:
Do:
Don't:
Morning Routine:
Brain Dump Strategy:
When exam starts, immediately write down on scratch paper (or whiteboard for online exam):
During Exam:
You've got this! Good luck on your AZ-400 exam!
Next Chapter: 09_final_checklist - Final Week Preparation Checklist
Go through this comprehensive checklist and mark items you're confident about:
Domain 1: Design and Implement Processes and Communications (10-15%)
Domain 2: Design and Implement a Source Control Strategy (10-15%)
Domain 3: Design and Implement Build and Release Pipelines (50-55%)
Domain 4: Develop a Security and Compliance Plan (10-15%)
Domain 5: Implement an Instrumentation Strategy (5-10%)
If you checked fewer than 80% in any domain: Review those specific chapters today.
If scored below 60%: Review fundamentals and domain chapters for weak areas.
If scored below 70%: Focus on largest domain (Build and Release Pipelines - 50-55%).
If scored below 75%: Consider rescheduling exam to allow more study time.
Morning (1 hour):
Afternoon (1 hour):
Evening (30 minutes):
Don't: Try to learn new topics, take full practice test, study late into the night, cram.
Breakfast:
Light Review (30 minutes max):
Logistics:
When exam starts, immediately write down on scratch paper:
YAML Pipeline Structure:
trigger:
branches:
include: [main]
stages:
- stage: Build
jobs:
- job: BuildJob
steps:
- task: TaskName@Version
Common Azure DevOps Tasks:
KQL Operators:
Authentication Methods:
Deployment Patterns:
Key Vault Access:
Time Management:
Question Analysis:
Tips:
Good luck on your AZ-400: Designing and Implementing Microsoft DevOps Solutions exam!
Next File: 99_appendices - Quick Reference Tables, Glossary, Additional Resources
| Method | Use Case | Pros | Cons | Lifespan |
|---|---|---|---|---|
| Managed Identity | Azure resources (VMs, App Service) | No secrets to manage, automatic, secure | Only works in Azure | Managed by Azure |
| Service Principal | GitHub Actions, on-premises, external tools | Works anywhere, flexible | Requires secret storage and rotation | Secret: max 2 years |
| GITHUB_TOKEN | Same-repo GitHub Actions | Automatic, no setup, free | Can't trigger other workflows, repo-scoped | Workflow duration |
| GitHub App | Cross-repo workflows, production automation | Short-lived tokens (1hr), not tied to user | Requires setup (create app, install) | Token: 1 hour |
| PAT (Classic) | Legacy integrations | Simple, works everywhere | Broad scopes, tied to user, requires rotation | Max 1 year |
| PAT (Fine-grained) | External integrations, temporary access | Repo-specific, granular permissions | Still tied to user, requires rotation | Custom (up to 1 year) |
| Pattern | Downtime | Rollback Speed | Cost | Complexity | Best For |
|---|---|---|---|---|---|
| Blue-Green | Zero | Instant (swap back) | 2X (two full environments) | Low | Critical apps, instant rollback needed |
| Canary | Zero | Fast (route traffic back) | 1.1-1.2X (small canary environment) | Medium | Gradual rollout with monitoring |
| Ring | Zero | Medium (depends on ring size) | 1X (same environment) | Medium | Phased rollout by user group |
| Rolling | Partial (some instances down) | Slow (redeploy previous version) | 1X (same environment) | Low | Non-critical apps, cost-sensitive |
| Feature Flags | Zero | Instant (toggle flag) | 1X + flag service cost | High | Decouple deployment from release |
| Feature | Azure Artifacts | GitHub Packages | npm Registry | NuGet Gallery |
|---|---|---|---|---|
| Pricing | 2GB free, $2/GB after | Unlimited public, $0.50/GB private | Free | Free |
| Package Types | NuGet, npm, Maven, Python, Universal | NuGet, npm, Maven, Docker, RubyGems | npm only | NuGet only |
| Feed Views | Yes (@Local, @Prerelease, @Release) | No | No | No |
| Upstream Sources | Yes (cache public packages) | No | N/A | N/A |
| Integration | Azure DevOps native | GitHub native | Universal | Universal |
| Best For | Enterprise, multi-package-type, feed promotion | GitHub-native workflows, open-source | Public npm packages | Public NuGet packages |
| Tool | Language | Cloud Support | State Management | Learning Curve | Best For |
|---|---|---|---|---|---|
| ARM Templates | JSON | Azure only | Azure manages | Steep (verbose JSON) | Complex Azure scenarios, full control |
| Bicep | Bicep (DSL) | Azure only | Azure manages | Low (simpler than ARM) | Azure-native IaC, recommended for new projects |
| Terraform | HCL | Multi-cloud | State file (local/remote) | Medium | Multi-cloud, large ecosystem, mature tooling |
| Pulumi | TypeScript/Python/C#/Go | Multi-cloud | State file (Pulumi service) | Medium | Developers prefer real languages over DSL |
| Azure CLI | Bash/PowerShell | Azure only | None (imperative) | Low | Quick scripts, automation, not IaC |
| Task | Purpose | Common Inputs | Example |
|---|---|---|---|
| AzureWebApp@1 | Deploy to App Service | azureSubscription, appName, package | Deploy web app |
| AzureKeyVault@2 | Get secrets from Key Vault | azureSubscription, KeyVaultName, SecretsFilter | Retrieve secrets |
| PublishTestResults@2 | Publish test results | testResultsFormat, testResultsFiles | Enable quality gates |
| PublishCodeCoverageResults@1 | Publish code coverage | codeCoverageTool, summaryFileLocation | Visualize coverage |
| AzureAppServiceManage@0 | Manage App Service | azureSubscription, action, webAppName, sourceSlot, targetSlot | Swap deployment slots |
| NuGetAuthenticate@1 | Authenticate to Azure Artifacts | No inputs (uses service connection) | Access private feeds |
| Docker@2 | Build/push Docker images | command, Dockerfile, tags, containerRegistry | Container workflows |
| AzureCLI@2 | Run Azure CLI commands | azureSubscription, scriptType, scriptLocation | Custom Azure operations |
| Operator | Purpose | Example | Result |
|---|---|---|---|
| where | Filter rows | where timestamp > ago(1h) |
Rows from last hour |
| project | Select columns | project timestamp, name, duration |
Only specified columns |
| summarize | Aggregate data | summarize count() by operation_Name |
Count per operation |
| order by | Sort results | order by duration desc |
Sorted by duration (descending) |
| take | Limit results | take 10 |
First 10 rows |
| ago() | Relative time | ago(24h) |
24 hours ago |
| bin() | Time buckets | bin(timestamp, 1h) |
Hourly buckets |
| extend | Add calculated column | extend DurationSec = duration / 1000 |
New column |
| join | Combine tables | requests | join dependencies on operation_Id |
Merged data |
| render | Visualize | render timechart |
Time-series chart |
Pattern: "You need to [accomplish task]. Which tool should you use?"
Approach:
Examples:
Pattern: "How do you implement [feature/configuration]?"
Approach:
Examples:
Pattern: "What is the best practice for [scenario]?"
Approach:
Examples:
Agentless Scanning: Security scanning that doesn't require pipeline changes; scanner accesses repositories via API (e.g., Defender for Cloud DevOps Security).
Application Insights: Azure Monitor feature for application performance monitoring (APM) with distributed tracing, dependency tracking, and exception monitoring.
Azure Artifacts: Package management service in Azure DevOps supporting NuGet, npm, Maven, Python, and Universal packages.
Azure Boards: Work tracking service in Azure DevOps with support for Agile, Scrum, and Kanban methodologies.
Azure Key Vault: Cloud service for securely storing and accessing secrets, keys, and certificates.
Azure Monitor: Comprehensive monitoring solution for Azure resources, applications, and infrastructure.
Azure Pipelines: CI/CD service in Azure DevOps supporting YAML and classic pipelines.
Bicep: Domain-specific language (DSL) for deploying Azure resources; simpler alternative to ARM templates.
Blue-Green Deployment: Deployment pattern with two identical environments (blue and green); traffic is switched instantly between them for zero-downtime deployments.
Branch Policy: Protection rule on a branch requiring conditions before merge (e.g., required reviewers, passing builds).
Canary Deployment: Deployment pattern where new version is gradually rolled out to a small subset of users (canary) before full deployment.
CI (Continuous Integration): Practice of automatically building and testing code on every commit to detect issues early.
CD (Continuous Delivery): Practice of automatically deploying code to staging/pre-production after successful build and tests; production deployment is manual.
Continuous Deployment: Extension of CD where code is automatically deployed to production after passing all tests (no manual approval).
CodeQL: Semantic code analysis engine that understands code structure to find security vulnerabilities.
Cumulative Flow Diagram (CFD): Visualization showing work items in different states over time; used to identify bottlenecks.
Cycle Time: Time from when work starts (moved to "In Progress") to when it's completed; measures team efficiency.
Defender for Cloud DevOps Security: Cloud-native application protection platform (CNAPP) providing unified visibility and security scanning across Azure DevOps, GitHub, and GitLab.
Dependabot: GitHub feature that automatically scans dependencies for vulnerabilities and creates pull requests to update them.
Deployment Slot: Separate environment in Azure App Service for staging deployments before swapping to production.
Distributed Tracing: Tracking requests across multiple services (microservices) to identify bottlenecks and failures.
Feature Flag: Configuration that enables/disables features at runtime without redeploying code; decouples deployment from release.
Feed View: Azure Artifacts feature for promoting packages through stages (@Local, @Prerelease, @Release).
GitHub Actions: CI/CD platform integrated into GitHub; uses YAML workflows.
GitHub Advanced Security: Suite of security features for GitHub Enterprise including CodeQL, secret scanning, and Dependabot.
GitHub App: Application that integrates with GitHub using short-lived tokens (1 hour); not tied to user account.
GITHUB_TOKEN: Automatically generated token for GitHub Actions workflows; scoped to repository, expires with workflow.
GitFlow: Branching strategy with structured workflow (main, develop, feature, release, hotfix branches) for scheduled releases.
IaC (Infrastructure as Code): Practice of managing infrastructure through code (ARM templates, Bicep, Terraform) rather than manual processes.
KQL (Kusto Query Language): Query language for analyzing log and telemetry data in Azure Monitor and Log Analytics.
Lead Time: Time from when work is created (work item opened) to when it's completed; measures end-to-end delivery time.
Log Analytics Workspace: Centralized log storage in Azure Monitor; supports KQL queries for analysis.
Managed Identity: Azure AD identity automatically managed by Azure for Azure resources; no secrets to manage.
Multi-Stage Pipeline: YAML pipeline with multiple stages (e.g., Build, Test, Deploy) that can run sequentially or in parallel.
PAT (Personal Access Token): User-generated token for authenticating to Azure DevOps or GitHub; tied to user account.
Pull Request (PR): Proposed code change that must be reviewed and approved before merging to target branch.
Ring Deployment: Deployment pattern where new version is rolled out in phases to different user groups (rings): internal → early adopters → general availability.
SemVer (Semantic Versioning): Versioning scheme (MAJOR.MINOR.PATCH) where version numbers convey meaning about changes.
Service Connection: Azure DevOps configuration for authenticating to external services (Azure, GitHub, Docker registries).
Service Principal: Azure AD identity for applications and services; requires storing and rotating secrets.
Shift-Left Security: Practice of integrating security early in development (scanning code on commit/PR) rather than late (production).
Trunk-Based Development: Branching strategy where developers work on short-lived feature branches (hours/days) that merge frequently to main branch.
Upstream Source: Azure Artifacts feature for caching public packages (npm, NuGet) to improve reliability and speed.
YAML: Human-readable data serialization language used for pipeline definitions in Azure Pipelines and GitHub Actions.
After AZ-400, consider:
AZ-400 certification demonstrates expertise in:
Typical roles: DevOps Engineer, Site Reliability Engineer (SRE), Cloud Engineer, Platform Engineer, Release Manager
DevOps is constantly evolving. Stay current by:
Congratulations on completing this comprehensive study guide!
You now have all the knowledge needed to pass the AZ-400 exam and excel as a DevOps Engineer. Remember to:
Good luck on your certification journey!
Date: October 5, 2024
Status: In Progress - 33% Complete
Token Usage: ~132K / 200K (66% used)
Total Words: 19,617 / 60,000 minimum (33%)
01_fundamentals_devops_lifecycle.mmd - DevOps infinity loop01_fundamentals_ecosystem.mmd - Azure DevOps + GitHub ecosystem01_fundamentals_ci_flow.mmd - Continuous Integration flow01_fundamentals_cd_pipeline.mmd - Continuous Deployment pipeline02_domain1_lead_cycle_time_comparison.mmd - Lead vs Cycle time02_domain1_github_projects_architecture.mmd - GitHub Projects structure02_domain1_cfd_example.mmd - Cumulative Flow Diagram02_domain1_wiki_documentation_workflow.mmd - Wiki/docs workflowDomain 1 Chapter (02_domain1_processes_communications):
Domain 2 Chapter (03_domain2_source_control):
03_domain2_source_control - Continue and complete (need ~9K words, 15+ diagrams)
04_domain3_build_release_pipelines - NEW (25,000-35,000 words, 45-60 diagrams)
05_domain4_security_compliance - NEW (12,000-15,000 words, 20-25 diagrams)
06_domain5_instrumentation - NEW (8,000-12,000 words, 15-20 diagrams)
Integration - NEW (8,000-12,000 words, 12-18 diagrams)
Study strategies - NEW (4,000-6,000 words, 5-8 diagrams)
Final checklist - NEW (3,000-5,000 words, 3-5 diagrams)
Appendices - NEW (5,000-8,000 words)
Total Remaining: ~75,000-105,000 words, 110-180 diagrams
Priority Order:
Complete Domain 2 (03_domain2_source_control) - Add ~9K words
Domain 3 (Largest chapter - 25-35K words)
Domains 4-5 (Security and Instrumentation)
Integration & Study Materials (Final chapters)
MCP Tools to Leverage:
microsoft_docs_search - Verify all Azure DevOps and GitHub conceptsmicrosoft_docs_fetch - Get complete documentation pages for detailed topicsmicrosoft_code_sample_search - Find official YAML pipeline examples, PowerShell scriptsConsistency Reminders:
Quality Standards: