🟧

Amazon Web Services Professional

Design AWS at scale: multi-account Organizations & landing zones, security, DR, FinOps and migration (Solutions Architect Professional scope).

20 lessons 60 quiz questions
Lessons & quizzes Certificate

📚 Lessons & quizzes

Each lesson ends with its own short quiz. Answer them as you go — score 90% across all lessons to earn your certificate.

1 Multi-account strategy with AWS Organizations

At professional scale, the unit of isolation in AWS is the account, not the VPC. AWS Organizations lets you centrally manage many accounts under a single management account (formerly “master”), grouped into a hierarchy of Organizational Units (OUs). Accounts give you the hardest blast-radius boundary AWS offers: a runaway workload, a compromised credential or a billing surprise is contained to one account.

A common pattern separates workloads by environment (prod, staging, dev) and by function (security, log archive, shared networking, sandbox). Organizations underpins consolidated billing, Service Control Policies, AWS RAM resource sharing, and delegated administration of services such as GuardDuty and Config. The management account itself should hold almost no workloads — it is privileged and is the one account you most want to keep clean.

The guiding principle: many small, purpose-scoped accounts with strong guardrails beat one giant account carved up only by IAM. IAM mistakes are easy; account boundaries are hard to cross by accident.

2 AWS Control Tower and landing zones

A landing zone is a well-architected, multi-account baseline you stand up before workloads arrive: account structure, identity, logging, networking and guardrails, ready to go. AWS Control Tower is the managed service that builds and governs one for you, orchestrating Organizations, IAM Identity Center, Config, CloudTrail and AWS RAM behind a single console.

Control Tower provisions a baseline with a Security OU containing a log archive account (central, immutable CloudTrail and Config logs) and an audit account (cross-account security read access), plus Account Factory for vending new accounts that inherit the baseline automatically. Its controls (guardrails) come in three flavours: preventive (SCPs that block actions), detective (Config rules that flag drift) and proactive (CloudFormation hooks that stop non-compliant resources before deployment).

Teams that need more flexibility than Control Tower offers can build a custom landing zone with Terraform or the older AWS Landing Zone solution, but Control Tower is the default starting point for most.

3 Service Control Policies and guardrails

Service Control Policies (SCPs) are organization-level guardrails attached to the root, an OU, or an account. An SCP defines the maximum set of permissions available within its scope — it never grants anything. An action is only allowed if it is permitted by both the applicable SCPs and the principal’s IAM policy. Crucially, SCPs do not apply to the management account, which is why you keep that account empty.

SCP evaluation follows IAM logic: an explicit Deny always wins, and if you use allow-list style SCPs, an action must be explicitly permitted at every level of the hierarchy from root down to the account. Typical guardrails: deny leaving the organization, deny disabling CloudTrail or GuardDuty, restrict usable Regions, and forbid root-user actions.

SCPs are coarse, preventive and org-wide; they complement — not replace — per-resource IAM policies, permission boundaries and resource policies. Think of them as the outer fence around what is even possible in an account.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyOutsideApprovedRegions",
      "Effect": "Deny",
      "NotAction": [ "iam:*", "organizations:*", "route53:*", "cloudfront:*" ],
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": [ "eu-west-1", "eu-north-1" ]
        }
      }
    }
  ]
}

4 Centralised identity with IAM Identity Center

Hard-coded IAM users sprawled across dozens of accounts are an audit nightmare. IAM Identity Center (formerly AWS SSO) gives workforce users one sign-on to every account and to SaaS apps, with short-lived credentials and no long-lived access keys. You connect it to an external identity provider — Okta, Entra ID, Ping — via SAML 2.0 or SCIM, or use its built-in directory.

Access is modelled as permission sets (reusable bundles of IAM policies) assigned to groups of users for specific accounts. Behind the scenes Identity Center provisions IAM roles in each target account and brokers AssumeRole via federation, so users get temporary STS credentials. This is the modern replacement for per-account IAM users and for chained cross-account roles managed by hand.

The parallel mechanism for customer-facing identity is Amazon Cognito; for workforce identity at organization scale, Identity Center plus an external IdP is the professional default. Federation means your existing corporate directory remains the single source of truth for joiners, movers and leavers.

5 Governance, compliance and security services

Professional architectures must prove they are governed. Several AWS services work together for this:

  • AWS Config records the configuration of every resource over time and evaluates it against rules, giving you continuous compliance and change history.
  • AWS CloudTrail records every API call — the audit log of who did what, when and from where, ideally aggregated to the org’s log archive account.
  • AWS Security Hub aggregates findings across accounts and Regions and scores them against standards such as CIS, AWS Foundational Security Best Practices and PCI DSS.
  • Amazon GuardDuty is the managed threat-detection service analysing CloudTrail, VPC Flow Logs and DNS logs for malicious behaviour.
  • AWS Audit Manager automates evidence collection mapped to frameworks like SOC 2, ISO 27001 and PCI DSS.

The compliance frameworks themselves — SOC 2, ISO 27001, PCI DSS — describe controls; AWS provides its own attestations through AWS Artifact, but the shared responsibility model means workloads you build on top must still implement and evidence their own controls.

6 Network topology at scale: Transit Gateway

Meshing dozens of VPCs with peering connections does not scale — peering is non-transitive, so N VPCs need N(N−1)/2 links. AWS Transit Gateway (TGW) solves this as a regional hub: every VPC, VPN and Direct Connect gateway attaches once to the TGW, and routing between them is handled centrally in a hub-and-spoke model.

TGW supports multiple route tables and attachment associations, letting you segment traffic — for example, keeping prod and dev isolated while both reach shared services. TGW peering connects gateways across Regions for a global backbone, and TGW integrates with Network Manager for visibility.

For a single highly-shared VPC, AWS RAM lets you share subnets; but for connecting many VPCs and on-premises networks, Transit Gateway is the professional answer. Combine it with a centralised inspection VPC (running firewalls or AWS Network Firewall) so all east-west and egress traffic passes through one controlled chokepoint.

7 Shared VPC, AWS RAM and PrivateLink

Three distinct mechanisms share connectivity across accounts, and professionals must not confuse them:

  • AWS Resource Access Manager (RAM) shares resources — subnets, TGWs, Route 53 rules, prefix lists — with other accounts in your organization. With shared VPC, a central networking account owns the VPC and subnets, and workload accounts launch resources into those shared subnets, centralising IP management and routing.
  • VPC peering connects two VPCs at the routing layer; good for a few, poor for many.
  • AWS PrivateLink exposes a specific service privately via an interface endpoint (an ENI in your VPC) without exposing the whole network. It is unidirectional, has no transitive routing, and avoids IP-range overlap problems entirely.

The rule of thumb: share the network with RAM/shared VPC or TGW; expose a single service across accounts or to customers with PrivateLink. PrivateLink keeps traffic on the AWS backbone and is ideal for SaaS providers and for consuming partner services without internet exposure.

8 Centralised DNS and egress

At scale you centralise both name resolution and internet egress. For DNS, Route 53 Resolver endpoints let on-premises and cloud resolve each other’s names: inbound endpoints let on-prem query AWS private zones; outbound endpoints with resolver rules forward queries to on-prem DNS. Private hosted zones can be associated with many VPCs, and RAM-shared resolver rules let one networking account define DNS policy for the whole org.

For egress, rather than a NAT gateway in every VPC, you route all outbound traffic through a central egress VPC attached to the Transit Gateway, where NAT gateways and outbound filtering (AWS Network Firewall, proxy fleets, or Route 53 DNS Firewall) live. This concentrates cost, logging and policy in one place and gives you a single chokepoint to inspect and control what your workloads talk to.

Centralising DNS and egress is a recurring SAP-C02 theme: it reduces duplicated infrastructure, enforces consistent security policy, and simplifies hybrid connectivity.

9 Disaster recovery and business continuity at scale

DR strategy is driven by two numbers: RPO (Recovery Point Objective — how much data loss is tolerable) and RTO (Recovery Time Objective — how long recovery may take). AWS frames four standard patterns, from cheapest/slowest to costliest/fastest:

  • Backup & Restore — restore from backups into a recovery Region. Highest RPO/RTO, lowest cost.
  • Pilot Light — a minimal core (e.g. replicated databases) always running; scale it up on disaster.
  • Warm Standby — a scaled-down but fully functional copy always running; scale up on failover.
  • Multi-site active/active — full capacity in multiple Regions serving live traffic; near-zero RPO/RTO, highest cost.

Lower RPO/RTO costs more, so match the pattern to the business value of the workload. AWS Elastic Disaster Recovery (DRS) continuously replicates servers for low-cost pilot-light/warm-standby DR. Always test failover regularly — an untested DR plan is a hypothesis, not a capability.

10 FinOps and cost governance

FinOps brings financial accountability to cloud spend. Organizations enables consolidated billing: one bill across all accounts, with volume discounts and Reserved Instance / Savings Plan benefits pooled and shared automatically. To attribute cost, you use cost allocation tags and Cost Categories (rules that group accounts and tags into business dimensions like team or product).

AWS Budgets sets thresholds with alerts (and optional automated actions); Cost Explorer visualises and forecasts spend; Cost and Usage Reports (CUR) give line-item detail for deep analysis. For commitment discounts, Savings Plans (compute or EC2-instance) offer flexible coverage across instance families and Regions, while Reserved Instances are more rigid but can include capacity reservations. The strategy is to commit to your stable baseline usage and leave spiky, uncertain demand on-demand or Spot.

Good FinOps is continuous: tag everything, give teams visibility into their own spend, right-size regularly, and treat cost as a first-class architectural constraint — the Cost Optimization pillar of Well-Architected.

11 Migration strategies and the 7 Rs

Large migrations are planned around the 7 Rs, each a disposition for an application:

  • Rehost (“lift and shift”) — move as-is, often with AWS Application Migration Service (MGN).
  • Replatform (“lift, tinker and shift”) — minor optimisations, e.g. moving a database to RDS.
  • Repurchase — switch to a different product, often SaaS.
  • Refactor / Re-architect — redesign for cloud-native, e.g. to serverless or microservices.
  • Retire — decommission what is no longer needed.
  • Retain — keep on-premises for now (or revisit later).
  • Relocate — move infrastructure such as VMware workloads without conversion.

AWS Migration Hub tracks portfolio progress; Application Migration Service (MGN) does block-level rehosting; AWS Database Migration Service (DMS) migrates databases, with the Schema Conversion Tool (SCT) for heterogeneous engine changes. Start with discovery and a business case, group apps into migration waves, and pick the cheapest R that meets the goal — refactoring everything is rarely worth it.

12 Data lakes and analytics at scale

A data lake centralises structured and unstructured data, usually on Amazon S3, decoupling storage from the many engines that query it. AWS Lake Formation layers fine-grained, centrally-managed access control (table-, column- and row-level) and a shared AWS Glue Data Catalog over that S3 storage, so permissions are defined once and enforced across services.

Around the lake: AWS Glue provides serverless ETL and crawlers; Amazon Athena runs serverless SQL directly on S3; Amazon Redshift (and Redshift Spectrum) provides the data warehouse; Amazon EMR runs Spark/Hadoop; Amazon Kinesis and MSK handle streaming ingestion; and Amazon QuickSight visualises results. Cross-account sharing of catalog and data, governed by Lake Formation, lets a producer account publish curated datasets that consumer accounts query without copying — the foundation of a data mesh.

The professional design goal is one governed catalog, S3 as the durable source of truth, and the right purpose-built engine for each access pattern rather than forcing everything through a single database.

13 Multi-Region active-active design

True active-active across Regions means every Region serves live traffic simultaneously — the hardest distributed-systems problem in cloud architecture, because state must be consistent or reconciled across Regions. Traffic is steered by Route 53 (latency-based or geolocation routing with health checks) or AWS Global Accelerator (anycast IPs over the AWS backbone with faster failover).

For data, you choose replication carefully: DynamoDB Global Tables give multi-Region, multi-active read/write with last-writer-wins reconciliation; Aurora Global Database offers a primary Region with fast cross-Region read replicas and managed failover (effectively active-passive for writes); S3 uses Cross-Region Replication. The key trade-off is the CAP-theorem tension: synchronous consistency across Regions costs latency, so most designs accept eventual consistency or route writes to a home Region.

Active-active maximises availability and serves users close to them, but it multiplies cost and complexity. Reserve it for workloads whose availability genuinely justifies running two or more full stacks and solving the data-consistency problem.

14 Zero Trust networking on AWS

Zero Trust abandons the idea of a trusted internal network: every request is authenticated and authorised on its own merits, regardless of where it originates. “Never trust, always verify.” On AWS this means identity-centric controls and strong segmentation rather than relying on a network perimeter.

Building blocks include IAM and IAM Identity Center with least-privilege, short-lived credentials; AWS Verified Access, which grants application access based on identity and device posture (via trust providers) without a VPN; VPC Lattice for application-layer service-to-service connectivity with auth policies; security groups referencing each other rather than CIDR ranges; and PrivateLink to expose single services privately. SigV4 request signing authenticates every API call.

The shift is from “inside the VPC therefore trusted” to “prove who you are and that you are allowed, for this specific resource, every time.” Network controls still matter for defence in depth, but identity becomes the primary control plane.

15 Secrets and key management at scale

Two services anchor cryptographic and secret management. AWS KMS manages encryption keys: customer managed keys (CMKs) have key policies controlling who can use them, support automatic annual rotation, and enable envelope encryption across S3, EBS, RDS and more. Multi-Region keys let you encrypt in one Region and decrypt in another — essential for multi-Region DR and replication. For the strictest custody, CloudHSM offers dedicated, single-tenant FIPS 140-2 Level 3 hardware.

AWS Secrets Manager stores credentials, API keys and database passwords with built-in automatic rotation via Lambda and fine-grained IAM access; Parameter Store (in Systems Manager) is a lighter-weight option for configuration and simple secrets. Cross-account access is granted through key policies and resource policies, and Secrets Manager can replicate secrets to other Regions.

The professional rules: no secrets in code or environment files, rotate automatically, scope key and secret access with least privilege, and use multi-Region keys when your data crosses Regions. KMS key policies — not just IAM — are the authoritative access control for a key.

16 SRE and operational excellence

The Operational Excellence pillar of Well-Architected is about running and improving workloads to deliver business value. Borrowing from SRE, you define SLIs (service level indicators — measured signals like latency or error rate), SLOs (objectives — the target, e.g. 99.9% success), and an error budget (the allowable shortfall that lets you balance reliability against velocity).

On AWS this is realised with CloudWatch (metrics, logs, alarms, dashboards), X-Ray (distributed tracing), CloudWatch Synthetics (canaries), and Systems Manager for operational tasks and automation runbooks. You codify operations as infrastructure as code (CloudFormation, CDK, Terraform), automate responses to events with EventBridge and Lambda, and run game days and chaos experiments (e.g. AWS Fault Injection Service) to prove resilience before reality tests it.

The cultural core is blameless postmortems: when something breaks, you improve the system and the process, not punish people. Small, frequent, reversible changes plus strong observability and automation make operations a competitive advantage rather than a firefight.

17 Well-Architected reviews and Trusted Advisor

The AWS Well-Architected Framework organises good design around six pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability. A Well-Architected Review uses the Well-Architected Tool to walk a workload through the pillars’ questions, surfacing high- and medium-risk issues and tracking remediation over time — a structured way to find architectural debt before it bites.

AWS Trusted Advisor complements this with automated, real-time checks across cost optimisation, performance, security, fault tolerance, service limits and operational excellence — for example, flagging idle resources, open security groups, or quotas nearing their ceiling. Business and Enterprise Support unlock the full check set.

The professional posture is to run reviews regularly (not once at launch), feed Trusted Advisor and Security Hub findings into a continuous improvement backlog, and use the framework’s questions as a shared language between architects, security and finance. Well-Architected is a practice, not a certificate.

18 Service quotas and limits

Every AWS service enforces quotas (limits) — on resources per Region, API request rates, and more. Most are soft (raisable via a request) but some are hard (fixed). Ignoring them is a classic cause of failed scale-ups and failovers: you discover a limit at the worst possible moment, when traffic spikes or you try to launch a DR Region.

Service Quotas is the central console and API to view current quotas, request increases, and — importantly — create CloudWatch alarms when utilisation approaches a quota. At organization scale you can use quota request templates to apply increases automatically to newly vended accounts, so a fresh DR account is not stuck at default limits. Trusted Advisor also surfaces limits nearing their ceiling.

Professional practice: treat quotas as part of capacity planning, raise them proactively in every Region you might fail over to, and alarm on approaching limits. Quotas are also a guardrail — a low quota on an expensive resource can prevent a runaway cost or a compromised credential spinning up thousands of instances.

19 Building a Cloud Center of Excellence

A Cloud Center of Excellence (CCoE) is the cross-functional team that drives an organization’s cloud adoption: setting standards, building reusable guardrails and platforms, and enabling product teams to move fast safely. It typically blends architecture, security, operations, networking and finance, and owns the landing zone, the paved-road templates, and the governance model.

The CCoE’s job is enablement, not gatekeeping: rather than approving every change manually, it encodes policy as guardrails (SCPs, Config rules, Control Tower controls, service catalogs of vetted patterns) so teams get a self-service “paved road” that is compliant by default. This is the structural counterpart to the AWS Cloud Adoption Framework (CAF), which organises capabilities across business, people, governance, platform, security and operations perspectives.

A healthy CCoE evolves from a centralised build team toward a platform team model: it ships an internal developer platform and self-service automation, measures adoption and friction, and continuously improves. The anti-pattern is a CCoE that becomes a bottleneck ticket queue — that recreates the slowness teams moved to cloud to escape.

20 The SAP-C02 exam domains

The AWS Certified Solutions Architect – Professional (SAP-C02) exam validates advanced design across the full breadth of AWS. Its content is organised into four domains, each weighted:

  • Domain 1 — Design Solutions for Organizational Complexity (~26%). Cross-account networking, multi-account strategy, centralised identity and governance.
  • Domain 2 — Design for New Solutions (~29%). Greenfield architectures meeting reliability, performance, security and cost requirements — the largest domain.
  • Domain 3 — Continuous Improvement for Existing Solutions (~25%). Operational excellence, reliability, performance, security and cost improvements to running workloads.
  • Domain 4 — Accelerate Workload Migration and Modernization (~20%). The 7 Rs, migration tooling and modernisation strategies.

The exam is scenario-heavy: long questions where several options “work” but only one best fits the stated constraints (cost, time, least operational overhead, least change). Read for the qualifier — “most cost-effective”, “least operational overhead”, “minimum downtime” — because that, not raw feasibility, decides the right answer. Mastering trade-offs across the pillars is what the Professional level tests.

🎓 Certificate of Completion

🔒 Complete every lesson quiz above with 90%+ to unlock your downloadable certificate.