🔵

Google Cloud Platform Professional

Design Google Cloud at scale: resource hierarchy and landing zones, org policy, security, DR, FinOps and migration (Professional Cloud Architect scope).

22 lessons 66 quiz questions

Lessons & quizzes Certificate

📚 Lessons & quizzes

Each lesson ends with its own short quiz. Answer them as you go — score 90% across all lessons to earn your certificate.

1 The resource hierarchy at scale

Google Cloud organises every resource into a strict hierarchy: the Organization node sits at the top, then Folders, then Projects, then the resources (VMs, buckets, datasets) themselves.

The hierarchy is not just cosmetic — it is the backbone of policy. IAM policies and Organization Policy constraints set on a parent node are inherited by every descendant. A policy granted at the Organization level applies to all folders and projects beneath it.

The Organization is created automatically when you link a Cloud Identity or Google Workspace account; it is the root of trust.
Folders model departments, teams, or environments and can nest several levels deep.
Projects are the fundamental unit of billing, quota, and API enablement — every resource lives in exactly one project.

A common pattern is folders for environment (prod, non-prod, shared) or for business unit, with projects grouped beneath. Designing this tree early is critical because IAM inheritance is additive and hard to unwind later.

# Inspect the hierarchy
gcloud organizations list
gcloud resource-manager folders list --organization=123456789012
gcloud projects list --filter='parent.id=2222222222'

# Create a folder and a project beneath it
gcloud resource-manager folders create \
  --display-name='prod' --organization=123456789012
gcloud projects create my-app-prod --folder=3333333333

2 Landing zones and the Cloud Foundation blueprint

A landing zone is the pre-built, opinionated foundation you stand up before any workload arrives: the organization structure, IAM, networking, logging, and guardrails that every future project inherits.

Google publishes the Cloud Foundation Fabric and the enterprise foundations blueprint — Terraform modules that codify Google best practice. They provision, in order:

A bootstrap stage that creates the seed project, Terraform state bucket, and the automation service accounts.
An organization stage that sets org policies, IAM, and centralised logging/monitoring sinks.
A networking stage with Shared VPC, DNS, and connectivity.
Project-factory tooling so application teams self-serve compliant projects.

The goal is repeatability and compliance by default: workloads land in an environment that is already secure, observable, and cost-attributed, rather than every team re-inventing the basics.

3 Organization Policy Service and guardrails

IAM answers "who can do what". The Organization Policy Service answers a different question: "what is allowed to exist at all". It applies constraints to the resource hierarchy that restrict configuration regardless of a principal’s IAM grants.

Constraints come in two flavours:

List constraints allow or deny specific values — for example constraints/compute.vmExternalIpAccess to deny public IPs, or gcp.resourceLocations to pin resources to EU regions.
Boolean constraints simply turn a behaviour on or off — for example constraints/iam.disableServiceAccountKeyCreation.

Policies are inherited down the tree and can merge with or override the parent. Because they cannot be bypassed by a project owner, org policies are the primary preventative guardrail — they stop misconfiguration before it happens, complementing the detective controls in Security Command Center.

# Deny external IPs across an entire folder
gcloud resource-manager org-policies deny \
  compute.vmExternalIpAccess all \
  --folder=3333333333

# Restrict resource creation to EU locations
gcloud org-policies set-policy eu-only-policy.yaml

4 Centralised identity: Cloud Identity, federation and SSO

At enterprise scale, identity should live in one authoritative place. Cloud Identity (or Google Workspace) provides the directory of users and groups that Google Cloud IAM trusts. Without it there is no Organization node.

Most enterprises already run an external identity provider — Microsoft Entra ID (Azure AD), Okta, or Active Directory. Rather than duplicate accounts, you federate:

SSO via SAML/OIDC delegates authentication to the external IdP, so users sign in with their corporate credentials.
Directory provisioning (Google Cloud Directory Sync, or SCIM) keeps users and groups synchronised from on-prem AD or Entra ID.
Granting IAM roles to groups rather than individuals keeps access manageable and auditable.

For workloads, Workload Identity Federation lets external systems (CI runners, other clouds) impersonate service accounts using short-lived tokens — eliminating long-lived service account keys.

5 Security and compliance: Security Command Center

Security Command Center (SCC) is Google Cloud’s centralised security and risk platform. It aggregates findings across the whole organization so security teams have one place to see exposure.

Its core services include:

Security Health Analytics — continuous misconfiguration scanning (public buckets, open firewall rules, missing encryption).
Event Threat Detection — log-based detection of malware, crypto-mining, brute-force and anomalous IAM grants.
Web Security Scanner and, in the Premium/Enterprise tiers, attack path simulation and posture management.

SCC maps findings to compliance frameworks (CIS, PCI DSS, NIST, ISO) so you can report on posture. It is a detective control — it surfaces what already exists — and pairs with the preventative org policies and the responsive automation you build on top of its findings.

6 VPC Service Controls, Assured Workloads and frameworks

IAM controls identity; VPC Service Controls (VPC-SC) control context. They draw a service perimeter around managed services like BigQuery and Cloud Storage so data cannot be exfiltrated even by a legitimately authenticated identity.

A perimeter blocks API access that crosses its boundary unless explicitly allowed by an ingress/egress rule or an access level (based on IP, device, or identity). This stops a stolen credential from copying a dataset to an attacker-owned project.

Assured Workloads goes further for regulated estates: it enforces data residency, personnel controls, and the technical settings required for compliance regimes such as FedRAMP, CJIS, IL4/IL5, and EU sovereignty — provisioning a folder whose resources are constrained to meet the framework automatically.

7 Network topology at scale: Shared VPC

Giving every project its own isolated VPC quickly becomes unmanageable. Shared VPC solves this by letting one host project own the network, while many service projects attach to it and deploy resources into its subnets.

This centralises network administration:

A dedicated network team manages subnets, firewall rules, and routes in the host project.
Application teams in service projects create VMs and load balancers but cannot alter the network fabric.
Resources across service projects share internal IP space and private routing without VPC peering between every pair.

IAM separation is the key benefit: the Network Admin and Network User roles cleanly split who designs the network from who consumes it — a textbook separation of duties for a platform team.

# Designate a host project and attach a service project
gcloud compute shared-vpc enable HOST_PROJECT
gcloud compute shared-vpc associated-projects add SERVICE_PROJECT \
  --host-project HOST_PROJECT

8 Hub-and-spoke and Network Connectivity Center

Large estates rarely use a single flat network. The hub-and-spoke topology places shared services (egress, inspection, DNS, hybrid links) in a central hub VPC, with workload spoke VPCs connecting to it.

Connecting the spokes is the design challenge. Options include:

VPC Network Peering — non-transitive, so spokes cannot reach each other through the hub.
Network Connectivity Center (NCC) — a hub resource that makes spokes (VPCs, VPNs, Interconnects, routers) transitively reachable through a managed control plane.

NCC is the modern answer for a true any-to-any hub: it lets on-prem sites and multiple cloud VPCs exchange routes through one logical hub, replacing brittle meshes of peerings and giving a single place to manage hybrid and inter-VPC connectivity.

9 Centralised egress and Cloud NAT

For security and cost control, enterprises funnel outbound internet traffic through a small number of centralised egress points rather than letting every VM have its own public IP.

Cloud NAT is a regional, managed service that gives private instances outbound connectivity without external IPs. Because it is software-defined there is no NAT gateway VM to scale or patch.

A common pattern places Cloud NAT in the hub VPC and routes all spoke egress through it, so that:

Outbound traffic leaves from a known, allow-listed set of NAT IP addresses.
A next-generation firewall or Secure Web Proxy can inspect and filter egress in one place.
VMs stay private (no external IPs), shrinking the attack surface and satisfying org policy.

This centralisation simplifies third-party allow-listing and gives the security team one chokepoint for monitoring data leaving the cloud.

10 Centralised DNS architecture

DNS is a shared service that benefits enormously from central design. Cloud DNS provides authoritative, globally available resolution with both public and private zones.

For a multi-project, hybrid estate the key building blocks are:

Private zones bound to a VPC for internal names, typically hosted in the Shared VPC host project so all service projects resolve them.
DNS peering so a hub VPC can resolve names defined in spoke VPCs and vice versa.
Inbound and outbound forwarding via Cloud DNS forwarding zones and policies, so on-prem can resolve cloud names and cloud workloads can resolve on-prem names across Interconnect/VPN.

Centralising DNS in the network hub gives one authoritative resolution path, avoids inconsistent split-horizon answers, and lets the platform team govern naming across the whole organization.

11 Disaster recovery and business continuity strategy

A DR strategy starts with two business-defined targets: RTO (Recovery Time Objective — how long you can be down) and RPO (Recovery Point Objective — how much data you can afford to lose). Architecture follows from those numbers.

Common patterns, from cheapest/slowest to most expensive/fastest:

Backup and restore — data is backed up off-site; you rebuild on failure. Lowest cost, highest RTO.
Cold standby — infrastructure defined as code, spun up only when needed.
Warm standby (pilot light) — a minimal version runs continuously and scales up on failover.
Hot standby / active-active — full capacity runs in a second region, giving near-zero RTO and RPO at the highest cost.

The architect’s job is to match the pattern to each workload’s RTO/RPO and budget, and to test failover regularly — an untested DR plan is an assumption, not a guarantee.

12 FinOps and billing governance

FinOps brings financial accountability to cloud spend. The foundation is the billing account, which pays for one or more projects and is the boundary for invoicing and committed-use discounts.

Governance building blocks:

Budgets and alerts notify owners (or trigger Pub/Sub automation) as spend approaches thresholds — they alert, they do not cap.
Labels on resources enable cost allocation and showback/chargeback by team, environment, or cost centre.
BigQuery billing export streams detailed usage and cost data into a dataset for analysis in Looker Studio and SQL — the single source of truth for FinOps reporting.

On the optimisation side, committed-use discounts (CUDs) trade a 1- or 3-year spend or resource commitment for steep savings, while sustained-use discounts apply automatically. A mature FinOps practice continuously matches commitments to the steady-state baseline and uses on-demand only for the variable peak.

# Create a budget with alert thresholds
gcloud billing budgets create \
  --billing-account=0X0X0X-0X0X0X-0X0X0X \
  --display-name='prod-monthly' \
  --budget-amount=50000 \
  --threshold-rule=percent=0.5 \
  --threshold-rule=percent=0.9 \
  --threshold-rule=percent=1.0

13 Migration strategy: Migrate to VMs and the 7 Rs

Enterprise migrations are categorised by the 7 Rs, each a different disposition for a workload:

Rehost ("lift and shift") — move VMs as-is, fastest path.
Replatform — minor optimisation, e.g. move a self-managed DB to Cloud SQL.
Refactor / Re-architect — redesign for cloud-native (containers, serverless).
Repurchase — switch to a SaaS equivalent.
Retire — decommission what is no longer needed.
Retain — leave it where it is, for now.
Relocate — move at the hypervisor level (e.g. VMware Engine).

Migrate to Virtual Machines is Google’s tool for rehosting: it replicates running on-prem or other-cloud VMs into Compute Engine with minimal downtime, performing test clone migrations before the final cutover. A typical programme begins with assess (discovery and TCO), then plan, then migrate in waves, then optimise.

14 The data platform at scale

Google Cloud’s analytics stack lets an architect pick the right engine for each job rather than forcing everything through one system.

BigQuery — serverless, petabyte-scale data warehouse. Storage and compute are decoupled; you query with SQL and pay per scanned bytes (on-demand) or per slot (capacity/editions pricing). It is the centre of gravity for most analytics.
Dataflow — fully managed Apache Beam for unified batch and streaming pipelines; ideal for ETL and real-time event processing.
Dataproc — managed Spark and Hadoop, the right fit when you are migrating existing Spark/Hadoop jobs or need that ecosystem.

A modern pattern lands raw data in Cloud Storage or BigQuery, transforms with Dataflow or Dataproc, and serves analytics from BigQuery — with Dataplex adding governance and a data mesh across the lake and warehouse.

15 Multi-region active-active architecture

The highest-availability designs run a workload active-active across two or more regions so that no single region is a point of failure and users are served from the nearest one.

Key ingredients on Google Cloud:

Global external Application Load Balancer with a single anycast IP routes users to the closest healthy backend and fails traffic away from an unhealthy region automatically.
Stateless compute (managed instance groups or GKE) deployed in each region behind that load balancer.
A multi-region data layer — Spanner (globally consistent), or multi-region Cloud Storage and BigQuery — handles the hard part: replicating state without a single writable region.

The genuine challenge is data consistency. Truly active-active writes demand a database built for it (Spanner’s synchronous replication), otherwise you accept eventual consistency or designate a primary write region. The compute tier is the easy part; the state tier defines the design.

16 Zero Trust with BeyondCorp

Traditional security trusted anything inside the corporate network. Zero Trust assumes the network is hostile and verifies every request based on identity and device context. Google’s implementation is BeyondCorp Enterprise.

Its core is Identity-Aware Proxy (IAP), which sits in front of applications and admits a request only after checking:

Who the user is (authenticated identity and IAM grant on the resource).
Context — device posture, location, and IP, evaluated through Access Context Manager access levels.

The result is access to internal apps and SSH/RDP without a VPN: there is no implicit network trust and no broad lateral movement. Combined with VPC Service Controls and context-aware access, BeyondCorp shifts the security boundary from the network perimeter to the identity-plus-device per request.

17 Secrets and key management at scale

Sensitive material splits into two concerns: secrets (API keys, passwords, certificates) and encryption keys.

Secret Manager stores secrets with versioning, IAM-controlled access, audit logging, and rotation. Applications fetch secrets at runtime rather than baking them into images or config — and Workload Identity grants access without a static credential.

For encryption, data is always encrypted at rest by Google by default. For control, Cloud KMS manages keys:

CMEK (customer-managed encryption keys) — you own the key in KMS and can disable or rotate it, gating access to the data.
CMEK with HSM or Cloud EKM (external key manager) for the highest assurance, keeping key material in a hardware module or outside Google entirely.

At scale, a central security team owns a KMS key hierarchy and Secret Manager policy, while application teams consume keys and secrets through IAM grants — never long-lived files on disk.

18 SRE practices and the operations suite

Site Reliability Engineering (SRE) is Google’s discipline for running services. Reliability is defined quantitatively, not by gut feel:

An SLI (Service Level Indicator) measures something users care about — e.g. request latency or success rate.
An SLO (Service Level Objective) is the target for an SLI — e.g. 99.9% of requests succeed.
The error budget is 100% minus the SLO: the allowable amount of unreliability. When it is exhausted, teams freeze risky launches and prioritise reliability.

The Google Cloud Observability suite (formerly Operations / Stackdriver) supplies the telemetry: Cloud Monitoring (metrics, dashboards, SLO monitoring, alerting), Cloud Logging (centralised logs and sinks), Cloud Trace, and Cloud Profiler. SRE also limits operational pain by capping toil — repetitive manual work — and automating it away.

19 Architecture Framework reviews

The Google Cloud Architecture Framework is the canonical set of best practices an architect reviews a design against. It is organised into pillars:

Operational excellence — running, monitoring and improving workloads.
Security, privacy and compliance — protecting data and meeting regulation.
Reliability — designing for availability and recovery (SLOs, DR).
Cost optimisation — maximising business value per dollar (FinOps).
Performance optimisation — right-sizing and scaling efficiently.

A well-architected review walks a design through each pillar, surfacing trade-offs explicitly — for instance, a multi-region active-active design improves reliability but raises cost. The framework does not dictate one answer; it forces the architect to make trade-offs consciously and document why, aligned to the business priorities.

20 Quota management at scale

Every Google Cloud API enforces quotas — limits on resource usage. At enterprise scale, quota is both a safety mechanism and an operational risk, so architects manage it proactively.

Two categories matter:

Rate quotas limit requests per unit time (e.g. API calls per minute) to protect shared services.
Allocation quotas limit how much of a resource you can hold (e.g. CPUs per region, in-use external IPs).

Quotas apply per project, per region, which is one reason large estates use many projects. The platform team monitors usage against limits (Cloud Monitoring exposes quota metrics), raises increases ahead of growth and launches, and treats hitting a quota as a preventable incident. For very large deployments, capacity is reserved and quota increases negotiated with Google in advance — a quota surprise during a launch is an avoidable outage.

21 Building a cloud platform team

Technology alone does not deliver a well-run cloud; the operating model does. Mature organizations stand up a cloud platform (or Cloud Center of Excellence) team that owns the foundation as a product.

This team applies platform engineering: it builds paved-road, self-service capabilities so application teams move fast within guardrails, rather than the platform team becoming a ticket-driven bottleneck.

It owns the landing zone, org policies, Shared VPC, IAM model, and the project factory.
It provides golden paths — vetted Terraform modules, CI/CD templates, and reference architectures.
It encodes governance as code so compliance is automatic, not manual gate-keeping.

The cultural shift is from central control to enablement with guardrails: the platform team is judged by how quickly and safely application teams can ship, not by how many requests it personally fulfils.

22 The Professional Cloud Architect exam and case studies

The Professional Cloud Architect (PCA) certification validates designing, planning, managing and operating Google Cloud solutions aligned to business goals. The exam is scenario-heavy and weighs business judgement as much as technical recall.

Its domains broadly cover:

Designing and planning a cloud solution architecture.
Managing and provisioning the infrastructure.
Designing for security and compliance.
Analysing and optimising technical and business processes.
Managing implementations and ensuring solution and operations reliability.

A distinctive feature is the official case studies (such as EHR Healthcare, Helicopter Racing League, Mountkirk Games, TerramEarth). Each describes a fictional company’s existing setup, business and technical requirements, and an executive statement. The skill being tested is reading the requirements carefully and choosing the option that best satisfies the business intent — not merely the most technically impressive one.

🎓 Certificate of Completion

🔒 Complete every lesson quiz above with 90%+ to unlock your downloadable certificate.