Production Deployment Guide
This page is the canonical operator runbook for bringing a SimNet challenge site online in production. It walks through environment prerequisites, the secret material required at build time, the build artifacts, two end-to-end deployment paths (dev preview and Docker image), how to confirm a fresh deployment is healthy, and how to roll back when something goes wrong.
The behaviors documented here are normative — they are mirrored by the production-deployment-guide capability spec at openspec/specs/production-deployment-guide/spec.md. Wave 4 (launch-demo-site-wave-4) will add deploy automation that cross-checks against the same spec.
This guide intentionally defers to existing capability specs for anything that is already normatively defined elsewhere. Read it alongside:
build-runtime-compatibility— supported Node / pnpm / Rust /wasm-packand the canonical build chain (human-readable mirror:docs/dev/build-runtime-compatibility.md)flag-secrets-management— howSIMNET_MASTER_KEYderives every challenge flag at build timebuild-time-key-injection— how the master key flows into the build pipeline and per-challenge saltsquality-assurance-gates— the smoke contract a healthy site must satisfy
Environment Prerequisites
A production deployer needs the same toolchain the CI workflow uses. The version range and install steps are not redefined here — the build-runtime-compatibility capability is the single source of truth.
Prerequisite categories required for a production build:
- Node.js — see
build-runtime-compatibilityfor the supported major range and the.nvmrc/engines.nodeenforcement details. Both supported majors must be able to complete the build chain end-to-end. - pnpm — the project's package manager; the canonical command chain is invoked through
pnpmscripts (seebuild-runtime-compatibility). - Rust toolchain — required to compile the SimNet engine to WebAssembly.
wasm-pack— must be available onPATHas a system binary (no npm wrapper);build-runtime-compatibilitydocuments the rustup / installer-script paths.- Docker (optional) — only required when you deploy via the Docker image path below. Skip if you serve the static build from a generic static host.
Expected success signal for the prerequisite check, run from a clean clone before anything else:
pnpm qa:build-prereqsA zero exit code with no WARNING lines on stderr confirms Node, pnpm, and wasm-pack are in good shape. If wasm-pack is missing the script exits non-zero before any other check and prints install URLs — see build-runtime-compatibility for the recovery steps.
If the supported Node range or
wasm-packrequirement changes inbuild-runtime-compatibility, that capability spec is updated; this page intentionally does not pin version numbers so it never drifts independently.
Secrets & Environment Variables
Production builds derive every challenge flag from a single master secret at build time. The semantics of that derivation belong to flag-secrets-management and build-time-key-injection; this section enumerates only the operator surface — which variables to set, what shape they take, and how to inject them — and defers to those two capabilities for anything normative.
| Variable | Required? | Purpose | Source |
|---|---|---|---|
SIMNET_MASTER_KEY | Required for any production build | HMAC root from which every per-challenge flag and salt is derived (see flag-secrets-management and build-time-key-injection). | 32-character hex value, generated via openssl rand -hex 16. Must be distinct from any CI key. |
TEACHER_MODE | Optional Docker build arg | When true, the image is built with the teacher edition assets enabled. Default is false (student build). | Build argument only; not consumed at runtime. |
NODE_ENV | Set by build tooling | pnpm docs:build runs in production mode internally; do not override unless you are intentionally testing the dev fallback path. | Set by the build scripts; do not export manually in production. |
Operator rules — these enforce the contracts already normatively defined in flag-secrets-management and build-time-key-injection:
- Generate
SIMNET_MASTER_KEYonce per environment. Useopenssl rand -hex 16. Store the value in your secret manager (not in a committed file). Rotation is described inflag-secrets-management. - Never reuse the CI master key for production.
flag-secrets-managementrequiresSIMNET_MASTER_KEY_CI(the CI secret used by thesite-smokejob) to be distinct from any production key. - Never commit
docs/challenges/flags.secret.yaml. It is git-ignored. The build pipeline generates it fromSIMNET_MASTER_KEY(seeflag-secrets-management, scenario "Missing secret file is generated during build"). The.exampletemplate in the repo is the only committed artifact. - Inject at build time, not runtime. The WASM runtime never sees the master key —
build-time-key-injectionrequires the key to be consumed by the build pipeline and reduced to per-challenge salts before the runtime ships. Do not embed the key into a container's runtime environment. - Fail closed if unset.
pnpm docs:buildaborts non-zero whenSIMNET_MASTER_KEYis unset in production (build-time-key-injection, scenario "Build aborts early on missing production key"). If your deploy pipeline silently swallows that failure, fix the pipeline; do not paper over it with the dev-only fallback.
Expected success signal for secret provisioning:
SIMNET_MASTER_KEY="$(openssl rand -hex 16)" pnpm flags:ensureA zero exit code and a freshly populated docs/challenges/flags.secret.yaml (with entries matching FLAG{[0-9a-f]{64}}) confirms the secret is plumbed correctly through the build pipeline. If the script aborts with the openssl rand -hex 16 hint, the variable was not set in the environment that invoked the build.
Build Artifacts
Production artifacts are produced by the canonical pnpm command chain established in build-runtime-compatibility. This section does not introduce new build commands — it lists the chain in execution order, the artifact each step produces, and the success signal that confirms the step finished cleanly.
The chain assumes a clean checkout: no node_modules/, no simnet-engine/target/, no .vitepress/cache/, and no .vitepress/dist/. Run from the repository root with SIMNET_MASTER_KEY exported.
| # | Command | Produces | Success signal |
|---|---|---|---|
| 1 | pnpm install --frozen-lockfile | node_modules/ populated against pnpm-lock.yaml | Done and zero exit code; no engines-mismatch error (see build-runtime-compatibility, scenario "Install succeeds on Node 22 LTS"). |
| 2 | pnpm test | (verification only) | vitest reports at least the baseline pass count established by build-runtime-compatibility. Zero exit code. |
| 3 | pnpm validate:challenges | (verification only) | Schema validator reports OK and exits zero. |
| 4 | pnpm wasm:build | WASM artifacts under docs/public/wasm/ (including simnet_wasm.js / simnet_wasm_bg.wasm) | wasm-pack finishes with [INFO]: ✨ Done and zero exit code. |
| 5 | pnpm docs:build | Static site under .vitepress/dist/ | vitepress build exits zero with no vitepress data not properly injected in app errors, and scripts/postprocess-vitepress-html.mjs runs to completion. |
| 6 | pnpm build | End-to-end production bundle (re-runs flags:ensure + wasm:build:release + docs:build) | Zero exit code; git status --porcelain is clean except for predictable untracked outputs under .vitepress/dist/ and docs/public/wasm/. |
After the chain completes, the static site at .vitepress/dist/ is the only artifact a downstream static host needs. The Docker image path described below produces the same dist/ and serves it with nginx.
The set of commands above is the canonical chain from
build-runtime-compatibility. Do not introduce alternative build commands for production — if a new step is genuinely required, propose a change tobuild-runtime-compatibilityfirst so this guide can pick it up by reference rather than diverging.
Deployment Modes
Three end-to-end deployment paths are supported and exercised:
- Dev preview — fastest way to inspect a production build on the deployer's workstation before promoting it.
- Docker image — the reference path for delivering a self-contained image to a host runner; supports a student build and a teacher build.
- Cloudflare Pages — the canonical path for the public demo of this template, using Cloudflare Pages git integration with a vendored build script.
Other generic static hosts (S3 + CloudFront, Netlify, GitHub Pages, etc.) are supported by uploading .vitepress/dist/ to whatever the host expects; provider-specific automation beyond Cloudflare Pages is out of scope for this guide.
Dev preview
Use this path to verify a production build locally before promoting it to a real host.
# 1. Export the production master key (do not reuse the CI value)
export SIMNET_MASTER_KEY="$(openssl rand -hex 16)"
# 2. Run the canonical build chain through pnpm build
pnpm install --frozen-lockfile
pnpm build
# 3. Serve the built site
pnpm docs:previewExpected port: 4173 (VitePress preview default; if your environment already binds 4173, VitePress logs the actual port to stdout).
Expected URL: http://localhost:4173/ (root locale) and http://localhost:4173/zh-TW/ (Traditional Chinese locale). Both should load without a VitePress 404 page or module-import error.
Success signal: the dev preview output prints a vitepress v<version> banner followed by a ➜ Local: line with the URL above; opening the root URL renders the home page and clicking any challenge route loads its terminal UI.
Docker image
The repository ships a multi-stage Dockerfile that compiles WASM, builds the static site, and serves it with nginx. Two configurations are supported:
- Student build — standard challenges only.
- Teacher build — same challenges plus teacher handbook and grading rubric (
TEACHER_MODE=true).
# Student build
docker build \
--build-arg SIMNET_MASTER_KEY="$(openssl rand -hex 16)" \
-t simnet:student .
docker run -p 8080:80 simnet:student
# Teacher build
docker build \
--build-arg TEACHER_MODE=true \
--build-arg SIMNET_MASTER_KEY="$(openssl rand -hex 16)" \
-t simnet:teacher .
docker run -p 8080:80 simnet:teacherExpected port mapping: host 8080 → container 80 (nginx listens on 80 per Dockerfile).
Expected URL: http://localhost:8080/ (root locale) and http://localhost:8080/zh-TW/.
Success signal: docker run keeps the container in the foreground with nginx access logs; curl -sfI http://localhost:8080/ returns HTTP/1.1 200 OK and the home page renders in a browser. Run the health-check section below before declaring the deployment live.
Treat
SIMNET_MASTER_KEYas a build secret: passing it via--build-argfrom your secret manager is acceptable. Do not bake the master key into the image as an environment variable —build-time-key-injectionrequires the key to be consumed at build time only.
Cloudflare Pages
This is the canonical path for the public demo of this template (currently served at https://netsim-demo.browserlab.online/). It uses Cloudflare Pages' git integration so a push to the production branch triggers an automatic build via a vendored build script — no separate CI step uploads artifacts.
Project setup
- Create a Cloudflare Pages project bound to the git repository. CF dashboard → Workers & Pages → Create application → Pages → Connect to Git → select the repository.
- Production branch:
mainwith automatic deployments enabled. - Build configuration:
- Framework preset: None.
- Build command:
./scripts/cf-pages-build.sh - Build output directory:
.vitepress/dist - Build system version: 3.
- Environment variables and secrets (provision under the Cloudflare Pages project's environment variables section — the dashboard layout has changed over time; locate the section for production-scope and preview-scope variables):
- Production scope →
SIMNET_MASTER_KEY: a fresh 32-character hex value fromopenssl rand -hex 16, distinct from any CI key. Seeflag-secrets-managementfor rotation semantics. - Preview scope →
SIMNET_MASTER_KEY: a separate 32-character hex value, also fromopenssl rand -hex 16, distinct from the production-scope value. Required byflag-secrets-management("Cloudflare Pages preview environment uses isolated master key").
- Production scope →
- No automatic fork PR preview deployments: confirm the Cloudflare Pages project does not auto-deploy preview builds for pull requests from forked repositories. Required by
flag-secrets-management("Cloudflare Pages does not auto-deploy forked PR previews"); fork-controlled commits must not reach the Cloudflare Pages build environment that holds preview env vars. Achieve this via whichever mechanism Cloudflare Pages currently exposes (default behavior in current versions, restricting Branch deployments to a known-good allowlist, or gating builds behind Deploy Hooks). Verify by opening a fork PR (or inspecting the Deployments list after one was opened) — no preview deployment SHALL appear for the fork PR's head commit. - No Cloudflare Web Analytics beacon: confirm the production deployment does not serve
static.cloudflareinsights.com/beacon.min.js. Required byproduction-deployment-guide("Cloudflare Pages production deployment serves no Analytics beacon"); the project does not need analytics, and not serving the beacon avoids the Content Security Policy violation it would otherwise produce. Achieve this via whichever mechanism Cloudflare currently exposes (Pages Web Analytics is opt-in in current versions; if a previously-enabled toggle still injects a beacon, disable it; if zone-level Browser Insights injects a beacon, disable it at the zone level). Verify by opening the production demo URL in a browser — the DevTools Network panel SHALL show no request tocloudflareinsights.com.
Vendored build script
The ./scripts/cf-pages-build.sh value above is normative — the CF dashboard build command must not inline the toolchain bootstrap. Inlining bypasses PR review for build environment changes and is treated as drift from production-deployment-guide's "Cloudflare Pages build environment toolchain script" requirement.
The script installs the Rust toolchain via the rustup minimal profile, installs wasm-pack via its prebuilt binary installer, runs pnpm install --frozen-lockfile, then runs pnpm build. set -euo pipefail makes any installer failure short-circuit the build before later steps run.
Custom domain (optional)
Cloudflare Pages assigns every project a *.pages.dev URL. Binding a custom domain is optional — *.pages.dev URLs are usable as the canonical demo URL for downstream forks that do not have their own domain.
To bind a custom domain via the CF dashboard: project → Custom domains → Set up a custom domain → enter the domain → follow the dashboard's DNS instructions. Cloudflare provisions a TLS certificate automatically once DNS resolves. Typical DNS records the dashboard will ask you to publish (consult the dashboard for the exact values it issues for your domain):
- Subdomain binding:
CNAME <subdomain> <project>.pages.dev. - Apex / root binding: an
A/AAAArecord pair pointing at Cloudflare's anycast IPs, or aCNAMEflattening setup when the apex zone is hosted on Cloudflare DNS.
This guide intentionally does not mandate a specific top-level domain or naming pattern. Downstream forks may bind their own domain or stay on *.pages.dev.
Expected URL
- With a custom domain:
https://<your-domain>/(this repo's demo:https://netsim-demo.browserlab.online/). - Without a custom domain: the
*.pages.devURL listed in the CF project's Deployments tab.
Run the smoke verification in the "Health Check & Smoke Verification" section below against the resolved URL before declaring the deployment live.
Fallback path
If the Cloudflare Pages build environment changes shape so the vendored script cannot complete the canonical command chain — for example, if the rustup installer script URL changes shape, or the wasm-pack prebuilt installer URL becomes unavailable — fall back to running the build in GitHub Actions and pushing artifacts to Cloudflare with wrangler pages deploy. That fallback path is not implemented as part of this guide; it is recorded as a known recovery option in production-deployment-guide's "Cloudflare Pages build environment toolchain script" requirement.
Health Check & Smoke Verification
A deployed instance is considered healthy when it satisfies the same smoke contract that CI already enforces. The contract belongs to quality-assurance-gates — this section enumerates the observable signals an operator checks after deploy and points back to that capability for the normative behavior.
Observable signals to confirm on a freshly deployed instance (each maps to a smoke verification already enforced by quality-assurance-gates):
- Root document loads.
curl -sfI <site-root>/returns200 OKwith atext/htmlcontent-type. Maps to the canonical home route check inquality-assurance-gates, scenario "Production-like smoke loads canonical routes". - Locale routing works.
curl -sfI <site-root>/zh-TW/also returns200 OK. The dual-locale smoke check inquality-assurance-gatescovers this;build-runtime-compatibilityreferences the same routes in its canonical command set (pnpm qa:site-smoke:dev/:preview). - WASM asset is reachable.
curl -sfI <site-root>/wasm/simnet_wasm_bg.wasm(or the equivalent path emitted bypnpm wasm:build) returns200 OKwithapplication/wasm. Missing WASM is treated as a smoke failure inquality-assurance-gates. - Canonical challenge route renders.
curl -sfI <site-root>/challenges/01-ethernet-basics/returns200 OKand the page boots without a VitePress 404.quality-assurance-gatesenumerates this as the canonical challenge-route smoke target. - Terminal UI mounts. Opening the root in a browser, the terminal pane mounts without console errors. The terminal-contract regression gate in
quality-assurance-gatescovers the public terminal contract.
For an automated end-to-end check against the deployed origin, the same script CI uses can be run with SIMNET_BASE_URL pointed at the deployed origin:
SIMNET_BASE_URL=http://localhost:8080 \
SIMNET_MASTER_KEY="$DEPLOY_SIMNET_MASTER_KEY" \
pnpm qa:site-smoke:previewSuccess signal: zero exit code and no failed route reported by the smoke script. A non-zero exit triggers the rollback procedure below.
Do not redefine the smoke contract here. If you find yourself wanting to add a new check that is not in
quality-assurance-gates, propose a change to that capability so CI and this guide stay aligned; otherwise the deployed-instance check and the CI check will drift.
Rollback Procedure
This section covers production-specific rollback — what to do when a freshly deployed instance fails the health check above. It does not duplicate the repository-level rollback runbook; see ROLLBACK.md at the repository root for git-level recovery (restoring deleted files, reverting to the archive/pre-sanitize-2026-05-19 branch, stage-by-stage reset).
Production rollback sequence (run only when the health check fails):
- Stop or unroute the failing instance.
- Docker path:
docker stop <container-id>(ordocker rm -fif the container is wedged). - Generic static host: switch the host's traffic source back to the previous artifact bundle (provider-specific; consult your host's documentation).
- Success signal:
curl -sfI <site-root>/no longer returns 200 from the failing instance (either connection refused or routed to a previous-known-good instance).
- Docker path:
- Restore the previous-known-good artifact.
- Docker path:
docker run -p 8080:80 simnet:student@<previous-image-digest>(pin to the image digest of the last instance that passed the health check; do not re-tag mutable tags). - Static host path: re-upload the prior
.vitepress/dist/bundle (you keep one previous bundle in your artifact store; if you do not, this is the moment to start). - Success signal:
curl -sfI <site-root>/returns 200 from the restored instance.
- Docker path:
- Re-run the health check on the restored instance.
- Use the same observable signals enumerated in the previous section. The smoke script can be re-pointed by setting
SIMNET_BASE_URLto the restored origin. - Success signal: all five observable signals pass;
pnpm qa:site-smoke:previewagainst the restored origin exits zero.
- Use the same observable signals enumerated in the previous section. The smoke script can be re-pointed by setting
- Capture diagnostics from the failed instance before discarding it.
- Save
docker logs <failing-container-id>(or the equivalent host logs) for the postmortem. - Record the failing image digest or static-bundle hash so the next change can be diffed against it.
- Save
- Open a follow-up change. Use the Spectra workflow (
/spectra-propose <name>) to record the failure mode and the fix; this keeps the rollback trail auditable rather than only living in shell history.
For repository-level recovery (restoring deleted files, resetting the working tree, or stage-by-stage git reset) — including the destructive git reset --hard archive/pre-sanitize-2026-05-19 path and the per-Phase reset path — refer to ROLLBACK.md at the repository root. That file remains the source of truth for git-side rollback; the steps above only cover the production instance layer.