Title
Create new category
Edit page index title
Edit category
Edit link
Upgrade
This document explains how to move an existing deployment from the previous release (app build 10.5.2603) to this release (app build 10.6.2605) without destroying any live infrastructure.
Conventions used in this document
The actual on-disk folder names in your environment may vary (e.g.
mocm-10.5.2603/, mocm-2026.01/, releases/2026-05/, or anything you
choose). To stay release-name-agnostic, this document uses two placeholders:
| Placeholder | Meaning |
|---|---|
<previous-release>/ | The release directory currently running in production (app build 10.5.2603, holds your live OpenTofu state). |
| <current-release>/ | The release directory of this package (app build 10.6.2605) — the new version you are upgrading to. |
Replace both placeholders with the real paths on your machine before running any command.
What's new in this release (10.5.2603 → 10.6.2605)
| Layer | Change | |
|---|---|---|
| Application images | Bumped from 10.5.2603 to 10.6.2605 (see images/images.txt) | |
| Terraform | New example terraform/aws/terraform.tfvars.cost-optimized.example | |
| Terraform | New operational guides under terraform/aws/docs/ (deployment, cost, DR, EKS upgrade, observability, WAF, …). | |
| Helm | mocm/values.yaml reset to template form — placeholders must be filled in again. |
Review the full diff before you start:
xxxxxxxxxxdiff -r <previous-release>/terraform/aws <current-release>/terraform/awsdiff <previous-release>/mocm/values.yaml <current-release>/mocm/values.yamlPre-upgrade checklist
Snapshot everything.
- MongoDB EBS snapshots (DLM should already be running — verify).
aws s3 cpThe contents of all 7 buckets to a safe location, or rely on versioning if enabled.- Take an RDS/ElastiCache final snapshot if applicable.
Back up the local OpenTofu state.
xxxxxxxxxxcp <previous-release>/terraform/aws/terraform.tfstate \ <previous-release>/terraform/aws/terraform.tfstate.pre-upgrade- Check tool versions match the table in
README.md— in particulartofu >= 1.10.4. - Read
tofu planoutput carefully before anyapply. If you ever seedestroyorreplaceon resources you intend to keep, STOP and investigate.
Option A — Keep using the local backend (simplest, POC/single-operator)
Use this if you originally ran tofu apply inside
<previous-release>/terraform/aws/ without configuring a remote backend.
xxxxxxxxxx# 1. Copy customer-specific files into the new releaseSRC=<previous-release>/terraform/awsDST=<current-release>/terraform/aws cp "$SRC"/terraform.tfstate "$DST"/cp "$SRC"/terraform.tfstate.backup "$DST"/ 2>/dev/null || truecp "$SRC"/terraform.tfvars "$DST"/cp "$SRC"/backend.tf "$DST"/ 2>/dev/null || truecp -r "$SRC"/.terraform.lock.hcl "$DST"/ 2>/dev/null || trueOpen <current-release>/terraform/aws/terraform.tfvars and append the
Valkey block below (new in 10.6.2605). Skip if your file already
contains valkey_*.
xxxxxxxxxx############################## Valkey Configuration (single node, no replica)#############################valkey_username = "fusion"valkey_engine_version = "8.0"valkey_node_type = "cache.t3.micro"valkey_port = 6379valkey_family = "valkey8"valkey_num_cache_clusters = 1valkey_automatic_failover_enabled = false# Password is auto-generated by Terraform and stored in AWS Secrets Manager.# Retrieve: aws secretsmanager get-secret-value \# --secret-id <name_prefix>/valkey/fusion \# --region <aws_region>Activate the ElastiCache (Redis) file for parallel run. The previous
release's ElastiCache resource is shipped as 09-elasticache.tf.2603 so
OpenTofu ignores it by default. Rename it so the old Redis cluster keeps
running alongside the new Valkey replication group during data migration:
xxxxxxxxxxcd <current-release>/terraform/awsmv 09-elasticache.tf.2603 09-elasticache.tfAfter this rename both aws_elasticache_cluster.elasticache (old Redis)
and aws_elasticache_replication_group.valkey (new) will appear in the
plan as + create / no-op. Leave this file active until applications are
fully cut over to Valkey — see Stage 4
xxxxxxxxxx# 4. Re-initialise providers (versions may have changed)cd <current-release>/terraform/awstofu init -upgrade # 5. Review the plantofu plan -out tfplan# Any destroy/replace on resources you intend to keep → STOP, investigate. # 6. Applytofu apply tfplanxxxxxxxxxxAfter the apply succeeds, **archive the old release directory** (don't deleteit) so you can roll back state if needed: ```bashtar czf previous-release.tar.gz <previous-release>/Option B — Migrate to a remote backend (recommended for production)
Do this once, then every future release just points at the same bucket.
B.1 First-time migration (still inside <previous-release>)
xxxxxxxxxxcd <previous-release>/terraform/aws # Create the S3 bucket + DynamoDB lock table out-of-band, then:cp backend.tf.example backend.tf# Edit backend.tf — set bucket, key, region, dynamodb_table tofu init -migrate-state # pushes local state to S3, answer "yes"After this step, terraform.tfstate in the local directory becomes irrelevant
(OpenTofu reads/writes S3). You may delete it, but archiving is safer:
xxxxxxxxxxmv terraform.tfstate terraform.tfstate.migrated-to-s3B.2 Apply this release
xxxxxxxxxx# 1. Copy customer-specific files into the new releaseSRC=<previous-release>/terraform/awsDST=<current-release>/terraform/awscp "$SRC"/backend.tf "$DST"/cp "$SRC"/terraform.tfvars "$DST"/ # 2. Activate ElastiCache (Redis) so it keeps running in parallel with the new# Valkey replication group. Rename back to .2603 once apps are cut over —# see "Stage 4 — Decommission ElastiCache" at the bottom of this document.mv "$DST"/09-elasticache.tf.2603 "$DST"/09-elasticache.tf # 3. Append the Valkey block to terraform.tfvars. New in 10.6.2605;# skip if your file already contains valkey_*. The block is identical to# Option A step 2:############################## Valkey Configuration (single node, no replica)#############################valkey_username = "fusion"valkey_engine_version = "8.0"valkey_node_type = "cache.t3.micro"valkey_port = 6379valkey_family = "valkey8"valkey_num_cache_clusters = 1valkey_automatic_failover_enabled = false# Password is auto-generated by Terraform and stored in AWS Secrets Manager.# Retrieve: aws secretsmanager get-secret-value \# --secret-id <name_prefix>/valkey/fusion \# --region <aws_region> # 4. Init, plan, applycd "$DST"tofu init # connects to the same S3 backendtofu plan -out tfplantofu apply tfplanNo state file copying is required — both release directories point to the same remote state.
Rollback
From Option A (local backend)
xxxxxxxxxxcd <current-release>/terraform/awscp terraform.tfstate terraform.tfstate.failedcp <previous-release>/terraform/aws/terraform.tfstate.pre-upgrade \ <previous-release>/terraform/aws/terraform.tfstatecd <previous-release>/terraform/awstofu init -upgradetofu apply # re-converges to the previous baselineFrom Option B (remote backend)
S3 bucket versioning (enabled in backend.tf.example) lets you restore the
previous state object:
xxxxxxxxxxaws s3api list-object-versions --bucket <state-bucket> --prefix <state-key>aws s3api get-object --bucket <state-bucket> --key <state-key> \ --version-id <pre-upgrade-version-id> terraform.tfstate.rollback# Then push it back with `tofu state push terraform.tfstate.rollback`Helm / Helmfile upgrade (application layer)
The chart upgrade procedure itself (lint → helmfile diff → helmfile sync,
release ordering, rollback) is already documented in
mocm/README.md → Upgrade. Follow that document for
the actual commands.
What is specific to a cross-release upgrade and easy to get wrong:
- Do NOT copy
<previous-release>/mocm/values.yamlover<current-release>/mocm/values.yaml. The newvalues.yamlships with placeholders (< REPLACE_VALUE_* >) and may contain new keys that did not exist in the previous release. A blind copy will silently drop those new keys. Correct workflow — merge, don't overwrite:
xxxxxxxxxx# Keep the new file as the base, copy your real values into itcp <current-release>/mocm/values.yaml <current-release>/mocm/values.yaml.new# Open both side-by-side and port your credentials / host / productKey /# storage / componentReplicas from the previous release into the new file.diff -u <current-release>/mocm/values.yaml.new \ <previous-release>/mocm/values.yaml | less- Bump the image tag in
global.image.tag:
xxxxxxxxxxglobal: image: registry: "<id>.dkr.ecr.<region>.amazonaws.com" tag: "10.6.2605" # was the previous build, e.g. "10.5.2603" pullPolicy: IfNotPresentMake sure the new images are already pushed to ECR
(cd <current-release>/images && ./loadimage.sh) before running
helmfile sync, otherwise pods will go into ImagePullBackOff.
- ** Run a diff first— the 3-release ordering
(mocm-bootstrap-1 → mocm-bootstrap-2 → mocm-service) is enforced by
Helmfile, but you should still preview the diff:
xxxxxxxxxxcd <current-release>/mocmhelmfile -f helmfile.yaml -n fusion diffThen apply with the standard command from mocm/README.md:
xxxxxxxxxxhelmfile -f helmfile.yaml -n fusion sync 2>&1 | tee helmfile-upgrade.log- **
Helm rollback is per-release, not per-package. If
mocm-servicefails to upgrade, roll back only that release:
xxxxxxxxxxhelm rollback mocm-service -n fusionYou do not need to roll back the Terraform layer just because the Helm layer failed.
- Chart-level changes are tracked in
mocm/CHANGELOG.md. Read the entry for the version shipped with this release before upgrading — any### Changedor### Removeditem there may require values-file edits beyond the steps above.
Stage 4 — Decommission ElastiCache (Redis) after Valkey cutover
After Path A/B + the Helm upgrade above, both caches are running so applications can be migrated without downtime:
| Resource | State after the upgrade apply |
|---|---|
aws_elasticache_cluster.elasticache (Redis) | Still serving live traffic |
aws_elasticache_replication_group.valkey | Created, idle, ready to use |
Run Stage 4 only after every workload that used the old Redis is pointing
at the Valkey endpoint (stored in Secrets Manager as
<name_prefix>/valkey/<valkey_username>) and you have observed Valkey
serving traffic for at least one full business cycle.
4.1 Verify nothing still depends on ElastiCache (Redis)
xxxxxxxxxx# Connections to the old cache cluster should be 0 over the observation window.aws cloudwatch get-metric-statistics \ --namespace AWS/ElastiCache \ --metric-name CurrConnections \ --dimensions Name=CacheClusterId,Value=<name_prefix>-cache \ --statistics Maximum --period 300 \ --start-time "$(date -u -d '24 hours ago' +%FT%TZ)" \ --end-time "$(date -u +%FT%TZ)"Also grep the live Helm values for any hostnames still pointing at the old endpoint:
xxxxxxxxxxgrep -RnE 'elasticache|redis\.amazonaws\.com' <current-release>/mocm/If anything matches, finish the application-layer cutover before continuing.
4.2 (Optional) Take a final snapshot
If you want a last-chance restore point before destroying the cluster:
xxxxxxxxxxaws elasticache create-snapshot \ --cache-cluster-id <name_prefix>-cache \ --snapshot-name <name_prefix>-cache-final4.3 Rename the file back so OpenTofu plans a destroy
xxxxxxxxxxcd <current-release>/terraform/awsmv 09-elasticache.tf 09-elasticache.tf.2603Renaming back to .2603 deactivates the resource block and its
co-located elasticache_* variables in one step (they live in the same
file — see the header comment in 09-elasticache.tf.2603).
4.4 Plan and apply the teardown
xxxxxxxxxxtofu plan -out tfplan# Expect destroys for:# aws_elasticache_cluster.elasticache# aws_elasticache_parameter_group.elasticache# aws_elasticache_subnet_group.elasticache# No destroy or replace on any *valkey* resource — if you see one, STOP. tofu apply tfplanThe matching security group aws_security_group.elasticache is defined in
02-sg.tf and is not touched by renaming 09-elasticache.tf. If
nothing else references it after teardown, remove it in a follow-up commit.