TL;DR

Installing ArgoCD is just the beginning. This guide walks you through the complete journey - from initial setup to managing multi-cluster deployments, automated resource limits, HPA configuration, and scaling to dozens of applications across multiple environments. What starts as a simple kubectl apply evolves into a production-grade GitOps control tower.


The Moment Before Everything Changes

There’s a point in every Kubernetes journey where deployments stop being “inconvenient” and start becoming dangerous.

Multiple microservices. Multiple environments. More than one cluster. A few manual kubectl apply commands here, a hotfix there, and suddenly no one can answer a simple question with confidence:

What exactly is running in production right now?

That’s usually when teams don’t go looking for a new tool. They go looking for control.

This article begins at that moment.


Act I: The First Steps

Installing ArgoCD - The Foundation

ArgoCD becomes your GitOps control tower. But before it can manage dozens of applications across multiple clusters, you need to install it. And like all good journeys, it starts with a single command.

Creating the Foundation

First, we need a home for ArgoCD. A dedicated namespace keeps things organized and isolated:

1
kubectl create namespace argocd

One line. One namespace. The foundation is laid.

The Installation Moment

Now comes the magic. ArgoCD provides a single manifest that bootstraps everything:

1
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

What just happened? In seconds, you deployed:

  • The ArgoCD API server
  • The repository server
  • The application controller
  • Redis for caching
  • The web UI

All running in your Kubernetes cluster, ready to transform how you deploy applications.

The Secret Handshake

ArgoCD generates an initial admin password during installation. It’s stored as a Kubernetes secret, base64-encoded (because that’s what Kubernetes does):

1
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

Save this password. You’ll need it in a moment.

Pro tip: The original script has a typo (base64 –d with an en-dash instead of a hyphen). It’s the little things that teach you to always test your scripts.


Act II: The CLI - Your Command-Line Companion

Why the CLI Matters

The ArgoCD web UI is beautiful. It’s visual, intuitive, and perfect for exploring. But when you’re managing dozens of applications across multiple clusters, automation is king. That’s where the CLI shines.

Getting the ArgoCD CLI

The installation is straightforward:

1
2
3
4
5
6
7
8
# Download the latest version
curl -sSL -o argocd-linux-amd64 https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64

# Install it with proper permissions
sudo install -m 555 argocd-linux-amd64 /usr/local/bin/argocd

# Clean up
rm argocd-linux-amd64

The -m 555 permission means: readable and executable by everyone, writable by no one. Security by design.

The First Login

Before you can use the CLI, you need to authenticate:

1
argocd login localhost:8080

Wait, localhost? By default, the ArgoCD server isn’t exposed externally. You’ll need to either:

  1. Port-forward: kubectl port-forward svc/argocd-server -n argocd 8080:443
  2. Set up an Ingress (recommended for production)
  3. Use a LoadBalancer or NodePort service

When prompted:

  • Username: admin
  • Password: The one you retrieved from the secret earlier

Once authenticated, the CLI stores your session. No more password prompts for every command.


Act III: Building the Ecosystem

Adding Kubernetes Clusters

Here’s where ArgoCD’s power becomes apparent. You can manage applications across multiple Kubernetes clusters from a single ArgoCD instance.

1
argocd cluster add <cluster_name>

What happens behind the scenes?

  1. ArgoCD reads your ~/.kube/config
  2. It creates a ServiceAccount in the target cluster
  3. It stores the credentials securely
  4. Now you can deploy to that cluster from ArgoCD

The multi-cluster moment: Imagine managing dev, staging, and production clusters from one control tower. That’s the power you just unlocked.

Connecting to Git Repositories

ArgoCD needs access to your Git repositories to pull manifests. For private repositories, you’ll need credentials:

1
2
3
4
argocd repo add https://gitlab.com/source/repo/path.git \
  --name <repo_name> \
  --username <username> \
  --password <password>

Security note: Instead of passwords, consider using:

  • SSH keys for Git authentication
  • Personal access tokens with limited scope
  • GitHub Apps or GitLab Deploy Tokens

The command format stays similar, but your security posture improves dramatically.

Organizing with Projects

ArgoCD Projects are like folders for your applications. They provide:

  • Logical grouping (by team, by environment, by product)
  • RBAC boundaries (who can deploy what, where)
  • Source and destination restrictions (prevent accidental production deployments)
1
2
3
argocd proj create login-srv \
  -d https://destinationcluster.com,namespace \
  -s https://gitlab.com/source/repo/path.git

What this says:

  • Create a project named login-srv
  • Allow deployments to the specified destination cluster and namespace
  • Allow sourcing manifests from the specified Git repository

Think of projects as security gates. They prevent mistakes like deploying dev code to production.


Act IV: The Application Dance

Creating Your First Application

Applications are where the rubber meets the road. This is where you tell ArgoCD: “Deploy this code, from this Git repo, to that cluster.”

1
2
3
4
5
6
7
8
9
argocd app create login-srv \
  --repo https://gitlab.com/source/repo/path.git \
  --path <path> \
  --revision main \
  --dest-server https://destinationcluster.com \
  --dest-namespace <namespace> \
  --sync-policy automated \
  --self-heal \
  --project <project_name>

Let’s break down the flags:

--repo: Where your Kubernetes manifests live (Git repository)

--path: The directory within the repo containing manifests

--revision: The Git branch, tag, or commit to track

--dest-server: The Kubernetes cluster to deploy to

--dest-namespace: The namespace within that cluster

--sync-policy automated: Automatically sync when Git changes

--self-heal: If someone does a manual kubectl apply, ArgoCD reverts it to match Git

--project: The ArgoCD project for RBAC and organization

The Automated vs Manual Decision

Notice the --sync-policy automated flag? This is where philosophy meets practice.

Automated sync means:

  • Git commit → ArgoCD automatically deploys
  • Perfect for dev/staging environments
  • GitOps at its purest: Git is the single source of truth

Manual sync means:

  • Git commit → ArgoCD detects changes but waits
  • You review the diff in the UI
  • You click “Sync” when ready
  • Perfect for production environments where you want that human checkpoint

Our approach? Automated for dev, manual for production. Best of both worlds.

Updating Applications

Applications evolve. Repositories change, namespaces shift, configurations update. Instead of deleting and recreating, you can update in place:

1
2
3
4
5
6
7
8
9
argocd app set test2 \
  --repo https://gitlab.com/new/source/repo/path.git \
  --path <new_path> \
  --revision main \
  --dest-server https://newdestinationcluster.com \
  --dest-namespace <new_namespace> \
  --sync-policy automated \
  --self-heal \
  --project <new_project_name>

ArgoCD reconfigures the application without losing history or state. Smooth.

Manual Sync - When You Need Control

Sometimes automated isn’t right. For production deployments, you want that manual approval step:

1
argocd app sync <application_name>

One command. Deployment happens. Confidence intact.


Act V: The Scaling Challenge

The Sudden Realization

Everything’s working beautifully. You have ArgoCD managing your applications. Git commits trigger deployments. Life is good.

Then you realize: you have 50+ microservices across 3 environments. That’s 150+ deployments. And each one needs:

  • Resource limits (CPU and memory)
  • Resource requests (for proper scheduling)
  • Horizontal Pod Autoscaling (HPA) rules

Doing this manually? That’s weeks of tedious YAML editing. There has to be a better way.

The CSV-Driven Solution

What if you could define all your resource configurations in a CSV file and let a script handle the rest?

1
2
3
4
repo_path,branch,deploy_name,mem_request,cpu_request,mem_limit,cpu_limit
/path/to/repo1,dev,frontend-app,256,100,512,200
/path/to/repo2,staging,backend-api,512,200,1024,400
/path/to/repo3,prod,worker-service,1024,500,2048,1000

One CSV file. All your configurations in one place. Version controlled. Easy to review.

The Resource Limits Script

This script reads the CSV and automatically patches Kustomization files:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#!/bin/bash

CSV_FILE="manifest-repo-details.csv"

while IFS=',' read -r repo_path branch deploy_name mem_request cpu_request mem_limit cpu_limit
do
  # Skip the header line
  if [[ "$repo_path" == "repo_path" ]]; then
    continue
  fi

  # Export variables for templating
  export REPO_PATH="$repo_path"
  export BRANCH="$branch"
  export DEPLOY_NAME="$deploy_name"
  export CPU_REQUEST="$cpu_request"
  export CPU_LIMIT="$cpu_limit"
  export MEM_REQUEST="$mem_request"
  export MEM_LIMIT="$mem_limit"

  # Generate the kustomization.yaml with resource patches
  cat > /tmp/kustomization.yaml <<EOF
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - kube-config.yaml

patches:
  - target:
      kind: Deployment
      name: "$DEPLOY_NAME"
    patch: |-
      - op: add
        path: /spec/template/spec/containers/0/resources
        value:
          requests:
            memory: "${MEM_REQUEST}Mi"
            cpu: "${CPU_REQUEST}m"
          limits:
            memory: "${MEM_LIMIT}Mi"
            cpu: "${CPU_LIMIT}m"
EOF

  # Navigate to the repository
  cd "$REPO_PATH" || { echo "Repository path not found"; exit 1; }

  # Checkout the right branch and pull latest
  git checkout "$BRANCH"
  git pull --all

  # Copy the generated kustomization file
  cp /tmp/kustomization.yaml .

  # Commit and push
  git add .
  git commit -m "Added kustomization.yaml for $REPO_PATH"
  git push
  cd -
done < "$CSV_FILE"

What just happened?

  1. Read CSV file line by line
  2. For each application, generate a Kustomization patch
  3. Clone/navigate to the Git repository
  4. Commit the kustomization file
  5. Push to Git
  6. ArgoCD detects the change and syncs

One script. All applications updated. Time saved: countless hours.

The HPA Automation

Horizontal Pod Autoscaling is critical for handling traffic spikes. But configuring HPA for 50+ applications? Another CSV-driven script to the rescue:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#!/bin/bash

CSV_FILE="manifest-hpa-details.csv"

while IFS=',' read -r namespace app repo_path branch deploy_name min_replicas max_replicas target_cpu_utilization
do
  # Skip header
  if [[ "$repo_path" == "repo_path" ]]; then
    continue
  fi

  # Export variables
  export NAMESPACE="$namespace"
  export REPO_PATH="$repo_path"
  export BRANCH="$branch"
  export DEPLOY_NAME="$deploy_name"
  export MIN_REPLICAS="$min_replicas"
  export MAX_REPLICAS="$max_replicas"
  export TARGET_CPU_UTILIZATION="$target_cpu_utilization"

  # Generate HPA configuration
  cat > /tmp/hpa.yaml <<EOF
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: "${DEPLOY_NAME}-${BRANCH}"
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: "${DEPLOY_NAME}"
  minReplicas: ${MIN_REPLICAS}
  maxReplicas: ${MAX_REPLICAS}
  targetCPUUtilizationPercentage: ${TARGET_CPU_UTILIZATION}
EOF

  # Navigate to repo, checkout branch
  cd "$REPO_PATH" || { echo "Repository path not found"; exit 1; }
  git checkout "$BRANCH"
  git pull --all

  # Copy HPA file
  cp /tmp/hpa.yaml .

  # Commit and push
  git add .
  git commit -m "Added HPA configuration for $REPO_PATH"
  git push
  cd -
done < "$CSV_FILE"

Now your applications automatically scale based on CPU utilization. Traffic spike? More pods spin up. Traffic drops? Pods scale down. Cost optimized. Performance maintained.


Act VI: The Multi-Project Orchestra

The GitOps Repository Structure - Foundation of Everything

Before we dive into automation scripts, let’s talk about the most critical decision you’ll make: how to structure your GitOps repository.

This isn’t just about organizing files. Your repository structure IS your deployment architecture. Get it right, and everything else falls into place. Get it wrong, and you’ll fight it forever.

The Branching Strategy

ArgoCD tracks Git branches to determine what gets deployed where. The pattern is elegant:

One Repository, Multiple Branches, Different Environments

1
2
3
4
5
6
Git Repository: project-deployment-manifests.git
├── dev branch      → Development clusters
├── staging branch  → Staging clusters
├── uat branch      → UAT/Testing clusters
├── prod branch     → Production clusters
└── main branch     → Production (alternative naming)

The workflow:

  1. Developer commits manifest changes to dev branch
  2. ArgoCD auto-syncs to dev cluster
  3. After testing, merge devstaging
  4. ArgoCD auto-syncs to staging cluster
  5. After validation, merge stagingprod
  6. ArgoCD syncs to production (auto or manual, your choice)

Why this works:

  • Git is the approval mechanism: Merging between branches = promotion between environments
  • Easy rollback: git revert on the branch = instant rollback
  • Clear audit trail: Git history shows exactly when and who promoted to production
  • Branch protection: Protect prod branch, require reviews for merges

The Directory Structure Convention

Inside each branch, organize manifests by organizational structure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
manifests/
├── <organization-name>/
   ├── client-projects/          # Client-specific projects
      ├── <client-name>/
         ├── <project-name>/
            ├── services/     # Backend microservices
               ├── auth-service/
                  ├── deployment.yaml
                  ├── service.yaml
                  └── kustomization.yaml
               ├── api-gateway/
               └── notification-service/
            ├── ui/           # Frontend applications
               ├── web-dashboard/
               └── mobile-app/
            └── <env>-common-manifests/  # Shared configs per env
                ├── configmap.yaml
                ├── secrets.yaml (sealed)
                └── network-policy.yaml
         └── another-project/
   ├── internal-tools/           # Internal products
      ├── monitoring/
      ├── ci-cd/
      └── admin-portal/
   └── data-platforms/           # Data/Analytics platforms
       ├── data-pipeline/
       ├── analytics-ui/
       └── ml-services/

Real-world example from the scripts:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
manifests/acme-org/
├── client-projects/
   ├── client-alpha/
      └── project-phoenix/
          ├── services/
             └── phoenix-backend/
          ├── ui/
             └── phoenix-web-v2/
          └── phoenix-dev-common-manifests/
   └── client-beta/
       ├── services/
          ├── beta-service/
          ├── beta-auth/
          ├── beta-admin/
          └── beta-api-gateway/
       ├── dealer-portal/
          └── services/
              ├── internal-service/
              └── api-gateway/
       ├── ui/web/
          └── beta-frontend/
       └── beta-dev-common-manifests/
├── compliance-tools/
   └── compliance-app/
       ├── io/
          └── compliance-monolith-srv/
       ├── ui/
          ├── dashboard/
          └── compliance-admin/
       └── compliance-v1-common-manifests/
└── analytics-platform/
    ├── analytics-ui/
    ├── analytics-ml/
    ├── microservices/
       ├── core/
       └── connector/
    └── analytics-dev-common-manifests/

Notice the patterns:

  • Top-level: Organization name (acme-org)
  • Second-level: Project category (client-projects, compliance-tools, analytics-platform)
  • Third-level: Specific client or product name
  • Fourth-level: Service type (services, ui, io)
  • Bottom-level: Individual microservice with its manifests

The Path Convention in ArgoCD Applications

When creating ArgoCD applications, the --path flag points to these directories:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# For a microservice in a client project
--path manifests/acme-org/client-projects/client-beta/services/beta-service

# For a UI application
--path manifests/acme-org/client-projects/client-beta/ui/web/beta-frontend

# For common manifests
--path manifests/acme-org/client-projects/client-beta/beta-dev-common-manifests

# For internal tools
--path manifests/acme-org/compliance-tools/compliance-app/ui/dashboard

# For data platform services
--path manifests/acme-org/analytics-platform/microservices/core

Why this depth?

  • Clarity: No ambiguity about what each path contains
  • Scalability: Add new clients/projects without restructuring
  • Multi-tenancy: Different teams can work independently
  • RBAC friendly: Restrict ArgoCD project access by path patterns

Environment-Specific Variations

The same path exists in each branch, but contents differ:

In dev branch:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# manifests/.../beta-app-service/deployment.yaml
spec:
  replicas: 1
  template:
    spec:
      containers:
      - name: beta-app-service
        image: registry.gitlab.com/company/beta-app-service:dev-latest
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"

In prod branch (same path):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# manifests/.../beta-app-service/deployment.yaml
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: beta-app-service
        image: registry.gitlab.com/company/beta-app-service:v1.2.3
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"

Same path, different config per environment = GitOps magic

The Common Manifests Pattern Explained

Every environment has a -common-manifests directory:

1
2
3
4
5
6
beta-app-dev-common-manifests/
├── configmap.yaml          # Environment variables
├── sealed-secrets.yaml     # Encrypted secrets
├── network-policy.yaml     # Network isolation rules
├── resource-quota.yaml     # Namespace limits
└── kustomization.yaml      # Kustomize orchestration

Why separate common manifests?

  1. Deploy order control: Deploy common manifests BEFORE applications
  2. Shared configuration: One ConfigMap used by multiple services
  3. Environment isolation: Dev secrets ≠ Prod secrets
  4. Single source of truth: Update one ConfigMap, all services get it

How to use:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# First, create the common manifests application
argocd app create do-beta-app-dev-apps.common-manifests.dev \
  --repo https://gitlab.com/company/manifests.git \
  --path manifests/acme-org/client-projects/beta-app/beta-app-dev-common-manifests \
  --revision dev \
  --dest-server https://cluster.com \
  --dest-namespace beta-app-dev-apps \
  --sync-policy automated \
  --project do-beta-app-dev-apps

# THEN create individual service applications that depend on it
argocd app create do-beta-app-dev-apps.beta-app-service.dev \
  --repo https://gitlab.com/company/manifests.git \
  --path manifests/acme-org/client-projects/beta-app/services/beta-app-service \
  --revision dev \
  --dest-server https://cluster.com \
  --dest-namespace beta-app-dev-apps \
  --sync-policy automated \
  --project do-beta-app-dev-apps

Order matters! Common manifests first, applications second.

Multi-Organization Support

For companies managing multiple organizations or clients:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
manifests/
├── org-alpha/              # Client Alpha's projects
│   ├── project-x/
│   └── project-y/
├── org-beta/               # Client Beta's projects
│   ├── project-z/
│   └── project-w/
└── internal/               # Internal company projects
    ├── hr-system/
    └── finance-app/

ArgoCD Projects map to organizations:

1
2
3
4
5
6
7
8
# Create separate ArgoCD projects per organization
argocd proj create org-alpha-projects \
  -d https://cluster.com,org-alpha-* \
  -s https://gitlab.com/manifests.git

argocd proj create org-beta-projects \
  -d https://cluster.com,org-beta-* \
  -s https://gitlab.com/manifests.git

RBAC benefits:

  • Alpha team can only deploy to org-alpha-* namespaces
  • Beta team can only deploy to org-beta-* namespaces
  • Platform team can deploy to all

The Revision Strategy: Branches vs Tags

You have choices for the --revision flag:

Branch-based (recommended for most cases):

1
2
3
--revision dev      # Tracks dev branch, auto-updates
--revision staging  # Tracks staging branch, auto-updates
--revision prod     # Tracks prod branch, auto-updates

Tag-based (recommended for production with strict change control):

1
2
--revision v1.2.3        # Pinned to specific release
--revision release-2024-01-15

Commit SHA-based (for debugging/rollback):

1
--revision a1b2c3d4      # Pinned to exact commit

Our recommendation:

  • Dev/Staging: Use branches for automatic updates
  • Production: Use branches with manual sync policy, OR use tags for maximum control

Repository Structure Best Practices

DO:

  • ✅ Use consistent naming across all paths
  • ✅ Keep manifests close to application structure
  • ✅ Use Kustomize for environment-specific overrides
  • ✅ Version control everything (even secrets via sealed-secrets)
  • ✅ Document your structure in the repository README

DON’T:

  • ❌ Mix application code and manifests in the same repo (separate concerns)
  • ❌ Store secrets unencrypted (use sealed-secrets, SOPS, or Vault)
  • ❌ Use deeply nested paths (3-5 levels max)
  • ❌ Create one-off directory structures (consistency > special cases)
  • ❌ Skip the common-manifests pattern (you’ll regret it at scale)

Validating Your Structure

Before deploying, validate your manifest structure:

1
2
3
4
5
6
7
8
9
# Check all manifests in a path are valid YAML
find manifests/acme-org/client-projects/beta-app -name "*.yaml" -exec yamllint {} \;

# Validate Kubernetes manifests
find manifests/acme-org/client-projects/beta-app -name "*.yaml" -exec kubectl apply --dry-run=client -f {} \;

# Test Kustomize builds
cd manifests/acme-org/client-projects/beta-app/services/beta-app-service
kustomize build .

The Template That Changes Everything

When you’re managing multiple projects across multiple environments (dev, staging, production), you need a systematic approach. Enter the template-based deployment script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/bin/bash
## CSV Format: PROJ,NAMESPACE,APP,PATH,BRANCH,REPO,CLUSTER

echo "Updating kubernetes manifests in $(kubectl config current-context)"

cat $1 | while read -r record; do
  export PROJ=$(echo $record | awk -F ',' '{print $2}')
  export NAMESPACE=$(echo $record | awk -F ',' '{print $3}')
  export APP=$(echo $record | awk -F ',' '{print $4}')
  export PATH=$(echo $record | awk -F ',' '{print $5}')
  export BRANCH=$(echo $record | awk -F ',' '{print $6}')
  export REPO=$(echo $record | awk -F ',' '{print $7}')
  export CLUSTER=$(echo $record | awk -F ',' '{print $8}' | tr -d '\r')

  # Create namespace
  kubectl create ns $NAMESPACE

  # Create ArgoCD project
  argocd proj create $PROJ -d $CLUSTER,$NAMESPACE -s $REPO

  # Create ArgoCD application
  argocd app create $APP \
    --repo $REPO \
    --path $PATH \
    --revision dev \
    --dest-server $CLUSTER \
    --dest-namespace $NAMESPACE \
    --sync-policy automated \
    --self-heal \
    --project $PROJ
done

The power of this approach:

  1. Define once: Create a CSV with all your applications
  2. Deploy anywhere: Switch Kubernetes context, run the script
  3. Consistency: Every application follows the same pattern
  4. Auditability: The CSV file is version controlled
  5. Scalability: Adding a new application is just a new CSV line

Real-World Example: Multi-Environment Deployment

Let’s look at a real pattern from the project - deploying an application across dev, staging, and production:

Dev Environment:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
argocd proj create do-analytics-dev-apps \
  -d https://cluster-dev.k8s.ondigitalocean.com,analytics-dev-apps \
  -s https://gitlab.com/company/gitops/manifests.git

argocd app create do-analytics-dev-apps.analyticsui.dev \
  --repo https://gitlab.com/company/gitops/manifests.git \
  --path manifests/analytics-platform/analyticsui \
  --revision dev \
  --dest-server https://cluster-dev.k8s.ondigitalocean.com \
  --dest-namespace analytics-dev-apps \
  --sync-policy manual \
  --project do-analytics-dev-apps

Staging Environment:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
argocd proj create do-analytics-staging-apps \
  -d https://cluster-staging.k8s.ondigitalocean.com,analytics-staging-apps \
  -s https://gitlab.com/company/gitops/manifests.git

argocd app create do-analytics-staging-apps.analyticsui.staging \
  --repo https://gitlab.com/company/gitops/manifests.git \
  --path manifests/analytics-platform/analyticsui \
  --revision staging \
  --dest-server https://cluster-staging.k8s.ondigitalocean.com \
  --dest-namespace analytics-staging-apps \
  --sync-policy automated \
  --self-heal \
  --project do-analytics-staging-apps

Production Environment:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
argocd proj create do-analytics-prod-apps \
  -d https://cluster-prod.k8s.ondigitalocean.com,analytics-prod-apps \
  -s https://gitlab.com/company/gitops/manifests.git

argocd app create do-analytics-prod-apps.analyticsui.prod \
  --repo https://gitlab.com/company/gitops/manifests.git \
  --path manifests/analytics-platform/analyticsui \
  --revision prod \
  --dest-server https://cluster-prod.k8s.ondigitalocean.com \
  --dest-namespace analytics-prod-apps \
  --sync-policy automated \
  --self-heal \
  --project do-analytics-prod-apps

The pattern emerges:

  • Same repository, different branches (dev/staging/prod)
  • Separate projects per environment
  • Different sync policies (manual for dev testing, automated for staging/prod)
  • Clear naming convention: <cluster>-<project>-<env>.{app}.{env}

Act VII: The Scripts Toolkit

Throughout this journey, we’ve created a powerful toolkit. Let’s catalog what we have:

Installation Scripts

01_argocd_installation.sh: Bootstrap ArgoCD in the cluster 02_argocd_cli_installation.sh: Install the CLI tool 03_argocd_cli_login.sh: Authenticate with ArgoCD

Configuration Scripts

04_argocd_cli_cluster_addition.sh: Register new Kubernetes clusters 05_argocd_cli_repo_addition.sh: Connect Git repositories 06_argocd_cli_proj_addition.sh: Create ArgoCD projects

Application Management Scripts

07_argocd_cli_app_addition.sh: Create new applications 08_argocd_cli_app_updation.sh: Update application configurations 09_argocd_cli_app_sync.sh: Manually trigger synchronization

Automation Scripts

10_resource_limits_patch_kustomization.sh: Bulk apply resource limits via CSV 11_hpa.sh: Bulk configure HPA rules via CSV

Project-Specific Scripts

project_commands/e2e/: End-to-end deployment scripts for E2E environments project_commands/do/: DigitalOcean cluster deployment scripts project_commands/e2e/template-argo-create.sh: Template for new project deployments

Each script is a tool. Together, they form a comprehensive GitOps automation framework.


The Key Learnings

1. Start Simple, Scale Smart

Don’t try to automate everything on day one. Start with:

  1. Install ArgoCD
  2. Deploy one application manually
  3. Understand the workflow
  4. Then automate

2. Git is the Single Source of Truth

Once you adopt ArgoCD, resist the urge to do manual kubectl apply commands. If it’s not in Git, it shouldn’t be in the cluster. This discipline is what makes GitOps powerful.

3. Projects Prevent Disasters

Use ArgoCD Projects to create guardrails:

  • Developers can only deploy to dev namespaces
  • Production deployments require specific approvals
  • Cross-environment accidents become impossible

4. Sync Policies Matter

Automated sync works when:

  • You trust your CI/CD pipeline
  • Rollbacks are easy
  • The environment is non-critical

Manual sync is better when:

  • Human review is required
  • Changes have high impact
  • Compliance requires approval workflows

5. Automation Scales, Manual Doesn’t

Managing 5 applications manually? Doable. Managing 50? Painful. Managing 500? Impossible.

CSV-driven automation isn’t overkill - it’s survival.


Troubleshooting Tales

Problem: “Application is OutOfSync”

Symptom: ArgoCD dashboard shows red, application is out of sync

Common causes:

  1. Someone did a manual kubectl apply (drift detected)
  2. Git repository was updated but auto-sync is disabled
  3. Kustomize build failed due to invalid YAML

Solution:

1
2
3
4
5
6
7
8
# Check what's different
argocd app diff <app-name>

# If drift, resync from Git
argocd app sync <app-name>

# If you want to keep manual changes (not recommended)
argocd app sync <app-name> --prune=false

Problem: “Authentication Failed”

Symptom: CLI commands fail with authentication errors

Solution:

1
2
3
4
5
# Re-login
argocd login <argocd-server-url>

# Or use admin password directly
argocd login <argocd-server-url> --username admin --password <password>

Problem: “Cluster Not Found”

Symptom: Application creation fails, can’t find destination cluster

Solution:

1
2
3
4
5
# List registered clusters
argocd cluster list

# Add the missing cluster
argocd cluster add <cluster-context-name>

Problem: “Sync Failed with Kustomize Error”

Symptom: Application won’t sync, Kustomize build errors

Solution:

1
2
3
4
5
6
7
8
# Manually test Kustomize build
cd <repo-path>
kustomize build .

# Common issues:
# - Missing resources in kustomization.yaml
# - Invalid patches
# - Circular dependencies

Production Considerations

High Availability

For production ArgoCD deployments:

1
2
3
4
5
helm install argocd argo/argo-cd \
  --set replicaCount=3 \
  --set redis-ha.enabled=true \
  --namespace argocd \
  --create-namespace

Run multiple replicas of the API server and use Redis HA for session management.

Backup and Disaster Recovery

ArgoCD configuration is stored in Kubernetes. Back up:

1
2
3
4
5
6
7
8
# Export all applications
argocd app list -o yaml > argocd-apps-backup.yaml

# Export all projects
argocd proj list -o yaml > argocd-projects-backup.yaml

# Backup the entire namespace
kubectl get all,cm,secret -n argocd -o yaml > argocd-full-backup.yaml

Monitoring and Alerting

Key metrics to track:

  • Application sync status (how many are out of sync?)
  • Sync failures (what’s breaking?)
  • API server health (is ArgoCD itself healthy?)
  • Repository connection status (can ArgoCD reach Git?)

Integrate with Prometheus and Grafana for visibility.

Access Control

Use ArgoCD’s RBAC to limit permissions:

1
2
3
4
5
6
7
# argocd-rbac-cm ConfigMap
policy.csv: |
  p, role:developers, applications, sync, default/*, allow
  p, role:developers, applications, get, default/*, allow
  p, role:ops, applications, *, */*, allow
  g, engineering-team, role:developers
  g, ops-team, role:ops

Developers can sync applications, but only ops can delete them.


Act VIII: The Migration Chronicles - Moving Clusters Like a Pro

The Migration Wake-Up Call

Remember that scenario from the beginning? The one where you’re asked to migrate Kubernetes clusters, possibly multiple times, and the panic sets in?

That’s not a hypothetical. That’s real life.

When cloud costs spike, when performance isn’t meeting expectations, or when compliance requires moving to a different provider - cluster migrations become inevitable. Without ArgoCD, this is a nightmare. With ArgoCD? It’s actually manageable.

Let’s walk through the real-world patterns that make cluster migration smooth.

The Multi-Cloud Reality: E2E, DigitalOcean, and AWS EKS

In real production environments, you rarely stick to one cloud provider. Cost optimization, client requirements, compliance, and redundancy often mean you’re managing a multi-cloud Kubernetes estate.

Our real deployment spans three infrastructure types:

E2E/Testing Cluster: On-premises or dedicated testing infrastructure

1
2
# E2E cluster endpoint
CLUSTER_E2E="https://116.204.172.18:6443"

DigitalOcean Kubernetes Clusters: For cost-effective development and staging

1
2
3
4
5
6
7
8
# Dev cluster
CLUSTER_DO_DEV="https://222968ad-5fde-40ed-9856-6b44757f7f45.k8s.ondigitalocean.com"

# Staging cluster (migrated to production-grade cluster)
CLUSTER_DO_STAGING="https://b4fd5ff3-7743-4d58-aa2a-044ec115c2d1.k8s.ondigitalocean.com"

# Production cluster
CLUSTER_DO_PROD="https://67860677-ba01-4c64-aea4-981fde9b5fc6.k8s.ondigitalocean.com"

AWS EKS Clusters: For enterprise-grade production workloads

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# UAT cluster (ap-south-1 region)
CLUSTER_AWS_UAT="https://B436343EB36A2980766273BAA9BD9F82.gr7.ap-south-1.eks.amazonaws.com"

# Staging cluster (ap-south-1 region)
CLUSTER_AWS_STAGING="https://8EDDBF9E1F2D02882A9E7D338AE92018.gr7.ap-south-1.eks.amazonaws.com"

# Production cluster 1 (ap-south-1 region)
CLUSTER_AWS_PROD_1="https://3CCD9E5C2236A7E09F13EC4EB6A6B183.gr7.ap-south-1.eks.amazonaws.com"

# Production cluster 2 (ap-south-1 region)
CLUSTER_AWS_PROD_2="https://42726EBBABB07D7E9C146B5D8F73DDBA.gr7.ap-south-1.eks.amazonaws.com"

The multi-cloud pattern: Test in E2E → Validate in DO → Deploy to AWS EKS for production scale.

Why Multi-Cloud? The Business Reality

You might wonder: “Why not just pick one cloud provider and stick with it?”

Here’s the reality we faced:

Cost Optimization: DigitalOcean offered better pricing for dev/staging workloads. AWS EKS provided enterprise features for production.

Client Requirements: Some clients require AWS for compliance reasons. Others are cost-sensitive and prefer DigitalOcean.

Risk Mitigation: If one cloud provider has an outage, critical services can fail over to another.

Feature Availability: AWS EKS offers advanced features (IAM integration, managed node groups, Fargate) that aren’t available everywhere.

Geographic Distribution: Different cloud providers have different regional availability.

ArgoCD makes managing this complexity not just possible, but actually manageable.

The Three-Prefix Naming Convention

Notice how the naming convention evolved for multi-cloud:

Format: <cloud-prefix>-<project>-<environment>-apps.<service>.<environment>

E2E Cluster Applications:

1
2
e2e-myapp-dev-apps.backend.dev
e2e-myapp-dev-apps.frontend.dev

DigitalOcean Cluster Applications:

1
2
do-myapp-staging-apps.backend.staging
do-myapp-prod-apps.backend.prod

AWS EKS Cluster Applications:

1
2
aws-myapp-uat-apps.backend.uat
aws-myapp-prod-apps.backend.prod

Why this matters even more in multi-cloud:

  1. Instant cloud identification: argocd app list | grep "aws-" shows all AWS deployments
  2. No cross-cloud collisions: Same app name can exist in DO and AWS simultaneously
  3. Clear billing attribution: Know which cloud is running what
  4. Disaster recovery: Quickly identify which apps need to failover to which cloud

AWS EKS-Specific Considerations

AWS EKS has unique characteristics that affect your ArgoCD deployments:

EKS Cluster Endpoint Format

EKS cluster endpoints follow a specific pattern:

1
https://<CLUSTER_ID>.gr7.<REGION>.eks.amazonaws.com

Examples from our deployment:

  • https://B436343EB36A2980766273BAA9BD9F82.gr7.ap-south-1.eks.amazonaws.com
  • Region: ap-south-1 (Mumbai)
  • The long alphanumeric ID is unique to each EKS cluster

Adding EKS Clusters to ArgoCD

The process is similar to other clusters, but with AWS-specific authentication:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Add EKS cluster to your kubeconfig first
aws eks update-kubeconfig \
  --region ap-south-1 \
  --name my-production-cluster

# Verify the cluster is accessible
kubectl get nodes

# Add to ArgoCD
argocd cluster add arn:aws:eks:ap-south-1:123456789012:cluster/my-production-cluster

Important: The cluster name in your kubeconfig will be an ARN format:

1
arn:aws:eks:ap-south-1:123456789012:cluster/my-production-cluster

But ArgoCD stores it by the API server endpoint.

IAM and RBAC Considerations

EKS uses AWS IAM for cluster authentication. When ArgoCD connects:

  1. ArgoCD creates a ServiceAccount in the EKS cluster
  2. That ServiceAccount needs proper Kubernetes RBAC permissions
  3. The IAM role/user running ArgoCD needs eks:DescribeCluster permission
  4. For cross-account EKS clusters, you need to set up IAM role assumption

Best practice: Create a dedicated IAM role for ArgoCD with minimal permissions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "eks:DescribeCluster",
        "eks:ListClusters"
      ],
      "Resource": "*"
    }
  ]
}

EKS Storage Classes

EKS comes with AWS EBS-backed storage classes:

1
2
3
4
5
6
7
# Check available storage classes
kubectl get storageclass

# Common EKS storage classes:
# - gp2 (General Purpose SSD)
# - gp3 (Newer, more cost-effective)
# - io1 (Provisioned IOPS)

When migrating from DO to EKS, your PersistentVolumeClaims may need updating:

1
2
3
4
5
# DigitalOcean
storageClassName: do-block-storage

# AWS EKS
storageClassName: gp3

Pro tip: Use Kustomize overlays to handle cloud-specific storage classes automatically.

Real-World Multi-Cloud Deployment Example

Here’s how we deploy the same application across all three environments:

E2E (Testing):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
argocd proj create e2e-myapp-dev-apps \
  -d https://116.204.172.18:6443,myapp-dev-apps \
  -s https://gitlab.com/company/gitops/manifests.git

argocd app create e2e-myapp-dev-apps.backend.dev \
  --repo https://gitlab.com/company/gitops/manifests.git \
  --path manifests/myapp/backend \
  --revision dev \
  --dest-server https://116.204.172.18:6443 \
  --dest-namespace myapp-dev-apps \
  --sync-policy automated \
  --self-heal \
  --project e2e-myapp-dev-apps

DigitalOcean (Staging):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
argocd proj create do-myapp-staging-apps \
  -d https://b4fd5ff3-7743-4d58-aa2a-044ec115c2d1.k8s.ondigitalocean.com,myapp-staging-apps \
  -s https://gitlab.com/company/gitops/manifests.git

argocd app create do-myapp-staging-apps.backend.staging \
  --repo https://gitlab.com/company/gitops/manifests.git \
  --path manifests/myapp/backend \
  --revision staging \
  --dest-server https://b4fd5ff3-7743-4d58-aa2a-044ec115c2d1.k8s.ondigitalocean.com \
  --dest-namespace myapp-staging-apps \
  --sync-policy automated \
  --self-heal \
  --project do-myapp-staging-apps

AWS EKS (Production):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
argocd proj create aws-myapp-prod-apps \
  -d https://3CCD9E5C2236A7E09F13EC4EB6A6B183.gr7.ap-south-1.eks.amazonaws.com,myapp-prod-apps \
  -s https://gitlab.com/company/gitops/manifests.git

argocd app create aws-myapp-prod-apps.backend.prod \
  --repo https://gitlab.com/company/gitops/manifests.git \
  --path manifests/myapp/backend \
  --revision prod \
  --dest-server https://3CCD9E5C2236A7E09F13EC4EB6A6B183.gr7.ap-south-1.eks.amazonaws.com \
  --dest-namespace myapp-prod-apps \
  --sync-policy automated \
  --self-heal \
  --project aws-myapp-prod-apps

Notice:

  • Same Git repository across all clouds
  • Same path to manifests
  • Different Git branches (dev/staging/prod) for environment-specific configs
  • Different cluster endpoints and project prefixes
  • Consistent naming pattern makes everything predictable

The Multi-Cloud Migration Pattern

When you need to move from one cloud to another (say, from DO to AWS EKS), the pattern is:

Step 1: Deploy in Parallel

1
2
3
4
5
# Existing DO deployment (keep running)
do-myapp-prod-apps.backend.prod → DigitalOcean

# New EKS deployment (deploy alongside)
aws-myapp-prod-apps.backend.prod → AWS EKS

Step 2: Validate AWS EKS Deployment

  • Test all functionality
  • Verify database connections (might need security group updates)
  • Check AWS-specific integrations (IAM roles, S3 access, RDS connections)
  • Load test to ensure EKS node groups can handle traffic

Step 3: DNS Cutover

1
2
# Update DNS from DO LoadBalancer to AWS ELB/ALB
# Or use weighted routing for gradual cutover

Step 4: Monitor

  • CloudWatch for EKS metrics
  • Application logs in CloudWatch Logs or your logging solution
  • Cost monitoring (EKS + node groups + EBS volumes)

Step 5: Document and Comment

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# In do/myapp.sh - comment out what migrated to AWS
#Migrated to AWS EKS prod cluster (ap-south-1)
# argocd app create do-myapp-prod-apps.backend.prod \
# --repo https://gitlab.com/company/gitops/manifests.git \
# --path manifests/myapp/backend \
# --revision prod \
# --dest-server https://do-cluster.com \
# --dest-namespace myapp-prod-apps \
# --sync-policy automated \
# --self-heal \
# --project do-myapp-prod-apps

Cloud-Specific Gotchas

AWS EKS Gotcha 1: IAM Authentication Token Expiry

Problem: EKS uses AWS IAM for authentication. Tokens expire, causing ArgoCD to lose cluster access.

Symptom: ArgoCD shows cluster as “Unknown” or sync fails with authentication errors.

Solution: Ensure ArgoCD has valid AWS credentials that refresh automatically. Use IRSA (IAM Roles for Service Accounts) if ArgoCD runs in EKS:

1
2
3
4
5
6
7
8
# ArgoCD ServiceAccount with IRSA annotation
apiVersion: v1
kind: ServiceAccount
metadata:
  name: argocd-application-controller
  namespace: argocd
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/argocd-cluster-manager

AWS EKS Gotcha 2: VPC and Security Groups

Problem: EKS clusters are VPC-isolated. Applications can’t reach databases or services in other VPCs.

Solution:

  • VPC Peering between EKS VPC and database VPC
  • AWS PrivateLink for managed services
  • Security group rules allowing EKS node group to reach RDS, ElastiCache, etc.

Validate connectivity before migration:

1
2
3
4
5
# Deploy a test pod in EKS
kubectl run test-connectivity --image=curlimages/curl -it --rm -- sh

# Inside pod, test connectivity
curl http://internal-database.vpc.local:5432

AWS EKS Gotcha 3: LoadBalancer Annotations

Problem: DigitalOcean and EKS use different LoadBalancer implementations.

DigitalOcean LoadBalancer:

1
2
3
4
5
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/do-loadbalancer-protocol: "http"

AWS EKS LoadBalancer (ALB):

1
2
3
4
5
6
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"

Solution: Use Kustomize overlays to apply cloud-specific annotations:

1
2
3
4
5
6
7
8
9
# kustomization.yaml for AWS
patches:
  - target:
      kind: Service
      name: myapp-backend
    patch: |-
      - op: add
        path: /metadata/annotations/service.beta.kubernetes.io~1aws-load-balancer-type
        value: nlb

AWS EKS Gotcha 4: Node Group Scaling

Problem: EKS node groups have different scaling characteristics than DO node pools.

Solution: Configure Cluster Autoscaler for EKS:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Cluster Autoscaler deployment for EKS
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - name: cluster-autoscaler
        image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0
        command:
          - ./cluster-autoscaler
          - --cloud-provider=aws
          - --namespace=kube-system
          - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster-name

Multi-Cloud Cost Optimization Tips

Tip 1: Right-Cloud for Right-Workload

  • Dev/Test: Use DigitalOcean (cheaper, simpler)
  • Production with AWS services: Use EKS (tight integration with RDS, S3, etc.)
  • Batch jobs: Use Fargate on EKS (pay only when running)

Tip 2: Cross-Cloud Resource Sharing

  • Keep ArgoCD in one cluster, manage all others
  • Centralized logging/monitoring (avoid per-cloud solutions)
  • Shared Git repository (one source of truth)

Tip 3: Reserved Capacity

  • DigitalOcean doesn’t have reserved pricing - pay-as-you-go
  • AWS EKS: Use Savings Plans for nodes, Reserved Instances for steady-state workloads
  • Compare costs quarterly and rebalance

Tip 4: Data Transfer Costs

  • Keep databases in same cloud as applications (cross-cloud data transfer is expensive)
  • Use CloudFront/CDN for serving static assets
  • Minimize cross-region traffic

The Multi-Cloud Directory Structure

Update your project structure to reflect multi-cloud reality:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
argocd/
├── scripts/
   ├── 01_argocd_installation.sh
   ├── 02_argocd_cli_installation.sh
   └── ...
├── project_commands/
   ├── e2e/          # E2E cluster deployments
      ├── myapp.sh
      ├── project-a.sh
      └── project-b.sh
   ├── do/           # DigitalOcean cluster deployments
      ├── myapp.sh
      ├── project-a.sh
      └── project-b.sh
   └── aws/          # AWS EKS cluster deployments
       ├── myapp.sh
       ├── project-a.sh
       └── project-b.sh
└── cloud-specific/
    ├── aws/
       ├── iam-policies/
       ├── security-groups.yaml
       └── load-balancer-configs/
    └── do/
        └── load-balancer-configs/

The pattern:

  • Separate directories per cloud provider
  • Same application structure across all clouds
  • Cloud-specific configurations in dedicated folders
  • Easy to compare: diff project_commands/do/myapp.sh project_commands/aws/myapp.sh

When to Use Which Cloud?

After managing multi-cloud for months, here’s our decision matrix:

Workload TypeRecommended CloudReason
Development environmentsDigitalOceanCost-effective, simple
Staging with low trafficDigitalOceanGood balance of features/cost
Production (AWS-heavy stack)AWS EKSNative integration with RDS, S3, IAM
Production (cloud-agnostic)DigitalOceanSimpler, cheaper
Compliance-required workloadsAWS EKSBetter compliance certifications
Microservices with high scalingAWS EKSBetter autoscaling, Fargate option
Simple stateless appsDigitalOceanDon’t overpay for features you won’t use

The beauty of ArgoCD? The decision can change, and your deployment process stays the same.

Migration Strategy: The Parallel Deployment Approach

Here’s the secret to zero-downtime migrations: Don’t migrate. Replicate, verify, then switch.

Step 1: Deploy to New Cluster in Parallel

Keep the old cluster running. Deploy the same applications to the new cluster using ArgoCD:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# Original deployment (old cluster)
argocd app create e2e-analytics-staging-apps.analyticsui.staging \
  --repo https://gitlab.com/company/gitops/manifests.git \
  --path manifests/analytics-platform/analyticsui \
  --revision staging \
  --dest-server https://116.204.172.18:6443 \
  --dest-namespace analytics-staging-apps \
  --sync-policy automated \
  --self-heal \
  --project e2e-analytics-staging-apps

# New deployment (new cluster)
argocd app create do-analytics-staging-apps.analyticsui.staging \
  --repo https://gitlab.com/company/gitops/manifests.git \
  --path manifests/analytics-platform/analyticsui \
  --revision staging \
  --dest-server https://b4fd5ff3-7743-4d58-aa2a-044ec115c2d1.k8s.ondigitalocean.com \
  --dest-namespace analytics-staging-apps \
  --sync-policy automated \
  --self-heal \
  --project do-analytics-staging-apps

Notice:

  • Same repository
  • Same path
  • Same revision
  • Different destination cluster
  • Different project name (prefixed with cluster identifier: e2e- vs do-)

Step 2: Verify Everything Works

Run your test suite against the new cluster. Check:

  • Application health in ArgoCD dashboard
  • Database connections
  • External API integrations
  • Monitoring and logging
  • SSL certificates
  • Ingress rules

Step 3: DNS Cutover

Update your DNS to point to the new cluster. Traffic shifts. Old cluster sits idle but ready.

Step 4: Monitor and Validate

Watch metrics, logs, error rates. Everything looking good?

Step 5: Document the Migration

Here’s a trick that saved us countless hours: Comment, don’t delete.

In the old cluster’s deployment script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#Migrated to smtc-prod cluster
# analytics-staging

# argocd proj create e2e-analytics-staging-apps -d https://116.204.172.18:6443,analytics-staging-apps -s https://gitlab.com/company/gitops/manifests.git

# argocd app create e2e-analytics-staging-apps.analyticsui.staging \
# --repo https://gitlab.com/company/gitops/manifests.git \
# --path manifests/analytics-platform/analyticsui \
# --revision staging \
# --dest-server https://116.204.172.18:6443 \
# --dest-namespace analytics-staging-apps \
# --sync-policy automated \
# --self-heal \
# --project e2e-analytics-staging-apps

Why comment instead of delete?

  1. Historical record: You know what was deployed and where
  2. Quick rollback: Uncomment and redeploy if needed
  3. Documentation: New team members see the migration history
  4. Audit trail: Compliance teams love this

The Naming Convention That Saves Lives

Notice the naming pattern in the scripts?

Format: <cluster-prefix>-<project>-<environment>-apps.<service>.<environment>

Examples:

  • e2e-analytics-dev-apps.analyticsui.dev - E2E cluster, analytics project, dev environment
  • do-analytics-staging-apps.analyticsui.staging - DigitalOcean cluster, analytics project, staging
  • do-analytics-prod-apps.analyticsui.prod - DigitalOcean cluster, analytics project, production

Why this matters:

  1. No naming collisions: You can have the same app in multiple clusters
  2. Clear ownership: You know which cluster at a glance
  3. Easy filtering: argocd app list | grep "do-" shows only DO clusters
  4. Scripting friendly: Parse application names programmatically

Tips and Tricks from the Trenches

Trick 1: The Service-Specific Comment Pattern

Some services run on dedicated infrastructure. Document it:

1
2
3
4
5
6
7
8
9
# Below is deployed in analytics team's server
# argocd app create do-analytics-dev-apps.analyticsml.dev \
# --repo https://gitlab.com/company/gitops/manifests.git \
# --path manifests/analytics-platform/analyticsml \
# --revision dev \
# --dest-server https://cluster-url.com \
# --dest-namespace analytics-dev-apps \
# --sync-policy manual \
# --project do-analytics-dev-apps

Why?

  • Prevents accidental “fixes” from well-meaning teammates
  • Documents architectural decisions
  • Explains why something is commented out

Trick 2: Environment-Specific Sync Policies

Notice the pattern across environments:

Dev: Manual sync

1
--sync-policy manual

Staging: Automated sync

1
2
--sync-policy automated \
--self-heal

Production: Automated sync (after testing in staging proves it’s safe)

1
2
--sync-policy automated \
--self-heal

The philosophy:

  • Dev: Developers test manually, sync when ready
  • Staging: Auto-sync to catch integration issues early
  • Prod: Auto-sync because staging already validated it

Trick 3: The Migration Readiness Checklist

Before migrating any cluster, verify:

Infrastructure:

  • New cluster provisioned and accessible
  • Node sizes match or exceed old cluster
  • Storage classes configured
  • Network policies compatible
  • LoadBalancer/Ingress controller deployed

ArgoCD:

  • New cluster added to ArgoCD: argocd cluster add <context>
  • Git repositories connected
  • Projects created with correct permissions
  • RBAC configured

Applications:

  • ConfigMaps verified (check for cluster-specific values)
  • Secrets transferred (database credentials, API keys, certificates)
  • Persistent volumes migrated (if applicable)
  • Ingress DNS updated
  • SSL certificates provisioned

Validation:

  • Health checks passing
  • Logs flowing to monitoring system
  • Metrics being collected
  • Alerts configured
  • Smoke tests passed

Trick 4: The Progressive Migration Strategy

Don’t migrate everything at once. Migrate in waves:

Wave 1: Non-critical, stateless services

  • Read-only APIs
  • Documentation sites
  • Internal tools

Wave 2: Stateless services with dependencies

  • Microservices that call other services
  • Worker queues
  • Caching layers

Wave 3: Stateful services

  • Databases (with proper backup/restore)
  • File storage services
  • Session stores

Wave 4: Critical production workloads

  • Customer-facing APIs
  • Payment processing
  • Authentication services

Each wave proves the migration strategy before risking more critical services.

Trick 5: The Common Manifests Pattern

Look closely at the deployment scripts - there’s always a “common manifests” application:

1
2
3
4
5
6
7
8
argocd app create do-analytics-dev-apps.analytics-dev-common-manifests.dev \
  --repo https://gitlab.com/company/gitops/manifests.git \
  --path manifests/analytics-platform/analytics-dev-common-manifests \
  --revision dev \
  --dest-server https://cluster.com \
  --dest-namespace analytics-dev-apps \
  --sync-policy manual \
  --project do-analytics-dev-apps

What goes in common manifests?

  • ConfigMaps used by multiple services
  • Shared secrets
  • Network policies
  • Resource quotas
  • Service meshes configurations
  • Monitoring agent configs

Pro tip: Deploy common manifests FIRST, then deploy applications. Otherwise, applications fail with “ConfigMap not found” errors.

Trick 6: The Multi-Cluster Deployment Script Template

The CSV-driven template approach is migration gold:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#!/bin/bash
# template-argo-create.sh

cat $1 | while read -r record; do
  export PROJ=$(echo $record | awk -F ',' '{print $2}')
  export NAMESPACE=$(echo $record | awk -F ',' '{print $3}')
  export APP=$(echo $record | awk -F ',' '{print $4}')
  export PATH=$(echo $record | awk -F ',' '{print $5}')
  export BRANCH=$(echo $record | awk -F ',' '{print $6}')
  export REPO=$(echo $record | awk -F ',' '{print $7}')
  export CLUSTER=$(echo $record | awk -F ',' '{print $8}' | tr -d '\r')

  kubectl create ns $NAMESPACE
  argocd proj create $PROJ -d $CLUSTER,$NAMESPACE -s $REPO

  argocd app create $APP \
    --repo $REPO \
    --path $PATH \
    --revision $BRANCH \
    --dest-server $CLUSTER \
    --dest-namespace $NAMESPACE \
    --sync-policy automated \
    --self-heal \
    --project $PROJ
done

CSV file (template-argo-create.csv):

1
2
Sl,PROJ,NAMESPACE,APP,PATH,BRANCH,REPO,CLUSTER
1,e2e-phoenix-dev-apps,phoenix-dev-apps,e2e-phoenix-dev-apps.phoenix-srv-dev,manifests/acme-org/client-projects/phoenix/services/phoenix-srv,dev,https://gitlab.com/company/manifests.git,https://116.204.172.18:6443

Migration magic: To migrate to a new cluster, just update the CLUSTER column in the CSV and rerun the script!

Even better: Maintain separate CSV files:

  • e2e-cluster-apps.csv - E2E cluster applications
  • do-dev-cluster-apps.csv - DO dev cluster applications
  • do-staging-cluster-apps.csv - DO staging cluster applications
  • do-prod-cluster-apps.csv - DO production cluster applications

Now migrations are literally:

1
./template-argo-create.sh do-prod-cluster-apps.csv

One command. Entire environment deployed to new cluster.

The Directory Structure That Scales

Notice the project structure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
argocd/
├── scripts/
│   ├── 01_argocd_installation.sh
│   ├── 02_argocd_cli_installation.sh
│   ├── 03_argocd_cli_login.sh
│   ├── 04_argocd_cli_cluster_addition.sh
│   ├── 05_argocd_cli_repo_addition.sh
│   ├── 06_argocd_cli_proj_addition.sh
│   ├── 07_argocd_cli_app_addition.sh
│   ├── 08_argocd_cli_app_updation.sh
│   ├── 09_argocd_cli_app_sync.sh
│   ├── 10_resource_limits_patch_kustomization.sh
│   └── 11_hpa.sh
├── project_commands/
│   ├── e2e/          # E2E cluster deployments
│   │   ├── analytics.sh
│   │   ├── auth.sh
│   │   ├── compliance-app.sh
│   │   └── template-argo-create.sh
│   └── do/           # DigitalOcean cluster deployments
│       ├── analytics.sh
│       ├── auth.sh
│       ├── compliance-app.sh
│       └── beta-app.sh

The pattern:

  • Generic scripts in scripts/
  • Cluster-specific deployments in project_commands/<cluster-type>/
  • Each project gets its own script with all environments (dev, staging, prod)

Migration workflow:

  1. Test deployment in E2E: ./project_commands/e2e/analytics.sh
  2. Migrate to DO staging: ./project_commands/do/analytics.sh (commented sections show what migrated)
  3. Promote to DO prod: Already in the same script, just different cluster endpoint

Real-World Migration Gotchas

Gotcha 1: The LoadBalancer IP Change

Problem: Old cluster has LoadBalancer with IP 1.2.3.4. New cluster assigns 5.6.7.8.

Solution: Update DNS before testing, or use ExternalDNS to automate this.

Gotcha 2: The Persistent Volume Data

Problem: StatefulSets have data in PersistentVolumes. Can’t just redeploy.

Solution:

  1. Backup data using Velero or custom backup scripts
  2. Deploy application in new cluster
  3. Restore data to new PVs
  4. Validate data integrity
  5. Switch traffic

Gotcha 3: The Secret Drift

Problem: Secrets in old cluster were manually updated. New cluster uses stale secrets from Git.

Solution: This is why GitOps is non-negotiable. If it’s not in Git, it doesn’t exist. Before migration, audit all secrets and commit them to Git (using sealed-secrets, SOPS, or Vault).

Gotcha 4: The Namespace Collision

Problem: Multiple applications using the same namespace name in different clusters causes ArgoCD confusion.

Solution: Namespace naming convention: <project>-<environment>-apps

  • analytics-dev-apps
  • analytics-staging-apps
  • analytics-prod-apps

Clear, consistent, collision-free.

Gotcha 5: The “It Works on My Machine” Cluster

Problem: Application works in E2E but fails in production cluster.

Solution: Cluster parity checks:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Compare Kubernetes versions
kubectl version --short

# Compare storage classes
kubectl get storageclass

# Compare node resources
kubectl top nodes

# Compare installed CRDs
kubectl get crd

Differences here = migration surprises later.

The Post-Migration Cleanup

After successful migration, don’t just leave the old cluster running forever:

Week 1: Monitor new cluster closely, keep old cluster as hot standby Week 2: If stable, stop old cluster but keep resources (for quick rollback) Week 3: If still stable, document the migration and archive old cluster configs Month 2: Decommission old cluster, celebrate migration success

Document everything:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Create a migration log
cat > MIGRATION-LOG.md << EOF
# Cluster Migration: E2E to DigitalOcean Staging

## Date: 2024-10-15

## Applications Migrated:
- analyticsui.staging
- analyticsapp.staging
- staging-common-manifests

## Old Cluster: https://116.204.172.18:6443
## New Cluster: https://b4fd5ff3-7743-4d58-aa2a-044ec115c2d1.k8s.ondigitalocean.com

## Issues Encountered:
1. LoadBalancer IP changed - Updated DNS
2. PVC had to be migrated manually - Used Velero

## Rollback Procedure:
1. Uncomment old cluster configs in project_commands/e2e/analytics.sh
2. Run: ./project_commands/e2e/analytics.sh
3. Update DNS back to old cluster IP

## Validation Checklist:
- [x] All pods healthy
- [x] Database connections working
- [x] SSL certificates valid
- [x] Monitoring dashboards updated
- [x] On-call team notified

## Sign-off:
- DevOps Lead: ✓
- Platform Team: ✓
- Security Team: ✓
EOF

This document becomes invaluable for:

  • Future migrations (you’ll remember what worked)
  • Incident response (rollback procedure is documented)
  • Knowledge transfer (new team members learn from it)
  • Compliance (audit trail exists)

What’s Next?

You’ve installed ArgoCD. You’ve configured applications. You’ve automated resource management. You’ve even learned how to migrate clusters like a pro. But the journey doesn’t end here.

Progressive Delivery

Explore ArgoCD Rollouts for:

  • Blue/green deployments
  • Canary releases with automated analysis
  • Traffic splitting with Istio/Linkerd

ApplicationSets

Manage hundreds of applications with generators:

  • Git generator (create apps from repository structure)
  • Cluster generator (deploy to all clusters automatically)
  • Matrix generator (combine multiple generators)

Notifications

Get alerted when deployments fail:

  • Slack notifications
  • Email alerts
  • Webhook integrations

GitOps Beyond Kubernetes

Use ArgoCD with:

  • Terraform (GitOps for infrastructure)
  • Crossplane (Kubernetes-native infrastructure)
  • Helm charts with values overrides

Final Thoughts

ArgoCD isn’t just a deployment tool. It’s a philosophy shift.

Before ArgoCD:

  • Deployments were tribal knowledge
  • “What’s running in production?” was a hard question
  • Rollbacks meant panic and manual intervention
  • Drift between environments was inevitable

After ArgoCD:

  • Git is the single source of truth
  • The UI shows exactly what’s deployed, everywhere
  • Rollbacks are a git revert and a sync
  • Drift is immediately visible and auto-corrected

Is it more complex than kubectl apply -f? Initially, yes.

Is it worth it? Ask yourself three years from now when you’re managing 100+ microservices across 5 clusters and deployments “just work.”

That’s when you’ll know the journey was worth it.


Resources and References

Official Documentation

Tools Used in This Guide

  • Kubernetes 1.24+
  • ArgoCD stable release
  • GitLab (or GitHub/Bitbucket)
  • Kustomize for manifest patching

Community Resources


Kudos to the Mavericks at the DevOps Den. Proud of you all.

Built with determination, automated with scripts, deployed with confidence - powered by GitOps and ArgoCD.

More from me on www.uk4.in.