The Secret Life of Kubernetes: A Tale of Two Clusters and the Quest for Secure Secrets
The Problem We All Face (But Pretend We Don’t)
Picture this: It’s 10 PM in the DevOps Den. Laptop screens glow under table lamps, and half the team is connected via Teams from home. We’re staring at Kubernetes clusters and YAML files scattered across multiple screens. We’ve been using ConfigMaps and Kubernetes Secrets for months now, but the workflow is painful—manually collecting credentials from developers, base64-encoding them, hardcoding values into YAML files, applying them to clusters. Rinse, repeat. Every new application, every credential rotation, every new environment means more manual work.
The truth? ConfigMaps and Secrets work, but they don’t scale. Secrets scattered across clusters, no centralized management, no rotation strategy, no audit trail, and definitely no peace of mind when security audit emails land in our inbox. There had to be a better way—something simpler, more scalable, more secure.
This is the story of how our team built a production-grade secret management system using HashiCorp Vault’s sidecar injection pattern. What started as a learning exercise with dev mode evolved into a full-blown production HA Vault setup with multi-cluster authentication, cross-cluster secret injection, and enough YAML files to make even the most seasoned DevOps engineer’s eye twitch.
Grab your coffee (or your beverage of choice - no judgment here), because this is going to be a ride through two DigitalOcean Kubernetes clusters, production Vault deployments, late nights in the DevOps Den, and the moment we realized that unsealing Vault is actually kind of nerve-wracking.
Act I: The Evolution from Dev to Production
The Humble Beginning: Dev Mode
Every great journey starts with a simple step. Before tackling production, we started with dev mode to understand the basics.
| |
Dev mode is beautiful in its simplicity:
- Pre-unsealed (no unseal key juggling)
- In-memory storage (ephemeral - restart and poof, it’s gone)
- Single node (no HA complexity)
- Root token available immediately
It’s perfect for learning, terrible for production, and absolutely not something you want to run when real secrets are at stake.
We spent a day testing secret injection, authentication, and policies. Impressive, but clearly just a warm-up.
The “Oh Wait, This Needs to Be Real” Moment
Two days later, the urgency hit. “We need this in production. Like, yesterday.”
Time to do it right. Production means HA, persistence, proper unsealing. No shortcuts.
Option 1: Production Single Node
| |
This gives you a proper Vault installation with persistent storage, but it’s still a single point of failure. Better, but not quite there.
Option 2: Production HA with Raft (The One I Chose… Eventually)
| |
Now we’re talking. Two Vault pods running in HA mode using the Raft consensus protocol. If one goes down, the other keeps serving secrets. This is what production looks like.
But wait. Let me tell you about the Raft rabbit hole we fell into.
The Great Raft Documentation Hunt (Or: Where Is Everything?)
Here’s the thing about Raft in Vault: it exists, it’s well-documented somewhere, but finding that documentation when you need it is like finding a specific grain of sand on a beach.
We spent hours Googling, reading docs, trying commands. The information exists but is scattered across multiple sources.
The official HashiCorp docs have pages on:
- Integrated Storage (Raft) configuration
- Raft internals
- Kubernetes deployment guide
- HA with Raft examples
But here’s what they don’t prominently tell you (or at least, not in a way that jumps out when you’re frantically Googling at midnight):
The vault-1 won’t automatically join vault-0. You need to manually run
vault operator raft joinon each follower node. The Helm chart creates the StatefulSet, but the clustering? That’s on you.The internal service name matters. It’s
vault-0.vault-internal:8200, notvault-0:8200, notvault:8200, notvault-0.vault.svc.cluster.local:8200. Get it wrong, and you’ll see cryptic “failed to join raft cluster” errors.Each node needs to be unsealed individually. Seal one? Sealed. Seal two? Also sealed. Pod restarts? Everything’s sealed. Node failure? You’re unsealing again. Hope you saved those keys!
The order matters. Initialize vault-0, unseal it, then join vault-1, then unseal vault-1. Do it out of order and you’ll get errors that make you question your life choices.
GitHub is full of issues about this exact pain point:
We spent hours piecing together information from:
- Three different HashiCorp tutorials (each covering one piece of the puzzle)
- GitHub issues (where the real truth lives)
- Random blog posts from people who’ve been there
- The Helm chart’s
values.yamlcomments (surprisingly informative, actually)
The information exists, but it’s scattered like puzzle pieces across the internet. No single “Raft in Kubernetes: Here’s Everything You Need to Know” guide existed.
The moment it clicked: Late evening in the DevOps Den, someone looked up from their laptop. “Wait… Raft is a consensus algorithm, not magic clustering?”
Exactly. The Helm chart creates the infrastructure, but you create the cluster by explicitly telling each node to join. The mental model finally made sense.
So yes, we now have two Vault pods running in HA mode using Raft. But getting here required archaeological-level documentation excavation.
The Initialization Ritual
When you install Vault in production mode, it starts sealed. Think of it as a safe that needs to be unlocked before you can use it. This is a one-time operation—miss this, and you’re locked out forever.
| |
CRITICAL WARNING: In a real production environment, you’d use 5 key shares with a threshold of 3, and distribute them to different trusted individuals. You’d also enable auto-unseal using a cloud KMS. But for this learning journey, we’re keeping it simple with one key.
That keys.json file? Treat it like the nuclear launch codes. Seriously.
Unsealing the Vault (Literally)
| |
And just like that, you have a two-node HA Vault cluster. Both nodes are part of the Raft cluster, one is the leader, and they’re ready to serve secrets.
Making Vault Accessible
I set up two ways to access Vault:
1. NodePort Service (For Direct IP Access)
| |
Now I can access Vault at any node’s IP on port 32000. In my case: http://64.227.181.21:32000
2. Ingress (For Fancy DNS Names)
| |
Much better: https://vault2.skill-mine.com
Quick test to make sure it’s working:
| |
If you see "sealed": false, you’re in business.
Act II: Setting Up the Secret Architecture
Enabling Authentication
Jump into vault-0 and let’s configure things:
| |
The Secret Hierarchy
Instead of just dumping secrets anywhere, I designed a hierarchical structure:
| |
The path structure smtc/project1/subproject1/env01 means:
smtc- The root path for all project secretsproject1- Specific projectsubproject1- Subproject or microserviceenv01- Environment (dev, staging, prod, etc.)
This scales beautifully as your organization grows.
Configuring Kubernetes Authentication
This is where we teach Vault to trust our Kubernetes cluster:
| |
Creating Policies (The Gatekeeper)
Policies are how Vault controls access. Think of them as fine-grained permissions:
| |
Notice the /data/ in the path? That’s a KV-v2 quirk. The actual API path has /data/ inserted between the mount path and your secret path.
Creating Vault Roles
Roles bind Kubernetes service accounts to Vault policies:
| |
What this says: “If a pod uses the service account named vault in the namespace vault, allow it to authenticate and apply the smtc-policy.”
We’ll also create a role for another namespace that we’ll use later:
| |
Act III: The Magic of Sidecar Injection
Deploying a Test Application
Let’s deploy a simple nginx application:
| |
Nothing special - just a plain nginx pod using the vault service account.
| |
The First Patch: Basic Injection
Now let’s inject secrets with a simple annotation patch:
| |
| |
What happens?
- The Vault Agent Injector (a mutating webhook) intercepts the pod creation
- It injects an init container that authenticates to Vault and fetches secrets
- It injects a sidecar container that keeps the secrets up to date
- It mounts the secrets at
/vault/secrets/database-config.txt
The secrets are written in raw format. Let’s check:
| |
You’ll see something like:
| |
That’s… not ideal for most applications.
The Second Patch: Template Magic
Let’s make it actually useful with templates:
| |
| |
Now check the secret:
| |
Output:
| |
Beautiful! A ready-to-use PostgreSQL connection string. Your application just reads a file and connects - it never needs to know about Vault.
Act IV: The Multi-Cluster Plot Twist
Here’s where things get spicy. I have two DigitalOcean Kubernetes clusters:
- Cluster 1 (do-blr1-testk8s-1): The Vault server cluster
- Cluster 2 (do-blr1-testk8s-2): The client cluster that needs secrets
The challenge: How do pods in Cluster 2 authenticate to Vault in Cluster 1 and fetch secrets?
On the Vault Server (Cluster 1)
First, we need to create a separate authentication backend for the remote cluster:
| |
Why a separate path? Because each Kubernetes cluster has its own CA certificate and API server. We need to configure them separately.
On the Client Cluster (Cluster 2)
Switch your kubectl context to the second cluster and install the Vault Agent Injector:
| |
This installs only the Vault Agent Injector webhook - not the Vault server itself. The injector will configure pods to connect to the external Vault.
Creating the Service Endpoint
We need to tell Kubernetes about the external Vault server. There are two approaches:
Approach 1: Service with Endpoints (For IP Addresses)
| |
Approach 2: ExternalName Service (For DNS Names)
| |
I used both in my testing. ExternalName is cleaner if you have DNS set up.
| |
The Service Account Token Dance
In Kubernetes 1.24+, service account tokens are no longer automatically created as secrets. We need to explicitly request them:
| |
| |
Extracting Authentication Credentials
Now we extract the credentials that Vault needs to validate tokens from this cluster:
| |
Generating the Vault Configuration Command
Create a script with the configuration command:
| |
IMPORTANT: Copy the contents of 7a-vault-auth and run it inside the Vault server in Cluster 1.
Back on the Vault Server (Cluster 1)
Switch back to your Cluster 1 context and configure the remote cluster authentication:
| |
Notice we’re using the same smtc-policy - no need to duplicate policies.
Back on the Client Cluster (Cluster 2): Deploy and Test
Now for the moment of truth:
| |
Key points:
vault.hashicorp.com/auth-path: 'auth/remote01-cluster'- This tells the Vault Agent to authenticate using our custom auth pathvault.hashicorp.com/role: 'vault-smtc-role'- The role we created earlier
| |
Checking if It Works
| |
If you see “authentication successful” in the logs, congratulations! You’ve just set up cross-cluster Vault authentication.
Scaling to Multiple Namespaces
Want to use this in multiple namespaces? Just create more roles:
| |
Then deploy your application to that namespace, and it just works.
Act V: Manual Testing and Debugging
Testing Authentication Manually
Sometimes you need to understand exactly what’s happening. Let’s manually authenticate from inside a pod:
| |
This returns a Vault token in the response. You can use that token to manually fetch secrets:
| |
Creating Manual Tokens (For Troubleshooting)
Sometimes you need a token for debugging:
| |
The Architecture: Putting It All Together
Let me paint the complete picture:
| |
Key Learnings and “Aha!” Moments
1. Dev Mode is a Trap (A Comfortable One)
Dev mode is so easy that you’ll be tempted to use it everywhere. Don’t. I learned this the hard way when my dev Vault restarted and all my test secrets vanished into the ether. Dev mode uses in-memory storage - no persistence.
2. Unsealing is Serious Business
In production, you’ll have 5 unseal keys split among trusted people. If Vault restarts, you need 3 of those 5 keys to unseal it. This isn’t paranoia - it’s security. But for convenience in non-critical environments, use cloud auto-unseal with AWS KMS, Azure Key Vault, or GCP KMS.
3. The /data/ Path Gotcha
With KV-v2, the path in your policy must include /data/:
- Secret path:
smtc/project1/subproject1/env01 - Policy path:
smtc/data/project1/subproject1/env01 - API path:
v1/smtc/data/project1/subproject1/env01
I spent an embarrassing amount of time troubleshooting “permission denied” errors before I figured this out.
4. Templates Are Your Best Friend
Don’t just inject raw JSON. Use templates to format secrets exactly as your application expects:
| |
5. Multi-Cluster Auth Paths
Each Kubernetes cluster needs its own auth path because:
- Different CA certificates
- Different API servers
- Different service account token issuers
Don’t try to share auth paths between clusters. It won’t work and you’ll waste hours debugging.
6. Service Account Tokens Changed in K8s 1.24+
If you’re wondering why service accounts don’t automatically create secrets anymore, it’s because of the BoundServiceAccountTokenVolume feature. Now you need to explicitly create token secrets:
| |
7. NodePort + Ingress = Best of Both Worlds
I used both:
- NodePort for direct access during troubleshooting
- Ingress for clean DNS names in production
Having both options saved me during debugging sessions.
8. Logging is Your Friend
When things go wrong (and they will), check these logs in order:
| |
9. Test Authentication Manually
Don’t trust the sidecar to work magically. Test the authentication flow manually first:
| |
If this fails, the sidecar will fail too.
10. Policies Are Finicky
A policy that doesn’t work:
| |
A policy that works:
| |
Notice the data in the path? That’s KV-v2’s way of saying “actual secret data” vs “metadata.”
Production Considerations (Or: Don’t Get Paged at 3 AM)
High Availability
My setup with 2 Raft nodes is the minimum for HA. For production:
- Use 3 or 5 nodes (odd numbers for Raft quorum)
- Spread nodes across availability zones
- Monitor Raft cluster health
- Have runbooks for node failures
Auto-Unseal
Manual unsealing doesn’t scale. Use cloud KMS:
| |
Backup Strategy
Vault data is precious. For Raft storage:
| |
Automate this. Schedule it. Test restores regularly.
Monitoring and Alerting
Key metrics to watch:
- Seal status (is Vault unsealed?)
- Raft cluster health (are all nodes active?)
- Authentication failures (someone trying something fishy?)
- Token expiration rates (are applications renewing properly?)
- Secret access patterns (unusual access patterns?)
Network Policies
Lock down network access:
| |
Only pods in namespaces with the vault-access label can reach Vault.
Audit Logging
Enable audit logging to track everything:
| |
Ship these logs to your centralized logging system (ELK, Splunk, etc.).
Secret Rotation
Secrets should rotate. Period. Implement a rotation strategy:
- Database credentials: Vault can generate dynamic credentials
- API keys: Rotate quarterly (or more frequently)
- Certificates: Use Vault’s PKI engine with automatic rotation
Resource Limits
The Vault Agent sidecar uses resources. Plan accordingly:
| |
For 100 pods, that’s 6.4 GB RAM just for Vault sidecars.
Troubleshooting Guide (Because You’ll Need It)
Problem: “Permission Denied” When Accessing Secrets
Symptoms: Vault Agent logs show permission denied
Solution:
Check your policy includes
/data/in the path:1 2 3path "smtc/data/project1/subproject1/env01" { capabilities = ["read"] }Verify the role has the correct policy:
1vault read auth/kubernetes/role/your-roleCheck if the service account matches:
1kubectl get pod <pod> -o yaml | grep serviceAccountName
Problem: “Authentication Failed”
Symptoms: Vault Agent init container fails with authentication error
Solution:
Verify Kubernetes auth is configured:
1vault read auth/kubernetes/configCheck the role exists:
1vault list auth/kubernetes/roleTest authentication manually (see earlier section)
Problem: Pods Stuck in Init
Symptoms: Pods stuck with vault-agent-init container running
Solution:
Check logs:
1kubectl logs <pod> -c vault-agent-initVerify Vault is accessible from the pod:
1kubectl exec <pod> -c vault-agent-init -- curl http://vault:8200/v1/sys/healthCheck the external Vault address:
1helm get values <vault-release> -n vault
Problem: Secrets Not Updating
Symptoms: Secrets in /vault/secrets/ are stale
Solution:
Check vault-agent sidecar logs:
1kubectl logs <pod> -c vault-agentVerify the template includes the update annotation:
1vault.hashicorp.com/agent-inject-status: "update"Check token TTL - might need renewal:
1vault read auth/kubernetes/role/your-role
Problem: “Vault is Sealed”
Symptoms: All Vault operations fail with “vault is sealed”
Solution:
Check seal status:
1kubectl exec vault-0 -- vault statusUnseal Vault:
1kubectl exec vault-0 -- vault operator unseal $VAULT_UNSEAL_KEYIf you have multiple nodes, unseal each:
1kubectl exec vault-1 -- vault operator unseal $VAULT_UNSEAL_KEY
Problem: Cross-Cluster Authentication Fails
Symptoms: Remote cluster pods can’t authenticate to Vault
Solution:
Verify the auth path is correct:
1vault.hashicorp.com/auth-path: 'auth/remote01-cluster'Check the remote auth backend exists:
1vault auth listVerify the remote cluster config:
1vault read auth/remote01-cluster/configTest with manual authentication (see earlier section)
What We’d Do Differently Next Time
1. Start with Production Setup
We spent too much time in dev mode. Starting over, we’d go straight to the HA production setup from day one. The unseal dance isn’t that complicated, and it teaches you the real workflow.
2. Use Auto-Unseal Immediately
Cloud KMS auto-unseal should be the default, not an afterthought. It’s easier, more secure, and saves you from the “where did we put the unseal keys?” panic—a lesson learned the hard way during a pod restart drill.
3. Document as You Go
We had to reverse-engineer our own setup multiple times. Write runbooks as you go. Your future self (and your team) will thank you.
4. Set Up Monitoring First
Don’t deploy to production without monitoring. Set up alerts for:
- Vault seal status
- Authentication failures
- Raft cluster health
- High token expiration rates
5. Test Disaster Recovery Early
Take snapshots, test restores, document the process. Don’t wait until you actually need to restore from backup.
6. Namespace Organization
I used vault namespace for everything. In hindsight, I should have:
vault-system- For Vault infrastructurevault-injector- For the webhook- Application namespaces - For actual workloads
7. Use Terraform for Vault Configuration
I did everything manually (vault write, vault policy write, etc.). For production, use Terraform:
| |
This makes configuration reproducible and version-controlled.
The Unexpected Benefits
1. Zero Code Changes
The most beautiful part? Applications don’t need any Vault-specific code. They just read files from /vault/secrets/. Want to migrate from ConfigMaps to Vault? Just change the annotations. The app code stays the same.
2. Audit Trail for Free
Every secret access is logged in Vault’s audit log. Who accessed what secret, when, from which pod. Security team loves this.
3. Secret Versioning
KV-v2 stores versions of secrets. Accidentally rotated a password and broke everything? Roll back:
| |
4. Dynamic Secrets
While I used static secrets in this journey, Vault can generate dynamic database credentials, AWS IAM credentials, and more. These expire automatically - no manual rotation needed.
5. Multi-Cloud Secrets
One Vault instance can serve multiple Kubernetes clusters across different cloud providers. Unified secret management across your entire infrastructure.
The Journey Ends (But the Road Continues)
What started as “I need to manage secrets in Kubernetes” turned into a deep dive spanning:
- Production HA Vault with Raft
- Multiple authentication backends
- Cross-cluster secret injection
- Template-based secret formatting
- Manual unsealing procedures
- Multi-cluster architecture
I’ve gone from hardcoding secrets (we’ve all been there) to a production-grade secret management system that:
- Scales across multiple clusters
- Provides audit logs
- Supports secret versioning
- Requires no application code changes
- Follows security best practices
Was it complex? Yes. Was it worth it? Absolutely.
The next time someone asks me “how do you manage secrets in Kubernetes?”, I can confidently answer: “Let me tell you about HashiCorp Vault…”
What’s Next?
This setup is just the foundation. Here’s what you can explore next:
Dynamic Database Credentials
Instead of static passwords, have Vault generate temporary credentials:
| |
Every pod gets its own database credentials that expire automatically.
PKI and Certificate Management
Use Vault as an internal Certificate Authority:
| |
Applications can request certificates on-demand, and Vault handles rotation.
Encryption as a Service
Use Vault’s Transit engine for application-level encryption:
| |
Your app sends plaintext to Vault, gets ciphertext back. No encryption keys to manage.
Multi-Region Vault
For global deployments, set up Vault replication:
- Performance replication (read replicas in multiple regions)
- Disaster recovery replication (failover clusters)
GitOps Integration
Combine Vault with ArgoCD or Flux for true GitOps:
- Store secret paths in Git (not the secrets themselves)
- Let the Vault Agent fetch actual values
- Change secrets without Git commits
Resources That Saved Us
These resources were invaluable during implementation:
- DevOps Cube - Vault in Kubernetes
- DevOps Cube - Vault Agent Injector Tutorial
- HashiCorp - Kubernetes Raft Deployment Guide
- HashiCorp - External Vault Tutorial
- Medium - Securely Inject Secrets with Vault Agent
- Medium - Introduction to Vault for Secret Management
Final Thoughts
Secret management isn’t glamorous. It doesn’t make for impressive demos. No one will ooh and ahh over your Vault setup at a conference.
But it’s critical. Every production outage caused by leaked credentials, every security breach from hardcoded passwords, every compliance audit failure - they all point back to poor secret management.
The sidecar injection pattern is elegant. Your application code stays clean. The Vault Agent handles all the complexity. Secrets are fetched securely, rotated automatically, and audited completely.
Is it more complex than hardcoding credentials or using ConfigMaps? Yes.
Is it worth it? Ask yourself this: What’s the cost of a security breach?
Your 3 AM self will thank you when:
- Credentials leak and you can rotate them in seconds
- Audit asks “who accessed production database passwords” and you have logs
- A pod is compromised but can only access its specific secrets
- Secrets rotate automatically and applications keep working
That’s when you’ll know the journey was worth it.
Kudos to the Mavericks at the DevOps Den. Proud of you all.
Built with curiosity, debugged with persistence, secured with Vault—fueled by late-night coffee in the DevOps Den.
Repository
All the configuration files, deployment manifests, patches, and scripts from this adventure are available in this repository:
1-deployment-*.yaml- Application deployments2-services-nodeport.yaml- NodePort service for Vault2-vault2-ingress.yaml- Ingress configuration3a-smtc-patch-inject-secrets.yaml- Basic injection patch3b-smtc-patch-inject-secrets-as-template01pg.yaml- Template-based injection4-external-vault-svc-endpoint.yaml- External Vault service endpoint4a-external-vault2-svc-externalname.yaml- ExternalName service5-vault-secret.yaml- Service account token secret6-deployment-nginx-remote01*.yaml- Remote cluster deploymentsREADME20240520.md- My working notes (with all the wrong turns included)
Feel free to use these as templates for your own Vault journey. Just remember: change the passwords. Seriously. Please.