Running Redis in production Kubernetes environments requires careful planning to ensure high availability, data persistence, and optimal performance. While a single Redis instance might work for development, production workloads demand a Redis Cluster that can handle failures gracefully, scale horizontally, and maintain data consistency—similar to how we set up high-availability PostgreSQL with operators.

In this comprehensive guide, we’ll walk through setting up a production-ready Redis Cluster on Kubernetes with high availability, covering everything from basic concepts to advanced configurations that you can deploy in your own cluster.

Prerequisites

Before we begin, ensure you have:

  • A running Kubernetes cluster (version 1.20 or later)
  • kubectl configured to communicate with your cluster
  • Basic understanding of Kubernetes concepts (Pods, Services, StatefulSets)
  • At least 6GB of available memory across your cluster nodes
  • Storage provisioner for PersistentVolumes (e.g., local-path, AWS EBS, GCP PD)

Understanding Redis Cluster Architecture

Why Redis Cluster?

Redis Cluster provides several advantages over standalone Redis, much like how MySQL Operator deployments enhance database reliability in Kubernetes:

  • Automatic Sharding: Distributes data across multiple nodes for scalability.
  • High Availability: Replicates data across nodes with automatic failover.
  • Scalability: Easily add or remove nodes to adjust capacity.
  • No Single Point of Failure: Decentralized architecture ensures resilience.

Redis Cluster Topology

A minimal production Redis Cluster requires:

  • 3 master nodes: Minimum for cluster quorum and automatic failover
  • 3 replica nodes: One replica per master for redundancy
  • Total: 6 nodes: The standard configuration for production

Each master handles approximately 5461 hash slots (16384 / 3), and each replica continuously synchronizes data from its master.

Setting Up Redis Cluster

Step 1: Create Namespace

First, create a dedicated namespace for Redis:

1
kubectl create namespace redis-cluster

Step 2: Create ConfigMap for Redis Configuration

Create a ConfigMap with Redis cluster configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# redis-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-cluster-config
namespace: redis-cluster
data:
redis.conf: |
port 6379
cluster-enabled yes
cluster-config-file /data/nodes.conf
cluster-node-timeout 5000
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
dir /data

# Memory management
maxmemory 512mb
maxmemory-policy allkeys-lru

# Security
protected-mode no

# Performance tuning
tcp-backlog 511
timeout 0
tcp-keepalive 300

# Persistence
save 900 1
save 300 10
save 60 10000

# Logging
loglevel notice
logfile ""

Apply the ConfigMap:

1
kubectl apply -f redis-configmap.yaml

Step 3: Create Headless Service

A headless service enables direct pod-to-pod communication required for Redis Cluster:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# redis-service.yaml
apiVersion: v1
kind: Service
metadata:
name: redis-cluster
namespace: redis-cluster
labels:
app: redis-cluster
spec:
clusterIP: None
ports:
- port: 6379
targetPort: 6379
name: client
- port: 16379
targetPort: 16379
name: gossip
selector:
app: redis-cluster
---
apiVersion: v1
kind: Service
metadata:
name: redis-cluster-external
namespace: redis-cluster
labels:
app: redis-cluster
spec:
type: ClusterIP
ports:
- port: 6379
targetPort: 6379
name: client
selector:
app: redis-cluster

Apply the services:

1
kubectl apply -f redis-service.yaml

Step 4: Deploy Redis StatefulSet

StatefulSets are ideal for Redis Cluster because they provide stable network identities and ordered deployment—concepts we explored in our Istio service mesh setup guide. They offer:

  • Stable network identities (predictable DNS names)
  • Ordered deployment and scaling
  • Persistent storage that follows pods
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# redis-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis-cluster
namespace: redis-cluster
spec:
serviceName: redis-cluster
replicas: 6
selector:
matchLabels:
app: redis-cluster
template:
metadata:
labels:
app: redis-cluster
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- redis-cluster
topologyKey: kubernetes.io/hostname
containers:
- name: redis
image: redis:7.2-alpine
ports:
- containerPort: 6379
name: client
- containerPort: 16379
name: gossip
command:
- redis-server
- /conf/redis.conf
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
tcpSocket:
port: 6379
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 3
volumeMounts:
- name: conf
mountPath: /conf
- name: data
mountPath: /data
volumes:
- name: conf
configMap:
name: redis-cluster-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi

Apply the StatefulSet:

1
kubectl apply -f redis-statefulset.yaml

Wait for all pods to be ready:

1
kubectl get pods -n redis-cluster -w

You should see output like:

1
2
3
4
5
6
7
NAME              READY   STATUS    RESTARTS   AGE
redis-cluster-0 1/1 Running 0 2m
redis-cluster-1 1/1 Running 0 2m
redis-cluster-2 1/1 Running 0 1m
redis-cluster-3 1/1 Running 0 1m
redis-cluster-4 1/1 Running 0 1m
redis-cluster-5 1/1 Running 0 30s

Step 5: Initialize Redis Cluster

Once all pods are running, initialize the cluster:

1
2
3
4
5
6
7
8
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli --cluster create \
redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379 \
redis-cluster-1.redis-cluster.redis-cluster.svc.cluster.local:6379 \
redis-cluster-2.redis-cluster.redis-cluster.svc.cluster.local:6379 \
redis-cluster-3.redis-cluster.redis-cluster.svc.cluster.local:6379 \
redis-cluster-4.redis-cluster.redis-cluster.svc.cluster.local:6379 \
redis-cluster-5.redis-cluster.redis-cluster.svc.cluster.local:6379 \
--cluster-replicas 1

When prompted, type yes to accept the configuration.

Expected output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica redis-cluster-4:6379 to redis-cluster-0:6379
Adding replica redis-cluster-5:6379 to redis-cluster-1:6379
Adding replica redis-cluster-3:6379 to redis-cluster-2:6379
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
.
>>> Performing Cluster Check (using node redis-cluster-0:6379)
M: [node-id] redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379
slots:[0-5460] (5461 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

Step 6: Verify Cluster Status

Check cluster information:

1
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli cluster info

Output should show:

1
2
3
4
5
6
7
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3

List all cluster nodes:

1
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli cluster nodes

Testing High Availability

Test 1: Basic Read/Write Operations

1
2
3
4
5
# Write data to the cluster
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli -c SET mykey "Hello Redis Cluster"

# Read data from any node
kubectl exec -it redis-cluster-1 -n redis-cluster -- redis-cli -c GET mykey

The -c flag enables cluster mode, allowing automatic redirection to the correct node.

Test 2: Verify Data Distribution

1
2
3
4
5
6
7
8
# Create multiple keys
for i in {1..100}; do
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli -c SET key$i value$i
done

# Check key distribution
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli --cluster check \
redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379

Test 3: Simulate Node Failure

Delete a master pod to test automatic failover:

1
2
3
4
5
# Identify which pod is master
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli role

# Delete the master pod
kubectl delete pod redis-cluster-0 -n redis-cluster

Watch the cluster recover:

1
kubectl get pods -n redis-cluster -w

Verify the cluster promoted a replica to master:

1
kubectl exec -it redis-cluster-1 -n redis-cluster -- redis-cli cluster nodes

You should see that one of the replicas has been promoted to master, and when redis-cluster-0 comes back online, it becomes a replica.

Test 4: Data Persistence

1
2
3
4
5
6
7
8
9
10
11
# Write test data
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli -c SET persistent-key "test-value"

# Delete all pods
kubectl delete pods -n redis-cluster --all

# Wait for pods to restart
kubectl wait --for=condition=ready pod -l app=redis-cluster -n redis-cluster --timeout=120s

# Verify data persisted
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli -c GET persistent-key

Monitoring Redis Cluster

Using Redis CLI

Monitor cluster health:

1
2
3
4
5
6
7
8
9
# Real-time monitoring
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli --cluster check \
redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379

# Memory usage
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli INFO memory

# Cluster statistics
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli INFO stats

Deploy Redis Exporter for Prometheus

Create a monitoring deployment to expose Redis metrics—similar to how we monitor microservices with Traefik:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# redis-exporter.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-exporter
namespace: redis-cluster
spec:
replicas: 1
selector:
matchLabels:
app: redis-exporter
template:
metadata:
labels:
app: redis-exporter
spec:
containers:
- name: redis-exporter
image: oliver006/redis_exporter:latest
ports:
- containerPort: 9121
env:
- name: REDIS_ADDR
value: "redis://redis-cluster-external:6379"
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
---
apiVersion: v1
kind: Service
metadata:
name: redis-exporter
namespace: redis-cluster
labels:
app: redis-exporter
spec:
ports:
- port: 9121
targetPort: 9121
name: metrics
selector:
app: redis-exporter

Apply the exporter:

1
kubectl apply -f redis-exporter.yaml

Scaling the Cluster

Adding New Nodes

To add a new master-replica pair and handle increased load—similar to scaling WordPress on Kubernetes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Scale the StatefulSet
kubectl scale statefulset redis-cluster --replicas=8 -n redis-cluster

# Wait for new pods
kubectl wait --for=condition=ready pod -l app=redis-cluster -n redis-cluster --timeout=120s

# Add new master to cluster
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli --cluster add-node \
redis-cluster-6.redis-cluster.redis-cluster.svc.cluster.local:6379 \
redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379

# Add new replica
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli --cluster add-node \
redis-cluster-7.redis-cluster.redis-cluster.svc.cluster.local:6379 \
redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379 \
--cluster-slave

# Rebalance hash slots
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli --cluster rebalance \
redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379 \
--cluster-use-empty-masters

Security Best Practices

Enable Authentication

Update the ConfigMap to add password protection:

1
2
3
4
5
6
# Add to redis-configmap.yaml
data:
redis.conf: |
# ... existing config ...
requirepass YourStrongPasswordHere
masterauth YourStrongPasswordHere

Store the password in a Secret:

1
2
3
4
5
6
7
8
9
# redis-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: redis-password
namespace: redis-cluster
type: Opaque
stringData:
password: YourStrongPasswordHere

Update the StatefulSet to use the secret:

1
2
3
4
5
6
env:
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: redis-password
key: password

Network Policies

Restrict network access to Redis using Kubernetes network policies—similar to how we enforce security with Kyverno:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# redis-networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: redis-cluster-policy
namespace: redis-cluster
spec:
podSelector:
matchLabels:
app: redis-cluster
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: application-namespace
ports:
- protocol: TCP
port: 6379
- from:
- podSelector:
matchLabels:
app: redis-cluster
ports:
- protocol: TCP
port: 6379
- protocol: TCP
port: 16379

Backup and Disaster Recovery

Automated Backups

Create a CronJob for regular backups:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# redis-backup-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: redis-backup
namespace: redis-cluster
spec:
schedule: "0 2 * * *" # Daily at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: redis:7.2-alpine
command:
- /bin/sh
- -c
- |
for i in 0 1 2; do
kubectl exec redis-cluster-$i -n redis-cluster -- redis-cli BGSAVE
done
sleep 60
kubectl exec redis-cluster-0 -n redis-cluster -- tar czf /tmp/backup-$(date +%Y%m%d).tar.gz /data
# Upload to S3 or other storage
restartPolicy: OnFailure

Performance Tuning

Optimize for Your Workload

Adjust memory policy based on use case:

1
2
3
4
5
6
7
8
# For cache-only workloads
maxmemory-policy allkeys-lru

# For mixed workloads with TTL
maxmemory-policy volatile-lru

# For database workloads
maxmemory-policy noeviction

Enable Transparent Huge Pages

For better memory performance:

1
2
3
# On each node
echo 'madvise' | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo 'madvise' | sudo tee /sys/kernel/mm/transparent_hugepage/defrag

Troubleshooting

Common Issues

Issue: Cluster formation fails

1
2
3
4
5
6
7
8
# Check pod logs
kubectl logs redis-cluster-0 -n redis-cluster

# Verify network connectivity
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli PING

# Check DNS resolution
kubectl exec -it redis-cluster-0 -n redis-cluster -- nslookup redis-cluster-1.redis-cluster.redis-cluster.svc.cluster.local

Issue: Pods stuck in CrashLoopBackOff

1
2
3
4
5
6
7
8
# Check resource constraints
kubectl describe pod redis-cluster-0 -n redis-cluster

# Verify PVC binding
kubectl get pvc -n redis-cluster

# Check configuration
kubectl exec -it redis-cluster-0 -n redis-cluster -- cat /conf/redis.conf

Issue: Data loss after restart

1
2
3
4
5
# Verify AOF persistence is enabled
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli CONFIG GET appendonly

# Check RDB snapshots
kubectl exec -it redis-cluster-0 -n redis-cluster -- ls -lh /data/

Conclusion

You now have a production-ready Redis Cluster running on Kubernetes with high availability, automatic failover, and data persistence. This setup can handle node failures gracefully, scale horizontally, and maintain data consistency across your cluster.

The Redis Cluster architecture we’ve implemented provides:

  • Automatic sharding across multiple masters for horizontal scalability
  • Built-in replication with automatic failover when masters fail
  • Data persistence using both RDB snapshots and AOF logs
  • Zero downtime during node failures and rolling updates
  • Monitoring capabilities for observability and alerting

As your application grows, you can easily scale the cluster by adding more master-replica pairs and rebalancing hash slots. For advanced traffic management and routing, consider integrating with a service mesh using Istio Gateway API. Remember to regularly test your disaster recovery procedures and monitor cluster health to ensure optimal performance.

For additional resources and advanced configurations, check out:

Happy Coding