How to setup production ready Redis cluster on Kubernetes

Running Redis in production Kubernetes environments requires careful planning to ensure high availability, data persistence, and optimal performance. While a single Redis instance might work for development, production workloads demand a Redis Cluster that can handle failures gracefully, scale horizontally, and maintain data consistency—similar to how we set up high-availability PostgreSQL with operators.

In this comprehensive guide, we’ll walk through setting up a production-ready Redis Cluster on Kubernetes with high availability, covering everything from basic concepts to advanced configurations that you can deploy in your own cluster.

Prerequisites

Before we begin, ensure you have:

A running Kubernetes cluster (version 1.20 or later)
kubectl configured to communicate with your cluster
Basic understanding of Kubernetes concepts (Pods, Services, StatefulSets)
At least 6GB of available memory across your cluster nodes
Storage provisioner for PersistentVolumes (e.g., local-path, AWS EBS, GCP PD)

Understanding Redis Cluster Architecture

Why Redis Cluster?

Redis Cluster provides several advantages over standalone Redis, much like how MySQL Operator deployments enhance database reliability in Kubernetes:

Automatic Sharding: Distributes data across multiple nodes for scalability.
High Availability: Replicates data across nodes with automatic failover.
Scalability: Easily add or remove nodes to adjust capacity.
No Single Point of Failure: Decentralized architecture ensures resilience.

Redis Cluster Topology

A minimal production Redis Cluster requires:

3 master nodes: Minimum for cluster quorum and automatic failover
3 replica nodes: One replica per master for redundancy
Total: 6 nodes: The standard configuration for production

Each master handles approximately 5461 hash slots (16384 / 3), and each replica continuously synchronizes data from its master.

Setting Up Redis Cluster

Step 1: Create Namespace

First, create a dedicated namespace for Redis:

1	kubectl create namespace redis-cluster

Step 2: Create ConfigMap for Redis Configuration

Create a ConfigMap with Redis cluster configuration:

# redis-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-cluster-config
  namespace: redis-cluster
data:
  redis.conf: |
    port 6379
    cluster-enabled yes
    cluster-config-file /data/nodes.conf
    cluster-node-timeout 5000
    appendonly yes
    appendfilename "appendonly.aof"
    appendfsync everysec
    dir /data
    
    # Memory management
    maxmemory 512mb
    maxmemory-policy allkeys-lru
    
    # Security
    protected-mode no
    
    # Performance tuning
    tcp-backlog 511
    timeout 0
    tcp-keepalive 300
    
    # Persistence
    save 900 1
    save 300 10
    save 60 10000
    
    # Logging
    loglevel notice
    logfile ""

Apply the ConfigMap:

1	kubectl apply -f redis-configmap.yaml

Step 3: Create Headless Service

A headless service enables direct pod-to-pod communication required for Redis Cluster:

# redis-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: redis-cluster
  namespace: redis-cluster
  labels:
    app: redis-cluster
spec:
  clusterIP: None
  ports:
  - port: 6379
    targetPort: 6379
    name: client
  - port: 16379
    targetPort: 16379
    name: gossip
  selector:
    app: redis-cluster
---
apiVersion: v1
kind: Service
metadata:
  name: redis-cluster-external
  namespace: redis-cluster
  labels:
    app: redis-cluster
spec:
  type: ClusterIP
  ports:
  - port: 6379
    targetPort: 6379
    name: client
  selector:
    app: redis-cluster

Apply the services:

1	kubectl apply -f redis-service.yaml

Step 4: Deploy Redis StatefulSet

StatefulSets are ideal for Redis Cluster because they provide stable network identities and ordered deployment—concepts we explored in our Istio service mesh setup guide. They offer:

Stable network identities (predictable DNS names)
Ordered deployment and scaling
Persistent storage that follows pods

# redis-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis-cluster
  namespace: redis-cluster
spec:
  serviceName: redis-cluster
  replicas: 6
  selector:
    matchLabels:
      app: redis-cluster
  template:
    metadata:
      labels:
        app: redis-cluster
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - redis-cluster
              topologyKey: kubernetes.io/hostname
      containers:
      - name: redis
        image: redis:7.2-alpine
        ports:
        - containerPort: 6379
          name: client
        - containerPort: 16379
          name: gossip
        command:
        - redis-server
        - /conf/redis.conf
        resources:
          requests:
            cpu: 100m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 512Mi
        livenessProbe:
          tcpSocket:
            port: 6379
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          exec:
            command:
            - redis-cli
            - ping
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 1
          failureThreshold: 3
        volumeMounts:
        - name: conf
          mountPath: /conf
        - name: data
          mountPath: /data
      volumes:
      - name: conf
        configMap:
          name: redis-cluster-config
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

Apply the StatefulSet:

1	kubectl apply -f redis-statefulset.yaml

Wait for all pods to be ready:

1	kubectl get pods -n redis-cluster -w

You should see output like:

NAME              READY   STATUS    RESTARTS   AGE
redis-cluster-0   1/1     Running   0          2m
redis-cluster-1   1/1     Running   0          2m
redis-cluster-2   1/1     Running   0          1m
redis-cluster-3   1/1     Running   0          1m
redis-cluster-4   1/1     Running   0          1m
redis-cluster-5   1/1     Running   0          30s

Step 5: Initialize Redis Cluster

Once all pods are running, initialize the cluster:

kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli --cluster create \
  redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379 \
  redis-cluster-1.redis-cluster.redis-cluster.svc.cluster.local:6379 \
  redis-cluster-2.redis-cluster.redis-cluster.svc.cluster.local:6379 \
  redis-cluster-3.redis-cluster.redis-cluster.svc.cluster.local:6379 \
  redis-cluster-4.redis-cluster.redis-cluster.svc.cluster.local:6379 \
  redis-cluster-5.redis-cluster.redis-cluster.svc.cluster.local:6379 \
  --cluster-replicas 1

When prompted, type yes to accept the configuration.

Expected output:

>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica redis-cluster-4:6379 to redis-cluster-0:6379
Adding replica redis-cluster-5:6379 to redis-cluster-1:6379
Adding replica redis-cluster-3:6379 to redis-cluster-2:6379
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
.
>>> Performing Cluster Check (using node redis-cluster-0:6379)
M: [node-id] redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

Step 6: Verify Cluster Status

Check cluster information:

1	kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli cluster info

Output should show:

cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3

List all cluster nodes:

1	kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli cluster nodes

Testing High Availability

Test 1: Basic Read/Write Operations

# Write data to the cluster
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli -c SET mykey "Hello Redis Cluster"

# Read data from any node
kubectl exec -it redis-cluster-1 -n redis-cluster -- redis-cli -c GET mykey

The -c flag enables cluster mode, allowing automatic redirection to the correct node.

Test 2: Verify Data Distribution

# Create multiple keys
for i in {1..100}; do
  kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli -c SET key$i value$i
done

# Check key distribution
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli --cluster check \
  redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379

Test 3: Simulate Node Failure

Delete a master pod to test automatic failover:

# Identify which pod is master
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli role

# Delete the master pod
kubectl delete pod redis-cluster-0 -n redis-cluster

Watch the cluster recover:

1	kubectl get pods -n redis-cluster -w

Verify the cluster promoted a replica to master:

1	kubectl exec -it redis-cluster-1 -n redis-cluster -- redis-cli cluster nodes

You should see that one of the replicas has been promoted to master, and when redis-cluster-0 comes back online, it becomes a replica.

Test 4: Data Persistence

# Write test data
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli -c SET persistent-key "test-value"

# Delete all pods
kubectl delete pods -n redis-cluster --all

# Wait for pods to restart
kubectl wait --for=condition=ready pod -l app=redis-cluster -n redis-cluster --timeout=120s

# Verify data persisted
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli -c GET persistent-key

Monitoring Redis Cluster

Using Redis CLI

Monitor cluster health:

# Real-time monitoring
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli --cluster check \
  redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379

# Memory usage
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli INFO memory

# Cluster statistics
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli INFO stats

Deploy Redis Exporter for Prometheus

Create a monitoring deployment to expose Redis metrics—similar to how we monitor microservices with Traefik:

# redis-exporter.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-exporter
  namespace: redis-cluster
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-exporter
  template:
    metadata:
      labels:
        app: redis-exporter
    spec:
      containers:
      - name: redis-exporter
        image: oliver006/redis_exporter:latest
        ports:
        - containerPort: 9121
        env:
        - name: REDIS_ADDR
          value: "redis://redis-cluster-external:6379"
        resources:
          requests:
            cpu: 50m
            memory: 64Mi
          limits:
            cpu: 100m
            memory: 128Mi
---
apiVersion: v1
kind: Service
metadata:
  name: redis-exporter
  namespace: redis-cluster
  labels:
    app: redis-exporter
spec:
  ports:
  - port: 9121
    targetPort: 9121
    name: metrics
  selector:
    app: redis-exporter

Apply the exporter:

1	kubectl apply -f redis-exporter.yaml

Scaling the Cluster

Adding New Nodes

To add a new master-replica pair and handle increased load—similar to scaling WordPress on Kubernetes:

# Scale the StatefulSet
kubectl scale statefulset redis-cluster --replicas=8 -n redis-cluster

# Wait for new pods
kubectl wait --for=condition=ready pod -l app=redis-cluster -n redis-cluster --timeout=120s

# Add new master to cluster
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli --cluster add-node \
  redis-cluster-6.redis-cluster.redis-cluster.svc.cluster.local:6379 \
  redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379

# Add new replica
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli --cluster add-node \
  redis-cluster-7.redis-cluster.redis-cluster.svc.cluster.local:6379 \
  redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379 \
  --cluster-slave

# Rebalance hash slots
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli --cluster rebalance \
  redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379 \
  --cluster-use-empty-masters

Security Best Practices

Enable Authentication

Update the ConfigMap to add password protection:

# Add to redis-configmap.yaml
data:
  redis.conf: |
    # ... existing config ...
    requirepass YourStrongPasswordHere
    masterauth YourStrongPasswordHere

Store the password in a Secret:

# redis-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: redis-password
  namespace: redis-cluster
type: Opaque
stringData:
  password: YourStrongPasswordHere

Update the StatefulSet to use the secret:

env:
- name: REDIS_PASSWORD
  valueFrom:
    secretKeyRef:
      name: redis-password
      key: password

Network Policies

Restrict network access to Redis using Kubernetes network policies—similar to how we enforce security with Kyverno:

# redis-networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: redis-cluster-policy
  namespace: redis-cluster
spec:
  podSelector:
    matchLabels:
      app: redis-cluster
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: application-namespace
    ports:
    - protocol: TCP
      port: 6379
  - from:
    - podSelector:
        matchLabels:
          app: redis-cluster
    ports:
    - protocol: TCP
      port: 6379
    - protocol: TCP
      port: 16379

Backup and Disaster Recovery

Automated Backups

Create a CronJob for regular backups:

# redis-backup-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: redis-backup
  namespace: redis-cluster
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: redis:7.2-alpine
            command:
            - /bin/sh
            - -c
            - |
              for i in 0 1 2; do
                kubectl exec redis-cluster-$i -n redis-cluster -- redis-cli BGSAVE
              done
              sleep 60
              kubectl exec redis-cluster-0 -n redis-cluster -- tar czf /tmp/backup-$(date +%Y%m%d).tar.gz /data
              # Upload to S3 or other storage
          restartPolicy: OnFailure

Performance Tuning

Optimize for Your Workload

Adjust memory policy based on use case:

# For cache-only workloads
maxmemory-policy allkeys-lru

# For mixed workloads with TTL
maxmemory-policy volatile-lru

# For database workloads
maxmemory-policy noeviction

Enable Transparent Huge Pages

For better memory performance:

1
2
3

# On each node
echo 'madvise' | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo 'madvise' | sudo tee /sys/kernel/mm/transparent_hugepage/defrag

Troubleshooting

Common Issues

Issue: Cluster formation fails

# Check pod logs
kubectl logs redis-cluster-0 -n redis-cluster

# Verify network connectivity
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli PING

# Check DNS resolution
kubectl exec -it redis-cluster-0 -n redis-cluster -- nslookup redis-cluster-1.redis-cluster.redis-cluster.svc.cluster.local

Issue: Pods stuck in CrashLoopBackOff

# Check resource constraints
kubectl describe pod redis-cluster-0 -n redis-cluster

# Verify PVC binding
kubectl get pvc -n redis-cluster

# Check configuration
kubectl exec -it redis-cluster-0 -n redis-cluster -- cat /conf/redis.conf

Issue: Data loss after restart

# Verify AOF persistence is enabled
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli CONFIG GET appendonly

# Check RDB snapshots
kubectl exec -it redis-cluster-0 -n redis-cluster -- ls -lh /data/

Conclusion

You now have a production-ready Redis Cluster running on Kubernetes with high availability, automatic failover, and data persistence. This setup can handle node failures gracefully, scale horizontally, and maintain data consistency across your cluster.

The Redis Cluster architecture we’ve implemented provides:

Automatic sharding across multiple masters for horizontal scalability
Built-in replication with automatic failover when masters fail
Data persistence using both RDB snapshots and AOF logs
Zero downtime during node failures and rolling updates
Monitoring capabilities for observability and alerting

As your application grows, you can easily scale the cluster by adding more master-replica pairs and rebalancing hash slots. For advanced traffic management and routing, consider integrating with a service mesh using Istio Gateway API. Remember to regularly test your disaster recovery procedures and monitor cluster health to ensure optimal performance.

For additional resources and advanced configurations, check out: