I run the Kubernetes cluster with OpenEBS cStor as the primary provisioning engine. It is used for two reasons: primo, the engine enables dynamic volume provisioning, and secundo, it is capable of replicating Persistent Volumes to other nodes in the cluster. This brings a bit of resiliency to the cluster, as we don't have to worry when something really bad happens to a node.
I had to update the hardware on one of the nodes some time ago. Therefore I removed the node from the cluster, performed a disk replacement, and then added the node back. You can read about it in the following posts.
Removing worker and control-plane nodes from the Kubernetes cluster
However, after re-adding a new pool to the cStor disk pool, Pods got stuck waiting for volumes.
Down the rabbit hole
So what is the issue with the cStor pool clusters?
❯ kubectl get cstorpoolclusters.cstor.openebs.io
NAME HEALTHYINSTANCES PROVISIONEDINSTANCES DESIREDINSTANCES AGE
openebs-cstor-disk-pool 3 3 3 35d
Printing cstorpoolclusters has shown that we desire 3 instances out of which we have 3 provisioned instances and 3 healthy instances. So good so far. But printing cstorpoolinstances have shown something different.
❯ kubectl get cstorpoolinstances.cstor.openebs.io
NAME HOSTNAME FREE CAPACITY READONLY PROVISIONEDREPLICAS HEALTHYREPLICAS STATUS AGE
openebs-cstor-disk-pool-9k98 vm0302 89500M 90132M false 4 1 ONLINE 17h
openebs-cstor-disk-pool-d5rg vm0102 87300M 90070M false 4 1 ONLINE 35d
openebs-cstor-disk-pool-rvl2 vm0202 87300M 90070M false 4 1 ONLINE 34d
We have one healthy replica only in each pool and that's pretty odd! Getting cstorvolumes has shown that we are having an issue with volumes.
❯ kubectl get cstorvolumes.cstor.openebs.io
NAME CAPACITY STATUS AGE
pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c 4Gi Healthy 34d
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f 8Gi Offline 34d
pvc-863aa630-5368-4400-9c01-e965d17c5aeb 4Gi Offline 34d
pvc-8d8cb8cb-9e13-4a3a-8f12-d5a19b4afb8b 4Gi Offline 34d
It happened that cStor volumes got stuck in an Offline state.
❯ kubectl get cstorvolumereplicas.cstor.openebs.io
NAME ALLOCATED USED STATUS AGE
pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c-openebs-cstor-disk-pool-d5rg 671M 1.66G Healthy 33d
pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c-openebs-cstor-disk-pool-rvl2 671M 1.66G Healthy 33d
pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c-openebs-cstor-disk-pool-9k98 671M 1.66G Healthy 33d
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f-openebs-cstor-disk-pool-d5rg 209M 741M Healthy 33d
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f-openebs-cstor-disk-pool-rvl2 209M 241M Degraded 33d
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f-openebs-cstor-disk-pool-9k98 209M 1M NewReplicaDegraded 33d
pvc-863aa630-5368-4400-9c01-e965d17c5aeb-openebs-cstor-disk-pool-d5rg 874M 2.37G Healthy 33d
pvc-863aa630-5368-4400-9c01-e965d17c5aeb-openebs-cstor-disk-pool-rvl2 874M 1.37G Degraded 33d
pvc-863aa630-5368-4400-9c01-e965d17c5aeb-openebs-cstor-disk-pool-9k98 874M 1M NewReplicaDegraded 33d
pvc-8d8cb8cb-9e13-4a3a-8f12-d5a19b4afb8b-openebs-cstor-disk-pool-d5rg 917M 2.53G Healthy 33d
pvc-8d8cb8cb-9e13-4a3a-8f12-d5a19b4afb8b-openebs-cstor-disk-pool-rvl2 917M 1.13G Degraded 33d
pvc-8d8cb8cb-9e13-4a3a-8f12-d5a19b4afb8b-openebs-cstor-disk-pool-9k98 917M 1M NewReplicaDegraded 33d
Pods got stuck as they needed volumes, and volumes were not ready...
cStor Target Pods
When a cStor volume is provisioned, it creates a new cStor target Pod that is responsible for exposing the iSCSI LUN. cStor target Pod receives the data from the workloads and then passes it on to the respective cStor volume replicas (on cStor Pools). cStor target pod handles the synchronous replication and quorum management of its replicas. And those target Pods required a restart.
❯ kubectl delete pod pvc-5ea36f92-daec-4ba8-a650-456b1b97b17a-target-5b7646677c2w4vq
Deleting each target Pod forced replicas to rebuild.
❯ kubectl get cstorvolumereplicas.cstor.openebs.io
NAME ALLOCATED USED STATUS AGE
pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c-openebs-cstor-disk-pool-d5rg 671M 1.66G Healthy 33d
pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c-openebs-cstor-disk-pool-rvl2 671M 1.66G Healthy 33d
pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c-openebs-cstor-disk-pool-9k98 671M 1.66G Healthy 33d
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f-openebs-cstor-disk-pool-d5rg 209M 741M Healthy 33d
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f-openebs-cstor-disk-pool-rvl2 209M 241M Rebuilding 33d
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f-openebs-cstor-disk-pool-9k98 209M 1M ReconstructingNewReplica 33d
pvc-863aa630-5368-4400-9c01-e965d17c5aeb-openebs-cstor-disk-pool-d5rg 874M 2.37G Healthy 33d
pvc-863aa630-5368-4400-9c01-e965d17c5aeb-openebs-cstor-disk-pool-rvl2 874M 1.37G Rebuilding 33d
pvc-863aa630-5368-4400-9c01-e965d17c5aeb-openebs-cstor-disk-pool-9k98 874M 322M ReconstructingNewReplica 33d
pvc-8d8cb8cb-9e13-4a3a-8f12-d5a19b4afb8b-openebs-cstor-disk-pool-d5rg 917M 2.53G Healthy 33d
pvc-8d8cb8cb-9e13-4a3a-8f12-d5a19b4afb8b-openebs-cstor-disk-pool-rvl2 917M 1.53G Rebuilding 33d
pvc-8d8cb8cb-9e13-4a3a-8f12-d5a19b4afb8b-openebs-cstor-disk-pool-9k98 917M 571M ReconstructingNewReplica 33d
Once reconstruction had been finished, all replicas became healthy again.
More useful OpenEBS cStor troubleshooting guides can be found at the following link.
Troubleshooting OpenEBS - cStor | OpenEBS Docs