OpenEBS takes volume provisioning to the next level. It is due to its dynamic nature as well as helpful replication capabilities. However, from time to time it might be required to perform some maintenance work in the cluster that would require taking a node out. This has to be done with tremendous care and caution so no data is corrupted or lost.
Reducing replicas gracefully
I use OpenEBS cStor as the default storage engine. CStorPoolCluster has been configured to have 3 CStorPoolInstances, on each worker node. Now it's time to temporarily reduce the number of those instances to 2. Issuing the following command will list them.
❯ kubectl get cstorpoolinstances.cstor.openebs.io
NAME HOSTNAME FREE CAPACITY READONLY PROVISIONEDREPLICAS HEALTHYREPLICAS STATUS AGE
openebs-cstor-disk-pool-d5rg vm0102 87400M 90100M false 4 4 ONLINE 34d
openebs-cstor-disk-pool-pxgl vm0302 87400M 90100M false 4 4 ONLINE 34d
openebs-cstor-disk-pool-rvl2 vm0202 87400M 90100M false 4 4 ONLINE 33d
Due to the removal of a node, I have to decommission openebs-cstor-disk-pool-pxgl
pool. The pool is used by multiple PVCs, therefore have to edit the CStorVolumeConfig of each PVC to get rid of the pool.
❯ kubectl get cstorvolumeconfigs.cstor.openebs.io
NAMESPACE NAME CAPACITY STATUS AGE
openebs pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c 4Gi Bound 33d
openebs pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f 8Gi Bound 33d
openebs pvc-863aa630-5368-4400-9c01-e965d17c5aeb 4Gi Bound 33d
openebs pvc-8d8cb8cb-9e13-4a3a-8f12-d5a19b4afb8b 4Gi Bound 33d
Editing volume config will give us output similar to the following one.
❯ kubectl edit cstorvolumeconfigs.cstor.openebs.io pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c
apiVersion: cstor.openebs.io/v1
kind: CStorVolumeConfig
metadata:
annotations:
openebs.io/persistent-volume-claim: datadir-mysql-innodbcluster-0
openebs.io/volume-policy: ""
openebs.io/volumeID: pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c
creationTimestamp: "2023-01-14T20:37:31Z"
finalizers:
- cvc.openebs.io/finalizer generation: 6
labels:
cstor.openebs.io/template-hash: "251293062"
openebs.io/cstor-pool-cluster: openebs-cstor-disk-pool
openebs.io/pod-disruption-budget: openebs-cstor-disk-poolrb5qk
name: pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c
namespace: openebs
resourceVersion: "538165"
uid: e441ef25-4fd1-4fcf-9e96-597f79f812db
publish:
nodeId: vm0202
spec:
capacity:
storage: 4Gi
cstorVolumeRef:
apiVersion: cstor.openebs.io/v1
kind: CStorVolume name: pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c
namespace: openebs
resourceVersion: "537566"
uid: 3037d679-b74f-4fdf-8279-7c31d4ff59e2
policy:
provision:
replicaAffinity: false
replica: {}
replicaPoolInfo:
- poolName: openebs-cstor-disk-pool-d5rg
- poolName: openebs-cstor-disk-pool-pxgl
- poolName: openebs-cstor-disk-pool-rvl2
target:
auxResources:
limits:
cpu: "0"
memory: "0"
requests:
cpu: "0"
memory: "0"
The crucial part here is to remove openebs-cstor-disk-pool-pxgl
from the replicaPoolInfo list. It has to be done for each CStorVolumeConfig.
Once it is done, there should be no replicas in the pool.
❯ kubectl get cstorpoolinstances.cstor.openebs.io
NAME HOSTNAME FREE CAPACITY READONLY PROVISIONEDREPLICAS HEALTHYREPLICAS STATUS AGE
openebs-cstor-disk-pool-d5rg vm0102 87400M 90100M false 4 4 ONLINE 34d
openebs-cstor-disk-pool-pxgl vm0302 90G 90117M false 0 0 ONLINE 34d
openebs-cstor-disk-pool-rvl2 vm0202 87400M 90100M false 4 4 ONLINE 33d
We are ready to delete it.
❯ kubectl delete cstorpoolinstances.cstor.openebs.io openebs-cstor-disk-pool-pxgl
cstorpoolinstance.cstor.openebs.io "openebs-cstor-disk-pool-pxgl" deleted
Now we should have two disk pools only. Describing CStorPoolClusters will show the number of healthy and desired instances, so 2 and 3 respectively.
❯ kubectl get cstorpoolclusters.cstor.openebs.io
NAME HEALTHYINSTANCES PROVISIONEDINSTANCES DESIREDINSTANCES AGE
openebs-cstor-disk-pool 2 2 3 34d
It's time to update CStorPoolClusters.
❯ kubectl edit cstorpoolclusters.cstor.openebs.io openebs-cstor-disk-pool
apiVersion: cstor.openebs.io/v1
kind: CStorPoolCluster
metadata:
creationTimestamp: "2023-01-13T21:31:16Z"
finalizers:
- cstorpoolcluster.openebs.io/finalizer
generation: 21
name: openebs-cstor-disk-pool
namespace: openebs
resourceVersion: "18193885"
uid: b9a7dfe4-5f2f-43b5-aeef-af03af3d87f1
spec:
pools:
- dataRaidGroups:
- blockDevices:
- blockDeviceName: blockdevice-a251ba13122b4b5f8c2ce9471cf4b03e
nodeSelector:
kubernetes.io/hostname: vm0102
poolConfig:
dataRaidGroupType: stripe
- dataRaidGroups:
- blockDevices:
- blockDeviceName: blockdevice-19d2d2fdc1c0e274aa3ba199d8fba897
nodeSelector:
kubernetes.io/hostname: vm0302
poolConfig:
dataRaidGroupType: stripe
- dataRaidGroups:
- blockDevices:
- blockDeviceName: blockdevice-2806021afad58e5ef20c5c82b78fd943
nodeSelector:
kubernetes.io/hostname: vm0202
poolConfig:
dataRaidGroupType: stripe
We have to get rid of dataRaidGroup containing the hostname vm0302
and block device blockdevice-19d2d2fdc1c0e274aa3ba199d8fba897
from the pools list.
❗ If a node is already lost, OpenEBS won't let us remove the hostname and device from the list. This requires additional steps to take, see Node is lost before disk pool removal.
The block device should now be Unclaimed.
❯ kubectl get bd
NAME NODENAME SIZE CLAIMSTATE STATUS AGE
blockdevice-19d2d2fdc1c0e274aa3ba199d8fba897 vm0302 99998934528 Unclaimed Active 34d
blockdevice-2806021afad58e5ef20c5c82b78fd943 vm0202 99998934528 Claimed Active 34d
blockdevice-a251ba13122b4b5f8c2ce9471cf4b03e vm0102 99998934528 Claimed Active 34d
Now it can be removed.
❯ kubectl delete bd blockdevice-19d2d2fdc1c0e274aa3ba199d8fba897
blockdevice.openebs.io "blockdevice-19d2d2fdc1c0e274aa3ba199d8fba897" deleted
We are ready to drain nodes.
Node is lost before disk pool removal
Planned maintenance is the most desirable way of making things run healthy. However, rarely happen random unplanned events. The disasters for instance. In such cases, there is no way to perform graceful removal as the node is already lost.
OpenEBS has finalizers that perform clean-up actions once a resource is removed. However, when the finalizer is not able to run an action, it gets stuck and prevents the removal of the disk pool from the cluster. In such a case it is required to edit the cStor volume replica related to the lost pool and remove the finalizer manually.
❯ kubectl get cstorvolumereplicas.cstor.openebs.io
NAME ALLOCATED USED STATUS AGE
pvc-5ea36f92-daec-4ba8-a650-456b1b97b17a-openebs-cstor-disk-pool-pxgl 873M 4.79G Offline 4d12h
pvc-5ea36f92-daec-4ba8-a650-456b1b97b17a-openebs-cstor-disk-pool-d5rg 873M 4.79G Healthy 4d12h
pvc-5ea36f92-daec-4ba8-a650-456b1b97b17a-openebs-cstor-disk-pool-rvl2 873M 4.79G Healthy 4d12h
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f-openebs-cstor-disk-pool-pxgl 226M 758M Offline 14d
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f-openebs-cstor-disk-pool-d5rg 227M 758M Healthy 48d
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f-openebs-cstor-disk-pool-rvl2 227M 758M Healthy 48d
pvc-a274c159-eeb4-4d7c-9c22-bc8df66e9ae9-openebs-cstor-disk-pool-pxgl 189M 512M Offline 4d12h
pvc-a274c159-eeb4-4d7c-9c22-bc8df66e9ae9-openebs-cstor-disk-pool-d5rg 189M 512M Healthy 4d12h
pvc-a274c159-eeb4-4d7c-9c22-bc8df66e9ae9-openebs-cstor-disk-pool-rvl2 189M 511M Healthy 4d12h
Here following volume replicas need treatment:
pvc-5ea36f92-daec-4ba8-a650-456b1b97b17a-openebs-cstor-disk-pool-pxgl,
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f-openebs-cstor-disk-pool-pxgl, and
pvc-a274c159-eeb4-4d7c-9c22-bc8df66e9ae9-openebs-cstor-disk-pool-pxgl.
When editing the resource, we have to find finalizers
list.
❯ kubectl edit cstorvolumereplicas.cstor.openebs.io pvc-5ea36f92-daec-4ba8-a650-456b1b97b17a-openebs-cstor-disk-pool-pxgl
apiVersion: cstor.openebs.io/v1
kind: CStorVolumeReplica
metadata:
annotations:
cstorpoolinstance.openebs.io/hostname: vm0302
creationTimestamp: "2023-02-27T20:13:52Z"
finalizers:
- cstorvolumereplica.openebs.io/finalizer
generation: 13086
labels:
cstorpoolinstance.openebs.io/name: openebs-cstor-disk-pool-pxgl
And remove the following line:
- cstorvolumereplica.openebs.io/finalizer
This will allow CStorVolume to scale down.
Now, we can update CStorPoolClusters and remove unreachable resources.
Conclusion
Installing OpenEBS on a cluster and then configuring it is an easy task. However, maintenance can be quite difficult and requires a lot of insight. I strongly encourage you to seek troubleshooting guides before going down the rabbit hole.
Troubleshooting OpenEBS - cStor | OpenEBS Docs