-
Notifications
You must be signed in to change notification settings - Fork 168
Description
vgscan is run during node unstage to make sure datacache partitions are in a good state: https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/gce-pd-csi-driver/node.go#L672
However vgscan can hang if there are certain devices offline (eg, a filesystem mounted from a loopback device backend by network storage).
It's possible vgscan should only be run if the volume being unstaged is datacache (but I'm not sure? especially if we do something like time out the vgscan call we may want to make sure it runs on the next unstage regardless of the volume type).
It also may be a good idea to time out the vgscan call. Since in the these cases, vgscan is hanging, we'd have to run it in a goroutine that times out. These goroutines may accumulate, but we also have vgscan processes accumulating in the system as well so this probably isn't a big deal.
A third thing to consider is if we can limit the devices looked at by vgscan. eg we should only look at /dev/sd* and maybe raided devices (/dev/md*).