Skip to content

Improve vgscan behavior #2209

@mattcary

Description

@mattcary

vgscan is run during node unstage to make sure datacache partitions are in a good state: https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/gce-pd-csi-driver/node.go#L672

However vgscan can hang if there are certain devices offline (eg, a filesystem mounted from a loopback device backend by network storage).

It's possible vgscan should only be run if the volume being unstaged is datacache (but I'm not sure? especially if we do something like time out the vgscan call we may want to make sure it runs on the next unstage regardless of the volume type).

It also may be a good idea to time out the vgscan call. Since in the these cases, vgscan is hanging, we'd have to run it in a goroutine that times out. These goroutines may accumulate, but we also have vgscan processes accumulating in the system as well so this probably isn't a big deal.

A third thing to consider is if we can limit the devices looked at by vgscan. eg we should only look at /dev/sd* and maybe raided devices (/dev/md*).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions