vSAN: A Glance Behind the Curtain

A brief overview of the services and processes that make up a vSAN Cluster.

The other day in the office I met a good old colleague of mine and we discussed vSAN processes and where in the IO chain the SPBM (Storage Policy Based Management) settings – like checksum – are applied. We decided to summarize our finding in this short, neat article that focuses on the most important processes in the vSAN stack.
Note: This post refers to vSAN 6.7.

vSAN Architecture

First lets highlight the different services and their main function.

CMMDS: Cluster Monitoring, Membership and Directory Services

This service will take care that our vSAN hosts actually build a cluster: Every host runs this service and one of the hosts will be elected to be a Master that works as the brain of the RAIN (Redundant Array of Independent Nodes) system. A Backup node – that holds a copy of the master data – will take over if the Master fails. All other nodes are Agents.

The master is responsible for the discovery and maintenance of the vSAN cluster. It builds and inventory of the vSAN cluster nodes and their resources as well as stores object meta-data information and policies.

Example: What happens if there is network failure and a 16 host cluster is divided into two 8 node partitions and Master as well as the Backup node run in partition A?

The Agent nodes in partition B detect that the Master is not reachable. An election process is started to promote an Agent node to a Master and another one to a Backup, so that a cluster can be built in partition B.
After the network problem is fixed and all nodes can talk to each other again a versioning system is used to merge the two partitions. Most like the old Master will be the Master for the whole cluster again and the interim Master will be degraded.

CLOM: Cluster Level Object Manager

As CMMDS manages the nodes, CLOM takes care of the vSAN objects in a cluster. It is responsible for object placement, migration and triggers repairs in case of failures.

Basically CLOM is involved in any objects (or data) management operation: Starting from powering on a VM (SWAP file creation) to Storage vMotion. CLOM is also responsible moving objects around when you put a host into Maintenance Mode.

When CLOM is tasked with the creation of an object it checks to see if there are enough available resources (failure domains, disk groups, free space) to satisfy the policy. It does this by communicating to the other nodes through the CMMDS. If everything looks good it will then instruct DOM (see next chapter) to create the objects. After the objects are created CLOM is further responsible for monitoring the objects compliance status.

CLOM triggers data path operations but does not actually participate in data read/write operations.

Hint: CLOM writes to its own log file: /var/run/log/clomd.log

DOM: Distributed Object Manager

After CLOM has defined how the object needs to be laid out and suitable disk groups have been identified the objects components are now created and distributed across the cluster by DOM talking to LSOM (see next section).

To enable objects creation on other hosts the DOMs on the cluster nodes talk to each other to guarantee component distribution. For this the DOM is split up into three processes:

  • DOM Client: Talks to the the vSCSI layer.
  • DOM Owner: Manages access to the vSAN object.
  • DOM Component Mgr: Manages objects on the hosts where components exists.

This concept is best illustrated by looking on the data flow in a vSAN cluster.
Please check out the section “Data Flow” below.

LSOM: Local Log-Structured Object Manager

LSOM is actually the worker in the whole stack. It reads and writes data by talking to the PSA (Pluggable Storage Architecture) of the local host. As it has access to the cache and capacity layer it is responsible for data caching and de-staging as well as device management which includes the reporting of unhealthy devices.
There is one LSOM process per disk group on every host.

Data Flow

Lets close this post with a short view on the data flow in a vSAN cluster. I would like to point your attention to the layers where the SPBM settings are enforced.

  • Software checksum is located closest to the “consumer” in our case the VMDK.
  • A layer lower the data is then managed by DOM which will apply the appropriate protection level – or in vSAN speech: Numbers of failures to tolerate.
    In the example above: FTT=1. (Witness not shown)
  • At the end of the chain is LSOM which takes care of Deduplication & Compression as well as Encryption. As there is one LSOM process per disk group we can understand that Deduplication & Compression is done per disk group.
Peter Oberacher

Leave a Reply