Graceful Restart in NSX

In this article we shall discuss how is Graceful Restart relevant to your design, the considerations with respect to ECMP based designs, vs HA based designs. Continue for more information.

I had an interesting discussion regarding Graceful Restart recently, due to the some confusion regarding traditional networking and GR/NSF/NSR, vs NSX’s implementation, specifically about when you implemented ESGs in HA Mode vs ECMP.

Firstly a quick recap on Graceful Restart, courtesy of Cisco:

When Graceful Restart is used, peer networking devices are informed, via protocol extensions prior to the event, of the SSO capable routers ability to perform graceful restart. The peer device must have the ability to understand this messaging. When a switchover occurs, the peer will continue to forward to the switching over router as instructed by the GR process for each particular protocol, even though in most cases the peering relationship needs to be rebuilt.

Reference Link

Now if we take an extract from the NSX Reference Design Guide, p128:

With GR, the NSX Edge can refresh adjacency with the physical router and the DLR Control VM while requesting them to continue using the old adjacencies. Without GR, these adjacencies would be brought down and renegotiated on reception of the first hello from the Edge and this would ultimately lead to a secondary traffic outage.

So here is the thing, with the DLR Control-VM, and ESGs in HA mode, HA is the key element. Remember the Control-VM is like an ESG, from a virtual appliance perspective. So when you deploy a Control-VM and ESG in HA Mode, we have an Active and a Standby Appliance. So during a failover of to Standby, in the event of a failure of the Active, we want to avoid tearing down learnt routes during failover, otherwise we will black hole traffic.

So from a DLR Control-VM perspective we don’t want to remove routes from the VDR instance on each of hosts, whilst the standby Control-VM is taking over. From an ESG perspective we don’t want, to the physical estate, or the DLR,  to tear down routes, whilst the Standby ESG is taking over. Keep in mind that routing protocol timers may be default with HA based ESGs/Control-VM, but to mitigate any other additional risks of black holing we can introduce a floating static route, that utilises a higher administrative distance. This will then be utilised in the event all other routes are lost.

How this differs with ECMP ESGs is that graceful restart is not required full stop! With an ECMP ESG, if the ESG fails then we will use an existing learnt route path via a second (or more) ECMP ESG. In addition these ESGs can support tuned hello and dead timers for the dynamic routing protocol. If an ECMP ESG fails, we want to tell the world as quickly as possible, to use the existing active redundant path.

Hopefully that clarifies a few things.

Bal Birdy on LinkedinBal Birdy on Twitter
Bal Birdy
Bal is an Open Group Certified IT Architect, and VCDX #269, specializing in the network and security arena, with over 15 years experience in enterprise level network/system technologies. His goal has always been to maintain a holistic view of the architecture allowing him to understand how various technology streams may impact the networking/infrastructure space.
Bal has a proven record of delivering on enterprise network designs, leading data center and site migrations as a result of business mergers and acquisitions, and vendor migrations e.g. Cisco to Checkpoint/Juniper. As part of this he worked across several business sectors: Utilities, Banking, Retail and Government, and can base designs around sector specific standards e.g. PCI-DSS, DSD and ISM. He is proficient in several technology areas including Cisco, Juniper, F5, VMware, Citrix and Microsoft. These skills are supported by non-technical certifications: Prince2 Project Management Practitioner, ITILv3, TOGAF 9.1 Certified and Open Group Certified IT Architect – Open CA.
In addition to supporting the Livefire Team, Bal leads several innovation efforts within the VMware WRACE organization, including projects investigating the use of Virtual Reality/Augmented Reality, AI/ML and Interactive 360, to support customer and partner enablement.

BSc (Hons) Computer Science
VCDX-NV #269
Open Group Certificated Architect
Member of the Associated of Enterprise Architects

1 thought on “Graceful Restart in NSX”

Leave a Reply