Houston we have a problem, NSX-T Tunnel Down Status

In this article we’ll discuss the Tunnel Down Status and what this means for your NSX-T version 2.x deployment, read on or click on the link in your Alexa app.

Note: The below is relevant for NSX-T v2.1 Deployments, an update for NSX-T v2.3 can be found at the end of this article. 

So you’ve just started to get NSX-T up and running, your NSX manager is installed, controllers are deployed and you’ve converted your fabric nodes to transport nodes. So far so good, but you notice that though the transport node operational status is green and in a success state, the tunnel status is red and down.

Confirmation of Tunnel Status:

Panic stations …. Red Alert …. Phone A Friend.

Well no, not really, calm down for a second and let’s take a minute to understand what this means. Firstly what this is referring to is the Geneve Tunnels on the host, these aren’t up, and this isn’t a bad thing. If you’re coming from an NSX-V background remember the host registration process for your logical switch VTEP Tables on the controller, when was your host added to that table? Only when you had an active VM on said logical switch. At that point the host would inform the controller that – I have an active VM on this logical switch so please update the VTEP, MAC and ARP tables accordingly. This tunnel status isn’t to dissimilar, oh and those tables still exist in NSX-T I’m sure.

What this message is saying is that the Geneve Tunnels are down as the host currently does not have a VM active on an overlay network/logical switch. If you think about it that makes perfect sense, why bring up a tunnel, and to whom, if it’s not required. So next step if you attach a VM to a logical switch and generate some traffic, then you should see the status change to green.

Ok, so let’s verify a few things to cement the understanding here. Firstly let’s check on the vCenter side and KVM side that the VMs are up and what networks they are on.

Screenshot 1: Confirms T1-Web-01a is on esxcomp-01a.corp.local and is attached to logical switch T1-Web

Screenshot 2: Confirms T1-Web-02a is on esxcomp-02a.corp.local and is attached to logical switch T1-Web

Screenshot 3: Confirms T1-Web-VM3 is on kvmcomp-02a, but I haven’t confirmed the networking side for this yet.

 

Screenshot 4: Confirms that I don’t have a web server on kvmcomp-01a, and we’ll confirm that the DB server running there isn’t attached to the web logical switch shortly.

Ok so let’s now go and have a look at the VMs attached to the web logical switch, to confirm the KVM hosted VMs are attached correctly.

So this confirms that on T1-Web logical switch I only have three VMs attached, Web-01a, Web-02a and Web-VM3. So my expectation is that when I look at VTEP tables, for this logical switch, that I should only see the hosts that these VMs running on to be in the list. What’s cool with NSX-T is that there’s a bunch of nice one-click buttons to get this information. So if you go to the logical switch you can download the current VTEP table information.

Below is the CSV export that I downloaded, which confirms the three hosts registered in the central control plane, are esxcomp-01a (192.168.140.151), esxcomp-02a (192.168.140.152) and kvmcomp-02a (192.168.150.152).

And final confirmation that this is, in fact, the case, let’s look at the VTEP details as seen by the KVM host and NSX Manager:

Did you notice something, we had to view VTEP information on NSX Manager, and not via vCenter. This is something new in NSX-T, as there is a shift away from the reliance on vCenter. To access VTEP details go to Fabric > Nodes > Transport Nodes, then select your transport node and click on monitor, browse to the network interfaces and click the 1 under Interface, for the VMKernel interface you want to find out details of.

For KVM running on ubuntu you run the below command.

The final thing I found whilst digging around the UI, there’s a nice section to see the tunnel status !!! This is very cool as it shows the tunnels from ESX-TN1’s perspective. These were initiated as I ping’d between the web VMs, causing the tunnels to come up.

 

I’m still learning NSX-T so please feedback if I’m off the mark and need to update anything.

Thanks

[Update NSX-T v2.3]

After receiving comments that the above was resolved in NSX-T v2.3, I went about upgrading my environment to see how the changes are reflected. Firstly I can confirm that in the latest version the tunnel status is shown as up, but I think this is a little white lie. See below for my findings.

Below is a screenshot confirming tunnel status as up. The key point to note is that at this stage I do not have any VMs attached to my logical switches, in fact, I rebuilt my environment from the ground up to see how the upgrade would impact things. So according to the UI, my tunnel status is up.

Yet, if we drill into the ESX-TN1 details you’ll see that there are no tunnels active, which is expected, but I’m not sure if this is a little counter-intuitive.

The good thing is that we’ve now verified the differences from 2.1 to 2.3, from the perspective of Geneve Tunnels, and the operational view within NSX Manager.

Bal Birdy

2 thoughts on “Houston we have a problem, NSX-T Tunnel Down Status

Leave a Reply

%d bloggers like this: