Hello All
I also built a Nested Lab that only use Workstation 10. Inside that workstation i use :
6 ESXi 5.5
1 Vcenter Aplliances
1 Win 7 ( For FTP Server and Oracle DB )
1 Vyatta Router
1 Openfiler ( For NFS )
My story, everytime i lab NSX, in tomorrow, i always re-installed NSX manager again. So it always same lab over and over again. My topology are simple :
VM guest --- NSX distributed router --- NSX edge router --- vyatta router --- internet
At first installation, everything worked perfectly. But if i start again tomorrow, VM guest cannot ping to default gateway at NSX distributed router, and NSX Edge router cannot ping to distributed router.
First i check connectivity. I am using OSPF. All route table are in there, but from distributed router, i cannot ping to 8.8.8.8, even there was a default route in routing table. So routing are not the issue, so i check something else.
Then i suspecting at controller nodes. Search google then found this blog >> Some useful NSX Troubleshooting Tips | CormacHogan.com
I check that with CLI, and this is what i found :
~ # esxcli network vswitch dvs vmware vxlan network list --vds-name=VM_NSX_VXLAN
VXLAN ID Multicast IP Control Plane Controller Connection Port Count MAC Entry Count ARP Entry Count
-------- ------------------------- ------------- --------------------- ---------- --------------- ---------------
5000 N/A (headend replication) Enabled () 0.0.0.0 (down) 2 0 0
I do what he told, like switch from unicast to multicast, that switch back again. And the problem still in there.
Then i "stalking" this forum, and i found this thread.
Using this blog >>ESXi host Enable Agent error "Cannot complete the operation." | Spas Kaloferov's Blog
I follow the step, and it worked!!
~ # esxcli network vswitch dvs vmware vxlan network list --vds-name=VM_NSX_VXLAN
VXLAN ID Multicast IP Control Plane Controller Connection Port Count MAC Entry Count ARP Entry Count
-------- ------------------------- ----------------------------------- --------------------- ---------- --------------- ---------------
5000 N/A (headend replication) Enabled (multicast proxy,ARP proxy) 10.0.99.1 (up) 1 1 0
Now is "normal" situation again
TLDR :
If you Lab, or maybe production server running VMware, then suddenly all your machine, and NSX manager died, and your check everything "like it should be", the problem maybe not in you interconnection, most likely its in controller.
The first thing you do is to ssh to ESXi host and do this command = esxcli network vswitch dvs vmware vxlan network list --vds-name=<YOUR VDS NAME>
If it show up like this :
~ # esxcli network vswitch dvs vmware vxlan network list --vds-name=VM_NSX_VXLAN
VXLAN ID Multicast IP Control Plane Controller Connection Port Count MAC Entry Count ARP Entry Count
-------- ------------------------- ------------- --------------------- ---------- --------------- ---------------
5000 N/A (headend replication) Enabled () 0.0.0.0 (down) 2 0 0
Then type in your ESXi this >> /etc/init.d/netcpad restart
And now it should be like this :
~ # esxcli network vswitch dvs vmware vxlan network list --vds-name=VM_NSX_VXLAN
VXLAN ID Multicast IP Control Plane Controller Connection Port Count MAC Entry Count ARP Entry Count
-------- ------------------------- ----------------------------------- --------------------- ---------- --------------- ---------------
5000 N/A (headend replication) Enabled (multicast proxy,ARP proxy) 10.0.99.1 (up) 1 1 0
Troubleshooting Step :
1. Check NSX Manager, is up or down.
2. Check Controller status.
3. Check installation status at host preparation, is it green or with red "resolve" in it.
4. Check ESXi with SSH, and use this command to determine if controller connection are up or down >> esxcli network vswitch dvs vmware vxlan network list --vds-name=<YOUR VDS NAME>