Mainly Networking, SDN, Automation, Datacenter and OpenStack as an overlay for my life

Thursday, March 9, 2017

VxLAN Deep Dive Part III: Flood and Learn



Rainy day ideal for continuing with the series of VxLAN :)
  • Part 1: Let's overlay -- Basic info about VXLAN, addressing and headers.
  • Part 2: It's all about knowledge -- Packet forwarding overview, VTEP control plane learning options
  • Part 3: Hands On #1 -- Configuration on Cisco Nexus Devices, Flood and Learn.
  • Part 4: Hands On #2 -- Configuration on Cisco Nexus Devices,  EVPN.

Today we will focus on config, the funniest part of any IE track. In part 1 & 2 we cover fundamentals, now we understand how VxLAN works, how many addresses can we get and the different options of advertising MAC/IP information to peers. In this case we will start with Flood & Learn, I choose this one not only for being the first adopted but also to be the *poorly* documented on the web.

Note: If you want to find any VxLAN config info in NXOS I encourage you too look under N9K, since latest release of 7K will do the same and under 7K you will not find anything :)

Recommended reading: I've also mentioned this in the prep for DC IE track, but if you don't have the chance I really recommend Cisco Live Presentations, in this case BRKDCT-2404: VxLAN Deployment Models

Let's use this topology:

First we will define each component in the network:
H1 / H2 would be hosts in different VLAN, for this example VLAN 101 and 102.
L1 - L4 would be Nexus 5K running as VXLAN L2 GW.
L5 - L6 would be Nexus 7K running as VXLAN L3 GW
S1 / S2 will be the underlay L3 core between all the nodes, the L3 cloud also supporting multicast.


IGP + Multicast cloud

First things first, we need basic IGP reachability between our nodes plus multicast reachability. For multicast we are going to use static mapping for the RP and, in order to bring redundancy, we will deploy PhantomRP, of course you can use any other way that you want but, for the purpose of the example, static mapping is the simplest option. Here is the snippet of config in each device


L1 - L6



S1 / S2



Also the config for IGP is omitted here, since you can run whatever you want (also static routing right? yes! but lot of work), in our case we simply setup OSPF in area 0 and point to point interfaces in each link, the only consideration was the MTU, as you may recall from previous post of this series you will need to tune up MTU to be able to send a VXLAN packet inside, you can do the math by yourself (remember VXLAN adds an additional 8 byte field...) but lets assuming that anything beyond 1554 1600 would be ok.


Just a minor note on Naming conventions

Before taking any other step further we will need to clarify some naming conventions that cisco has impossed to us :)

If you recall correctly each host in the VXLAN network has a VTEP which establish the tunnel and is , in the end, the responsible of taking in/out encapsulated packet to our vxlan network... Well for Cisco is quite different, since they don't have any host here messing around with VXLAN for them VTEP is the device responsible for the encap/decap process, but if VTEP is the device how do I configure the interface on that device in which this magic occurs? and here we come with NVE (Network Virtual Interface) which is the logical interface where the previous mentioned magic of encap/decap actually takes place.


VXLAN L2 Gateway

Well here the fun begins, at L2 of course. I've stated the concept of VXLAN L2 GW and thats because it is so important to understand how it works and what it does, the L2 keyword is key to understand this.
Let's asume you are running a legacy network, with VLANs in place and you want to integrate to a new design with VXLAN. Of course that you want not just to allow inter-routing between them but also allow to place some hosts in your new network in the same segment as the legacy one. Nice scenario... but how we can accomplish this?

Essentially what we need is a device capable of taking L2 frames, let's say tagged since I state that we've VLANs in place, and put them in the same bridge domain with the VXLAN traffic. Using different words to say the same what we want to accomplish is to put VLAN traffic into the VXLAN VNI associated to that traffic and be able to flood interact between them.

Here I will post two suggested options for config, one at the interface level (i.e. config direct on interface that get the tagged frames) and one at the switch level (what will use in our 5K), Cisco refer to this config as VSI (VN-Segment service instance) and VLAN modes.

VSI CLI Mode (L1/L2)

For this config let's assume that L1/L2 leafs gets the frames on H1/H2 ports on a trunk interface and what we want to accomplish is the vlan-to-vni mapping.




VLAN CLI Mode

This mode is much more easier, and that is because what we do is just an association between a vlan and a vn-segment.






VXLAN L3 Gateway

So basically we cover full L2 reachability between VXLAN under same bridge domain, but what about inter VXLAN routing? Seems pretty obvious that we will need to add a gateway with interfaces in the interesting VXLAN segment to route among them and also like SVI we do have a BDI (Bridge Domain Interfaces). We will associate that BDI with our VNI and assign addressing to it in order to be the VXLAN gateway for that segment and also be able to route outside (also to other VXLAN segment if we have)


So if we dig into config we can see is pretty straight forward since the magic already happened, what that means is that we do the hardest part that is the association between BD-VNI or VLAN-VNI, so only thing left id to create the associated interface (SVI in VLAN mode or BDI in VSI mode) with the addressing and thats all:



Note #1: See, only "new" stuff here is that we need to run PIM sparse mode, can you guess why? ping me!
Note #2: Yes, I'm using a VRF since it's also best practice to run tenant traffic into a separate vrf, this is pretty common in real life deployments (not flood and learn, VRF isolation i mean :) )

Now lets consider some design caveats and the first thing in mind for everyone here is HA so...

Redundancy? VXLAN L3 GW + HSRP? What about VPC ?

Yes, Yes and also Yes. Of course you can run HSRP on top of BDI/SVI but the not so easy part here is VPC. As you know VPC provides MAC state sync between peering devices and if you've opted to HSRP redundant VTEPs share anycast VTEP IP address (underlay). This way VPC provides L2 + L3 redundancy in HW, to be able to do this some changes needed to be done to loopack interface used for VTEP sourcing and to VPC Domain config, as a starter point first thing to recall is that a secondary ip address should be shared between VPC peers in order to forward VXLAN packets that can be handled for any of the peering devices, does that remember you something?... Yes, we also need peer-gateway under VPC Domain. Based on what we just describe peer-gateway is mandatory and also requires a SVI configured with PIM across peer link, a list of requirements can be found here:

http://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/6-x/vxlan/configuration/guide/b_Cisco_Nexus_9000_Series_NX-OS_VXLAN_Configuration_Guide/b_Cisco_Nexus_9000_Series_NX-OS_VXLAN_Configuration_Guide_chapter_010.html#concept_C769B7878CE2458E98657905843DEEFA

But what you can never forget is:

  • Unique primary IP for underlay loopback
  • Same secondary
  • PIM
  • Consistent VNI to multicast group mapping
  • peer-gateway and a "special SVI" (PIM enabled) // This is needed in case your leaf lost connectvity to Spines and needs to forward packet to peering device.

Those are key for me but in the doc you will find a lot of more information, also something to mention is that on Nexus 5K VXLAN VPC configuration requires the use of that special SVI for VXLAN traffic by issuing a special command:




Distributed Gateway

Well, this is a huge post... we will talk about distributed GW, anycast and that stuff in another one since if not this will take me a lifetime and I do want to explain it in detail :)

Stay in touch :)

, , , , , , , , , , , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

2 comments:

  1. Hello! Question on the L3 gateway - do we not need to advertise this into the IGP? Can you explain the packet walk if that is not needed?

    ReplyDelete