Living in the underlay

Mainly Networking, SDN, Automation, Datacenter and OpenStack as an overlay for my life

Thursday, July 6, 2017

Juniper DC Reading list

One of my colleagues just asked me about the recommended reading list for the Juniper DC track (in particular what I've used to clear JNCIP-DC few weeks ago), here is a complete list of free resources that you can access to prepare yourself for the exam, I will also recommend (if you don't have any real/lab experience with QFX/EX for vxlan setup and mostly with VCF) to do some labs with vQFX (you can try out EVE-NG, which is *highly recommended*)

Here it is:

Juniper Networks EVPN Implementation for Next-Generation Data Center Architectures - https://www.juniper.net/assets/us/en/local/pdf/whitepapers/2000606-en.pdf

Virtual Chassis Fabric Feature Guide - http://www.juniper.net/documentation/en_US/junos/information-products/pathway-pages/qfx-series/virtual-chassis-fabric.pdf

Comparing Layer 3 Gateway & Virtual Machine Traffic Optimization (VMTO) For EVPN/VXLAN And EVPN/MPLS - https://www.juniper.net/documentation/en_US/release-independent/solutions/information-products/pathway-pages/solutions/l3gw-vmto-evpn-vxlan-mpls.pdf

Clos IP Fabrics with QFX5100 Switches - https://www.juniper.net/assets/cn/zh/local/pdf/whitepapers/2000565-en.pdf

Virtual Chassis Fabric Best Practices Guide - http://www.juniper.net/documentation/en_US/release-independent/vcf/information-products/pathway-pages/vcf-best-practices-guide.pdf

EVPN Control Plane and VXLAN Data Plane Feature Guide for QFX Series Switches - https://www.juniper.net/documentation/en_US/junos/information-products/pathway-pages/junos-sdn/evpn-vxlan.pdf

Understanding Zero Touch Provisioning - https://www.juniper.net/documentation/en_US/junos/topics/concept/software-image-and-configuration-automatic-provisioning-understanding.html

Configuring Zero Touch Provisioning - https://www.juniper.net/documentation/en_US/junos12.3/topics/task/configuration/software-image-and-configuration-automatic-provisioning-confguring.html

Configuring Zero Touch Provisioning in Branch Networks - https://www.juniper.net/documentation/en_US/release-independent/nce/information-products/pathway-pages/nce/nce-151-zero-touch-provisioning.pdf


Also another great book which I just end reading is "Building Data Centers with VXLAN BGP EVPN", this book is Cisco NXOS oriented but provides an amazing background on how VXLAN BGP EVPN Fabrics works.

HTH,

, , , , , , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Sunday, May 21, 2017

TSHOOT Tips: ELAM Usage on Cisco ACI

I was using this quite lot past weeks and think that is a good resource to share to everyone playing around with Cisco ACI. When it comes to tshoot and to understand packet flow inside the Fabric ELAM is a great tool.

So, what it is?

ELAM stands for Embedded Logic Analyzer Module, It is a logic that is present in the ASICs that allows us to capture and view one or more packets, that match a defined rule, from all the packets that are traversing the ASIC. ELAM is not new at all, some of you can remember this from CAT6500, and thats ok, same logic also same from N7K (for the youngest?).

and... whats new?

Essentialy the concept is still the same, an we just need to focus on understand how is the architecture inside the ASICs on Leafs and Spines to fully apply this concept.

Cisco ASIC data path is divided into ingress and egress pipelines where two ELAMs are present (see figure) at the beginning of the lookup block.



As we can see in the picture Before we can use ELAM to capture a packet, we must be sure that the packet is sent from the BCM ASIC to the Northstar ASIC. ELAM operates only in the Northstar (for leafs, on Spine takes place on Alpine), so any packets that are locally switched in the BCM ASIC will not trigger the ELAM, this is important since in some scenarios the packet will not reach Northstar and will not trigger an ELAM event (we can cover this in a future post about PL-to-PL traffic on ACI fabric :) )

So, assuming that our traffic will be processed by Northstar we need to configure our ELAM instance, first of all is good to know which kind of rules can we configure based on the pipeline, this is also referred as "select lines" and the following are available:

Input Select Lines Supported 
3 - Outerl2-outerl3-outerl4
4 - Innerl2-innerl3-inner l4 
5 - Outerl2-innerl2 
6 - Outerl3-innerl3
7 - Outerl4-innerl4 

Output Select Lines Supported 
0 - Pktrw 
5 - Sideband

With this in mind we can configure our ELAM instance, first of all is always good to have an image to understand the whole process of what we need to do:


Where on INIT we choose the ASIC and pipeline in which the capture should take place, CONFIG refers to the proper configuration of the rulo to match the packets, ARM is like arming the bomb :) but in this case we arm our packet capture to be triggered once the rule defined on CONFIG section has a match, after this READ the captured data and RESET to start over :)

Now lets dig into the packet capture, we will refer to this topology for the capture.

ELAM Example


This image is extracted from a Cisco Live presentation of ELAM but we will focus on LEAF4 only, traffic will traverse from VM1 to the EP at the right going toward Northstar (at 1) and this example is also useful to show how this behaves on Alpine. 

We will arm the ELAM on Leaf 4 to capture a packet coming from EP1 (the one at the left side, directly connected to Leaf1). In this example we show use of in-select 3, which means the fields we can match on or outer L2, L3, or L4. We show also the out-select of 0.



This will work for basic ELAM packet capture.As we mention we need to configure (CONFIG section of the ELAM) 1 aspect of the trigger to match on. For this example we will use the SMAC of the locally attached endpoint:





In order to see the ELAM state the status command can be used, esentially three different status can be found:
- Triggered: indicates that a packet has been detected as matching the trigger, and that packet is available for analysis. 
- Armed: it means that that no packet has been detected as matching the trigger yet, and ELAM is actively looking at packets for a match to the trigger.
- Initialized: the ELAM is available for triggers to be configured, or to be armed with the start command. It is not currently attempting to capture a matched packet. 

Once ELAM is triggered, the packet can be viewed for analysis with the report command. The report will show the relevant header fields in the packet (note that will not show the complete payload of the packet), once this is done we can restart the process with the reset command.



This is pretty much all for a good start on ELAM usage for ACI, more info is available at N9K config guide and a good resource as well is the Cisco Live Session BRKACI-2102, from which I already took some images for this post.

Hope you enjoy and next time maybe I found some time to start the amazing post of PL-to-PL traffic on ACI Fabric.











, , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Thursday, May 18, 2017

Stretched DC, really?... ok, for L3, BGP conditional forwarding

A long time ago (I think it was years back) I was reviewing a DR solution for some internal customer who has two datacenter and a DCI between them (dark fiber). They moved initially to a stretched design extending vlans from each site and using L3 gateway on one side only at a time, since as a business requirement traffic should always leave from primary DC. However they were expecting some kind of solution to be able to automatically switchover to secondary DC in case of a failure on DC1.

For this cases it's always a pleasure to read Ivan and see how he predicts the design issues that I will face in the future (Stretched DCI), hopefully no stateful firewalls were involved here.

The main issue was not only to detect which side is alive (which is not easy without a witness, and we don't have one at all) but also how to decide which traffic should be served and from where.

So here is a big stop. After keep going with this we need to take some assumptions and business decisions:

  • If DC1 site fails but DCI and DC2 site alive, traffic will enter from DC2 side and traverse the DCI.
  • If DCI fails, traffic will continue being served from DC1 for stretched VLANs subnets, this implies move by other method those servers to the surviving side or at least shut them down.
  • If DC2 site fails but DCI and DC1 site alive, traffic will enter DC1 side and traverse DCI to reach DC2 side servers.
  • Traffic should leave and enter from DC1 whenever possible and DC2 site should not be used unless strictly necessary (this was imposed by customer)

So after reviewing lot of options, and assuming that eventually we can fail and working around that (and the fact that we need to do a stretched cluster after all) we came across a nice BGP feature which is called conditional forwarding. 
Just for your reference, BGP Conditional forwarding allows us to advertise a given network based on the information that we have in our FIB. This can be really useful for this scenario by defining an witness network from each side and advertise to each other, this should be a dummy network like 1.1.1.0/30 for DC1 and 1.1.2.0/30 for DC2 and the match statement will verify if we are getting this network advertisement and based on that will withdraw our advertisement or just let it flow.

Ok, so enough of reading and lets have a quick view on configuration (On NXOS) and behaviour:



Here is the config for the eBGP side of the DC2


Based on that normal behaviour would behave like this (routes will be withdrawn):


Now if we have a failure on DC1 side, conditional trigger will take place and start advertising from DC2.





Is this all what we need? Definitely No... There are still lot of things to resolve and we don't have an optimal design (we can discuss here, if we are meeting business requirements is there anything else to do?), but apart from that notice that stretching a VLAN is not a good choice, guess why? you're extending your fault domain and that doesn't simplify things it also make more complex the isolation and detection. so let's start wondering why we made such poor decisions and why we can't start talking about application level aware resiliency, making our life better by allowing us to use different subnets/networks at each site being able to handle traffic in/out more flexible by leveraging existing methods (long talk about BGP attributes and policy control enters here).


Some references:

Cisco. (Agosto de 2010). Cisco IP Routing. http://www.cisco.com/en/US/tech/tk365/technologies_configuration_example09186a0080094309.shtml











, , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Sunday, May 14, 2017

CCIE DC v2 - bootcamp - outline

For those attending to my CCIE DC v2 bootcamp next week, here is the updated outline, I will be posting updated diagram in few (remember this course is not based in any rack rental so interface numbering is up to that :) )

Introduction

Exam Considerations / Oveview / Strategy


Section 1 – Cisco Data Center Layer 2/Layer 3 Technologies

1.1 – Configure VDC Resources
1.2 – Configure NXOS multicast
1.3- Understanding VxLAN
1.4 – Configure vPC & Deployment options
1.5 – Configure FEX & Deployment options
1.6- Configure VxLAN L2/L3 GW (EVPN | F&L)
1.7 – Configure NXOS Security
1.8 – Configure& Troubleshoot Spanning Tree Protocol
1.9 – Configure & Troubleshoot OTV


Section 2 – Cisco Data Center Network Services

2.1- ACI Service Graph
2.2 – RISE
2.3 – Unmanaged devices in ACI
2.4 –Configure Shared L3 Services


Section 3 – Data Center Storage Networking and Compute

3.1 – Configure FCoE
3.2 – Cisco UCS Connectivity
3.3 – UCS QoS
3.4 – Service Profiles
3.5 – Configure advanced policies
3.6 – Configure Cisco UCS Authentication
3.7 – Configure Call Home Monitoring
3.8 – Troubleshoot SAN Boot
3.9 – UCS Central Basics
3.10 – UCS Central Advanced configuration & tshoot


Section 4 – Data Center Automation and Orchestration

4.1 – Introduction to scripting in Python / cobra SDK
4.2 – Python Programming with ACI Advanced
4.3 – UCS Director Basics
4.4 – UCSD Advanced Workflows Design


Section 5 – ACI

5.1 – Understanding ACI Fabric Policies
5.2 –Understanding ACI Access policies
5.3 – ACI external L3 connectivity in shared resources
5.4 – ACI L2 bridge / L2out
5.5 – ACI VMM integration

















, , , , , , , , , , , , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Saturday, April 29, 2017

Multicast redundancy: Phantom RP

Past week two weeks a colleague and also a student asked me about Phantom RP and how it works, all was related with a discussion we have around VXLAN Part 2 post and about supported Multicast configurations for VXLAN in NX-OS.

First of all, and in order to avoid further confusions around it, I would resume current supported methods for VXLAN underlay on Cisco NXOS/ASR devices:

Source: Cisco doc

Being clarified that, we can continue with the original purpose of this post.
So, based in our previous post we have configured our Nexus 5K / 7K underlay to run multicast in to support Flood and Learn configuration, by that time we choose Bidir PIM since is the only supported method in N5K. So let's get some background about bidir and how can we make it redundant (can we?)


BiDir PIM


PIM Bi Directional mode enable multicast group to route traffic over a single shared tree rooted at the RP, instead of using different unidirectional or sources tree. Since RP is the root  (his IP address :) ) is good to not to place it on a router but on an unused IP on the network reachable from PIM domain (this will be seen later in PhantomRP configuration).
Explicit join messages are used to establish group membership, Traffic from sources is unconditionally sent up the shared tree toward the RP and passed down the tree toward the receivers on each branch of the tree (note: traffic is not sent unidirectional to RP)

Bidir-PIM shares mechanisms of PIM-SM like unconditional forwarding of the source traffic toward the RP but without the registering process for sources (https://tools.ietf.org/html/rfc7761#section-4.2). Based on that forwarding can take place based on (*,G) entries, removing the need of any source specific state and, therefore, expanding scaling capabilities. This image extracted from Cisco white paper are good to see the differences in upstream process towards the RP in SM vs BiDir:


Source: http://www.cisco.com/c/en/us/td/docs/ios/12_0s/feature/guide/fsbidir.html#wp1023176

"PIM-SM cannot forward traffic in the upstream direction of a tree, because it only accepts traffic from one Reverse Path Forwarding (RPF) interface. This interface (for the shared tree) points toward the RP, therefore allowing only downstream traffic flow. In this case, upstream traffic is first encapsulated into unicast register messages, which are passed from the designated router (DR) of the source toward the RP. In a second step, the RP joins an SPT that is rooted at the source. Therefore, in PIM-SM, traffic from sources traveling toward the RP does not flow upstream in the shared tree, but downstream along the SPT of the source until it reaches the RP. From the RP, traffic flows along the shared tree toward all receivers."


Need of redundancy? Let's do it

We mention that our shared tree is rooted at RP address, so in order to give him redudancy we need a way to duplicate this or use a virtual IP. For bidir pim no traffic is targeted at RP (no control plane functions) so our solution is easier, instead of actually assign same IP in a sort of anycast we can just advertise it thru our IGP, the only issue foreseen is that the actual shared tree should be only one at a given time (we dont want that our RPF interface changes everytime) so in oirder to avoid that we can leverage the path decision to a more specific match in the RIB (by advertising same subnet with largest mask by some of the redundant points).
Well, that was so much talk I think that a code/config snippet worths more than a millon words:


Primary


Secondary (hmm.. if you don't see any difference here is a hint: look at the mask)




Now it's done, you can run your set of favourite verification commands to see if this is working:



Also you can shutdown the active interface (lo1) and see how does this change and our redundancy is working.

For CCIE / CCDE students:
- What is the convergence time of RP in case  of a failure on primary?
- Can we give sub-second convergence?
- In flood and learn configuration for VxLAN what would you recommend ASM or bidir PIM?
- In case of choosing ASM how is your redundancy going to be solved?
- Why are we using "ip ospf network point to point" ?

More on Multicast ASM/SSM/Bidir comparisson: http://lostintransit.se/2015/08/09/many-to-many-multicast-pim-bidir/



, , , , , , , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Monday, April 3, 2017

VCIX6-NV Demo Session! Join us

@ie-bootcamps estará dando una sesión demo del track #VCIX-NV, la misma sera gratuita y libre de acceso!



Trained by: Ariel Liguori (VCIX-NV Certified)
Date: 04-April-2016
Time: 5.00 PM GMT
Duration: 30 mins

Session Details:
Link: https://lnkd.in/fC8VwQU
Password: DEMO

FOR VCIX6-NV Bootcamp:
Website: www.cciehome.com
Email: sales@cciehome.com
Skype: cciehome
Mob: +91 7710910003
For Spanish: +54 911 6530 2520 , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Friday, March 31, 2017

IE-Bootcamps - Launching a new training experience


Finally, after a tough work done by me and the CCIE HOME team, we ended creating ie-bootcamps, the first expert level training company based in Latin America that will deliver courses and bootcamps for the most challenging tracks. We plan to cover America completely, and as a start point we introduce the CCIE DC v2 Lab Bootcamp that would be held at Buenos Aires, Argentina in May 22-26th. I will be delivering the course, so you are all invited :)

More info at: http://ie-bootcamps.com/course/ccie-dc-v2-0-lab-bootcamp/
Or reach me directly

-----

Finalmente tras un largo trabajo de parte del equipo de CCIE HOME y mio, hemos dado a luz a la primer empresa encargada de dar capacitaciones de nivel experto en habla hispana: ie-bootcamps.
Como primer medida, hemos arrancado con la coordinacion del bootcamp de CCIE DC v2.0 a darse en Buenos Aires, Argentina el 22 al 26 de Mayo.



, , , , , , , , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Wednesday, March 22, 2017

Testing boundaries - thoughts before start

As we previous discussed in our previous post regarding what is truly behind the business needs of an AFA box vs the vendor hype that we face, let's assume we have just obtained our technical requirements and we are facing the task of stress one of this boxes. 

For any performance test there are several conditions that must always be present and should be considered, these are my initial thoughts hope you find them useful.

  • Create "significant" traffic: This means not to only stress performance but to use traffic patterns that are representative for you (i.e. workloads similar to the environment in which you are going to put the device under test)
  • Don't forget to measure: An important and also tedious part of the testing is to take notes and write down the results, so always plan your performance scenarios with the idea behind of getting the data exported so you can easily write it down or graph results.
  • Test boundaries: If a device is called to reach X performance, test it. Lets say here X is 100K IOPS @8k bs 70/30 (r/w), so you have to get a way to reach that performance in your infrastructure (generate that workloads). Also there are several considerations to take here, i.e. using one thread with that workload is not the same than running multiple which represent a much more real approach, we will deep into this in an specific post about AFA testing procedure.
  • Tune the environment: This can also be called environment set-up, be sure to have your underlay infrastructure ready with best practices and no issues to be sure that the test you're running is not getting affected by any other factor than the test procedure itself.
  • Automate as much as you can: Doing testing can be tough, imagine that you have to re-test changing few parameters, applying new versions, etc... impossible.. so get an automated approach to set up your test and even shoot them and plot the results in a fancy way.
  • Understand what you're doing: Testing is not about running a workload and seeing if performance is good or not, or at least it shouldn't be. The whole purpose behind a testing procedure is to understand how the under-testing device react at stress conditions an under normal plus similar to real ones. Also to notice how does internally works and how behaves under changes... this guided me to the next bullet.
  • Resiliency: So you have tested and all seems perfect, performance is outstanding and testing is going well... but have you tested how does this behaves under unexpected and planned changes? Resiliency is key to production environments since it not only gives you an overview of how high availability is performed (which can be important and most for production environments) and also on how doest the systems react to this changes (you can be easily surprised by well-known vendors running in panic mode after switchovers).
  • Plan the tests accordingly: If you're running a PoC or a performance test you will do a lot of work for preparation and setup of the environment, this can involve doing changes in physical network to test HA, clusters, or other functionalities, you will lost lot of time by changing lot of times so is really important to order the test plan accordingly meaning to do the minimal changes necessary and in every change do the maximum amount of task prior to next change, this will save you lot of time.





, , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Sunday, March 19, 2017

All Flash Array: vendor hype vs business needs

Past month I was involved in a project to test the performance of several All Flash Array boxes, let's call them AFAs, since it is expected to start delivering new class of services (and SLA?) to customers.

Being in a R&D team gives you the opportunity to know the whole part of the story, you start with sales pitch, then the pre-sales *not-so-technical and at the end you break a box in a PoC and you end with real engineers which explains you the details of his architecture and why they don't support what you just test (but it was on the sales pitch, right?)

What I do want to remark here is the relation between vendor hype and business needs. AFA can brings you an outstanding performance in amount of IOPS, latency, compression, and so on... but the point is, what do you really need?. In any architecture design you are supposed to deliver a solution that mets technical requirements + business requirements/needs.  In AFA this could be quite tricky since if you don't have a clear understanding on your business needs you're going to be pushed to an unfair or un-precise comparison.

So, what about technical requirements?

I've faced two kinds of scenarios for the AFA deployment. The easier was the deploy of a new infrastructure aimed to fulfill specific requirements for an application suite, this is always the best case scenario for an architect since you can gather technical requirements easily including not only actual requirements bur also a forecast for upcoming demands. As you may know there are not all pink elephants, and the other well seen scenario is to move a current deployment to an AFA solution.
In the latter case you have to take a huge amount of considerations in order to do the planning accordingly, I can summarize the following:

  • Amount of IOPS: This is something that you can see a lot, and is completely wrong, since what is 1K IOPS? Which block size? which read/write ratio?
  • Amount of IOPS based on a given IO distribution which includes block size and read/write ratio of each of those. Getting this numbers can be significant hard, most of current arrays have the information of all the workloads they had run, this is amazing for getting the IO distribution per block size plus rd/wr ratios but assumes that you are running similar workloads all day long (i.e. consistent distribution of a given set like 30K IOPS @4k 65/35, 15K IOPS @8k 60/40.. and so on, but what about the nightly jobs? when your performance gets affected with backup jobs?)
  • Expected  and maximum latency: Based on application + OS/Guest OS needs 
  • Expected compression ratio (for your data set!)
  • HA and expected performance in contingency
  • Network based (NFS), IP based (iSCSI) or FC? 
  • Disk replacement policy and MTBF/MTTF


Ok, but where is the vendor Hype?

Well, quick answer is everywhere you get a sales/pre-sales engineer talking, but to be targeted to the topic I've found solutions that claims in a  single Box 100K IOPS @32K block size.... easy math here is to ask them how many IOPS did they support in the case of 4/8K if they reach that at 32, also how many bandwidth did they expect?


How can I test performance an avoid being seduced by pink elephants?

When we begin the test process we aim at a huge amount of VMs with RDM to the array but later we figured it out that there is no way to do such, with few VMs with lot of disk in each you can easily setup a quick test.
For the testing procedure and consideration IDC has written a good recap in theirs "All - Flash Array Performance Testing Framework" and also there is a Tool made by EMC engineers to test arrays called "AFA PoC Toolkit" which setups a few VMs under vmware host, connect the host to the LUNs of the array then makes RDM and set up VDBench on the VMs.
I do recommend using this approach by changing the parameters on VDB files to meet your requirements. Also there is one caveat with that Toolkit and is that in only runs with EMC XtremIO, I've made several changes to be able to run it against PureStorage boxes and i'm doing the changes to be able to test against SolidFire too. In a later post we will discuss design considerations for an AFA platform and the testing tools plus results for each vendor (We're testing PureStorage, EMC XtremeIO, NetApp SolidFire)





, , , , , , , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Tuesday, March 14, 2017

Worth Reading: Verizon SDN-NFV Reference Architecture

Last week I've found this reference architecture and just ended reading it, it's huge but it really worth it (specially for me which i'm working in an SDN NFV Reference architecture myself)

http://innovation.verizon.com/content/dam/vic/PDF/Verizon_SDN-NFV_Reference_Architecture.pdf

 Enjoy! , , , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Thursday, March 9, 2017

VxLAN Deep Dive Part III: Flood and Learn



Rainy day ideal for continuing with the series of VxLAN :)
  • Part 1: Let's overlay -- Basic info about VXLAN, addressing and headers.
  • Part 2: It's all about knowledge -- Packet forwarding overview, VTEP control plane learning options
  • Part 3: Hands On #1 -- Configuration on Cisco Nexus Devices, Flood and Learn.
  • Part 4: Hands On #2 -- Configuration on Cisco Nexus Devices,  EVPN.

Today we will focus on config, the funniest part of any IE track. In part 1 & 2 we cover fundamentals, now we understand how VxLAN works, how many addresses can we get and the different options of advertising MAC/IP information to peers. In this case we will start with Flood & Learn, I choose this one not only for being the first adopted but also to be the *poorly* documented on the web.

Note: If you want to find any VxLAN config info in NXOS I encourage you too look under N9K, since latest release of 7K will do the same and under 7K you will not find anything :)

Recommended reading: I've also mentioned this in the prep for DC IE track, but if you don't have the chance I really recommend Cisco Live Presentations, in this case BRKDCT-2404: VxLAN Deployment Models

Let's use this topology:

First we will define each component in the network:
H1 / H2 would be hosts in different VLAN, for this example VLAN 101 and 102.
L1 - L4 would be Nexus 5K running as VXLAN L2 GW.
L5 - L6 would be Nexus 7K running as VXLAN L3 GW
S1 / S2 will be the underlay L3 core between all the nodes, the L3 cloud also supporting multicast.


IGP + Multicast cloud

First things first, we need basic IGP reachability between our nodes plus multicast reachability. For multicast we are going to use static mapping for the RP and, in order to bring redundancy, we will deploy PhantomRP, of course you can use any other way that you want but, for the purpose of the example, static mapping is the simplest option. Here is the snippet of config in each device


L1 - L6



S1 / S2



Also the config for IGP is omitted here, since you can run whatever you want (also static routing right? yes! but lot of work), in our case we simply setup OSPF in area 0 and point to point interfaces in each link, the only consideration was the MTU, as you may recall from previous post of this series you will need to tune up MTU to be able to send a VXLAN packet inside, you can do the math by yourself (remember VXLAN adds an additional 8 byte field...) but lets assuming that anything beyond 1554 1600 would be ok.


Just a minor note on Naming conventions

Before taking any other step further we will need to clarify some naming conventions that cisco has impossed to us :)

If you recall correctly each host in the VXLAN network has a VTEP which establish the tunnel and is , in the end, the responsible of taking in/out encapsulated packet to our vxlan network... Well for Cisco is quite different, since they don't have any host here messing around with VXLAN for them VTEP is the device responsible for the encap/decap process, but if VTEP is the device how do I configure the interface on that device in which this magic occurs? and here we come with NVE (Network Virtual Interface) which is the logical interface where the previous mentioned magic of encap/decap actually takes place.


VXLAN L2 Gateway

Well here the fun begins, at L2 of course. I've stated the concept of VXLAN L2 GW and thats because it is so important to understand how it works and what it does, the L2 keyword is key to understand this.
Let's asume you are running a legacy network, with VLANs in place and you want to integrate to a new design with VXLAN. Of course that you want not just to allow inter-routing between them but also allow to place some hosts in your new network in the same segment as the legacy one. Nice scenario... but how we can accomplish this?

Essentially what we need is a device capable of taking L2 frames, let's say tagged since I state that we've VLANs in place, and put them in the same bridge domain with the VXLAN traffic. Using different words to say the same what we want to accomplish is to put VLAN traffic into the VXLAN VNI associated to that traffic and be able to flood interact between them.

Here I will post two suggested options for config, one at the interface level (i.e. config direct on interface that get the tagged frames) and one at the switch level (what will use in our 5K), Cisco refer to this config as VSI (VN-Segment service instance) and VLAN modes.

VSI CLI Mode (L1/L2)

For this config let's assume that L1/L2 leafs gets the frames on H1/H2 ports on a trunk interface and what we want to accomplish is the vlan-to-vni mapping.




VLAN CLI Mode

This mode is much more easier, and that is because what we do is just an association between a vlan and a vn-segment.






VXLAN L3 Gateway

So basically we cover full L2 reachability between VXLAN under same bridge domain, but what about inter VXLAN routing? Seems pretty obvious that we will need to add a gateway with interfaces in the interesting VXLAN segment to route among them and also like SVI we do have a BDI (Bridge Domain Interfaces). We will associate that BDI with our VNI and assign addressing to it in order to be the VXLAN gateway for that segment and also be able to route outside (also to other VXLAN segment if we have)


So if we dig into config we can see is pretty straight forward since the magic already happened, what that means is that we do the hardest part that is the association between BD-VNI or VLAN-VNI, so only thing left id to create the associated interface (SVI in VLAN mode or BDI in VSI mode) with the addressing and thats all:



Note #1: See, only "new" stuff here is that we need to run PIM sparse mode, can you guess why? ping me!
Note #2: Yes, I'm using a VRF since it's also best practice to run tenant traffic into a separate vrf, this is pretty common in real life deployments (not flood and learn, VRF isolation i mean :) )

Now lets consider some design caveats and the first thing in mind for everyone here is HA so...

Redundancy? VXLAN L3 GW + HSRP? What about VPC ?

Yes, Yes and also Yes. Of course you can run HSRP on top of BDI/SVI but the not so easy part here is VPC. As you know VPC provides MAC state sync between peering devices and if you've opted to HSRP redundant VTEPs share anycast VTEP IP address (underlay). This way VPC provides L2 + L3 redundancy in HW, to be able to do this some changes needed to be done to loopack interface used for VTEP sourcing and to VPC Domain config, as a starter point first thing to recall is that a secondary ip address should be shared between VPC peers in order to forward VXLAN packets that can be handled for any of the peering devices, does that remember you something?... Yes, we also need peer-gateway under VPC Domain. Based on what we just describe peer-gateway is mandatory and also requires a SVI configured with PIM across peer link, a list of requirements can be found here:

http://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/6-x/vxlan/configuration/guide/b_Cisco_Nexus_9000_Series_NX-OS_VXLAN_Configuration_Guide/b_Cisco_Nexus_9000_Series_NX-OS_VXLAN_Configuration_Guide_chapter_010.html#concept_C769B7878CE2458E98657905843DEEFA

But what you can never forget is:

  • Unique primary IP for underlay loopback
  • Same secondary
  • PIM
  • Consistent VNI to multicast group mapping
  • peer-gateway and a "special SVI" (PIM enabled) // This is needed in case your leaf lost connectvity to Spines and needs to forward packet to peering device.

Those are key for me but in the doc you will find a lot of more information, also something to mention is that on Nexus 5K VXLAN VPC configuration requires the use of that special SVI for VXLAN traffic by issuing a special command:




Distributed Gateway

Well, this is a huge post... we will talk about distributed GW, anycast and that stuff in another one since if not this will take me a lifetime and I do want to explain it in detail :)

Stay in touch :)

, , , , , , , , , , , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Tuesday, February 28, 2017

CCDE, be a chameleon



After passing DE want to take some time and write this words to all futures aspirants and also for those who failed and also pass to see their point of views regarding CCDE. I was thinking about strategy in lab and mind strategy as well and I realize that being a chameleon is one of the important aspects that you will need to address in order to be successful in this path.


Disclaimer: This post is intended to be a wrap up of my experience to succeed in lab, this study methodology and all suggested here is based on my background and available times on week for studying, feel free to use it and adjust it to you own pace.

Being said that...

Get yourself use to read and read

Asumming that you have passed written and you feel like all the theory needed is covered I can assure you, that's not true. Even if you have passed the written you still need to read and re read technology and also be able to understand pros and cons of each design. A general rule that I can say is that to any desig you will need to be able to find the pros and the cons, if you can't you're just being shortsigthed and you're missing to cover all points in a given design. For this aspect is also key to get a group study, so let's talk about this one too.


Be in a group and collaborate as much as you can

Some guys claim that studying alone is the way that best suits them, that can be great if you're a giving a IE level exam or any written, but for DE is completely the opposite. I do really like to do this comparison and say that if you study alone you're just a vlan with a polarized HSRP gateway, despite of the nerd comparison what I do really mean is that you can't argue anyone! and that is not good at all, since I don't expect guys that argue for everything but I do expect people that say their point of view and discuss also why they have thought that caveat or pro in a proposed design. And remember discussions are a good thing, just like other good stuff don't abuse to them and use it carefully.


The methodology that we use in our group was to check one scenario every week, we set up call on weekend and on week we study by ourselves. You can use the scenarios in Orhan and Martin Duggan book, but since we run out of scenarios what we made is split ourselves in group of two and make modifications to create a specific scenario that cover some specific design concern, I do recommend you that in this "split" you choose to create scenarios based on your areas of expertise (in my case it was SP and DC :) )

My study methodology prior to lab was aprox 6hs per day (Monday-Friday), and 4hs sessions on Saturday. This by two months... and assuming that you have cleared written (i cleared mine long ago) and you have good design expertise (if not i will recommend you another 2 months of study), in my case I have near 13 years of expertise working on huge companies (like design scenarios hehe) and last 6 years on a Service Provider (who divest itself into two, nice scenario for CsC we made in real life!)

For lab strategy I can summarize my key points, some of them you will find it along a lot of posts, but this was really useful for me:


Color scheme for highlighting

I do really quick reading and that has a caveat, you can omit stuff, so be sure that you read all of the sentences and also take a color scheme to highlight information. I choose a real simple one: Green is good design option taken, Red is bad design or caveat on design, Yellow is constraint or requirement. I want to use pink for IGP info but i realize that info is so mixed that in second scenario I back to the roots and only use those three.



Be a chameleon

This is the mind status that you have to reach in the exam, you will have to be a chameleon to read all requirements and constraints and be ready to transform yourself in the designer of ABC Company to take the best choice, But inmmediately , in next scenario you will transform yourself into a Service provider architect being part of team who is evaluating X or Y technology. All this happens really fast and you have plenty of info around you to support you in this transformation, what you need is the ability to quickly focus in the job role that they have assigned to you, gather the info and take the best shot.



Suboptimal routing still works right?

I do really love to do comparisons of real life and technology and in this case I can think in suboptimal routing as a wrong (or not so good) branch in exam. If you have readed all Cisco Live material of DE you may be aware that in scenarios there is always several branches based on your selections, so even if you're not in the best branch you can get points (STOP! if you're not reading Cisco Live info for DE take a moment to review that, is key to understand exam flow). So in the time frame of an scenario the branch will become your life, so being able to manage bad decissions is important, because you will realize that you don't choose the best path (trust me, it happens) but they will provide you options to understand why you make that choice and you will have to choose the not so good and not so bad option to be able to continue, so don't get worried suboptimal routing still works :) and we know next time you will not face same mistake :)


Hope this help all the next aspirants, those new and those not so new. Also for those which have passed lab, congrats! and inputs are always welcome.



, , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Friday, February 24, 2017

CCDE... happy ending

Well guys, you may note that I was out of blogging for a while, and there was a reason behind that... CCDE was so time consuming, I spend at least 4 to 6 hours last two months with a my awesome study group debating scenarios, design choices and pros/cons of each of them... it was real hard, it requires ****a lot**** of reading but definitely worth it!! Past Feb 22nd I pass the lab, it was at my first attempt and I feel very lucky about that but it was really about training, reading, understanding and debate with my colleagues what make me succeed, I will write down a post with strategy and training used as well, but at this moment just want to take me some time to write down this line and congratulate all my friends and colleagues who have taken the CCDE lab, passed or not, was a real good experience and an amazing learning!



Keep you updated guys :)
CCDE 2017::27


, , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Friday, January 27, 2017

VXLAN Deep Dive Part II: It's all about knowledge

Before starting with second part of this post and in order to calm down anxiety will briefly describe what this series is going to cover:


  • Part 1: Let's overlay -- Basic info about VXLAN, addressing and headers.
  • Part 2: It's all about knowledge -- Packet forwarding overview, VTEP control plane learning options
  • Part 3: Hands On #1 -- Configuration on Cisco Nexus Devices, Flood and Learn.
  • Part 4: Hands On #2 -- Configuration on Cisco Nexus Devices,  EVPN.
  • Part 5: NSX Overview

So if you're interested in any other topic that you think is not going to be covered kindly ping me and will add.

After all that prelude I think we can start. As we see in Part I, we cover the header added into the original frame in order to be forwarded into an L3 network and also we end the post by giving an overview of packet forwarding. In order to reference later here is a pic of a VXLAN packet:

Figure 1: VXLAN Packet header

In later post we just reach a point in where host (hypervisor or device with VTEP, I will use any of these indistinctly) get a packet (VXLAN) which is not local and need to be delivered. Let's think like any L2 forwarding plane, we need to know where to route/send out this packet, this process is made by a lookup made by the host based on DST MAC Address of Original L2 frame (see picture above) and based on that we should get a destination port  (hehe no L2 switching ) destination VTEP Address. This post would cover different methods of learning and populating this internal table, and as usual for forwarding it's all about knowledge.


VxLAN Flood and Learn

This scenario was the first introduced, it relies in head end replication, meaning that end host in case of not having any entry for the destination MAC address will send out an ARP to other devices / VTEPs in the VXLAN network. This is done by sending the request to the VXLAN multicast group for this Bridge domain, remote VTEPs will get the packet and answer accordingly direct to the originating VTEP (Here we can be aware of two requirements for running this: multicast core, IGP or unicast reachability between VTEP Addresses)

Figure 2: VXLAN Peer Discoveries and Tenant Address Learning

I will base the explanation using this amazing pic that I just stole from cisco web page :)
  1. End System A (ES-A) sends out an ARP request for IP-B on its Layer 2 VXLAN network (note the Dst MAC Address).
  2. VTEP-1 receives the ARP request. Since he doesn't have a mapping for IP-B yet, it encapsulates the ARP request in an IP multicast packet and forwards it to the VXLAN multicast group for that specific segment (VNI). The encapsulated multicast packet has the IP address of VTEP-1 as the source IP address and the VXLAN multicast group address as the destination IP address.
  3. The IP multicast packet is distributed to all members in the tree, VTEP-2 and VTEP-3 receive the encapsulated multicast packet because they’ve joined that specific VXLAN multicast group, after that they decapsulate the packet and forward it locally to the local VXLAN network. In this process, if no prior communication was made between VTEP-1 to them, they insert into his local tablet the mapping between Mac Address of ES-A with IP of VTEP-1.
  4. After the local transport of ARP, End System B (ES-B) gets the request forwarded by VTEP-2 and responds with its own MAC address (MAC‑B), and learns the IP-A-to-MAC-A mapping.
  5. VTEP-2 receives the ARP reply of ES-B that has MAC-A as the destination MAC address, as per step 3 he knows about MAC-A-to- VTEP-1 mapping and therefore it can use the unicast tunnel to forward the ARP reply back to VTEP-1. The ARP reply is encapsulated in the UDP payload of a packet sourced from VTEP-2 and destined to VTEP-1.
  6. VTEP-1 receives the encapsulated ARP reply from VTEP-2. It decapsulates and forwards the ARP reply back to ES-A, also it learns the IP address of VTEP-2 from the outer IP address header and inspects the original packet to learn MAC-B-to-VTEP-2 IP mapping.
  7. Subsequent IP packets between ES-A and B are unicast forwarded, based on the mapping information on VTEP-1 and VTEP-2, using the VXLAN tunnel between them.
  8. VTEP-1 can optionally perform proxy ARPs for subsequent ARP requests for IP-B to reduce the flooding over the transport network.

Head-end Replication

When you are working with VXLAN and reading literature also is common to hear or read the concept of head-end replication, what this essentially  means is that the local VTEP has the overhead of replicate the broadcast traffic out to the other VTEPs, in the original release of VXLAN which uses multicast as underlying layer to reach VTEPs this only means encapsulate packet and sent out to multicast group, but also there is the possibility of have unicast peering (full-mesh) with all the VTEPs and in this scenario the head-end replication has a notorious impact.


Figure 3: Head-end replication example in unicast VTEP reachability


VxLAN MAC Distribution

Another well know method is VXLAN MAC Distribution, head-end replication is still used to deliver broadcast and multicast frames to remote VTEPs, but.. what about unknown unicast? You shouldn't have any (wish, read further). In this scenario MAC learning is not based on data plane activity and instead of that we have a central control unity (Nexus 1000V VSM, NSX controller, etc) which is used to keep track of all MAC addresses in the domain and send this information to the VTEPs on the system. Why do I say that this is a wish? Basically things are there to be broken, just like anY mapping table (CAM i.e.) entries have an aging associated to it, so if in first scenario VTEP-2 announces MAC-B entry through it and VTEP-1 gets populated with that all traffic will flow accordingly and VTEP-1, if doesn't have an entry for MAC-B, will query controller to get this info. Here two branches appears, a) controller has an entry and reply back to VTEP-1, entry gets installed and unicast traffic flow; b) controller doesn't have an entry for MAC-B and reply with an invalid entry so VTEP-1 must use head-end replication to reach learn where to send his packet (*this may vary depending on VTEPs OS/SW implementation). 
Also there is another case in which VTEP-1 has a valid entry but it lost connectivity to controller and that entry gets old (and removed from table), in this case controller can't be queried and head-end replication will be used again.



VxLAN BGP EVPN Control plane

Quick disclaimer: Before starting with this I will say that you will find a lot of literature for this approach, also a lot of information regarding configuration to make this possible. This is the desired scenario for any real / production environment, Flood and Learn was showed just to understand what we got in the beginning and how we came up with a real control plane solution (and in a standard fashion way!).

EVPN overlay specifies adaptations to the BGP MPLS-based EVPN solution to enable it to be applied as a network virtualization overlay with VXLAN encapsulation, essentially this bring us great benefits (I will add more later):
  • Standardized solution: BGP plus VxLAN
  • Real Control Plane learning

For this approach what we made is (for MPLS EVPN knowers):
  • VTEP/network virtualization edge (NVE) is the equivalent to PE node
  • VTEPs use control plane learning/distribution via BGP for remote MAC addresses instead of data plane learning.
  • Broadcast, unknown unicast and multicast (BUM) data traffic is sent using a shared multicast tree.
  • In order to reduce the need of full mesh between VTEPs we can rely on BGP route reflector (RR)
  • Enhanced security by using well known Route filtering and constrained route distribution (control plane traffic for a given overlay is only distributed to the VTEPs that are in that overlay instance).
  • Host (MAC) mobility mechanism to ensure that all the VTEPs in the overlay instance know the specific VTEP associated with the MAC
MP BGP could be used for L2 VXLAN and also for L3 VXLAN (instead of Mac addresses learning think of IP association to VTEPs, do you remember LISP?). It's not my goal to enumerate all the benefits of running BGP EVPN control plane for VXLAN, apart of greater scalability, well known and proven protocols, etc. instead of that I will focus in the life of a packet in this new scenario and hopefully in next post we can cover all the variations for this (anycast GW, asymmetric. symmetric IRB, etc)


Packet forwarding in L2 VxLAN Segment



In this scenario we are covering L2 VxLAN communication, Host-A and Host-B belong to same VNI: 30000.
  • Host-A sends traffic to his local VTEP V1 (post ARP resolution), DST MAC B.
  • V1 will lookup in his table for an entry for MAC B.
  • V1 has an entry for MAC B thru VTEP V2, it encapsulate the packets and unicast send to V2.
  • V2 gets the packet, decapsulate and locally deliver to Host-B


End of happy tale, right? What about L3 traffic between VxLAN (see that we didn't cover this in flood and learn, since in that approach traffic should reach a device with the two VxLAN segments involved and logically route)


Packet forwarding between different L2 VxLAN VNI

In this scenario Host-A (VNI 30000) sends packet to Host-F (VNI 30001), core network is using VNI 50000, based on that the process is similar to:
  • Host-A sends traffic to DG (post ARP) which is configured on the locally attached VTEP V1.
  • V1 make a FIB lookup based on DST IP
  • V1 routes the packet to VTEP V2, but VXLAN packet is using core VNI 50000.
  • V2 gets the packet, it decapsulates, made a FIB lookup determining that DST VNI is 30001, rewrites the packet and deliver locally.


Ok, so now what about the ugly tale? As you can see I made two examples saying "this happens post ARP resolution", but how do we process ARP?

There is an ARP suppression mechanism, essentially, the IP-MACs learnt locally via ARP as well as those learnt over BGP-EVPN are stored in a local ARP suppression cache. ARP request sent from the end host is trapped at the source ToR and a lookup is performed in the ARP suppression cache with the destination IP as the key. If there is a HIT, then the ToR proxies on behalf of the destination with the destination MAC. 
In case the lookup results in a MISS, when the destination is unknown or a silent end host, the ToR re-injects the ARP request received from the requesting end host and broadcasts it within the layer-2 VNI. This entails sending the ARP request out locally over the server facing ports as well as sending a VXLAN encapsulated packet with the layer-2 VNI over the IP core. This follows same process that we saw but only difference is that at reply the ToR will store the MAC binding in his ARP supression cache for further usage







, , , , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Tuesday, January 17, 2017

Let's Overlay: VXLAN Deep Dive - Part I


I was getting a lot of technical questions regarding VXLAN and overlays, how did they work, how you can configure them, etc. So I always think that is better if we can share all of that to all of you instead of reply each of you separately.

Let's start with a quick definition on overlays. Overlays, as the name mentions, allow us to reach different points in network without the need of caring about the under layer, you can say "OK, so it's all about tunnels", well I will not lie to you, there is a lot more behind that (address replication, control plane, ARP resolution, etc) but you can start with that vague idea (in coming posts I promise that you will get a better picture of it).

Being said that, one of the most used overlay technologies used and spreaded is VXLAN, which is designed to provide the same Layer 2 network services as VLAN does, but with greater extensibility and flexibility. How do we achieve this? Key aspects to understand are:

  • VXLAN uses a 24-bit segment ID known as the VXLAN network identifier (VNID), which enables up to 16 million VXLAN segments, this allow us for higher scalability and multi-tenancy
  • VLAN based designs uses STP in the back to choose best path, VXLAN uses MAC-in-UDP encap and in consecuence he can take advantage of the underlying network (Layer 3) and can take complete advantage of Layer 3 routing, equal-cost multipath (ECMP) routing, and link aggregation protocols to use all available paths, this provides us better use of resources.

VXLAN Packet format

As we mention, it uses MAC-in-UDP encapsulation to provide a means to extend Layer 2 segments across the data center network. The encapsulation scheme used places the original Layer 2 frame with a VXLAN header and  then placed in a UDP-IP packet. With this MAC-in-UDP encapsulation is easy to think in tunneling VXLAN across L3 networks, a great and easy way to see this is in the following packet format:


As seeing in the picture, VXLAN introduces an 8-byte VXLAN header that consists of a 24-bit VNID and a few reserved bits. The VXLAN header together with the original Ethernet frame goes in the UDP payload. The 24-bit VNID is used to identify L2 segments and to maintain L2 isolation between the segments. 

Also by seeing picture is now easier to get the idea on how tunneling can work with this frame, tunnels are formed between devices which want to exchange VXLAN data and for that only thing needed is destination IP (in Outer IP Header) also once reached we have to do the hard work that every tunnel does encap/decap, and for that we have to introduce other player that is the VTEP or VXLAN Tunnel Endpoint.

VXLAN Tunnel Endpoint

VTEPs are essential players in VXLAN work, as we mentioned earlier their role is similar to any tunnel endpoint (encap/decap) but we will explain in detail how this occur. I would use this simple scenario to do some explanations:


VTEP has presence in a Local LAN segment and has a defined mapping from that segment to a VNID. The encapsulation process consist on taking the L2 Frame sended by any of the End Systems on the local segment, add the VXLAN header with the corresponding VNID, add UDP header, and add Outer IP header (with destination IP for the remote VTEP where we want to send out our packet, if you are now thinking how this entries get populated and how arp is handled you have to wait for Part 2 :) ). Once packet arrives remote VTEP, decap process start by stripping off VXLAN Header and identifying, based on VNID, the local segment in which we have to deliver out our packet.

Being said this the following is pretty self-explanatory (if you follow me, if not here you have a nice picture):



In this figure, Host A is sending out a packet to host B, his associated VTEP, VTEP-1, has an entry for Destination MAC-B in his table and his referring to IP of remote VTEP VTEP-2, he also has the info of the VNID assignment for Host-A, VNID 10. Based on that VTEP-1 has all the info that he needs to encap the packet and deliver it out to VTEP-2. Once packet reach VTEP-2, gets decapsulated, based on VNID is associated to LAN segment where Host-B lives and packet is sent out to the destination (also on VTEP-B and switches if any there know how to reach MAC-B :) )

I think we are good for today, next post in this series will cover how BUM traffic is processed, control plane options and config scenarios. If you want to see any in particular, just let me know.





, , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Friday, January 13, 2017

About Writtens and CCIE DC written prep


I'm constantly been asked of several things about certs, and particularly about written exams. In this post I will try to summarize two things, one is my personal belief and the second one is the study method / reading list that I followed up.

So, lets start by saying a bad word... dumps. Lot (and i mean a LOT) of people asks me "which dumps have you used to clear XYZ?", "I'm waiting till dumps get more accurate" and so on... My answer is always the same: "you can do whatever you want, but think if time spent by reading questions worth it", how much time do you spend by reading a series of 2^20 Qs? And by memorizing them? And it gets better, do you enjoy that process? If you do, just skip this post since you're not going to need the book list and start thinking if you can memorize cards in a casino, maybe you get a bigger reward ;) if you don't... don't feel silly, we are at least two guys (And believe me I know a lot of people who enjoy the process of reading books instead of memorizing Qs)
Also, another thing that i really want to point out about this is that I know that Cisco is working really hard to overcome dumpers by releasing new Qs everyday (maybe that is too much, but they are working on it, believe me)

Well, after all the introduction setted up (feel free to comment if you agree or not), I will post my study methodology for writtens and my book list for DC (v1, sorry guys.. I will post the books readed for v2 but for written i took v1 in beggining of 2016).

Study methodology

  1. Read the blueprint at cisco cert page and topics included in written (https://learningnetwork.cisco.com/community/certifications/ccie_data_center/written_exam/exam-topics)
  2. Start by identifying the topics that you (think) master and those you definitely don't.
  3. Mix reading of topics you don't know anything of with those that you do, this is key to avoid being overwhelmed by new stuff (your brain will thank you)
  4. Always take notes! For CCIE written exams and similar certs you have to note down those concepts and configuration maximums/limits that you will never remember in real life! 
  5. To use as a guide, be sure that you have a section on your notes for each protocol that is in the blueprint, i.e. for FabricPath you should have at least this info:
"Running per Supervisor Engine, on a per-VDC basis:   FabricPath IS-IS - SPF routing protocol process that forms the core of the FabricPath control plane
   DRAP - Dynamic Resource Allocation Protocol, an extension to FabricPath IS-IS that ensures network‑wide unique and consistent Switch IDs and FTAG values
   IGMP - Provides IGMP snooping support on FabricPath edge switches for building multicast forwarding database
   U2RIB - Unicast Layer 2 RIB, containing the “best” unicast Layer 2 routing information
   M2RIB - Multicast Layer 2 RIB, containing the “best” multicast Layer 2 routing information
   L2FM - Layer 2 forwarding manager, managing the MAC address table
   MFDM - Multicast forwarding distribution manager, providing shim between platform-independent control-plane processes and platform-specific processes on I/O modules

Global components that run on each of the I/O modules, processing forwarding information from each VDC and programming it into the I/O module hardware:
   U2FIB - Unicast Layer 2 FIB, managing the hardware version of the unicast Layer 2 RIB
   M2FIB - Multicast Layer 2 FIB, managing the hardware version of the multicast Layer 2 RIB
   MTM - MAC table manager, managing the hardware version of the MAC address table"

Well I think that finally a book list is expected:

  • NX-OS and Cisco Nexus Switching: Next-Generation Data Center Architectures, 2nd Edition
  • I/O Consolidation in the Data Center
  • Storage Networking Fundamentals: An Introduction to Storage Devices, Subsystems, Applications, Management, and File Systems
  • Cisco Unified Computing System (UCS) (Data Center): A Complete Reference Guide to the Cisco Data Center Virtualization Server Architecture
  • Policy Driven Data Center with ACI, The: Architecture, Concepts, and Methodology
  • Cisco Live docs, don't you use that? You're missing a GREAT resource
    • BRKDCT-2404 VXLAN Deployment Models - A Practical Perspective
    • BRKDCT-2370 - Intermediate - End-to-End Application-Centric Infrastructure Automation with UCS Director
    • BRKDCT-2049 - Overlay Transport Virtualization
    • BRKDCT-3237 - Advanced - Versatile architecture using Nexus 7000 with a mix of F and M modules to deliver FEX, FabricPath, Multihop FCoE, MPLS and LISP all at the same time 
    • BRKDCT-3145 - Advanced - Troubleshooting Cisco Nexus 5000 / 2000 Series Switches 
    • BRKDCT-3378 - Advanced - Building simplified, automated and scalable DataCenter network with Overlays (VXLAN/FabricPath)
  • Also lot of blogs... I will write down my RSS feeds soon :) there is really smart people near us :)
, , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "