Living in the underlay

Mainly Networking, SDN, Automation, Datacenter and OpenStack as an overlay for my life

Sunday, May 21, 2017

TSHOOT Tips: ELAM Usage on Cisco ACI

I was using this quite lot past weeks and think that is a good resource to share to everyone playing around with Cisco ACI. When it comes to tshoot and to understand packet flow inside the Fabric ELAM is a great tool.

So, what it is?

ELAM stands for Embedded Logic Analyzer Module, It is a logic that is present in the ASICs that allows us to capture and view one or more packets, that match a defined rule, from all the packets that are traversing the ASIC. ELAM is not new at all, some of you can remember this from CAT6500, and thats ok, same logic also same from N7K (for the youngest?).

and... whats new?

Essentialy the concept is still the same, an we just need to focus on understand how is the architecture inside the ASICs on Leafs and Spines to fully apply this concept.

Cisco ASIC data path is divided into ingress and egress pipelines where two ELAMs are present (see figure) at the beginning of the lookup block.



As we can see in the picture Before we can use ELAM to capture a packet, we must be sure that the packet is sent from the BCM ASIC to the Northstar ASIC. ELAM operates only in the Northstar (for leafs, on Spine takes place on Alpine), so any packets that are locally switched in the BCM ASIC will not trigger the ELAM, this is important since in some scenarios the packet will not reach Northstar and will not trigger an ELAM event (we can cover this in a future post about PL-to-PL traffic on ACI fabric :) )

So, assuming that our traffic will be processed by Northstar we need to configure our ELAM instance, first of all is good to know which kind of rules can we configure based on the pipeline, this is also referred as "select lines" and the following are available:

Input Select Lines Supported 
3 - Outerl2-outerl3-outerl4
4 - Innerl2-innerl3-inner l4 
5 - Outerl2-innerl2 
6 - Outerl3-innerl3
7 - Outerl4-innerl4 

Output Select Lines Supported 
0 - Pktrw 
5 - Sideband

With this in mind we can configure our ELAM instance, first of all is always good to have an image to understand the whole process of what we need to do:


Where on INIT we choose the ASIC and pipeline in which the capture should take place, CONFIG refers to the proper configuration of the rulo to match the packets, ARM is like arming the bomb :) but in this case we arm our packet capture to be triggered once the rule defined on CONFIG section has a match, after this READ the captured data and RESET to start over :)

Now lets dig into the packet capture, we will refer to this topology for the capture.

ELAM Example


This image is extracted from a Cisco Live presentation of ELAM but we will focus on LEAF4 only, traffic will traverse from VM1 to the EP at the right going toward Northstar (at 1) and this example is also useful to show how this behaves on Alpine. 

We will arm the ELAM on Leaf 4 to capture a packet coming from EP1 (the one at the left side, directly connected to Leaf1). In this example we show use of in-select 3, which means the fields we can match on or outer L2, L3, or L4. We show also the out-select of 0.



This will work for basic ELAM packet capture.As we mention we need to configure (CONFIG section of the ELAM) 1 aspect of the trigger to match on. For this example we will use the SMAC of the locally attached endpoint:





In order to see the ELAM state the status command can be used, esentially three different status can be found:
- Triggered: indicates that a packet has been detected as matching the trigger, and that packet is available for analysis. 
- Armed: it means that that no packet has been detected as matching the trigger yet, and ELAM is actively looking at packets for a match to the trigger.
- Initialized: the ELAM is available for triggers to be configured, or to be armed with the start command. It is not currently attempting to capture a matched packet. 

Once ELAM is triggered, the packet can be viewed for analysis with the report command. The report will show the relevant header fields in the packet (note that will not show the complete payload of the packet), once this is done we can restart the process with the reset command.



This is pretty much all for a good start on ELAM usage for ACI, more info is available at N9K config guide and a good resource as well is the Cisco Live Session BRKACI-2102, from which I already took some images for this post.

Hope you enjoy and next time maybe I found some time to start the amazing post of PL-to-PL traffic on ACI Fabric.











, , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Thursday, May 18, 2017

Stretched DC, really?... ok, for L3, BGP conditional forwarding

A long time ago (I think it was years back) I was reviewing a DR solution for some internal customer who has two datacenter and a DCI between them (dark fiber). They moved initially to a stretched design extending vlans from each site and using L3 gateway on one side only at a time, since as a business requirement traffic should always leave from primary DC. However they were expecting some kind of solution to be able to automatically switchover to secondary DC in case of a failure on DC1.

For this cases it's always a pleasure to read Ivan and see how he predicts the design issues that I will face in the future (Stretched DCI), hopefully no stateful firewalls were involved here.

The main issue was not only to detect which side is alive (which is not easy without a witness, and we don't have one at all) but also how to decide which traffic should be served and from where.

So here is a big stop. After keep going with this we need to take some assumptions and business decisions:

  • If DC1 site fails but DCI and DC2 site alive, traffic will enter from DC2 side and traverse the DCI.
  • If DCI fails, traffic will continue being served from DC1 for stretched VLANs subnets, this implies move by other method those servers to the surviving side or at least shut them down.
  • If DC2 site fails but DCI and DC1 site alive, traffic will enter DC1 side and traverse DCI to reach DC2 side servers.
  • Traffic should leave and enter from DC1 whenever possible and DC2 site should not be used unless strictly necessary (this was imposed by customer)

So after reviewing lot of options, and assuming that eventually we can fail and working around that (and the fact that we need to do a stretched cluster after all) we came across a nice BGP feature which is called conditional forwarding. 
Just for your reference, BGP Conditional forwarding allows us to advertise a given network based on the information that we have in our FIB. This can be really useful for this scenario by defining an witness network from each side and advertise to each other, this should be a dummy network like 1.1.1.0/30 for DC1 and 1.1.2.0/30 for DC2 and the match statement will verify if we are getting this network advertisement and based on that will withdraw our advertisement or just let it flow.

Ok, so enough of reading and lets have a quick view on configuration (On NXOS) and behaviour:



Here is the config for the eBGP side of the DC2


Based on that normal behaviour would behave like this (routes will be withdrawn):


Now if we have a failure on DC1 side, conditional trigger will take place and start advertising from DC2.





Is this all what we need? Definitely No... There are still lot of things to resolve and we don't have an optimal design (we can discuss here, if we are meeting business requirements is there anything else to do?), but apart from that notice that stretching a VLAN is not a good choice, guess why? you're extending your fault domain and that doesn't simplify things it also make more complex the isolation and detection. so let's start wondering why we made such poor decisions and why we can't start talking about application level aware resiliency, making our life better by allowing us to use different subnets/networks at each site being able to handle traffic in/out more flexible by leveraging existing methods (long talk about BGP attributes and policy control enters here).


Some references:

Cisco. (Agosto de 2010). Cisco IP Routing. http://www.cisco.com/en/US/tech/tk365/technologies_configuration_example09186a0080094309.shtml











, , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Sunday, May 14, 2017

CCIE DC v2 - bootcamp - outline

For those attending to my CCIE DC v2 bootcamp next week, here is the updated outline, I will be posting updated diagram in few (remember this course is not based in any rack rental so interface numbering is up to that :) )

Introduction

Exam Considerations / Oveview / Strategy


Section 1 – Cisco Data Center Layer 2/Layer 3 Technologies

1.1 – Configure VDC Resources
1.2 – Configure NXOS multicast
1.3- Understanding VxLAN
1.4 – Configure vPC & Deployment options
1.5 – Configure FEX & Deployment options
1.6- Configure VxLAN L2/L3 GW (EVPN | F&L)
1.7 – Configure NXOS Security
1.8 – Configure& Troubleshoot Spanning Tree Protocol
1.9 – Configure & Troubleshoot OTV


Section 2 – Cisco Data Center Network Services

2.1- ACI Service Graph
2.2 – RISE
2.3 – Unmanaged devices in ACI
2.4 –Configure Shared L3 Services


Section 3 – Data Center Storage Networking and Compute

3.1 – Configure FCoE
3.2 – Cisco UCS Connectivity
3.3 – UCS QoS
3.4 – Service Profiles
3.5 – Configure advanced policies
3.6 – Configure Cisco UCS Authentication
3.7 – Configure Call Home Monitoring
3.8 – Troubleshoot SAN Boot
3.9 – UCS Central Basics
3.10 – UCS Central Advanced configuration & tshoot


Section 4 – Data Center Automation and Orchestration

4.1 – Introduction to scripting in Python / cobra SDK
4.2 – Python Programming with ACI Advanced
4.3 – UCS Director Basics
4.4 – UCSD Advanced Workflows Design


Section 5 – ACI

5.1 – Understanding ACI Fabric Policies
5.2 –Understanding ACI Access policies
5.3 – ACI external L3 connectivity in shared resources
5.4 – ACI L2 bridge / L2out
5.5 – ACI VMM integration

















, , , , , , , , , , , , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "