Living in the underlay

Mainly Networking, SDN, Automation, Datacenter and OpenStack as an overlay for my life

Sunday, May 21, 2017

TSHOOT Tips: ELAM Usage on Cisco ACI

I was using this quite lot past weeks and think that is a good resource to share to everyone playing around with Cisco ACI. When it comes to tshoot and to understand packet flow inside the Fabric ELAM is a great tool.

So, what it is?

ELAM stands for Embedded Logic Analyzer Module, It is a logic that is present in the ASICs that allows us to capture and view one or more packets, that match a defined rule, from all the packets that are traversing the ASIC. ELAM is not new at all, some of you can remember this from CAT6500, and thats ok, same logic also same from N7K (for the youngest?).

and... whats new?

Essentialy the concept is still the same, an we just need to focus on understand how is the architecture inside the ASICs on Leafs and Spines to fully apply this concept.

Cisco ASIC data path is divided into ingress and egress pipelines where two ELAMs are present (see figure) at the beginning of the lookup block.



As we can see in the picture Before we can use ELAM to capture a packet, we must be sure that the packet is sent from the BCM ASIC to the Northstar ASIC. ELAM operates only in the Northstar (for leafs, on Spine takes place on Alpine), so any packets that are locally switched in the BCM ASIC will not trigger the ELAM, this is important since in some scenarios the packet will not reach Northstar and will not trigger an ELAM event (we can cover this in a future post about PL-to-PL traffic on ACI fabric :) )

So, assuming that our traffic will be processed by Northstar we need to configure our ELAM instance, first of all is good to know which kind of rules can we configure based on the pipeline, this is also referred as "select lines" and the following are available:

Input Select Lines Supported 
3 - Outerl2-outerl3-outerl4
4 - Innerl2-innerl3-inner l4 
5 - Outerl2-innerl2 
6 - Outerl3-innerl3
7 - Outerl4-innerl4 

Output Select Lines Supported 
0 - Pktrw 
5 - Sideband

With this in mind we can configure our ELAM instance, first of all is always good to have an image to understand the whole process of what we need to do:


Where on INIT we choose the ASIC and pipeline in which the capture should take place, CONFIG refers to the proper configuration of the rulo to match the packets, ARM is like arming the bomb :) but in this case we arm our packet capture to be triggered once the rule defined on CONFIG section has a match, after this READ the captured data and RESET to start over :)

Now lets dig into the packet capture, we will refer to this topology for the capture.

ELAM Example


This image is extracted from a Cisco Live presentation of ELAM but we will focus on LEAF4 only, traffic will traverse from VM1 to the EP at the right going toward Northstar (at 1) and this example is also useful to show how this behaves on Alpine. 

We will arm the ELAM on Leaf 4 to capture a packet coming from EP1 (the one at the left side, directly connected to Leaf1). In this example we show use of in-select 3, which means the fields we can match on or outer L2, L3, or L4. We show also the out-select of 0.

Leaf4# vsh_lc
module-1# debug platform internal ns elam asic 0
module-1(NS-elam)# trigger init ingress in-select ?
3 Outerl2-outerl3-outerl4
4 Innerl2-innerl3-innerl4
5 Outerl2-innerl2
6 Outerl3-innerl3
7 Outerl4-innerl4
module-1(NS-elam)# trigger init ingress in-select 3 out-select 0


This will work for basic ELAM packet capture.As we mention we need to configure (CONFIG section of the ELAM) 1 aspect of the trigger to match on. For this example we will use the SMAC of the locally attached endpoint:



module-1(NS-elam-insel3)# set ?
outer Mask and Match By Outer Packet Fields
module-1(NS-elam-insel3)# set outer ?
ipv4 IPv4 Fields
l2 All Layer 2 Fields
l4 L4 Fields
module-1(NS-elam-insel3)# set outer l2 ?
cfi
cntag_vld
cos
dst_mac
hg2
qtag_vld
snap_vld
src_mac
vlan
vntag_dvif
vntag_looped
vntag_pointer
vntag_svif
vntag_vld
module-1(NS-elam-insel3)# set outer l2 src_mac 0050.5665.34bd
!!
!!The start command will arm the ELAM and it will start looking for the first packet that matches the trigger.
!!
module-1(NS-elam-insel3)# start


In order to see the ELAM state the status command can be used, esentially three different status can be found:
- Triggered: indicates that a packet has been detected as matching the trigger, and that packet is available for analysis. 
- Armed: it means that that no packet has been detected as matching the trigger yet, and ELAM is actively looking at packets for a match to the trigger.
- Initialized: the ELAM is available for triggers to be configured, or to be armed with the start command. It is not currently attempting to capture a matched packet. 

Once ELAM is triggered, the packet can be viewed for analysis with the report command. The report will show the relevant header fields in the packet (note that will not show the complete payload of the packet), once this is done we can restart the process with the reset command.


module-1(NS-elam-insel3)# report
...
module-1(NS-elam-insel3)# report | egrep ce_|ar_|drop|hg2_src
GBL_C++: [INFO] hg2_srcpid: 0A
GBL_C++: [INFO] ce_da: FFFFFFFFFFFF
GBL_C++: [INFO] ce_sa: 0050566534BD
GBL_C++: [INFO] ce_etype: 0806
GBL_C++: [INFO] ar_sha: 0050566534BD
GBL_C++: [INFO] ar_spa: 0A108030
GBL_C++: [INFO] ar_tha: 000000000000
GBL_C++: [INFO] ar_tpa: 0A108001
GBL_C++: [INFO] ar_spare: 0000000000000000000000000000
GBL_C++: [MSG] - pktrw is complete
GBL_C++: [INFO] drop: 0
GBL_C++: [INFO] hg2_srcpid: 0A
GBL_C++: [INFO] hg2_vid_lo: 63
GBL_C++: [INFO] vlan0: 063
GBL_C++: [INFO] adj_index: 000C
GBL_C++: [INFO] ol_encap_idx: 2FF6
GBL_C++: [INFO] ol_ttl: 08
GBL_C++: [INFO] ol_segid: 2A8001
GBL_C++: [INFO] sclass: C005
GBL_C++: [INFO] sup_redirect: 0
GBL_C++: [INFO] mcast: 0
!!
!! Once done, we need to reset the ELAM to start the process again
!!
module-1(NS-elam-insel3)# reset

This is pretty much all for a good start on ELAM usage for ACI, more info is available at N9K config guide and a good resource as well is the Cisco Live Session BRKACI-2102, from which I already took some images for this post.

Hope you enjoy and next time maybe I found some time to start the amazing post of PL-to-PL traffic on ACI Fabric.











, , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Thursday, May 18, 2017

Stretched DC, really?... ok, for L3, BGP conditional forwarding

A long time ago (I think it was years back) I was reviewing a DR solution for some internal customer who has two datacenter and a DCI between them (dark fiber). They moved initially to a stretched design extending vlans from each site and using L3 gateway on one side only at a time, since as a business requirement traffic should always leave from primary DC. However they were expecting some kind of solution to be able to automatically switchover to secondary DC in case of a failure on DC1.

For this cases it's always a pleasure to read Ivan and see how he predicts the design issues that I will face in the future (Stretched DCI), hopefully no stateful firewalls were involved here.

The main issue was not only to detect which side is alive (which is not easy without a witness, and we don't have one at all) but also how to decide which traffic should be served and from where.

So here is a big stop. After keep going with this we need to take some assumptions and business decisions:

  • If DC1 site fails but DCI and DC2 site alive, traffic will enter from DC2 side and traverse the DCI.
  • If DCI fails, traffic will continue being served from DC1 for stretched VLANs subnets, this implies move by other method those servers to the surviving side or at least shut them down.
  • If DC2 site fails but DCI and DC1 site alive, traffic will enter DC1 side and traverse DCI to reach DC2 side servers.
  • Traffic should leave and enter from DC1 whenever possible and DC2 site should not be used unless strictly necessary (this was imposed by customer)

So after reviewing lot of options, and assuming that eventually we can fail and working around that (and the fact that we need to do a stretched cluster after all) we came across a nice BGP feature which is called conditional forwarding. 
Just for your reference, BGP Conditional forwarding allows us to advertise a given network based on the information that we have in our FIB. This can be really useful for this scenario by defining an witness network from each side and advertise to each other, this should be a dummy network like 1.1.1.0/30 for DC1 and 1.1.2.0/30 for DC2 and the match statement will verify if we are getting this network advertisement and based on that will withdraw our advertisement or just let it flow.

Ok, so enough of reading and lets have a quick view on configuration (On NXOS) and behaviour:



Here is the config for the eBGP side of the DC2

router bgp 65533
bgp log-neighbor-changes
neighbor 19.21.54.245 remote-as 666
neighbor 19.21.54.245 update-source Vlan30
!
neighbor 19.21.54.245 activate
neighbor 19.21.54.245 advertise-map ADV-MAP non-exist-map NON-EXIST neighbor 19.21.54.245 next-hop-self
neighbor 19.21.54.245 soft-reconfiguration inbound
neighbor 19.21.54.245 route-map SOME_RANGE_ONLY_AT_DC2 out
route-map PUBLICAS-L3 permit 10
match ip address prefix-list PUBLICAS-L3
!
route-map NON-EXIST-HOR permit 10
match ip address prefix-list NON-EXIST-HOR !
!
route-map ADV-MAP-L3 permit 10
match ip address prefix-list ADV-MAP-L3
!
!# ADV-MAP: This are the routes that will be advertised in case that the non-exist route map succeeds.
ip prefix-list ADV-MAP seq 5 permit 201.212.14.128/26
ip prefix-list ADV-MAP seq 10 permit 201.212.14.0/24
ip prefix-list ADV-MAP seq 15 permit 201.212.15.0/24 !
!# NON-EXIST: This will trigger the withdrawal based on the existence of this networks
ip prefix-list NON-EXIST seq 5 permit 0.0.0.0/32
ip prefix-list NON-EXIST seq 10 permit 1.1.1.0/30
!
ip prefix-list SOME_RANGE_ONLY_AT_DC2 seq 5 permit 200.1.33.80/28
!

Based on that normal behaviour would behave like this (routes will be withdrawn):
dc1-side# sh ip bgp summary
BGP summary information for VRF default, address family IPv4 Unicast BGP router identifier 10.120.32.240, local AS number 65533
BGP table version is 276, IPv4 Unicast config peers 3, capable peers 3 49 network entries and 84 paths using 7756 bytes of memory
BGP attribute entries [5/640], BGP AS path entries [1/10]
BGP community entries [0/0], BGP clusterlist entries [0/0]
42 received paths for inbound soft reconfiguration
41 identical, 1 modified, 0 filtered received paths using 8 bytes
Neighbor V AS MsgRcvd MsgSent
10.x.143.5 4 xx 296153 296254
10.12.32.2 4 65533 296111 296115
10.33.32.21 4 65533 10212 10280 276 0 0 1w0d 5
N7K-1-BORDER_VDC# sh ip bgp neighbors 10.120.32.248 advertised-routes
Peer 10.120.32.248 routes for address family IPv4 Unicast:
BGP table version is 276, local router ID is 10.120.32.240
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath
Network *>e0.0.0.0/0 *>l1.1.1.0/30
Next Hop 10.110.143.5 0.0.0.0
Metric
LocPrf Weight Path
150 0 xx 3549 i
100 32768 i // Trigger route injected
# !!!!!!!!!!!!!!
dc2-side#sh ip bgp summary
Metric
LocPrf Weight Path
Next Hop 10.110.143.1
150
100
0 666 354
32768 i // Trigger route injected
0.0.0.0
BGP router identifier 10.120.32.74, local AS number 65533 BGP table version is 129, main routing table version 129
8 network entries using 936 bytes of memory
11 path entries using 572 bytes of memory
5/3 BGP path/bestpath attribute entries using 800 bytes of memory 2 BGP AS-PATH entries using 48 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 2356 total bytes of memory
BGP activity 44/36 prefixes, 85/74 paths, scan interval 60 secs
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.x.32.2 4 65533 xy 15496 129 0 01w3d 2
10.12.32.1 4 65533 xu 15494 129 0 01w3d 2
19.21.54.245 4 3549 xn 11109 129 0 0 1w0d 1
dc2-side#sh ip bgp neighbors 10.y.6.12 received-routes
BGP table version is 129, local router ID is 10.120.32.74
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network *>i0.0.0.0 *>i1.1.1.0/30
Next Hop 10.120.32.240
10.1.2.240
Metric LocPrf Weight Path 150 0 xxx 3549 i
100 0 I // Trigger route received
Total number of prefixes 2
dc2-side#sh ip bgp neighbors 10.120.6.12 received-routes
BGP table version is 129, local router ID is 10.120.32.74
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network
* i0.0.0.0
* i1.1.1.0/30
Next Hop 10.120.32.241
10.120.32.241
Metric LocPrf Weight Path 150 0 xxx 3549 i
100 0 i //Trigger route received
Total number of prefixes 2
###
### Important verification here, check at the end
###
dc2-side#sh ip bgp neighbors 190.216.54.245
BGP neighbor is 190.216.54.245, remote AS 3549, external link
BGP version 4, remote router ID 67.17.82.239
BGP state = Established, up for 1w0d
Last read 00:00:38, last write 00:00:56, hold time is 180, keepalive interval is 60 seconds Neighbor capabilities:
Route refresh: advertised and received(new) Four-octets ASN Capability: advertised and received Address family IPv4 Unicast: advertised and received
Message statistics: InQ depth is 0 OutQ depth is 0
Sent
Rcvd 2
0 74085
Opens: Notifications: Updates: Keepalives:
Route Refresh: Total: 11109
2 0
13 11093
12291 0
1
Default minimum time between advertisement runs is 30 seconds
86378
For address family: IPv4 Unicast
BGP table version 129, neighbor version 129/0
Output queue size : 0
Index 2, Offset 0, Mask 0x4
2 update-group member
Inbound soft reconfiguration allowed
NEXT_HOP is always this router
Outbound path policy configured
Route map for outgoing advertisements is SOME_PUBLIC_SUBNETS_AT_DC2_SIDE
###
Condition-map NON-EXIST, Advertise-map ADV-MAP, status: Withdraw


Now if we have a failure on DC1 side, conditional trigger will take place and start advertising from DC2.



dc2-site#sh ip bgp 1.1.1.0/30
% Network not in table //Trigger route NOT received
VSS-GC-L3#sh ip bgp summary
BGP router identifier 10.120.32.74, local AS number 65533 BGP table version is 134, main routing table version 134
7 network entries using 819 bytes of memory
9 path entries using 468 bytes of memory
4/2 BGP path/bestpath attribute entries using 640 bytes of memory 2 BGP AS-PATH entries using 48 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 1975 total bytes of memory
BGP activity 44/37 prefixes, 85/76 paths, scan interval 60 secs
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
19.21.54.245 4 xys 86397 11126 131 0 0 1w0d 1 //Only L3 peer is alive
dc2-site#sh ip bgp neighbors 19.21.54.245
BGP neighbor is 19.21.54.245, remote AS 3549, external link
BGP version 4, remote router ID 67.17.82.239
BGP state = Established, up for 1w0d
Last read 00:00:24, last write 00:00:40, hold time is 180, keepalive interval is 60 seconds Neighbor capabilities:
Route refresh: advertised and received(new) Four-octets ASN Capability: advertised and received Address family IPv4 Unicast: advertised and received
Message statistics: InQ depth is 0 OutQ depth is 0
Sent
Rcvd 2
Opens: Notifications: Updates: Keepalives:
Route Refresh: Total: 11126
2 0
0 74085
13
11110 12310
1 0 86397
Default minimum time between advertisement runs is 30 seconds
For address family: IPv4 Unicast
BGP table version 134, neighbor version 134/0
Output queue size : 0
Index 2, Offset 0, Mask 0x4
2 update-group member
Inbound soft reconfiguration allowed
NEXT_HOP is always this router
Outbound path policy configured
Route map for outgoing advertisements is SOME_PUBLIC_SUBNETS_ON_DC2
Condition-map NON-EXIST, Advertise-map ADV-MAP, status: Advertise //Routes in advertise-map
ADV-MAP are being advertised.


Is this all what we need? Definitely No... There are still lot of things to resolve and we don't have an optimal design (we can discuss here, if we are meeting business requirements is there anything else to do?), but apart from that notice that stretching a VLAN is not a good choice, guess why? you're extending your fault domain and that doesn't simplify things it also make more complex the isolation and detection. so let's start wondering why we made such poor decisions and why we can't start talking about application level aware resiliency, making our life better by allowing us to use different subnets/networks at each site being able to handle traffic in/out more flexible by leveraging existing methods (long talk about BGP attributes and policy control enters here).


Some references:

Cisco. (Agosto de 2010). Cisco IP Routing. http://www.cisco.com/en/US/tech/tk365/technologies_configuration_example09186a0080094309.shtml











, , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Sunday, May 14, 2017

CCIE DC v2 - bootcamp - outline

For those attending to my CCIE DC v2 bootcamp next week, here is the updated outline, I will be posting updated diagram in few (remember this course is not based in any rack rental so interface numbering is up to that :) )

Introduction

Exam Considerations / Oveview / Strategy


Section 1 – Cisco Data Center Layer 2/Layer 3 Technologies

1.1 – Configure VDC Resources
1.2 – Configure NXOS multicast
1.3- Understanding VxLAN
1.4 – Configure vPC & Deployment options
1.5 – Configure FEX & Deployment options
1.6- Configure VxLAN L2/L3 GW (EVPN | F&L)
1.7 – Configure NXOS Security
1.8 – Configure& Troubleshoot Spanning Tree Protocol
1.9 – Configure & Troubleshoot OTV


Section 2 – Cisco Data Center Network Services

2.1- ACI Service Graph
2.2 – RISE
2.3 – Unmanaged devices in ACI
2.4 –Configure Shared L3 Services


Section 3 – Data Center Storage Networking and Compute

3.1 – Configure FCoE
3.2 – Cisco UCS Connectivity
3.3 – UCS QoS
3.4 – Service Profiles
3.5 – Configure advanced policies
3.6 – Configure Cisco UCS Authentication
3.7 – Configure Call Home Monitoring
3.8 – Troubleshoot SAN Boot
3.9 – UCS Central Basics
3.10 – UCS Central Advanced configuration & tshoot


Section 4 – Data Center Automation and Orchestration

4.1 – Introduction to scripting in Python / cobra SDK
4.2 – Python Programming with ACI Advanced
4.3 – UCS Director Basics
4.4 – UCSD Advanced Workflows Design


Section 5 – ACI

5.1 – Understanding ACI Fabric Policies
5.2 –Understanding ACI Access policies
5.3 – ACI external L3 connectivity in shared resources
5.4 – ACI L2 bridge / L2out
5.5 – ACI VMM integration

















, , , , , , , , , , , , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "