Living in the underlay

Mainly Networking, SDN, Automation, Datacenter and OpenStack as an overlay for my life

Friday, March 31, 2017

IE-Bootcamps - Launching a new training experience


Finally, after a tough work done by me and the CCIE HOME team, we ended creating ie-bootcamps, the first expert level training company based in Latin America that will deliver courses and bootcamps for the most challenging tracks. We plan to cover America completely, and as a start point we introduce the CCIE DC v2 Lab Bootcamp that would be held at Buenos Aires, Argentina in May 22-26th. I will be delivering the course, so you are all invited :)

More info at: http://ie-bootcamps.com/course/ccie-dc-v2-0-lab-bootcamp/
Or reach me directly

-----

Finalmente tras un largo trabajo de parte del equipo de CCIE HOME y mio, hemos dado a luz a la primer empresa encargada de dar capacitaciones de nivel experto en habla hispana: ie-bootcamps.
Como primer medida, hemos arrancado con la coordinacion del bootcamp de CCIE DC v2.0 a darse en Buenos Aires, Argentina el 22 al 26 de Mayo.



, , , , , , , , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Wednesday, March 22, 2017

Testing boundaries - thoughts before start

As we previous discussed in our previous post regarding what is truly behind the business needs of an AFA box vs the vendor hype that we face, let's assume we have just obtained our technical requirements and we are facing the task of stress one of this boxes. 

For any performance test there are several conditions that must always be present and should be considered, these are my initial thoughts hope you find them useful.

  • Create "significant" traffic: This means not to only stress performance but to use traffic patterns that are representative for you (i.e. workloads similar to the environment in which you are going to put the device under test)
  • Don't forget to measure: An important and also tedious part of the testing is to take notes and write down the results, so always plan your performance scenarios with the idea behind of getting the data exported so you can easily write it down or graph results.
  • Test boundaries: If a device is called to reach X performance, test it. Lets say here X is 100K IOPS @8k bs 70/30 (r/w), so you have to get a way to reach that performance in your infrastructure (generate that workloads). Also there are several considerations to take here, i.e. using one thread with that workload is not the same than running multiple which represent a much more real approach, we will deep into this in an specific post about AFA testing procedure.
  • Tune the environment: This can also be called environment set-up, be sure to have your underlay infrastructure ready with best practices and no issues to be sure that the test you're running is not getting affected by any other factor than the test procedure itself.
  • Automate as much as you can: Doing testing can be tough, imagine that you have to re-test changing few parameters, applying new versions, etc... impossible.. so get an automated approach to set up your test and even shoot them and plot the results in a fancy way.
  • Understand what you're doing: Testing is not about running a workload and seeing if performance is good or not, or at least it shouldn't be. The whole purpose behind a testing procedure is to understand how the under-testing device react at stress conditions an under normal plus similar to real ones. Also to notice how does internally works and how behaves under changes... this guided me to the next bullet.
  • Resiliency: So you have tested and all seems perfect, performance is outstanding and testing is going well... but have you tested how does this behaves under unexpected and planned changes? Resiliency is key to production environments since it not only gives you an overview of how high availability is performed (which can be important and most for production environments) and also on how doest the systems react to this changes (you can be easily surprised by well-known vendors running in panic mode after switchovers).
  • Plan the tests accordingly: If you're running a PoC or a performance test you will do a lot of work for preparation and setup of the environment, this can involve doing changes in physical network to test HA, clusters, or other functionalities, you will lost lot of time by changing lot of times so is really important to order the test plan accordingly meaning to do the minimal changes necessary and in every change do the maximum amount of task prior to next change, this will save you lot of time.





, , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Sunday, March 19, 2017

All Flash Array: vendor hype vs business needs

Past month I was involved in a project to test the performance of several All Flash Array boxes, let's call them AFAs, since it is expected to start delivering new class of services (and SLA?) to customers.

Being in a R&D team gives you the opportunity to know the whole part of the story, you start with sales pitch, then the pre-sales *not-so-technical and at the end you break a box in a PoC and you end with real engineers which explains you the details of his architecture and why they don't support what you just test (but it was on the sales pitch, right?)

What I do want to remark here is the relation between vendor hype and business needs. AFA can brings you an outstanding performance in amount of IOPS, latency, compression, and so on... but the point is, what do you really need?. In any architecture design you are supposed to deliver a solution that mets technical requirements + business requirements/needs.  In AFA this could be quite tricky since if you don't have a clear understanding on your business needs you're going to be pushed to an unfair or un-precise comparison.

So, what about technical requirements?

I've faced two kinds of scenarios for the AFA deployment. The easier was the deploy of a new infrastructure aimed to fulfill specific requirements for an application suite, this is always the best case scenario for an architect since you can gather technical requirements easily including not only actual requirements bur also a forecast for upcoming demands. As you may know there are not all pink elephants, and the other well seen scenario is to move a current deployment to an AFA solution.
In the latter case you have to take a huge amount of considerations in order to do the planning accordingly, I can summarize the following:

  • Amount of IOPS: This is something that you can see a lot, and is completely wrong, since what is 1K IOPS? Which block size? which read/write ratio?
  • Amount of IOPS based on a given IO distribution which includes block size and read/write ratio of each of those. Getting this numbers can be significant hard, most of current arrays have the information of all the workloads they had run, this is amazing for getting the IO distribution per block size plus rd/wr ratios but assumes that you are running similar workloads all day long (i.e. consistent distribution of a given set like 30K IOPS @4k 65/35, 15K IOPS @8k 60/40.. and so on, but what about the nightly jobs? when your performance gets affected with backup jobs?)
  • Expected  and maximum latency: Based on application + OS/Guest OS needs 
  • Expected compression ratio (for your data set!)
  • HA and expected performance in contingency
  • Network based (NFS), IP based (iSCSI) or FC? 
  • Disk replacement policy and MTBF/MTTF


Ok, but where is the vendor Hype?

Well, quick answer is everywhere you get a sales/pre-sales engineer talking, but to be targeted to the topic I've found solutions that claims in a  single Box 100K IOPS @32K block size.... easy math here is to ask them how many IOPS did they support in the case of 4/8K if they reach that at 32, also how many bandwidth did they expect?


How can I test performance an avoid being seduced by pink elephants?

When we begin the test process we aim at a huge amount of VMs with RDM to the array but later we figured it out that there is no way to do such, with few VMs with lot of disk in each you can easily setup a quick test.
For the testing procedure and consideration IDC has written a good recap in theirs "All - Flash Array Performance Testing Framework" and also there is a Tool made by EMC engineers to test arrays called "AFA PoC Toolkit" which setups a few VMs under vmware host, connect the host to the LUNs of the array then makes RDM and set up VDBench on the VMs.
I do recommend using this approach by changing the parameters on VDB files to meet your requirements. Also there is one caveat with that Toolkit and is that in only runs with EMC XtremIO, I've made several changes to be able to run it against PureStorage boxes and i'm doing the changes to be able to test against SolidFire too. In a later post we will discuss design considerations for an AFA platform and the testing tools plus results for each vendor (We're testing PureStorage, EMC XtremeIO, NetApp SolidFire)





, , , , , , , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Tuesday, March 14, 2017

Worth Reading: Verizon SDN-NFV Reference Architecture

Last week I've found this reference architecture and just ended reading it, it's huge but it really worth it (specially for me which i'm working in an SDN NFV Reference architecture myself)

http://innovation.verizon.com/content/dam/vic/PDF/Verizon_SDN-NFV_Reference_Architecture.pdf

 Enjoy! , , , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

Thursday, March 9, 2017

VxLAN Deep Dive Part III: Flood and Learn



Rainy day ideal for continuing with the series of VxLAN :)
  • Part 1: Let's overlay -- Basic info about VXLAN, addressing and headers.
  • Part 2: It's all about knowledge -- Packet forwarding overview, VTEP control plane learning options
  • Part 3: Hands On #1 -- Configuration on Cisco Nexus Devices, Flood and Learn.
  • Part 4: Hands On #2 -- Configuration on Cisco Nexus Devices,  EVPN.

Today we will focus on config, the funniest part of any IE track. In part 1 & 2 we cover fundamentals, now we understand how VxLAN works, how many addresses can we get and the different options of advertising MAC/IP information to peers. In this case we will start with Flood & Learn, I choose this one not only for being the first adopted but also to be the *poorly* documented on the web.

Note: If you want to find any VxLAN config info in NXOS I encourage you too look under N9K, since latest release of 7K will do the same and under 7K you will not find anything :)

Recommended reading: I've also mentioned this in the prep for DC IE track, but if you don't have the chance I really recommend Cisco Live Presentations, in this case BRKDCT-2404: VxLAN Deployment Models

Let's use this topology:

First we will define each component in the network:
H1 / H2 would be hosts in different VLAN, for this example VLAN 101 and 102.
L1 - L4 would be Nexus 5K running as VXLAN L2 GW.
L5 - L6 would be Nexus 7K running as VXLAN L3 GW
S1 / S2 will be the underlay L3 core between all the nodes, the L3 cloud also supporting multicast.


IGP + Multicast cloud

First things first, we need basic IGP reachability between our nodes plus multicast reachability. For multicast we are going to use static mapping for the RP and, in order to bring redundancy, we will deploy PhantomRP, of course you can use any other way that you want but, for the purpose of the example, static mapping is the simplest option. Here is the snippet of config in each device


L1 - L6



S1 / S2



Also the config for IGP is omitted here, since you can run whatever you want (also static routing right? yes! but lot of work), in our case we simply setup OSPF in area 0 and point to point interfaces in each link, the only consideration was the MTU, as you may recall from previous post of this series you will need to tune up MTU to be able to send a VXLAN packet inside, you can do the math by yourself (remember VXLAN adds an additional 8 byte field...) but lets assuming that anything beyond 1554 1600 would be ok.


Just a minor note on Naming conventions

Before taking any other step further we will need to clarify some naming conventions that cisco has impossed to us :)

If you recall correctly each host in the VXLAN network has a VTEP which establish the tunnel and is , in the end, the responsible of taking in/out encapsulated packet to our vxlan network... Well for Cisco is quite different, since they don't have any host here messing around with VXLAN for them VTEP is the device responsible for the encap/decap process, but if VTEP is the device how do I configure the interface on that device in which this magic occurs? and here we come with NVE (Network Virtual Interface) which is the logical interface where the previous mentioned magic of encap/decap actually takes place.


VXLAN L2 Gateway

Well here the fun begins, at L2 of course. I've stated the concept of VXLAN L2 GW and thats because it is so important to understand how it works and what it does, the L2 keyword is key to understand this.
Let's asume you are running a legacy network, with VLANs in place and you want to integrate to a new design with VXLAN. Of course that you want not just to allow inter-routing between them but also allow to place some hosts in your new network in the same segment as the legacy one. Nice scenario... but how we can accomplish this?

Essentially what we need is a device capable of taking L2 frames, let's say tagged since I state that we've VLANs in place, and put them in the same bridge domain with the VXLAN traffic. Using different words to say the same what we want to accomplish is to put VLAN traffic into the VXLAN VNI associated to that traffic and be able to flood interact between them.

Here I will post two suggested options for config, one at the interface level (i.e. config direct on interface that get the tagged frames) and one at the switch level (what will use in our 5K), Cisco refer to this config as VSI (VN-Segment service instance) and VLAN modes.

VSI CLI Mode (L1/L2)

For this config let's assume that L1/L2 leafs gets the frames on H1/H2 ports on a trunk interface and what we want to accomplish is the vlan-to-vni mapping.




VLAN CLI Mode

This mode is much more easier, and that is because what we do is just an association between a vlan and a vn-segment.






VXLAN L3 Gateway

So basically we cover full L2 reachability between VXLAN under same bridge domain, but what about inter VXLAN routing? Seems pretty obvious that we will need to add a gateway with interfaces in the interesting VXLAN segment to route among them and also like SVI we do have a BDI (Bridge Domain Interfaces). We will associate that BDI with our VNI and assign addressing to it in order to be the VXLAN gateway for that segment and also be able to route outside (also to other VXLAN segment if we have)


So if we dig into config we can see is pretty straight forward since the magic already happened, what that means is that we do the hardest part that is the association between BD-VNI or VLAN-VNI, so only thing left id to create the associated interface (SVI in VLAN mode or BDI in VSI mode) with the addressing and thats all:



Note #1: See, only "new" stuff here is that we need to run PIM sparse mode, can you guess why? ping me!
Note #2: Yes, I'm using a VRF since it's also best practice to run tenant traffic into a separate vrf, this is pretty common in real life deployments (not flood and learn, VRF isolation i mean :) )

Now lets consider some design caveats and the first thing in mind for everyone here is HA so...

Redundancy? VXLAN L3 GW + HSRP? What about VPC ?

Yes, Yes and also Yes. Of course you can run HSRP on top of BDI/SVI but the not so easy part here is VPC. As you know VPC provides MAC state sync between peering devices and if you've opted to HSRP redundant VTEPs share anycast VTEP IP address (underlay). This way VPC provides L2 + L3 redundancy in HW, to be able to do this some changes needed to be done to loopack interface used for VTEP sourcing and to VPC Domain config, as a starter point first thing to recall is that a secondary ip address should be shared between VPC peers in order to forward VXLAN packets that can be handled for any of the peering devices, does that remember you something?... Yes, we also need peer-gateway under VPC Domain. Based on what we just describe peer-gateway is mandatory and also requires a SVI configured with PIM across peer link, a list of requirements can be found here:

http://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/6-x/vxlan/configuration/guide/b_Cisco_Nexus_9000_Series_NX-OS_VXLAN_Configuration_Guide/b_Cisco_Nexus_9000_Series_NX-OS_VXLAN_Configuration_Guide_chapter_010.html#concept_C769B7878CE2458E98657905843DEEFA

But what you can never forget is:

  • Unique primary IP for underlay loopback
  • Same secondary
  • PIM
  • Consistent VNI to multicast group mapping
  • peer-gateway and a "special SVI" (PIM enabled) // This is needed in case your leaf lost connectvity to Spines and needs to forward packet to peering device.

Those are key for me but in the doc you will find a lot of more information, also something to mention is that on Nexus 5K VXLAN VPC configuration requires the use of that special SVI for VXLAN traffic by issuing a special command:




Distributed Gateway

Well, this is a huge post... we will talk about distributed GW, anycast and that stuff in another one since if not this will take me a lifetime and I do want to explain it in detail :)

Stay in touch :)

, , , , , , , , , , , , , , , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "