Living in the underlay: architecture

Showing posts with label architecture. Show all posts

Friday, March 31, 2017

IE-Bootcamps - Launching a new training experience

5:46 PM

Finally, after a tough work done by me and the CCIE HOME team, we ended creating ie-bootcamps, the first expert level training company based in Latin America that will deliver courses and bootcamps for the most challenging tracks. We plan to cover America completely, and as a start point we introduce the CCIE DC v2 Lab Bootcamp that would be held at Buenos Aires, Argentina in May 22-26th. I will be delivering the course, so you are all invited :)

More info at: http://ie-bootcamps.com/course/ccie-dc-v2-0-lab-bootcamp/
Or reach me directly

-----

Finalmente tras un largo trabajo de parte del equipo de CCIE HOME y mio, hemos dado a luz a la primer empresa encargada de dar capacitaciones de nivel experto en habla hispana: ie-bootcamps.
Como primer medida, hemos arrancado con la coordinacion del bootcamp de CCIE DC v2.0 a darse en Buenos Aires, Argentina el 22 al 26 de Mayo.

Wednesday, March 22, 2017

Testing boundaries - thoughts before start

10:46 AM

As we previous discussed in our previous post regarding what is truly behind the business needs of an AFA box vs the vendor hype that we face, let's assume we have just obtained our technical requirements and we are facing the task of stress one of this boxes.

For any performance test there are several conditions that must always be present and should be considered, these are my initial thoughts hope you find them useful.

Create "significant" traffic: This means not to only stress performance but to use traffic patterns that are representative for you (i.e. workloads similar to the environment in which you are going to put the device under test)

Don't forget to measure: An important and also tedious part of the testing is to take notes and write down the results, so always plan your performance scenarios with the idea behind of getting the data exported so you can easily write it down or graph results.

Test boundaries: If a device is called to reach X performance, test it. Lets say here X is 100K IOPS @8k bs 70/30 (r/w), so you have to get a way to reach that performance in your infrastructure (generate that workloads). Also there are several considerations to take here, i.e. using one thread with that workload is not the same than running multiple which represent a much more real approach, we will deep into this in an specific post about AFA testing procedure.

Tune the environment: This can also be called environment set-up, be sure to have your underlay infrastructure ready with best practices and no issues to be sure that the test you're running is not getting affected by any other factor than the test procedure itself.

Automate as much as you can: Doing testing can be tough, imagine that you have to re-test changing few parameters, applying new versions, etc... impossible.. so get an automated approach to set up your test and even shoot them and plot the results in a fancy way.

Understand what you're doing: Testing is not about running a workload and seeing if performance is good or not, or at least it shouldn't be. The whole purpose behind a testing procedure is to understand how the under-testing device react at stress conditions an under normal plus similar to real ones. Also to notice how does internally works and how behaves under changes... this guided me to the next bullet.

Resiliency: So you have tested and all seems perfect, performance is outstanding and testing is going well... but have you tested how does this behaves under unexpected and planned changes? Resiliency is key to production environments since it not only gives you an overview of how high availability is performed (which can be important and most for production environments) and also on how doest the systems react to this changes (you can be easily surprised by well-known vendors running in panic mode after switchovers).

Plan the tests accordingly: If you're running a PoC or a performance test you will do a lot of work for preparation and setup of the environment, this can involve doing changes in physical network to test HA, clusters, or other functionalities, you will lost lot of time by changing lot of times so is really important to order the test plan accordingly meaning to do the minimal changes necessary and in every change do the maximum amount of task prior to next change, this will save you lot of time.

Sunday, March 19, 2017

All Flash Array: vendor hype vs business needs

11:10 AM

Past month I was involved in a project to test the performance of several All Flash Array boxes, let's call them AFAs, since it is expected to start delivering new class of services (and SLA?) to customers.

Being in a R&D team gives you the opportunity to know the whole part of the story, you start with sales pitch, then the pre-sales *not-so-technical and at the end you break a box in a PoC and you end with real engineers which explains you the details of his architecture and why they don't support what you just test (but it was on the sales pitch, right?)

What I do want to remark here is the relation between vendor hype and business needs. AFA can brings you an outstanding performance in amount of IOPS, latency, compression, and so on... but the point is, what do you really need?. In any architecture design you are supposed to deliver a solution that mets technical requirements + business requirements/needs. In AFA this could be quite tricky since if you don't have a clear understanding on your business needs you're going to be pushed to an unfair or un-precise comparison.

So, what about technical requirements?

I've faced two kinds of scenarios for the AFA deployment. The easier was the deploy of a new infrastructure aimed to fulfill specific requirements for an application suite, this is always the best case scenario for an architect since you can gather technical requirements easily including not only actual requirements bur also a forecast for upcoming demands. As you may know there are not all pink elephants, and the other well seen scenario is to move a current deployment to an AFA solution.
In the latter case you have to take a huge amount of considerations in order to do the planning accordingly, I can summarize the following:

~~Amount of IOPS:~~ This is something that you can see a lot, and is completely wrong, since what is 1K IOPS? Which block size? which read/write ratio?
Amount of IOPS based on a given IO distribution which includes block size and read/write ratio of each of those. Getting this numbers can be significant hard, most of current arrays have the information of all the workloads they had run, this is amazing for getting the IO distribution per block size plus rd/wr ratios but assumes that you are running similar workloads all day long (i.e. consistent distribution of a given set like 30K IOPS @4k 65/35, 15K IOPS @8k 60/40.. and so on, but what about the nightly jobs? when your performance gets affected with backup jobs?)
Expected and maximum latency: Based on application + OS/Guest OS needs
Expected compression ratio (for your data set!)
HA and expected performance in contingency
Network based (NFS), IP based (iSCSI) or FC?
Disk replacement policy and MTBF/MTTF

Ok, but where is the vendor Hype?

Well, quick answer is everywhere you get a sales/pre-sales engineer talking, but to be targeted to the topic I've found solutions that claims in a single Box 100K IOPS @32K block size.... easy math here is to ask them how many IOPS did they support in the case of 4/8K if they reach that at 32, also how many bandwidth did they expect?

How can I test performance an avoid being seduced by pink elephants?

When we begin the test process we aim at a huge amount of VMs with RDM to the array but later we figured it out that there is no way to do such, with few VMs with lot of disk in each you can easily setup a quick test.
For the testing procedure and consideration IDC has written a good recap in theirs "All - Flash Array Performance Testing Framework" and also there is a Tool made by EMC engineers to test arrays called "AFA PoC Toolkit" which setups a few VMs under vmware host, connect the host to the LUNs of the array then makes RDM and set up VDBench on the VMs.
I do recommend using this approach by changing the parameters on VDB files to meet your requirements. Also there is one caveat with that Toolkit and is that in only runs with EMC XtremIO, I've made several changes to be able to run it against PureStorage boxes and i'm doing the changes to be able to test against SolidFire too. In a later post we will discuss design considerations for an AFA platform and the testing tools plus results for each vendor (We're testing PureStorage, EMC XtremeIO, NetApp SolidFire)

Tuesday, March 14, 2017

Worth Reading: Verizon SDN-NFV Reference Architecture

5:52 PM

Last week I've found this reference architecture and just ended reading it, it's huge but it really worth it (specially for me which i'm working in an SDN NFV Reference architecture myself)

http://innovation.verizon.com/content/dam/vic/PDF/Verizon_SDN-NFV_Reference_Architecture.pdf

Enjoy!

Tuesday, February 28, 2017

CCDE, be a chameleon

9:33 AM

After passing DE want to take some time and write this words to all futures aspirants and also for those who failed and also pass to see their point of views regarding CCDE. I was thinking about strategy in lab and mind strategy as well and I realize that being a chameleon is one of the important aspects that you will need to address in order to be successful in this path.

Disclaimer: This post is intended to be a wrap up of my experience to succeed in lab, this study methodology and all suggested here is based on my background and available times on week for studying, feel free to use it and adjust it to you own pace.

Being said that...

Get yourself use to read and read

Asumming that you have passed written and you feel like all the theory needed is covered I can assure you, that's not true. Even if you have passed the written you still need to read and re read technology and also be able to understand pros and cons of each design. A general rule that I can say is that to any desig you will need to be able to find the pros and the cons, if you can't you're just being shortsigthed and you're missing to cover all points in a given design. For this aspect is also key to get a group study, so let's talk about this one too.

Be in a group and collaborate as much as you can

Some guys claim that studying alone is the way that best suits them, that can be great if you're a giving a IE level exam or any written, but for DE is completely the opposite. I do really like to do this comparison and say that if you study alone you're just a vlan with a polarized HSRP gateway, despite of the nerd comparison what I do really mean is that you can't argue anyone! and that is not good at all, since I don't expect guys that argue for everything but I do expect people that say their point of view and discuss also why they have thought that caveat or pro in a proposed design. And remember discussions are a good thing, just like other good stuff don't abuse to them and use it carefully.

The methodology that we use in our group was to check one scenario every week, we set up call on weekend and on week we study by ourselves. You can use the scenarios in Orhan and Martin Duggan book, but since we run out of scenarios what we made is split ourselves in group of two and make modifications to create a specific scenario that cover some specific design concern, I do recommend you that in this "split" you choose to create scenarios based on your areas of expertise (in my case it was SP and DC :) )

My study methodology prior to lab was aprox 6hs per day (Monday-Friday), and 4hs sessions on Saturday. This by two months... and assuming that you have cleared written (i cleared mine long ago) and you have good design expertise (if not i will recommend you another 2 months of study), in my case I have near 13 years of expertise working on huge companies (like design scenarios hehe) and last 6 years on a Service Provider (who divest itself into two, nice scenario for CsC we made in real life!)

For lab strategy I can summarize my key points, some of them you will find it along a lot of posts, but this was really useful for me:

Color scheme for highlighting

I do really quick reading and that has a caveat, you can omit stuff, so be sure that you read all of the sentences and also take a color scheme to highlight information. I choose a real simple one: Green is good design option taken, Red is bad design or caveat on design, Yellow is constraint or requirement. I want to use pink for IGP info but i realize that info is so mixed that in second scenario I back to the roots and only use those three.

Be a chameleon

This is the mind status that you have to reach in the exam, you will have to be a chameleon to read all requirements and constraints and be ready to transform yourself in the designer of ABC Company to take the best choice, But inmmediately , in next scenario you will transform yourself into a Service provider architect being part of team who is evaluating X or Y technology. All this happens really fast and you have plenty of info around you to support you in this transformation, what you need is the ability to quickly focus in the job role that they have assigned to you, gather the info and take the best shot.

Suboptimal routing still works right?

I do really love to do comparisons of real life and technology and in this case I can think in suboptimal routing as a wrong (or not so good) branch in exam. If you have readed all Cisco Live material of DE you may be aware that in scenarios there is always several branches based on your selections, so even if you're not in the best branch you can get points (STOP! if you're not reading Cisco Live info for DE take a moment to review that, is key to understand exam flow). So in the time frame of an scenario the branch will become your life, so being able to manage bad decissions is important, because you will realize that you don't choose the best path (trust me, it happens) but they will provide you options to understand why you make that choice and you will have to choose the not so good and not so bad option to be able to continue, so don't get worried suboptimal routing still works :) and we know next time you will not face same mistake :)

Hope this help all the next aspirants, those new and those not so new. Also for those which have passed lab, congrats! and inputs are always welcome.

Friday, February 24, 2017

CCDE... happy ending

10:25 AM

Well guys, you may note that I was out of blogging for a while, and there was a reason behind that... CCDE was so time consuming, I spend at least 4 to 6 hours last two months with a my awesome study group debating scenarios, design choices and pros/cons of each of them... it was real hard, it requires ****a lot**** of reading but definitely worth it!! Past Feb 22nd I pass the lab, it was at my first attempt and I feel very lucky about that but it was really about training, reading, understanding and debate with my colleagues what make me succeed, I will write down a post with strategy and training used as well, but at this moment just want to take me some time to write down this line and congratulate all my friends and colleagues who have taken the CCDE lab, passed or not, was a real good experience and an amazing learning!

Keep you updated guys :)
CCDE 2017::27

Living in the underlay