Mainly Networking, SDN, Automation, Datacenter and OpenStack as an overlay for my life

Thursday, May 18, 2017

Stretched DC, really?... ok, for L3, BGP conditional forwarding

A long time ago (I think it was years back) I was reviewing a DR solution for some internal customer who has two datacenter and a DCI between them (dark fiber). They moved initially to a stretched design extending vlans from each site and using L3 gateway on one side only at a time, since as a business requirement traffic should always leave from primary DC. However they were expecting some kind of solution to be able to automatically switchover to secondary DC in case of a failure on DC1.

For this cases it's always a pleasure to read Ivan and see how he predicts the design issues that I will face in the future (Stretched DCI), hopefully no stateful firewalls were involved here.

The main issue was not only to detect which side is alive (which is not easy without a witness, and we don't have one at all) but also how to decide which traffic should be served and from where.

So here is a big stop. After keep going with this we need to take some assumptions and business decisions:

  • If DC1 site fails but DCI and DC2 site alive, traffic will enter from DC2 side and traverse the DCI.
  • If DCI fails, traffic will continue being served from DC1 for stretched VLANs subnets, this implies move by other method those servers to the surviving side or at least shut them down.
  • If DC2 site fails but DCI and DC1 site alive, traffic will enter DC1 side and traverse DCI to reach DC2 side servers.
  • Traffic should leave and enter from DC1 whenever possible and DC2 site should not be used unless strictly necessary (this was imposed by customer)

So after reviewing lot of options, and assuming that eventually we can fail and working around that (and the fact that we need to do a stretched cluster after all) we came across a nice BGP feature which is called conditional forwarding. 
Just for your reference, BGP Conditional forwarding allows us to advertise a given network based on the information that we have in our FIB. This can be really useful for this scenario by defining an witness network from each side and advertise to each other, this should be a dummy network like 1.1.1.0/30 for DC1 and 1.1.2.0/30 for DC2 and the match statement will verify if we are getting this network advertisement and based on that will withdraw our advertisement or just let it flow.

Ok, so enough of reading and lets have a quick view on configuration (On NXOS) and behaviour:



Here is the config for the eBGP side of the DC2


Based on that normal behaviour would behave like this (routes will be withdrawn):


Now if we have a failure on DC1 side, conditional trigger will take place and start advertising from DC2.





Is this all what we need? Definitely No... There are still lot of things to resolve and we don't have an optimal design (we can discuss here, if we are meeting business requirements is there anything else to do?), but apart from that notice that stretching a VLAN is not a good choice, guess why? you're extending your fault domain and that doesn't simplify things it also make more complex the isolation and detection. so let's start wondering why we made such poor decisions and why we can't start talking about application level aware resiliency, making our life better by allowing us to use different subnets/networks at each site being able to handle traffic in/out more flexible by leveraging existing methods (long talk about BGP attributes and policy control enters here).


Some references:

Cisco. (Agosto de 2010). Cisco IP Routing. http://www.cisco.com/en/US/tech/tk365/technologies_configuration_example09186a0080094309.shtml











, , , ,

Article By: Ariel Liguori

CCIE DC #55292 / VCIX-NV / JNCIP "Network Architect mainly focused on SDN/NFV, Openstack adoption, Datacenter technologies and automations running on top of it :) "

No comments:

Post a Comment