Sunday, February 12, 2012

BGP Route Reflection

The original BGPv4 specification (RFC 1771 – A Border Gateway Protocol 4, BGP-4) does not define an intra-AS loop prevention mechanism; routers were therefore prohibited from propagating routes received from an IBGP peer to another IBGP peer; and hence requiring full-mesh IBGP sessions to be established between all BGP routers within an AS. Full-mesh IBGP is not scalable; the number of TCP sessions established for the BGP routers within an AS can be calculated using the formula n(n-1) / 2, in which n is the number of routers. The amount of routing update traffic exchanged between IBGP peers can consume a significant amount of bandwidth and impact the availability of the network resources for applications.

RFC 1966 – BGP Route Reflection – An alternative to full mesh IBGP was later released to introduce the implementation of route reflection as the alternative to full-mesh IBGP sessions. However, it defined that a BGP route reflector (RR) is not allowed to send an update received from a BGP route reflector client (RRC) back to the same route reflector client. As a result, the RR could not reuse the outbound BGP updates sent to a RRC for other RRCs and had to treat each RRC individually, resulting in significant performance degradation when a RR is associated with many RRCs and need to maintain different sets of outbound BGP updates for different RRCs.

RFC 4456, which obsoletes both RFC 1966 and RFC 2796 has relaxed the update propagation rule mentioned above by defining that a RR can now sends the same update to all RRCs, even if the original route was received from an RRC. The BGP ORIGINATOR_ID and CLUSTER_LIST attributes that were originally defined in RFC 1966 are used by the RRCs to detect route propagation loops. This relaxed update propagation rule allows a RR to group all the RRCs into a single update group, and generates a single set outbound BGP updates for all RRCs, resulting in significant performance improvements.
Note: The latest BGP standard as defined in RFC 4271 includes references to BGP route reflection.
Note: Cisco IOS Release 12.3(4)T and later implemented the BGP Dynamic Update Peer-Groups feature that conforms to RFC 4456.

BGP route reflection modifies the BGP split-horizon rule by allowing the BGP router configured as the RR to propagate or reflect routes learned by IBGP to other IBGP peers. BGP route reflection greatly reduces the number of IBGP sessions and the routing update traffic.

BGP Route Reflection

Route reflection is often used by ISPs when the numbers of internal routers and the neighbor statements become excessive. The administrative overhead for maintaining full-mesh IBGP networks is the neighbor statement must be added on all routers each time a new peer is added. An AS can have multiple route reflectors, both for the redundancy purpose and grouping to further reduce the number of IBGP sessions.

Route reflection does not affect the packet forwarding paths; but only the routing update distribution paths. However, routing loops can occur if route reflectors are configured incorrectly, eg: when 2 route reflector are configured to treat each other as route reflector clients.

Migrating to route reflectors involves a minimal configuration and does not need to be configured on all routers at once, as normal routers can coexist with route reflectors within an AS, in which they are simply normal IBGP peers with the route reflectors. RRCs should not peer with IBGP routers outside their associated cluster.

The AS_PATH attribute is used for loop prevention for BGP route advertisements across ASes. The ORIGINATOR_ID, CLUSTER_ID, and CLUSTER_LIST help prevent routing loops when implementing route reflection.

A cluster is referred to as the combination of the RR and its clients. IBGP peering between the clients is no longer requires, as the route reflector replicates routing updates between the clients. There is no defined limit for the number of RRCs that a RR may have; it is constrained by the processing capability of the RRs.

The route reflector will create the 4-byte ORIGINATOR_ID, an optional non-transitive BGP attribute that indicates the BGP Router ID of the originator of a route. When an update arrives back to the originator due to bad RR design and implementation, the originator discards it.

Usually a cluster has a single RR and the cluster is identified using the ORIGINATOR_ID. A cluster can have multiple RRs to increase redundancy and avoid single point of failure, in which all the RRs in the cluster must be assigned with a Cluster ID that allows the RRs in the cluster to recognize and discard updates originated from other RRs in the same cluster as well as reduces the number of updates that need to be stored in the BGP routing tables.

A route reflector cluster list is a sequence of Cluster IDs that a route has passed within an AS. When a RR replicates an update from its client to non-clients outside the cluster, it appends the local Cluster ID upon the cluster list; it creates a cluster list if the update does not contain one. A RR is able to notice whether an update is looped back into the same cluster due to bad design. If the local cluster ID is found in the cluster list of an update, the update is ignored and discarded.

When implementing route reflection within an AS, the IBGP routers can be divided into multiple groups that known as clusters, each having at least a RR and a few clients. Multiple RRs can exist in a cluster for redundancy purpose. The RRs from different clusters must be fully meshed through IBGP to ensure that all routes learned through EBGP are propagated throughout the AS. An IGP is still used just as it was before to advertise local routes and EBGP next-hop addresses.

BGP Route Reflection Design

The figure above shows an example of a BGP route reflection design. RT2, RT4, RT5, and RT6 form a cluster; RT3, RT7, and RT8 form another cluster. RT2 and RT3 are route reflectors. RT1, RT2, and RT3 are fully meshed through IBGP; while routers within a cluster are not fully meshed.

A route reflector propagates an update depending upon the type of peer that sent the update:
  • If the update is received from a non-client peer, it sends the update to all its RRCs only.
  • If the update is received from a RRC, it sends the update to all RRCs and non-client peers. [1]
  • If the update is received from an EBGP peer, it sends the update to all RRCs and non-client peers. [1]
[1] – Non-client peers including route reflectors in other clusters.
Note: RRs still send updates to their EBGP peers as normal as before implementing route reflection.

The first consideration when migrating to implement route reflection is which routers should be the RRs and which should be the RRCs. The design decision lies upon following the physical topology to ensure that both the physical and logical packet forwarding paths are the same. Route reflection design and implementation that does not follow the physical topology, in which RRCs are configured to associate with non-directly connected RRs, may results in routing loops.

Bad and Good Route Reflection Designs

The left portion of the figure above demonstrates how routing loops occur when route reflection is implemented without following the physical topology. RT1 is a RRC for both RRs, RT2 and RT3. The following happens in this bad route reflection design:
  • RT4 learns that the next-hop to reach is, as learnt from its RR – RT3.
  • RT5 learns that the next-hop to reach is, as learnt from its RR – RT2.
  • RT4 forwards a packet destined to through RT5 based on the best IGP route to the next-hop –
  • RT5 forwards a packet destined to through RT4 based on the best IGP route to the next-hop –
  • This is a routing loop.

The following happens in the good route reflection design as shown in the right portion of the figure:
  • RT4 learns that the next-hop to reach is, as learnt from its RR – RT2.
  • RT5 learns that the next-hop to reach is, as learnt from its RR – RT3.
  • RT4 forwards a packet destined to through RT2 based on the best IGP route to the next-hop – RT2 then forwards the packet to RT1.
  • RT5 forwards a packet destined to through RT3 based on the best IGP route to the next-hop – RT3 then forwards the packet to RT1.
  • There is no routing loop.

When implementing multiple route reflectors for redundancy purpose, it is really important to complement logical redundancy with physical redundancy; and it does not make sense to build route reflector redundancy if the physical redundancy itself does not exist!

Migrating to implement route reflection is easy, configure one RR at a time, and the remove the redundant IBGP sessions between the RRCs. The configuration only need to be implemented on those routers intended to be route reflectors. It is recommended to configure one RR per cluster.

Route reflection is not recommended for every topology, it is recommended only for ASes that have a large number of IBGP mesh. The route reflection concept introduces processing overhead on the route reflector server and might introduce routing loops and routing instability if configure incorrectly. Imposing complex techniques on situations that do not really need them could hurt more than help! The route reflection function is being performed only on the route reflector, all RRCs and non-clients are normal IBGP peers that have no concept and never notice that route reflection is in effect. RRCs are considered as such only because the RR treats them as RRCs.

Note that route reflectors propagate only the best path to a particular destination to its RRCs – when a RR learns multiple paths to the same destination from multiple RRCs, only a single best path that is selected by the best path algorithm will be propagated to other RRCs. Therefore, the number of paths available to reach a particular destination after implemented route reflection will likely lower than that of an IBGP full-mesh configuration.

The neighbor {ip-addr | peer-group-name} route-reflector-client BGP router subcommand configures a router as a BGP route reflector and configure the specified neighbor as its client. Use the bgp cluster-id {cluster-id} BGP router subcommand to assigns a Cluster ID on all RRs in a RR cluster when the RR cluster has one or more RRs. The maximum 4-byte Cluster ID can be specified in dotted or decimal format. The Cluster ID cannot be changed after the RRCs have been configured; unless the RR supports the Route Refresh capability that can readvertise it.
Note: Cluster ID is obsolete! Redundant route reflection setups where multiple RRs serve the same set of RRCs can lead to partial connectivity problem when multiple IBGP sessions are disrupted. The revised BGP best path selection algorithm ensures that a RR in a cluster always prefers route from a client with shorter CLUSTER_LIST over a reflected route; making the Cluster ID obsolete. This allows RRCs to peer with RRs or RRCs in other clusters. The Cluster ID should not be used in new network designs to increase the resilience of a network.
The classical case for route reflector redundancy described above group the route reflectors in the same cluster with all the RRs configured with the same Cluster ID and requires all the clients need have IBGP sessions with both RRs. The modern approach is to have only 1 route reflector per cluster, and only the RRCs that want the redundancy establish IBGP session with all the RRs.

Note: The NEXT_HOP attribute is remained intact when a prefix is reflected by a RR. The neighbor next-hop-self BGP router subcommand that configured on a route reflector affects only the next-hop of EBGP-learned routes; the next-hop of reflected IBGP routes will be remained intact.

Previously, route reflectors could only be used only in conjunction with peer groups when all the route reflector clients within a cluster were fully meshed due to some update exchange issues. Fortunately, this full-mesh requirement has been removed and route reflector clients of a route reflector can now be configured under a peer group and are not required to be fully meshed.

BGP Route Reflection Configuration

Below shows the BGP and IP routing tables on RT4 and RT5 after implemented RT2 and RT3 as RRs.
Note: RT1 has been configured with the neighbor next-hop-self command for RT2 and RT3.
RT4#sh ip bgp
BGP table version is 2, local router ID is
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*>i192.168.1.0               0    100      0 65000 i
RT4#sh ip bgp
BGP routing table entry for, version 2
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Flag: 0x820
  Not advertised to any peer
  65000 (metric 20) from (
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Originator:, Cluster list:
RT5#sh ip bgp
BGP table version is 2, local router ID is
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*>i192.168.1.0               0    100      0 65000 i

No comments:

Post a Comment