GENWiki

Premier IT Outsourcing and Support Services within the UK

User Tools

Site Tools


rfc:rfc2991

Network Working Group D. Thaler Request for Comments: 2991 Microsoft Category: Informational C. Hopps

                                                   NextHop Technologies
                                                          November 2000
    Multipath Issues in Unicast and Multicast Next-Hop Selection

Status of this Memo

 This memo provides information for the Internet community.  It does
 not specify an Internet standard of any kind.  Distribution of this
 memo is unlimited.

Copyright Notice

 Copyright (C) The Internet Society (2000).  All Rights Reserved.

Abstract

 Various routing protocols, including Open Shortest Path First (OSPF)
 and Intermediate System to Intermediate System (ISIS), explicitly
 allow "Equal-Cost Multipath" (ECMP) routing.  Some router
 implementations also allow equal-cost multipath usage with RIP and
 other routing protocols.  The effect of multipath routing on a
 forwarder is that the forwarder potentially has several next-hops for
 any given destination and must use some method to choose which next-
 hop should be used for a given data packet.

1. Introduction

 Various routing protocols, including OSPF and ISIS, explicitly allow
 "Equal-Cost Multipath" routing.  Some router implementations also
 allow equal-cost multipath usage with RIP and other routing
 protocols.  Using equal-cost multipath means that if multiple equal-
 cost routes to the same destination exist, they can be discovered and
 used to provide load balancing among redundant paths.
 The effect of multipath routing on a forwarder is that the forwarder
 potentially has several next-hops for any given destination and must
 use some method to choose which next-hop should be used for a given
 data packet.  This memo summarizes current practices, problems, and
 solutions.

Thaler & Hopps Informational [Page 1] RFC 2991 Multipath Issues November 2000

2. Concerns

 Several router implementations allow multipath forwarding.  This is
 sometimes done naively via round-robin, where each packet matching a
 given destination route is forwarded using the subsequent next-hop,
 in a round-robin fashion.  This does provide a form of load
 balancing, but there are several problems with approaches such as
 round-robin or random:
 Variable Path MTU
       Since each of the redundant paths may have a different MTU,
       this means that the overall path MTU can change on a packet-
       by-packet basis, negating the usefulness of path MTU discovery.
 Variable Latencies
       Since each of the redundant paths may have a different latency
       involved, having packets take separate paths can cause packets
       to always arrive out of order, increasing delivery latency and
       buffering requirements.
       Packet reordering causes TCP to believe that loss has taken
       place when packets with higher sequence numbers arrive before
       an earlier one.  When three or more packets are received before
       a "late" packet, TCP enters a mode called "fast-retransmit" [6]
       which consumes extra bandwidth (which could potentially cause
       more loss, decreasing throughput) as it attempts to
       unnecessarily retransmit the delayed packet(s).  Hence,
       reordering can be detrimental to network performance.
 Debugging
       Common debugging utilities such as ping and traceroute are much
       less reliable in the presence of multiple paths and may even
       present completely wrong results.
 In multicast routing, the problem with multiple paths is that
 multicast routing protocols prevent loops and duplicates by
 constructing a single tree to all receivers of the same group
 address.  Multicast routing protocols deployed today (DVMRP, PIM-DM,
 PIM-SM) [2] construct shortest-path trees rooted at either the
 source, or another router known as a Core or Rendezvous Point.
 Hence, the way they ensure that duplicates will not arise is that a
 given tree must use only a single next-hop towards the root of the
 tree.

Thaler & Hopps Informational [Page 2] RFC 2991 Multipath Issues November 2000

3. Requirements

 In the remainder of this document, we will use the term "flow" to
 represent the granularity at which the router keeps state (if at all)
 for classes of traffic.  The exact definition of a flow may depend on
 the actual implementation.  For example, a flow might be identified
 solely by destination address, or it might be identified by (source
 address, destination address, protocol id) triplet.  Hence "flow" is
 not necessarily synonymous with the term "microflow" as used in RFC
 2474 [7], which also includes port numbers.  Indeed, including
 transport-layer information in the next-hop selection process can
 actually be problematic.  For example, if packets are fragmented, the
 transport-layer information may not be available in every packet.
 Furthermore, having the choice of path depend on transport-layer
 fields may negate the benefit of caching information such as MTU for
 use in subsequent connections between the same endpoints.
 All of the problems outlined in the previous section arise when
 packets in the same unicast or multicast "flow" are split among
 multiple paths.  The natural solution is therefore to ensure that
 packets for the same flow always use the same path.
 Two additional features are desirable:
 Minimal disruption
       When multipath is used, meaning that multiple routes contribute
       valid next-hops, the chances are higher of routes being added
       and deleted from consideration than when only the "best" route
       is used (in which case metric changes in alternate routes have
       no effect on traffic paths).  Since a higher number of routes
       may actually be used for forwarding when multipath is in use,
       the potential for packet reordering and packet loss due to
       route flaps can be much greater than when not using multipath.
       Hence, it is desirable to minimize the number of active flows
       affected by the addition or deletion of another next-hop.
 Fast implementation
       The amount of additional computation required to forward a
       packet should be small.  For example, when doing round-robin,
       this computation might consist of incrementing (modulo the
       number of next-hops) a next-hop index.

4. Solutions

 We now provide three possible methods for improving the performance
 of multipath and then discuss their applicability to unicast and
 multicast forwarding.

Thaler & Hopps Informational [Page 3] RFC 2991 Multipath Issues November 2000

 Modulo-N Hash
       To select a next-hop from the list of N next-hops, the router
       performs a modulo-N hash over the packet header fields that
       identify a flow.  This has the advantage of being fast, at the
       expense of (N-1)/N of all flows changing paths whenever a
       next-hop is added or removed.
 Hash-Threshold
       The router first selects a key by performing a hash over the
       packet header fields that identify the flow.  The N next-hops
       have been assigned unique regions in the hash function's output
       space.  By comparing the hash value against region boundaries
       the router can determine which region the hash value belongs to
       and thus which next-hop to use.  This method has the advantage
       of only affecting flows near the region boundaries (or
       thresholds) when next-hops are added or removed.  For ECMP
       hash-threshold's lookup can be done with a simple division
       (hash_value / fixed_region_size).  When a next-hop is added or
       removed, between 1/4 and 1/2 of all flows change paths.  An
       analysis of this method can be found in [3].
 Highest Random Weight (HRW)
       The router computes a key for EACH next-hop by performing a
       hash over the packet header fields that identify the flow, as
       well as over the address of the next-hop.  The router then
       chooses the next-hop with the highest resulting key value [4].
       This has the advantage of minimizing the number of flows
       affected by a next-hop addition or deletion (only 1/N of them),
       but is approximately N times as expensive as a modulo-N hash.
 The applicability of these three alternatives depends on (at least)
 two factors: whether the forwarder maintains per-flow state, and how
 precious CPU is to a multipath forwarder.
 Some routers may maintain per-flow state for reasons other than for
 supporting multipath.  For example, routers typically keep per-flow
 state for multicast flows so that they can maintain the list of
 interfaces to which packets in the flow should be copied.
 If per-flow state is maintained in a multipath forwarder, then
 computation of the next-hop can be done by the router at state
 creation time.  This entails no additional computations at packet
 forwarding time compared with normal forwarding to a single next-hop,
 since the next-hop is precomputed.  In this case, any method can be
 used, including round-robin, random, modulo-N, hash-threshold or HRW.
 Hash functions such as modulo-N, hash-threshold and HRW are better if
 the forwarder state may be deleted for any reason during the lifetime
 of a flow since subsequent next-hop computations by the router will

Thaler & Hopps Informational [Page 4] RFC 2991 Multipath Issues November 2000

 always select the same path.  This also improves the usefulness of
 debugging utilities such as traceroute.  Finally, to maximize the
 stability of paths (and hence the usefulness of traceroute, etc.),
 the use of HRW is recommended over the other methods mentioned
 herein.
 If per-flow state is not maintained by the forwarder, then using
 multiple next-hops requires that the next-hop be calculated at packet
 arrival time.  When CPU is more precious than stability of flow
 paths, hash-threshold is recommended over the other methods mentioned
 herein.

4.1. Unicast Forwarding

 Depending on the implementation, unicast forwarding may or may not
 keep per-flow state.  We recommend that where forwarder
 implementations keep flow state, routers should use HRW at state
 creation time (and next-hop deletion time) to select the next-hop,
 and that forwarders without per-flow state use hash-threshold.

4.2. Multicast Forwarding

 Today's multicast forwarding engines use a cache of forwarding
 entries indexed by group (or group prefix) and source (or source
 prefix).  This means that today's multicast forwarder's always keep
 per-flow state, although for some multicast routing protocols, the
 "flow" may be fairly coarse (e.g., traffic from all sources to the
 same destination).  Since per-flow state is kept by the forwarder, it
 is recommended that the router always use HRW to select the next-hop.
 Routers using explicit-joining protocols such as PIM-SM [5] should
 thus use the multipath information when determining to which neighbor
 a join message should be sent.  For example, when multiple next-hops
 exist for a given Rendezvous Point (RP) toward which a (*,G) Join
 should be sent, it is recommended that HRW be used to select the
 next-hop to use for each group.

5. Applicability

 The algorithms discussed above (except round-robin) all rely on some
 form of hash function.  Equal flow distribution is achieved when the
 hash function is uniformly distributed.  Since the commonly used hash
 functions only become uniformly distributed when the number of inputs
 is relatively large, these algorithms are more applicable to routers
 used to route many flows, than in, for example, a small business
 setting.

Thaler & Hopps Informational [Page 5] RFC 2991 Multipath Issues November 2000

6. Redundant Parallel Links

 A related problem occurs when multiple parallel links are used
 between the same pair of routers.  A common solution is to bundle the
 two links together into a "super"-link when is then used for routing.
 For multicast forwarding, this results in the two links being reduced
 to a single next-hop (over the combined link) which can be used to
 prevent duplicates.  When a unicast or multicast packet is queued to
 the combined link, some method, such as those discussed earlier, is
 still required to determine the physical link on which to transmit
 the packet.  If the parallel links are identical, then most of the
 concerns discussed in this document are avoided with the combined
 link.  The exception is packet reordering, which can still occur with
 round-robin, adversely affecting TCP.

7. Security Considerations

 This document discusses issues with various methods of choosing a
 next-hop from among multiple valid next-hops.  As such, it does not
 directly impact the security of the Internet infrastructure or its
 applications.
 One issue that is worth mentioning, however, is that when next-hop
 selection is predictable, an attacker can synthesize traffic that
 will all hash the same, making it possible to launch a denial-of-
 service attack that overloads a particular path.  Since a special
 case of this is when the same (single) next-hop is always selected,
 such an attack is easiest when multipath is not being used.
 Introducing multipath routing can make such an attack more difficult;
 the more unpredictable the hash is, the harder it becomes to conduct
 a denial-of-service attack against any single link.

Thaler & Hopps Informational [Page 6] RFC 2991 Multipath Issues November 2000

8. References

 [1]  Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998.
 [2]  Maufer, T., "Deploying IP Multicast in the Enterprise",
      Prentice-Hall, 1998.
 [3]  Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm", RFC
      2992, November 2000.
 [4]  Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to
      Increase Hit Rates", IEEE/ACM Transactions on Networking,
      February 1998.
 [5]  Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S.,
      Handley, M., Jacobson, V., Liu, C., Sharma, P. and L. Wei,
      "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol
      Specification", RFC 2362, June 1998.
 [6]  Allman, M., Paxson, V. and W. Stevens, "TCP Congestion Control",
      RFC 2581, April 1999.
 [7]  Nichols, K., Blake, S., Baker, F. and D. Black., "Definition of
      the Differentiated Services Field (DS Field) in the IPv4 and
      IPv6 Headers", RFC 2474, December 1998.

Thaler & Hopps Informational [Page 7] RFC 2991 Multipath Issues November 2000

9. Authors' Addresses

 Dave Thaler
 Microsoft
 One Microsoft Way
 Redmond, WA  98052
 Phone: +1 425 703 8835
 EMail: dthaler@dthaler.microsoft.com
 Christian E. Hopps
 NextHop Technologies, Inc.
 517 W. William Street
 Ann Arbor, MI 48103-4943
 U.S.A
 Phone: +1 734 936 0291
 EMail: chopps@nexthop.com

Thaler & Hopps Informational [Page 8] RFC 2991 Multipath Issues November 2000

10. Full Copyright Statement

 Copyright (C) The Internet Society (2000).  All Rights Reserved.
 This document and translations of it may be copied and furnished to
 others, and derivative works that comment on or otherwise explain it
 or assist in its implementation may be prepared, copied, published
 and distributed, in whole or in part, without restriction of any
 kind, provided that the above copyright notice and this paragraph are
 included on all such copies and derivative works.  However, this
 document itself may not be modified in any way, such as by removing
 the copyright notice or references to the Internet Society or other
 Internet organizations, except as needed for the purpose of
 developing Internet standards in which case the procedures for
 copyrights defined in the Internet Standards process must be
 followed, or as required to translate it into languages other than
 English.
 The limited permissions granted above are perpetual and will not be
 revoked by the Internet Society or its successors or assigns.
 This document and the information contained herein is provided on an
 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Acknowledgement

 Funding for the RFC Editor function is currently provided by the
 Internet Society.

Thaler & Hopps Informational [Page 9]

/data/webs/external/dokuwiki/data/pages/rfc/rfc2991.txt · Last modified: 2000/11/02 16:53 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki