Network Working Group Y. Rekhter, Ed.
Request for Comments: 4271 T. Li, Ed.
Obsoletes: 1771 S. Hares, Ed.
Category: Standards Track January 2006
A Border Gateway Protocol 4 (BGP-4)
Status of This Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2006).
Abstract
This document discusses the Border Gateway Protocol (BGP), which is
an inter-Autonomous System routing protocol.
The primary function of a BGP speaking system is to exchange network
reachability information with other BGP systems. This network
reachability information includes information on the list of
Autonomous Systems (ASes) that reachability information traverses.
This information is sufficient for constructing a graph of AS
connectivity for this reachability from which routing loops may be
pruned, and, at the AS level, some policy decisions may be enforced.
BGP-4 provides a set of mechanisms for supporting Classless Inter-
Domain Routing (CIDR). These mechanisms include support for
advertising a set of destinations as an IP prefix, and eliminating
the concept of network "class" within BGP. BGP-4 also introduces
mechanisms that allow aggregation of routes, including aggregation of
AS paths.
This document obsoletes RFC 1771.
Table of Contents
1. Introduction ....................................................4
1.1. Definition of Commonly Used Terms ..........................4
1.2. Specification of Requirements ..............................6
2. Acknowledgements ................................................6
3. Summary of Operation ............................................7
3.1. Routes: Advertisement and Storage ..........................9
3.2. Routing Information Base ..................................10
4. Message Formats ................................................11
4.1. Message Header Format .....................................12
4.2. OPEN Message Format .......................................13
4.3. UPDATE Message Format .....................................14
4.4. KEEPALIVE Message Format ..................................21
4.5. NOTIFICATION Message Format ...............................21
5. Path Attributes ................................................23
5.1. Path Attribute Usage ......................................25
5.1.1. ORIGIN .............................................25
5.1.2. AS_PATH ............................................25
5.1.3. NEXT_HOP ...........................................26
5.1.4. MULTI_EXIT_DISC ....................................28
5.1.5. LOCAL_PREF .........................................29
5.1.6. ATOMIC_AGGREGATE ...................................29
5.1.7. AGGREGATOR .........................................30
6. BGP Error Handling. ............................................30
6.1. Message Header Error Handling .............................31
6.2. OPEN Message Error Handling ...............................31
6.3. UPDATE Message Error Handling .............................32
6.4. NOTIFICATION Message Error Handling .......................34
6.5. Hold Timer Expired Error Handling .........................34
6.6. Finite State Machine Error Handling .......................35
6.7. Cease .....................................................35
6.8. BGP Connection Collision Detection ........................35
7. BGP Version Negotiation ........................................36
8. BGP Finite State Machine (FSM) .................................37
8.1. Events for the BGP FSM ....................................38
8.1.1. Optional Events Linked to Optional Session
Attributes .........................................38
8.1.2. Administrative Events ..............................42
8.1.3. Timer Events .......................................46
8.1.4. TCP Connection-Based Events ........................47
8.1.5. BGP Message-Based Events ...........................49
8.2. Description of FSM ........................................51
8.2.1. FSM Definition .....................................51
8.2.1.1. Terms "active" and "passive" ..............52
8.2.1.2. FSM and Collision Detection ...............52
8.2.1.3. FSM and Optional Session Attributes .......52
8.2.1.4. FSM Event Numbers .........................53
8.2.1.5. FSM Actions that are Implementation
Dependent .................................53
8.2.2. Finite State Machine ...............................53
9. UPDATE Message Handling ........................................75
9.1. Decision Process ..........................................76
9.1.1. Phase 1: Calculation of Degree of Preference .......77
9.1.2. Phase 2: Route Selection ...........................77
9.1.2.1. Route Resolvability Condition .............79
9.1.2.2. Breaking Ties (Phase 2) ...................80
9.1.3. Phase 3: Route Dissemination .......................82
9.1.4. Overlapping Routes .................................83
9.2. Update-Send Process .......................................84
9.2.1. Controlling Routing Traffic Overhead ...............85
9.2.1.1. Frequency of Route Advertisement ..........85
9.2.1.2. Frequency of Route Origination ............85
9.2.2. Efficient Organization of Routing Information ......86
9.2.2.1. Information Reduction .....................86
9.2.2.2. Aggregating Routing Information ...........87
9.3. Route Selection Criteria ..................................89
9.4. Originating BGP routes ....................................89
10. BGP Timers ....................................................90
Appendix A. Comparison with RFC 1771 .............................92
Appendix B. Comparison with RFC 1267 .............................93
Appendix C. Comparison with RFC 1163 .............................93
Appendix D. Comparison with RFC 1105 .............................94
Appendix E. TCP Options that May Be Used with BGP ................94
Appendix F. Implementation Recommendations .......................95
Appendix F.1. Multiple Networks Per Message .........95
Appendix F.2. Reducing Route Flapping ...............96
Appendix F.3. Path Attribute Ordering ...............96
Appendix F.4. AS_SET Sorting ........................96
Appendix F.5. Control Over Version Negotiation ......96
Appendix F.6. Complex AS_PATH Aggregation ...........96
Security Considerations ...........................................97
IANA Considerations ...............................................99
Normative References .............................................101
Informative References ...........................................101
1. Introduction
The Border Gateway Protocol (BGP) is an inter-Autonomous System
routing protocol.
The primary function of a BGP speaking system is to exchange network
reachability information with other BGP systems. This network
reachability information includes information on the list of
Autonomous Systems (ASes) that reachability information traverses.
This information is sufficient for constructing a graph of AS
connectivity for this reachability, from which routing loops may be
pruned and, at the AS level, some policy decisions may be enforced.
BGP-4 provides a set of mechanisms for supporting Classless Inter-
Domain Routing (CIDR) [RFC1518, RFC1519]. These mechanisms include
support for advertising a set of destinations as an IP prefix and
eliminating the concept of network "class" within BGP. BGP-4 also
introduces mechanisms that allow aggregation of routes, including
aggregation of AS paths.
Routing information exchanged via BGP supports only the destination-
based forwarding paradigm, which assumes that a router forwards a
packet based solely on the destination address carried in the IP
header of the packet. This, in turn, reflects the set of policy
decisions that can (and cannot) be enforced using BGP. BGP can
support only those policies conforming to the destination-based
forwarding paradigm.
1.1. Definition of Commonly Used Terms
This section provides definitions for terms that have a specific
meaning to the BGP protocol and that are used throughout the text.
Adj-RIB-In
The Adj-RIBs-In contains unprocessed routing information that has
been advertised to the local BGP speaker by its peers.
Adj-RIB-Out
The Adj-RIBs-Out contains the routes for advertisement to specific
peers by means of the local speaker's UPDATE messages.
Autonomous System (AS)
The classic definition of an Autonomous System is a set of routers
under a single technical administration, using an interior gateway
protocol (IGP) and common metrics to determine how to route
packets within the AS, and using an inter-AS routing protocol to
determine how to route packets to other ASes. Since this classic
definition was developed, it has become common for a single AS to
use several IGPs and, sometimes, several sets of metrics within an
AS. The use of the term Autonomous System stresses the fact that,
even when multiple IGPs and metrics are used, the administration
of an AS appears to other ASes to have a single coherent interior
routing plan, and presents a consistent picture of the
destinations that are reachable through it.
BGP Identifier
A 4-octet unsigned integer that indicates the BGP Identifier of
the sender of BGP messages. A given BGP speaker sets the value of
its BGP Identifier to an IP address assigned to that BGP speaker.
The value of the BGP Identifier is determined upon startup and is
the same for every local interface and BGP peer.
BGP speaker
A router that implements BGP.
EBGP
External BGP (BGP connection between external peers).
External peer
Peer that is in a different Autonomous System than the local
system.
Feasible route
An advertised route that is available for use by the recipient.
IBGP
Internal BGP (BGP connection between internal peers).
Internal peer
Peer that is in the same Autonomous System as the local system.
IGP
Interior Gateway Protocol - a routing protocol used to exchange
routing information among routers within a single Autonomous
System.
Loc-RIB
The Loc-RIB contains the routes that have been selected by the
local BGP speaker's Decision Process.
NLRI
Network Layer Reachability Information.
Route
A unit of information that pairs a set of destinations with the
attributes of a path to those destinations. The set of
destinations are systems whose IP addresses are contained in one
IP address prefix carried in the Network Layer Reachability
Information (NLRI) field of an UPDATE message. The path is the
information reported in the path attributes field of the same
UPDATE message.
RIB
Routing Information Base.
Unfeasible route
A previously advertised feasible route that is no longer available
for use.
1.2. Specification of Requirements
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2. Acknowledgements
This document was originally published as [RFC1267] in October 1991,
jointly authored by Kirk Lougheed and Yakov Rekhter.
We would like to express our thanks to Guy Almes, Len Bosack, and
Jeffrey C. Honig for their contributions to the earlier version
(BGP-1) of this document.
We would like to specially acknowledge numerous contributions by
Dennis Ferguson to the earlier version of this document.
We would like to explicitly thank Bob Braden for the review of the
earlier version (BGP-2) of this document, and for his constructive
and valuable comments.
We would also like to thank Bob Hinden, Director for Routing of the
Internet Engineering Steering Group, and the team of reviewers he
assembled to review the earlier version (BGP-2) of this document.
This team, consisting of Deborah Estrin, Milo Medin, John Moy, Radia
Perlman, Martha Steenstrup, Mike St. Johns, and Paul Tsuchiya, acted
with a strong combination of toughness, professionalism, and
courtesy.
Certain sections of the document borrowed heavily from IDRP
[IS10747], which is the OSI counterpart of BGP. For this, credit
should be given to the ANSI X3S3.3 group chaired by Lyman Chapin and
to Charles Kunzinger, who was the IDRP editor within that group.
We would also like to thank Benjamin Abarbanel, Enke Chen, Edward
Crabbe, Mike Craren, Vincent Gillet, Eric Gray, Jeffrey Haas, Dimitry
Haskin, Stephen Kent, John Krawczyk, David LeRoy, Dan Massey,
Jonathan Natale, Dan Pei, Mathew Richardson, John Scudder, John
Stewart III, Dave Thaler, Paul Traina, Russ White, Curtis Villamizar,
and Alex Zinin for their comments.
We would like to specially acknowledge Andrew Lange for his help in
preparing the final version of this document.
Finally, we would like to thank all the members of the IDR Working
Group for their ideas and the support they have given to this
document.
3. Summary of Operation
The Border Gateway Protocol (BGP) is an inter-Autonomous System
routing protocol. It is built on experience gained with EGP (as
defined in [RFC904]) and EGP usage in the NSFNET Backbone (as
described in [RFC1092] and [RFC1093]). For more BGP-related
information, see [RFC1772], [RFC1930], [RFC1997], and [RFC2858].
The primary function of a BGP speaking system is to exchange network
reachability information with other BGP systems. This network
reachability information includes information on the list of
Autonomous Systems (ASes) that reachability information traverses.
This information is sufficient for constructing a graph of AS
connectivity, from which routing loops may be pruned, and, at the AS
level, some policy decisions may be enforced.
In the context of this document, we assume that a BGP speaker
advertises to its peers only those routes that it uses itself (in
this context, a BGP speaker is said to "use" a BGP route if it is the
most preferred BGP route and is used in forwarding). All other cases
are outside the scope of this document.
In the context of this document, the term "IP address" refers to an
IP Version 4 address [RFC791].
Routing information exchanged via BGP supports only the destination-
based forwarding paradigm, which assumes that a router forwards a
packet based solely on the destination address carried in the IP
header of the packet. This, in turn, reflects the set of policy
decisions that can (and cannot) be enforced using BGP. Note that
some policies cannot be supported by the destination-based forwarding
paradigm, and thus require techniques such as source routing (aka
explicit routing) to be enforced. Such policies cannot be enforced
using BGP either. For example, BGP does not enable one AS to send
traffic to a neighboring AS for forwarding to some destination
(reachable through but) beyond that neighboring AS, intending that
the traffic take a different route to that taken by the traffic
originating in the neighboring AS (for that same destination). On
the other hand, BGP can support any policy conforming to the
destination-based forwarding paradigm.
BGP-4 provides a new set of mechanisms for supporting Classless
Inter-Domain Routing (CIDR) [RFC1518, RFC1519]. These mechanisms
include support for advertising a set of destinations as an IP prefix
and eliminating the concept of a network "class" within BGP. BGP-4
also introduces mechanisms that allow aggregation of routes,
including aggregation of AS paths.
This document uses the term `Autonomous System' (AS) throughout. The
classic definition of an Autonomous System is a set of routers under
a single technical administration, using an interior gateway protocol
(IGP) and common metrics to determine how to route packets within the
AS, and using an inter-AS routing protocol to determine how to route
packets to other ASes. Since this classic definition was developed,
it has become common for a single AS to use several IGPs and,
sometimes, several sets of metrics within an AS. The use of the term
Autonomous System stresses the fact that, even when multiple IGPs and
metrics are used, the administration of an AS appears to other ASes
to have a single coherent interior routing plan and presents a
consistent picture of the destinations that are reachable through it.
BGP uses TCP [RFC793] as its transport protocol. This eliminates the
need to implement explicit update fragmentation, retransmission,
acknowledgement, and sequencing. BGP listens on TCP port 179. The
error notification mechanism used in BGP assumes that TCP supports a
"graceful" close (i.e., that all outstanding data will be delivered
before the connection is closed).
A TCP connection is formed between two systems. They exchange
messages to open and confirm the connection parameters.
The initial data flow is the portion of the BGP routing table that is
allowed by the export policy, called the Adj-Ribs-Out (see 3.2).
Incremental updates are sent as the routing tables change. BGP does
not require a periodic refresh of the routing table. To allow local
policy changes to have the correct effect without resetting any BGP
connections, a BGP speaker SHOULD either (a) retain the current
version of the routes advertised to it by all of its peers for the
duration of the connection, or (b) make use of the Route Refresh
extension [RFC2918].
KEEPALIVE messages may be sent periodically to ensure that the
connection is live. NOTIFICATION messages are sent in response to
errors or special conditions. If a connection encounters an error
condition, a NOTIFICATION message is sent and the connection is
closed.
A peer in a different AS is referred to as an external peer, while a
peer in the same AS is referred to as an internal peer. Internal BGP
and external BGP are commonly abbreviated as IBGP and EBGP.
If a particular AS has multiple BGP speakers and is providing transit
service for other ASes, then care must be taken to ensure a
consistent view of routing within the AS. A consistent view of the
interior routes of the AS is provided by the IGP used within the AS.
For the purpose of this document, it is assumed that a consistent
view of the routes exterior to the AS is provided by having all BGP
speakers within the AS maintain IBGP with each other.
This document specifies the base behavior of the BGP protocol. This
behavior can be, and is, modified by extension specifications. When
the protocol is extended, the new behavior is fully documented in the
extension specifications.
3.1. Routes: Advertisement and Storage
For the purpose of this protocol, a route is defined as a unit of
information that pairs a set of destinations with the attributes of a
path to those destinations. The set of destinations are systems
whose IP addresses are contained in one IP address prefix that is
carried in the Network Layer Reachability Information (NLRI) field of
an UPDATE message, and the path is the information reported in the
path attributes field of the same UPDATE message.
Routes are advertised between BGP speakers in UPDATE messages.
Multiple routes that have the same path attributes can be advertised
in a single UPDATE message by including multiple prefixes in the NLRI
field of the UPDATE message.
Routes are stored in the Routing Information Bases (RIBs): namely,
the Adj-RIBs-In, the Loc-RIB, and the Adj-RIBs-Out, as described in
Section 3.2.
If a BGP speaker chooses to advertise a previously received route, it
MAY add to, or modify, the path attributes of the route before
advertising it to a peer.
BGP provides mechanisms by which a BGP speaker can inform its peers
that a previously advertised route is no longer available for use.
There are three methods by which a given BGP speaker can indicate
that a route has been withdrawn from service:
a) the IP prefix that expresses the destination for a previously
advertised route can be advertised in the WITHDRAWN ROUTES
field in the UPDATE message, thus marking the associated route
as being no longer available for use,
b) a replacement route with the same NLRI can be advertised, or
c) the BGP speaker connection can be closed, which implicitly
removes all routes the pair of speakers had advertised to each
other from service.
Changing the attribute(s) of a route is accomplished by advertising a
replacement route. The replacement route carries new (changed)
attributes and has the same address prefix as the original route.
3.2. Routing Information Base
The Routing Information Base (RIB) within a BGP speaker consists of
three distinct parts:
a) Adj-RIBs-In: The Adj-RIBs-In stores routing information learned
from inbound UPDATE messages that were received from other BGP
speakers. Their contents represent routes that are available
as input to the Decision Process.
b) Loc-RIB: The Loc-RIB contains the local routing information the
BGP speaker selected by applying its local policies to the
routing information contained in its Adj-RIBs-In. These are
the routes that will be used by the local BGP speaker. The
next hop for each of these routes MUST be resolvable via the
local BGP speaker's Routing Table.
c) Adj-RIBs-Out: The Adj-RIBs-Out stores information the local BGP
speaker selected for advertisement to its peers. The routing
information stored in the Adj-RIBs-Out will be carried in the
local BGP speaker's UPDATE messages and advertised to its
peers.
In summary, the Adj-RIBs-In contains unprocessed routing information
that has been advertised to the local BGP speaker by its peers; the
Loc-RIB contains the routes that have been selected by the local BGP
speaker's Decision Process; and the Adj-RIBs-Out organizes the routes
for advertisement to specific peers (by means of the local speaker's
UPDATE messages).
Although the conceptual model distinguishes between Adj-RIBs-In,
Loc-RIB, and Adj-RIBs-Out, this neither implies nor requires that an
implementation must maintain three separate copies of the routing
information. The choice of implementation (for example, 3 copies of
the information vs 1 copy with pointers) is not constrained by the
protocol.
Routing information that the BGP speaker uses to forward packets (or
to construct the forwarding table used for packet forwarding) is
maintained in the Routing Table. The Routing Table accumulates
routes to directly connected networks, static routes, routes learned
from the IGP protocols, and routes learned from BGP. Whether a
specific BGP route should be installed in the Routing Table, and
whether a BGP route should override a route to the same destination
installed by another source, is a local policy decision, and is not
specified in this document. In addition to actual packet forwarding,
the Routing Table is used for resolution of the next-hop addresses
specified in BGP updates (see Section 5.1.3).
4. Message Formats
This section describes message formats used by BGP.
BGP messages are sent over TCP connections. A message is processed
only after it is entirely received. The maximum message size is 4096
octets. All implementations are required to support this maximum
message size. The smallest message that may be sent consists of a
BGP header without a data portion (19 octets).
All multi-octet fields are in network byte order.
4.1. Message Header Format
Each message has a fixed-size header. There may or may not be a data
portion following the header, depending on the message type. The
layout of these fields is shown below:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ +
| |
+ +
| Marker |
+ +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length | Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Marker:
This 16-octet field is included for compatibility; it MUST be
set to all ones.
Length:
This 2-octet unsigned integer indicates the total length of the
message, including the header in octets. Thus, it allows one
to locate the (Marker field of the) next message in the TCP
stream. The value of the Length field MUST always be at least
19 and no greater than 4096, and MAY be further constrained,
depending on the message type. "padding" of extra data after
the message is not allowed. Therefore, the Length field MUST
have the smallest value required, given the rest of the
message.
Type:
This 1-octet unsigned integer indicates the type code of the
message. This document defines the following type codes:
1 - OPEN
2 - UPDATE
3 - NOTIFICATION
4 - KEEPALIVE
[RFC2918] defines one more type code.
4.2. OPEN Message Format
After a TCP connection is established, the first message sent by each
side is an OPEN message. If the OPEN message is acceptable, a
KEEPALIVE message confirming the OPEN is sent back.
In addition to the fixed-size BGP header, the OPEN message contains
the following fields:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+
| Version |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| My Autonomous System |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Hold Time |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| BGP Identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opt Parm Len |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Optional Parameters (variable) |
| |