Enterprises that have several physical locations—branch offices, data centers, remote offices—often need robust, scalable, and resilient routing between sites. Static routes quickly become unwieldy and fragile. The Border Gateway Protocol (BGP), when configured correctly in a multi‑site context, enables dynamic, policy‑driven, failover‑capable routing between your sites.
In this article we’ll cover how to design, configure, and tune BGP across multiple sites—what settings and practices often get overlooked, how to minimize downtime risk, and how to manage routing policy for performance, security, and scalability.
What is BGP and Why Use It Between Sites
- BGP is a path‑vector protocol, designed to advertise and maintain routing information among autonomous systems (ASes) or between routers in different domains.
- In multi‑site enterprise networks, it enables dynamic route advertisement, failover, multipath redundancy, load balancing, and policy‑based routing.
- When you have several locations, especially with multiple ISPs or inter‑site WAN links, BGP avoids you having to manually configure static routes everywhere, and can dynamically adapt to link failures or path changes.
Key Design Considerations Before You Start
| Factor | Why It Matters |
|---|---|
| ASN Plan (single vs multiple ASNs) | Determines how your sites interconnect; whether you need iBGP or eBGP links, and how you advertise your prefixes externally. |
| Peering Topology | Full mesh? Hub‑and‑spoke? Partial mesh? Affects scalability and convergence time. |
| WAN Link Redundancy | Multiple WAN paths ensure failover, but you need consistent policies to avoid routing loops or blackholing. |
| Prefix Aggregation | Helps reduce routing table size, avoid propagation of many /32s or /24s across all peers. |
| Routing Policy (import/export) | Control which prefixes are advertised, which path is preferred, how to handle metric or local preference. |
| Security Controls | Authentication, prefix filtering, AS path filtering, avoiding route leaks. |
Step‑By‑Step: Configuring BGP Between Multiple Sites
Here’s a general procedure, with details that often go unnoticed.
1. Decide on iBGP vs eBGP Between Sites
- iBGP is typically used when multiple routers share the same ASN. It’s useful for internal site interconnections. In iBGP, each router must know routes via other routers; full mesh or route reflectors are needed.
- eBGP is used when connecting to routers in different ASNs—such as between your sites if you allocate different ASNs per site, or between your network and ISPs.
Choosing single ASN (with iBGP) vs. multiple ASNs (using eBGP between sites) has trade‑offs: single ASN with route reflectors is simpler for some control, but multiple ASNs give more isolation and policy control.
2. Establish Peering Links
- Use reliable, low‑latency WAN links between sites if possible.
- Use loopback addresses for BGP peering where feasible (and ensure IGP or static routes reach loopbacks).
- Set proper TTL and ensure peering works through your network (no blocking at intermediate firewalls).
3. Configure BGP Neighbors and Basic Settings
Typical settings include:
- Router BGP local ASN
- Neighbor configuration (IP, remote ASN)
- Router‑ID (choose a stable, non‑changing IP—often in a loopback)
- Timers (keepalive / hold time) tuned for links’ latency and reliability
4. Routing Policy
- Import filters: decide which routes from other sites or peers are accepted; perhaps limit by prefix, AS path, or communities.
- Export filters: decide which local prefixes should be advertised; avoid advertising internal or unneeded subnets.
- Use Local Preference (higher preference for preferred paths) to influence path selection.
- Use AS Path Prepending to de‑prefer certain paths when advertising to external peers.
- Use MED (Multi‑Exit Discriminator) where supported to suggest which of multiple exit points is preferred for external traffic.
5. Redundancy and Failover
- Use redundant links between sites; ensure BGP is aware of link failures.
- Consider using multiple routers per site for resilience.
- Use route reflection or BGP confederations if many routers are involved to reduce full mesh requirements and control update traffic.
6. Ensure Routing Convergence Speed
- Tune BGP timers (keepalive, hold time) but carefully—lower values improve detection of failures but may increase churn.
- Use fast failure detection on WAN links (e.g. BFD—Bidirectional Forwarding Detection) if routers support it, so BGP sessions drop quickly when the link goes down.
- Avoid large flaps (frequent up/down) on links—look for root causes (interfaces, physical cabling, ISP issues).
7. Handle Prefix Aggregation and Summarization
- Aggregate prefixes at site boundaries to reduce the number of prefixes each peer must carry.
- Avoid over‑aggregation that hides needed granularity or causes suboptimal routing.
- Use summarization only where topologically acceptable (e.g. when contiguous address space).
8. Security Practices
- Use authentication (MD5 or better) on BGP session if supported.
- Implement prefix lists to prevent unintended route advertisements.
- Implement AS path filters to drop obviously malformed or undesired paths.
- Where possible, validate received routes (e.g. via RPKI or equivalent) to prevent hijacks or incorrect route announcements.
Hidden Settings that Matter (Often Overlooked)
- BGP update dampening: to avoid noisy links or flapping prefixes causing instability.
- Route reflection’s cluster IDs and split‑horizon settings to prevent suboptimal or unintended route loops.
- Next‑hop self in iBGP: ensure the next hop is reachable for routes advertised between internal routers.
- Route‐refresh support: ensure routers support route refresh capability for dynamic policy changes.
- Peer session collision avoidance: making sure both sides agree on who initiates the TCP session to avoid duplicates or mismatches.
- IGP metric alignment: inside each site, ensure the Interior Gateway Protocol (OSPF, EIGRP, IS‑IS etc.) metrics align so that BGP path selection behavior is predictable.
- Scalable hardware queue depth / buffer settings: especially over high‑latency WAN, router/switch buffers, queuing discipline affect performance during bursts.
Testing, Monitoring, and Tuning
- Monitor BGP sessions: status, peer adjacency, prefix counts, and any abnormal fluctuations.
- Use tools like traceroute, ping, or BGP‑specific diagnostics to verify that traffic follows desired paths.
- Check route propagation from each site—ensure advertised prefixes reach all intended peers.
- Observe convergence: simulate link failure and see how quickly backups or alternative paths are used.
- Gather logs and metrics on CPU/memory load on routers (especially when handling many prefixes or route reflection).
Example Architecture: Multi‑Site BGP
Here’s a simplified example of how 3 sites could be connected:
- Site A, B, C each have a router.
- They all use the same ASN (e.g. 65000) and peering via iBGP full mesh, or via route reflector at Site A.
- Each site also has ISP uplinks (public eBGP) with external ASNs, for internet access.
- Site A is primary data center, Site B is DR location, Site C is branch. All internal site‑prefixes are advertised over iBGP.
- Traffic leaving to internet may prefer local ISP unless policy forces it via site A.
Tune Local Preference so that internal traffic prefers local site where possible, avoid hairpinning.
Common Pitfalls and How to Avoid Them
| Pitfall | Consequences | Avoidance Strategy |
|---|---|---|
| Inconsistent ASNs or wrong remote ASN settings | BGP sessions never establish or wrong routing | Document ASNs properly, test in lab |
| Advertise too many prefixes (no summarization) | Large routing tables, memory/CPU load, longer convergence | Aggregate where possible |
| Missing route filters | Unintended or malicious routes accepted | Use prefix lists, AS‑path filters |
| Poor failure detection (no BFD, high BGP timers) | Slow failover, downtime | Enable fast detection, tune timers |
| Unbalanced links (one site overloaded) | Suboptimal performance, congestion | Monitor, adjust local preference, distribute load |
Conclusion
Configuring dynamic routing with BGP in multi‑site enterprise setups unlocks scalability, resilience, and control—but only when done thoughtfully. Paying attention to topology, routing policy, timer settings, and hidden settings like route reflectors or dampening can mean the difference between a robust network and one prone to failure or inefficiency.
Design your BGP deployment with redundancy, performance, and security in mind. Monitor closely. And don’t assume “defaults are good enough” for multi‑site scale.
