RoSE: Robust, Secure & Efficient Wide-Area Routing(2002-2008)

Considering the critical role that the Internet plays in our day-to-day lives, the current routing architecture is surprisingly fragile. Major fiber cuts due to accidents have led to wide-spread loss of Internet connectivity. While these accidents are rare, the majority of the routing problems are due to software bugs, misconfigurations, or human error, and can be prevented. The ultimate goal this project is to improve the fail-over time, service availability, stability, and security of Internet routing infrastructure without compromising its scalability and manageability.

Our research approach hinges upon the observation that modeling individual network components in isolation is not sufficient. This project strives to understand the end-to-end behavior of networks (in terms of performance, reachability, or security properties) by building a model of the dynamic interactions across protocol layers (e.g., IP-routing and application-layer) and between network components (e.g., sets of routers with different policy configurations, packet filters, and firewalls). We believe that the key to support truly ubiquitous computing over heterogeneous networks is to find a reliable route (or routes) that deliver predictable performance.

The research plan for the RoSE project consists of three phases:

Phase 1: Modeling routing dynamics and their implications

The first phase of the project focuses on gaining a thorough understanding the routing dynamics and failure characteristics within currently deployed, large-scale, operational wide-area IP networks. The lessons learned are useful for designing future networks. We gather packet-level measurements and routing information from a Tier-1 ISP's backbone and public peering points. These data are analyzed to address the following questions:

  • What is a typical wide-area failure model?
  • How frequently does an intra/inter-domain link fail?
  • What is the length of service disruption time due to routing failures/re-convergence?
  • How do IGP/BGP interact?
  • What causes large volumes of BGP updates, i.e., BGP storms?
  • What are the effects of BGP instability on traffic forwarding?

Based on the network measurements and initial analysis, we revisit the definition of "service availability" for IP networks, which should account for instantaneous performance characteristics and routing dynamics. This is analogious to the 99.99 availability measure of the traditional telephone networks. Such measure is crucial in determining whether a specific type of application-level performance requirements can be met, and in determining a meaningful Service-Level Agreements (SLAs). We also explore how our routing failure models impact the design of routers, failure restoration schemes, and other traffic engineering practices.

Phase 2: Multi-layer (or -entity)
information sharing for better performance and stability

The second phase of this study will focus on modeling the interactions across multiple protocol layers, e.g., between overlay and underlying IP networks, to identify a set of design principles that ensure their synergistic co-existence.

We will explore the design of a Routing Introspection and Feedback System (RIFS) that will: (1) Provide active feedback from the IP-routing layer to high-level overlay networks and applications for joint optimization, (2) Detect, report and resolve routing anomalies, and (3) Verify the correctness of routing protocols, policies, and router configurations.

Phase 3: Rethinking the design of the Internet

The lessons learned in Phase 1 and 2 will form the basis from which we can revisit the fundamental properties of the current Internet and define a set of design principles for next-generation global Internet. We have begun to explore the feasiblity of creating an overlay control layer called OPCA for inter-domain policy negotiations, fault-tolerance, and traffic engineering. We will continue to investigate the potential use of overlay networks to improve the stability and security of the underlying IP networks.

People

Faculty

Graduate Students and Alumni

  • R. Keralapura, ECE (PhD, 2001)
  • J. Mai, ECE (PhD, 2008),
  • L. Yuan, ECE (PhD, 2008),
  • S. Raza, CS
  • A. Moerschell, ECE

Collaborators

  • S. Agarwal, Microsoft Research
  • S. Bhattacharyya, previously at Sprint ATL
  • C. Diot, previously at Thompsons Labs
  • G. Iannaconne, previously at Intel Research Berkeley
  • A. Markoupoulou, UC Irvine
  • S. Nelakuditi, Univ. South Carolina
  • A. Nucci, previously at Narus
  • N. Taft, previously at Intel Research Berkeley
  • Z. Zhang, Univ of Minnesota

Publications

C. Chuah and R. Keralapura, "Overlay Networks: Applications, Co-existence with IP-Layer, and Transient Dynamics," Chapter 8 in Algorithms for Next Generation Networks, Springer (ISBN: 978-1-84882-764-6), pp. 159-179, 2010.

R. Keralapura, C-N. Chuah, N. Taft, and G. Iannaccone, "Race Conditions in Coexisting Overlay Networks," IEEE/ACM Transactions on Networking (TON), vol. 16, no. 1, pp. 1-14, February 2008. [pdf]

J. Mai, L. Yuan, and C-N. Chuah, "Detecting BGP Anomalies with Wavelet," IEEE/IFIP Network Operations and Management Symposium (NOMS), April 2008. [pdf]

S. Raza and C-N. Chuah, "Interface Split Routing," 26th International Symposium on Computer Performance, Modeling, Measurements, and Evaluation (Performance'07), October 2007. Also in Performance Evaluation Journal, vol. 64, issue 9-12, pp. 994-1008, October 2007. [pdf]

Z. Li, L. Yuan, P. Mohapatra, and C-N. Chuah, "On the Analysis of Overlay Failure Detection and Recovery," Computer Networks Journal, vol. 51, issue. 13, pp. 3838-3843, September 2007. [pdf]

S. Teoh, S. Ranjan, A. Nucci, and C-N. Chuah, "BGP-Eye: A New Visualization Tool for Real-time Detection and Analysis of BGP Anomalies," ACM CCS Workshop on Visualization for Computer Security, November 2006. [pdf]

R. Keralapura, C-N. Chuah, and Y. Fan, "Optimal Strategy for Graceful Network Upgrade," ACM SIGCOMM Workshop on Internet Network Management, September 2006. [pdf]

R. Keralapura, C-N. Chuah, N. Taft, and G. Iannaccone, "Can co-existing overlays inadvertently step on each other?" Proc. IEEE ICNP, November 2005. [pdf]

Z. Zhong, R. Keralapura, S. Nelakuditi, Y. Yu, J. Wang, C-N. Chuah, and S. Lee, "Avoiding Transient Loops through Interface-Specific Forwarding," Proc. IFIP/IEEE IWQoS, Springer-LNCS, vol. 3552, pp. 219-232, June 2005. [pdf]
We also have a journal extension:
S. Nelakuditi, Z. Zhong, J. Wang, R. Keralapura, C-N. Chuah, "Mitigating Transient Loops Through Interface-Specific Forwarding," Elsevier Computer Networks, vol. 52, issue 3, pp. 593-609, February 2008. [pdf]

K. Zhang, S. Teoh, S. Tseng, R. Limprasittipom, K.-L. Ma, S. F. Wu, and C-N. Chuah, "Performing BGP Experiments on a Semi-Realistic Internet Testbed," IEEE International Workshop on Security in Distributed Computing Systems (SDCS), June 2005. [pdf]

Z. Li, P. Mohapatra, and C-N. Chuah, "Virtual Multi-Homing: On the Feasibility of Combining Overlay Routing with BGP Routing," IFIP Networking Conference, Springer-Verlag Lecture Notes in Computer Science (LNCS) series, vol. 3462, pp. 1348-1352, May 2005 (Poster presentation). [pdf]

Z. Zhong, S. Nelakuditi, Y. Yu, S. Lee, J. Wang, and C-N. Chuah, "Failure Inferencing based Fast Rerouting for Handling Transient Link and Node Failures," IEEE Global Internet, March 2005. [pdf]

R. Keralapura, N. Taft, C-N. Chuah, and G. Iannaccone, "Can ISPs take the heat from Overlay Networks?" ACM Workshop on Hot Topics in Networks (HotNets-III), November 2004. [pdf]

R. Keralapura, N. Taft, G. Iannaccone, and C-N. Chuah, "Can ISPs and Overlay Networks form a synergistic co-existence?" IFIP/IEEE Distributed Systems: Operations and Management (DSOM), Nov 15-17, 2004. [pdf]

A. Zeitoun, C-N. Chuah, S. Bhattacharrya, and C. Diot, "An AS-Level Study of Internet Path Delay Characteristics," IEEE Globecom, November 2004. [pdf]

R. Keralapura, C-N. Chuah, G. Iannaconne, and S. Bhattacharrya, "Service Availability: A New Approach to Characterize IP-Backbone Topologies," IEEE IWQoS, pp. 232-241, June 2004. [pdf]
We also have a journal extension:
R. Keralapura, A. Moerschell, C-N. Chuah, G. Iannaconne, and S. Bhattacharrya, "A Case for Using Service Availability to Characterize IP Backbone Topologies," Journal of Communication Networks, , vol. 8, no. 2, June 2006. [pdf]

S. Agarwal, C-N. Chuah, S. Bhattacharrya, and C. Diot, "The Impact of BGP Dynamics on Intra-Domain Traffic," ACM Sigmetrics, Performance Evaluation Review Special Issue, vol. 32, no. 1, pp. 319-330, June 2004. [pdf]

S. Agarwal, C-N. Chuah, S. Bhattacharrya, and C. Diot, "The Impact of BGP Dynamics on Router CPU Utilization," Passive & Active Measurement Workshop, Springer-Verlag Lecture Notes in Computer Sciences (LNCS) series, vol. 3015, pp. 278-288, April 2004. [pdf]

G. Iannaccone, C-N. Chuah, S. Bhattacharrya, and C. Diot, "Feasibility of IP Restoration in a Tier-1 Backbone," IEEE Network, vol. 18, no. 2, pp. 13-19, March 2004. [pdf}

A. Markopoulou, G. Iannaconne, S. Bhattacharrya, C-N. Chuah, and C. Diot, "Characterization of Failures in an IP Backbone," IEEE INFOCOM, March 2004. [pdf]
We also have a journal extension:
A. Markopoulou, G. Iannaconne, S. Bhattacharrya, C-N. Chuah, and C. Diot, "Characterization of Failures in an IP Backbone," IEEE/ACM Trans. On Networking (TON), vol. 16, no. 4, pp. 749-762, August 2008. [pdf]

S. Lee, Y. Yu, S. Nelakuditi, Z-L. Zhang, and C-N. Chuah, "Proactive vs. Reactive Approaches to Failure Resilient Routing," Proc. of IEEE Proc. INFOCOM, March 2004. [pdf]
We also have a journal extension:
S. Lee, Y. Yu, S. Nelakuditi, Z-L. Zhang, and C-N. Chuah, "Proactive vs. Reactive Approaches to Failure Resilient Routing," IEEE/ACM Trans. On Networking vol. 15, no. 2, pp. 359-372, April 2007. [pdf]

G. Iannaccone, C-N. Chuah, R. Mortier, S. Battacharrya, and C. Diot, "Analysis of Link Failures in an IP Backbone," ACM SIGCOMM Internet Measurement Workshop, Marseille, France, pp. 237-242, November 6-8, 2002. [pdf]

C-N. Chuah and Sprint ATL IP-Group, "Analysis of Link Failures and Their Impact on Traffic," Internet Traffic and Topology Session, 17th IEEE Annual Computer Communications Workshop, October 13-16, 2002.

Technical Reports

S. Agarwal, C-N. Chuah, S. Bhattacharrya, and C. Diot, "The Impact of BGP Dynamics on Intra-Domain Traffic," Sprint ATL Research Report Nr. RR03-ATL-080677, August 2003.

C. Diot, G. Iannaccone, A. Markopoulou, C. Chuah, and S. Bhattacharyya, "Service Availability in IP Networks," Sprint ATL Research Report Nr. RR03-ATL-071888, July 2003.

A. Zeitoun, C-N. Chuah, S. Bhattacharrya, and C. Diot, "An AS-level Study on Delay Characteristics," Sprint ATL Research Report Nr. RR03-ATL-051699, May 2003.

G. Iannaccone, C-N. Chuah, S. Bhattacharrya, and C. Diot, "Feasibility of IP Restoration in a Tier-1 Backbone," Sprint ATL Research Report Nr. RR03-ATL-030666, Sprint ATL, Mar 2003. [pdf]

C. Chuah, S. Bhattacharyya, and C. Diot, "Measuring I-BGP Updates and Their Impact on Traffic," Sprint ATL Research Report Nr. RR02-ATL-051099, May 2002.

Funding

This material is based upon work supported by the National Science Foundation Grant No. 0238348. This project is also partly funded by Fujitsu Laboratories of America, Inc., Sprint Advanced Technology Laboratories, and U. C. Micro Program.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.