Review of "PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric"

16 Nov 2015

Review of "PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric"

Existing L2 and L3 network protocols face some combination of limitations in the trends of multi-core processors, end-host virtualization, and commodities of scale are pointing to millions of virtual endpoints. This paper proposes PortLand, a scalable fault tolerant L2 routing and forwarding protocol for data center environments.

PortLand employs a logically centralized and replicated fabric manager to maintain soft state about the network configuration information such as topology, mac addresses etc. It's a user process running on a dedicated machine responsible for assisting with ARP resolution, fault tolerance and multi-casting. It may run in a separate control network or simply be a redundantly connected host int he larger topology. It doesn't require any administrator configuration such as number of switches, locations and identifiers of them etc. Since the fabric manager only keeps soft state, it doesn't require replicate to be strictly consistent. Fabric manager helps keeping the mapping of PMAC and real MAC and the corresponding IP addresses. PMAC is a virtual MAC addresses that also encodes position information in it. For example, all end points in the same pod will have the same prefix in their PMACs. Traditional ARPs suffers scalability problem because they need to broadcast over the entire network, PortLand solves this problem by again using fabric manager backed proxy-based ARP. Thus transforming the broadcasting into a unicasting. The fabric manager also helps with fault tolerant routing using a fault matrix.

Will this paper be influential in 10 years? I think so. It centralizes the control logic of a more and more complex network and provides far more features on it than traditional networks. A good way to go under the trends of huge data centers.