Distributed external gateway

Registered by Pedro Marques on 2014-04-24

Distributed implementation of external gateway functionality: source nat, route aggregation, ACLs.

Blueprint information

Pedro Marques
Needs approval
Series goal:
Milestone target:
Completed by
Paul Carver on 2018-02-12

Related branches



At the moment the opencontrail vrouter implements 1:1 NAT fully distributed as well as ACLs (aka firewall). It expects the data-center gateway to perform route aggregation such that all the /32s of the public facing network(s) are aggregated; this requires the data-center gateway to have a VRF for the external facing network(s).

In order to be able to implement distributed source NAT the following components are required:
1. Ability to allocate a block of ports to a compute node. Compute nodes should allocate ports in blocks of N (example 256). Compute nodes need to be able to request additional ports as the utilization grows. Blocks that are empty need to be released back to the pool.
2. Compute nodes need to be able to advertise themselves as the next-hops for an allocated port block.
3. Up-to 16/32 compute-nodes must advertise themselves as next-hops for the aggregated route using a VRF table label rather than an interface label. 16/32 is the limit of paths for ECMP purposes. This is so that inbound traffic from the DC gateways to the DC is attracted into these nodes.
4. The forwarding decision on the VRF must be able to take the advertised port blocks into consideration.

For 2), the preferred option is to use RFC5575 + https://datatracker.ietf.org/doc/draft-ietf-idr-flowspec-redirect-ip/; this means that the mechanism would be able to interoperate with L3VPN capable gateways that implement this specification, by passing the aggregating compute-nodes when that option is chosen.

This requires a spec that adapts RFC5575 to xmpp transport similar to the l3vpn-end-system spec.

For port allocation, the simplest approach is to take a speculative approach:
1. When a compute-node wants to allocate a block, it should allocate a block at random from the blocks available at the time.
2. The compute-node will then advertise the block but not use it immediately. The block should be put into a 30 sec hold time. The route advertisement for this port block should include the unix time epoch.
3. If within the hold time period the same block is advertised by a different compute node, the advertisement with the lowest "timestamp, ip-address" wins.
4. If the compute node looses the advertisement it repeat the process.

When route aggregation through compute-nodes is configured, up-to 32 compute nodes should be selected to perform route aggregation for the public facing networks; aggregation should be configured as a boolean flag under the virtual-network object.

The aggregators can be chosen by marking in the data-model a link between the vrouter and the virtual-network. This allows the schema to select a set of aggregators per network, potentially distributing this load across all the compute nodes in the cluster for a cluster that hosts a large number of public facing (or transit networks). This configuration should be performed by the schema-transformer.


Work Items

This blueprint contains Public information 
Everyone can see this information.