Security group rules for devices RPC call refactoring
The security_
allow all from 'default' group
This leads to :
* very big messages (I've seen >20-600MB)
* very long processing time at neutron-server side when we start having lots of instances under the same tenant/security group.
* lockups, when RPC messages timeout, and then the same security_
For a more detailed insight:
* security_
neutron-
- This call receives as argument a list of device_ids, device_ids are connected to ports.
- Neutron builds a list of security group rules and returns the list of security group rules
per device_id
Ok, and now let's look at the default security group rules which is 4 rules:
- [IPv6, egress all]
- [IPv6, ingress from default security group]
- [IPv4, egress all]
- [IPv4, ingress from default security group]
This means two things:
- Machines can initiate traffic to anywhere.
- Machines can be reached from anyone on the same security
group.
So, what happens:
1) As we add instances, those instances join the 'default' security
group, if we don't change this explicitly.
2) That means, the openvswitch-agents on compute nodes, get a
devices in such updated security group (I'm partly guessing the
logic here, but +/- is what it happens).
3) That means, for every device in a node, it will get an explicit
rule for each other device IP in such security group rule list.
4) Those rules are translated into IP tables rules [2] (see line 97,
at port 013859e0)
(look at https:/
for this)
5)*** The RPC message size will grow in VMs_on_hipervisor * VMs_on_
1) neutron-server has a bad time (high load) to render those
AMQP
2) AMQP suffers: long time to transmit a message, timeouts, etc..
3) When one of those big replies timeout, it's asked for again.. and
This problem goes worse as we have bigger compute nodes (capable of having
more instances) or we go into denser clouds based in docker containers.
[1] Logged, and pretty printed RPC messages: http://
[2] Resulting iptable rules, see line 97: http://
Blueprint information
- Status:
- Complete
- Approver:
- Miguel Angel Ajo
- Priority:
- High
- Drafter:
- Miguel Angel Ajo
- Direction:
- Approved
- Assignee:
- Miguel Angel Ajo
- Definition:
- Approved
- Series goal:
- Accepted for juno
- Implementation:
- Implemented
- Milestone target:
- 2014.2
- Started by
- Kyle Mestery
- Completed by
- Kyle Mestery
Related branches
Related bugs
Sprints
Whiteboard
20-July (mestery): Juno-3 as medium.
Gerrit topic: https:/
Addressed by: https:/
Refactor the security_
Addressed by: https:/
Refactor security group rpc call
Gerrit topic: https:/
Addressed by: https:/
Framework to start/stop neutron services for functional testing.
Addressed by: https:/
Add test to compare security_