Create new alarms for packetloss for UVEs(VRouter, Virtual Machine Interface , VirtualNetwork

Registered by Pavani on 2017-05-17

1. Introduction
Currently vRouter , VMI and Virtual Network UVE's provide Packet Loss details(from Contrail 3.2). Due to unavailability of Python ruleset, Packet Loss alarm is not getting triggered.
We are proposing to create new packet loss alarm in OPSERVER for Virtual Machine Interface(VMI), VRouter and Virtual Network(VN) by using dropstats statistics. Downstream applications such as Contrail GUI can use Opserver API to pull Packet loss statistics.

2. Problem statement
Currently vRouter captures Packet Loss counters by following UVEs. Due to unavailability of Python ruleset, Packet Loss alarm is not getting triggered. Below are the agents for each UVE.
1. VrouterStatsAgent for vRouter
2. UveVirtualNetworkAgent for Virtual Network
3. UveVMInterfaceAgent for VirtualMachineInterface

3. Proposed solution
Create new alarms to detect packet loss . This will help in detecting failures early and can be rectified without any major issues.
dropstats percentage will be calculated using the formula: drop_pkts_percent = drop_pkts/( in_pkts + out_pkts).
     If drop_pkts_percent > 1% then notification will be raised and response
     will be triggered.
     Notification and response needs to be defined.
Use cases – Identified 7 fields in dropstats output for which alarm needs be created
Virtual Machine Interface , Vrouter, Physical Interface.
For Vrouter :
VrouterStatsAgent.exception_packets > 200

For Virtual Machine Interface , Vrouter, Physical Interface.
---Trap No IF >50
drop_stats_1h.ds_trap_no_if > 50

counters are incremented when vrouter is not able to find the interface to trap the packets to vrouter agent, and should not happen in a working system.
---IF Drop >50
drop_stats_1h.ds_interface_drop > 50
counters indicate packets that are dropped in the interface layer. The increase can typically happen when interface settings are wrong.
---Flow No Memory >50
drop_stats_1h.ds_flow_no_memory > 50
 counter increments when the flow block doesn't have enough memory to perform internal operations.
---ds_discard >50 or 100
drop_stats_1h.ds_discard > 50
 counter tracks packets that hit a discard next hop. For various reasons interpreted by the agent and during some transient conditions, a route can point to a discard next hop. When packets hit that route, they are dropped.
--- Mcast Clone Fail > 30
drop_stats_1h.ds_mcast_clone_fail > 30
happens when the vrouter is not able to replicate a packet for flooding.
---Invalid NH > 30
drop_stats_1h.ds_invalid_nh > 30
counter tracks the number of packets that hit a next hop that was not in a state to be used (usually in transient conditions) or a next hop that was not expected, or no next hops when there was a next hop expected. Such increments happen rarely, and should not continuously increment.
---Rewrite Fail >30
counter tracks the number of times vrouter was not able to write next hop rewrite data to the packet.

3.1 Alternatives considered
 User can execute dropstats command manually and check or user can check values of Analytics API’s manually.
3.2 API schema changes
No schema changes.
3.3 User workflow impact
Automatically alarms are created when python coded rules are satisfied.
3.4 UI changes
No UI Changes. Through UI also we can create alarms.
3.5 Notification impact
New alarms are created. Logs will be added when alarms are raised. No UVE Changes.
4. Implementation
4.1 Work items

Changes in Opserver:

New plugins are added in opserver for packet loss alarms.
Install python plugin for an alarm on Analytics node.
Plugins are added in the folder : controller/src/opserver/plugins/
Implementation of the plugin is in (alarm_packet_loss/ coded rules are added in the files of the corresponding plugins.
Plugins created are added in Sconscript.
Python coded rule will check the dropstats values to the values in message table of the corresponding UVE of Virtual Machine Interface, VRouter and Virtual Network.
Create entry points in in the corresponding plugins.
_RULES = 'DatabaseUsageInfo.database_usage.disk_space_used_1k > 90%'
No changes in other modules as we already have analytics for packetloss
5. Performance and scaling impact
5.1 API and control plane
Scaling and performance for API and control plane
5.2 Forwarding performance
Scaling and performance for API and forwarding
6. Upgrade
Describe upgrade impact of the feature
Schema migration/transition
7. Deprecations
No Deprecations observed.
8. Dependencies
Describe dependent features or components.
9. Testing
9.1 Unit tests
9.2 Dev tests
9.3 System tests
10. Documentation Impact
11. References

Blueprint information

Gautam Divgi
Needs approval
Series goal:
Milestone target:
Started by
Pavani on 2017-05-17

Related branches




Work Items

This blueprint contains Public information 
Everyone can see this information.