Additional info into snapshot

Registered by Alexander Dobdin

This blueprint has been superseded. See the newer blueprint "Implement diagnostic snapshot v2.0" for updated plans.

Please add output from the following commands to a snapshot:

On master node:
curl http://127.0.0.1:8000/api/v1/nodes/ | json_reformat

On every node:
lsof
ip link
ip rule show
Instead of netstat (https://blog.timheckman.net/2011/12/22/why-you-should-replace-ifconfig/)
ss -tuanl
service --status-all
free -m
df -m
df -i

#top cpu-usage processes
ps -eo pcpu,pmem,pid,rss,vsize,args | sort -k 1 -r | head -n 10
#top memory-usage processes
ps -eo pmem,pcpu,pid,rss,vsize,args | sort -k 1 -r | head -n 10

Depends by role
for i in `ip netns`; do ip netns exec $i iptables-save ; done
neutron agent list
cinder-manage host list
nova-manage service list

Blueprint information

Status:
Complete
Approver:
Roman Alekseenkov
Priority:
High
Drafter:
Alexander Dobdin
Direction:
Approved
Assignee:
None
Definition:
Superseded
Series goal:
Accepted for 5.0.x
Implementation:
Deferred
Milestone target:
None
Completed by
Roman Alekseenkov

Related branches

Sprints

Whiteboard

Another source of requests: https://blueprints.launchpad.net/fuel/+spec/manage-logs-with-free-space-consideration

===

Additional commands from Andrey Kirilochkin: https://github.com/andrei4ka/CloudInfoCollector
At least 'atop' command needs 10 minutes of execution. So we need to add checkbox for extended snapshot.

[2014-03-12 Dmitry Borodaenko] One of the notable things listed in CloudInfoCollector and missing from the Fuel diagnostic snapshot today is neutron configs.

[2014-03-20 Sergey Yudin]
For now fuel snapshot cannot be updated on fly so we have to deal with outdated version written years ago.
1) We need to be able to update snapshot tool by sending updated version to the customer.

Snapshot tool has lack of automation so for simple cases like one2one ping-tcpdump or start vm and run something on it, support have write billions tons of code, and have to request this informantion in addition with snapshot info.
2) Some pre-define features like run something on role2role/node2node/etc should be created and at least connectivity between nodes with ping/tcpdump should be performed by default.

We need dynamic view of environment so at least
3) Features for running somehing for x times with y interval should be added.

Since we going to run some commands which will take a long time for being performed we need the ability to run something in asyncronous mode, e.g. while we collecting atop for 10 min
4) We need to be able to run commands which will take long time to perform simultaneously

Right now there is no command which will be able to say something about openstack services except of logfiles.
5) We need host list/agent list/status/whatever for each service

We need at least some basic information about cluster performance, idealy it should be profiling information from services or at least clients - volume-list/net-list/image-list/user-list/meter-list/whatever-list with --debug/--verbose and timestamps in front of each line will be okay for first time
6) some base profiling information should be added

7) We need to pay attention to namespaces and provide iptables-save / netstat / ifconfig / ip ro / all related to netwrok not only from system but also also from every namespace

8) nova-manage service list
neutron agent-list
crm resource list
crm status
rabbitmqctl status
rabbitmqctl cluster_status
rabbitmqctl list_queues

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.