Create an OpenStack Operations manual
As outlined on http://
Blueprint information
- Status:
- Complete
- Approver:
- Anne Gentle
- Priority:
- High
- Drafter:
- Anne Gentle
- Direction:
- Approved
- Assignee:
- OpenStack Documentation Coordinators
- Definition:
- Approved
- Series goal:
- Accepted for grizzly
- Implementation:
-
Implemented
- Milestone target:
-
grizzly
- Started by
- Anne Gentle
- Completed by
- Tom Fifield
Related branches
Related bugs
Sprints
Whiteboard
August 16th is f3
Essex Operation guide outline:
Chapter: How to use this manual
Intended Audience
Conventions
Chapter: Operations Quick Start
Administrative starting tasks checklist
- first time operator on an OpenStack cloud, what do they need to know?
New Hire Operator who needs to get up to speed quickly.
Chapter: Operations Management Practices
Starting Up
Common Tasks on start up
Horizon Dashboard operations
Database Management
Storage Management - volumes attached to compute instances
Backups for volumes
User Management
Security Management
Dev/Ops principals and how they work in the Ops environment
Migration and Upgrades
Failover Planning
DR/BC Planning
Automation - What to automate and how to do it
Chapter: Day-to-Day Operations
A day in the life of an operator
Watch your rabbitmq or qpid
No feedback, just sits there
Pings, traceroutes, nslookups for networking issues
Look for pings with high latency
Disk I/O - look for 100% utilization spots
CPU
Straces
TCP dumps
Instances stuck in "build" state
Instances can't be "ping"ed
Instances can't be accessed through a VNC console
Tier 2 support for users, who are developers for the most part
Types of "tickets" generated by OpenStack
Metrics to follow with alarm settting
Syslog trolling
Available tools modified or used in Openstack
Euca
Nova.conf
Curl
etc.
Nagios
Chapter: Periodic Operations
Signs to watch ( and where)
Dashboard
CLI commands sets ( scripts )
Standard Log paths
Things that need to be done sometimes
Cleanup of databases
Rebuilding/
Log file rotation checks
Capacity planning
Chapter: Diagnostics and Troubleshooting
If this happens, then do this
Tips, Tricks, and Traps
Making sensible scheduler choices
Figuring out how to find out what flags do
Monitoring
Sources of information and alerts
Monitoring Hardware
Monitoring the Network
Monitoring the Cloud
Monitoring Storage Volumes
Gerrit topic: https:/
Addressed by: https:/
Adding structure for Operations Manual