Create an OpenStack Operations manual

Registered by Anne Gentle

As outlined on http://etherpad.openstack.org/EssexOperationsGuide, create an Operations manual for OpenStack Compute and Storage and the services that support them. We're going to do this with a team of operators the last week of Feb. 2013 as a book sprint with a facilitator.

Blueprint information

Status:
Complete
Approver:
Anne Gentle
Priority:
High
Drafter:
Anne Gentle
Direction:
Approved
Assignee:
OpenStack Documentation Coordinators
Definition:
Approved
Series goal:
Accepted for grizzly
Implementation:
Implemented
Milestone target:
milestone icon grizzly
Started by
Anne Gentle
Completed by
Tom Fifield

Related branches

Sprints

Whiteboard

August 16th is f3

Essex Operation guide outline:
Chapter: How to use this manual
    Intended Audience
    Conventions
Chapter: Operations Quick Start
    Administrative starting tasks checklist
    - first time operator on an OpenStack cloud, what do they need to know?
        New Hire Operator who needs to get up to speed quickly.
Chapter: Operations Management Practices
    Starting Up
    Common Tasks on start up
           Horizon Dashboard operations
    Database Management
    Storage Management - volumes attached to compute instances
        Backups for volumes
    User Management
    Security Management
    Dev/Ops principals and how they work in the Ops environment
    Migration and Upgrades
    Failover Planning
    DR/BC Planning
    Automation - What to automate and how to do it
Chapter: Day-to-Day Operations
    A day in the life of an operator
    Watch your rabbitmq or qpid
        No feedback, just sits there
    Pings, traceroutes, nslookups for networking issues
        Look for pings with high latency
    Disk I/O - look for 100% utilization spots
    CPU
    Straces
    TCP dumps
    Instances stuck in "build" state
    Instances can't be "ping"ed
    Instances can't be accessed through a VNC console
    Tier 2 support for users, who are developers for the most part
    Types of "tickets" generated by OpenStack
    Metrics to follow with alarm settting
    Syslog trolling
    Available tools modified or used in Openstack
        Euca
        Nova.conf
        Curl
        etc.
        Nagios
Chapter: Periodic Operations
   Signs to watch ( and where)
        Dashboard
        CLI commands sets ( scripts )
        Standard Log paths
    Things that need to be done sometimes
    Cleanup of databases
    Rebuilding/Balancing the ring
    Log file rotation checks
    Capacity planning
Chapter: Diagnostics and Troubleshooting
    If this happens, then do this
    Tips, Tricks, and Traps
        Configuration settings that should not remain defaults
        Making sensible scheduler choices
        Figuring out how to find out what flags do
    Monitoring
        Sources of information and alerts
        Monitoring Hardware
        Monitoring the Network
        Monitoring the Cloud
         Monitoring Storage Volumes

Gerrit topic: https://review.openstack.org/#q,topic:bp/openstack-operations-manual,n,z

Addressed by: https://review.openstack.org/10487
    Adding structure for Operations Manual

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.