Magnum Troubleshooting Guide

Registered by Ton Ngo

Users, new contributors frequently ask for help on the Magnum IRC in debugging various problems. When an experienced developer is not available to help, the user can be left frustrated with little recourse. This can also inhibit adoption of Magnum by Cloud Providers. Since many of these troubleshooting sessions follow similar patterns, it would be useful to document a troubleshooting guide for common scenarios.

At the contributor meeting in the Tokyo Design Summit, we decided to put together a skeleton for a Troubleshooting Guide so that contributors can fill in with content over time.

Initially the following scenarios will be considered:

- What do do when a Bay create fails

- Heat client examples for debugging failed heat stacks

- How to introspect k8s (when heat works and k8s does not)

- How to check on a swarm cluster (see membership information, view master/agent containers)

- Cluster networking issues (whoops I don't have internet access!)

- TLS Issues

- debugging Barbican issues

- etcd

- Docker CLI

Blueprint information

Status:
Complete
Approver:
Adrian Otto
Priority:
High
Drafter:
Ton Ngo
Direction:
Approved
Assignee:
Ton Ngo
Definition:
Approved
Series goal:
Accepted for newton
Implementation:
Implemented
Milestone target:
None
Started by
Ton Ngo
Completed by
Adrian Otto

Whiteboard

12-15-2016 - suro-patz
swarm-bay creation failed on devstack due to default size of docker-volume-size. Hongbin helped me to troubleshoot and fix the same. Either you can refer to the IRC log to include this, or if there is shared doc/file, let me know, if can edit/add this case.
http://eavesdrop.openstack.org/irclogs/%23openstack-containers/%23openstack-containers.2015-12-15.log.html#t2015-12-15T23:17:35

Gerrit topic: https://review.openstack.org/#q,topic:bp/magnum-troubleshooting-guide,n,z

Addressed by: https://review.openstack.org/263940
    Skeleton for Troubleshooting Guide

Addressed by: https://review.openstack.org/267802
    Add initial documentation for troubleshooting gate

Addressed by: https://review.openstack.org/269443
    Add troubleshooting for network

Addressed by: https://review.openstack.org/271710
    Troubleshooting Kubernetes networking

Addressed by: https://review.openstack.org/283250
    Add Flannel troubleshooting

Addressed by: https://review.openstack.org/283258
    Add etcd troubleshooting

Addressed by: https://review.openstack.org/314788
    Add troubleshooting steps for trustee creation

(?)

Work Items

Work items:
Initial Outline: DONE
[tango] Heat stacks: DONE
TLS: TODO
Barbican service: TODO
[tango] Cluster internet access: DONE
[tango] Kubernetes Networking: DONE
[tango] etcd service: INPROGRESS
[tango] flannel service: INPROGRESS
Kubernetes services: TODO
Swarm services: TODO
Mesos services: TODO
Barbican issues: TODO
Docker CLI: TODO
Request volume size: TODO
Heat software resource scripts: TODO
[thomas-maddox] Troubleshooting Gate: DONE
[tango] Debugging unit tests: INPROGRESS
[dimalg] Gate logs: TODO
Database migration: TODO

This blueprint contains Public information 
Everyone can see this information.