Create system tests for Ceph recovery
Registered by
Artem Panchenko
In system tests we check Ceph health and can detect warnings regarding clock skew or osd recovery:
https:/
But process of recovering isn't monitored by test and we just waiting for its success within predefined timeout. We should observe decreasing number of PGs over time after the moment when peering ends. If peering is not ended in a minute, it's time to throw red flag - parameters are very unoptimal neither I/O is choking.
Also, 'ceph health' could report about states which are not acceptable during recovery:
- creating
- down
- inconsistent
- incomplete
and states which should not be observed longer than minute than osd kill:
- unfound
- peering
Blueprint information
- Status:
- Complete
- Approver:
- Nastya Urlapova
- Priority:
- Undefined
- Drafter:
- Artem Panchenko
- Direction:
- Needs approval
- Assignee:
- Artem Panchenko
- Definition:
- Obsolete
- Series goal:
- None
- Implementation:
- Unknown
- Milestone target:
- None
- Started by
- Completed by
- Artem Panchenko
Related branches
Sprints
Whiteboard
(?)