Adapt the upgrades workflow to facilitate stable/mitaka to master (newton)

Registered by Marios Andreou

EDIT:
For now we will use launchpad bugs to track the lifecycle related work for newton (major upgrades, minor updates and mixed version support). The tag is: https://bugs.launchpad.net/tripleo/+bugs?field.tag=upgrade-bugs

Original bp text below. I will also retarget this bp from n3 now.
=========================================

The current upgrades workflow [1] has been proven to work well for upgrades to upstream liberty. There are currently two known items of work within the wider tripleo community which will impact that workflow - in particular the upgrade of the controller nodes. These are the composable services work, tracked at [2] and the move towards an evolved pacemaker architecture for controller services, tracked at [3]. There may very well be others that can be added here as discovered/identified/suggestions are very welcome so we can catch issues as early as possible.

This blueprint is for tracking the work involved with adapting the current upgrades workflow and prove it working for upgrades to master/newton from stable/mitaka. For both major items of work mentioned above, the changes are mostly restricted to the controller nodes.

For the composable services effort, the challenge will be in bringing up the cluster using the 'new' composed services templates - since the implementation there will have completely changed for many services. As just one example, the change that decomposes the base neutron-server and dhcp services removes the resource definitions and constraints from the tripleo-heat-templates overcloud_controller_pacemaker.pp manifest (which has thus far served as the single 'controller' puppet manifest) at [4]. This code is now migrated to service specific templates in puppet-tripleo, for example at [5]. In theory (and hopefully... :/) the end result will be the same but this is needs to be tested and proven with any inevitable issues addressed.

The migration to an advanced pacemaker architecture is expected to pose a number of difficulties for upgrades. Similarly to the composable services, the challenge will again be in bringing the cluster up to the 'new' architecture. That is, you start with a stable/mitaka overcloud running the legacy pacemaker architecture [6] and upgrade using the latest master/newton tripleo-heat-templates which define the advanced architecture [7].

Given that a key feature of the new architecture is to retain only core services as managed by pacemaker, its very nature implies some loss in centralised control of _all_ services running on the controller via pacemaker. Furthermore the orthogonal composable controller services effort discussed above means that there can be no assumptions made about exactly which services are deployed and running (and so should be brought back up after an upgrade) on the given controller. We may for example need to construct/store a service manifest before starting the upgrades process to capture the services currently running on the given controller. There is already some work towards this at [8]

[1] "Upgrade documentation" https://review.openstack.org/#/c/308985/
[2] https://etherpad.openstack.org/p/tripleo-composable-services
[3] https://review.openstack.org/#/c/299628
[4] https://review.openstack.org/#/c/303386/23/puppet/manifests/overcloud_controller_pacemaker.pp
[5] https://review.openstack.org/#/c/293436/21/manifests/profile/pacemaker/neutron.pp
[6] http://acksyn.org/files/tripleo/wsgi-openstack-core.pdf via https://review.openstack.org/#/c/299628
[7] http://acksyn.org/files/tripleo/light-cib-nomongo.pdf via https://review.openstack.org/#/c/299628
[8] https://review.openstack.org/#/c/313544/

Blueprint information

Status:
Complete
Approver:
Steven Hardy
Priority:
High
Drafter:
Marios Andreou
Direction:
Approved
Assignee:
Marios Andreou
Definition:
Drafting
Series goal:
None
Implementation:
Implemented
Milestone target:
None
Started by
Marios Andreou
Completed by
Emilien Macchi

Related branches

Sprints

Whiteboard

Gerrit topic: https://review.openstack.org/#q,topic:fixup_upgrades_scripts,n,z

Addressed by: https://review.openstack.org/321027
    Initial rework of pacemaker_common_functions for M..N upgrades

Gerrit topic: https://review.openstack.org/#q,topic:bp/overcloud-upgrades-workflow-mitaka-to-newton,n,z

Addressed by: https://review.openstack.org/323299
    Adjust UpgradeLevelNovaCompute rpc messaging pin to mitaka

Bug for upgrading the undercloud (mistral isssue) filed at https://bugs.launchpad.net/tripleo/+bug/1587915

Upgrading the undercloud - tracking bugs
---------------------------------------
Hit a number of issues upgrading the undercloud and they are all already fixed in latest master, so it seems to be packaging/backport related. For tracking I have filed individual bugs and will add them track them here:

    -1-> https://bugs.launchpad.net/tripleo/+bug/1593736 "Could not find class ::tripleo::selinux"
    -2-> https://bugs.launchpad.net/tripleo/+bug/1593182 "failed openstack-nova-scheduler"
    -3-> https://bugs.launchpad.net/tripleo/+bug/1594890 "Error: Could not find class ::ironic::drivers::deploy"
    -4-> https://bugs.launchpad.net/tripleo/+bug/1594893 "Error: Invalid parameter pxe_bootfile_name on Class[Ironic::Drivers::Pxe"
    -5-> https://bugs.launchpad.net/tripleo/+bug/1594895 "Error: Invalid parameter dport on Tripleo::Firewall::Rule[101 mongodb_config"
    -6-> https://bugs.launchpad.net/tripleo/+bug/1594896 "Error: Invalid parameter destination on Tripleo::Firewall::Rule"
^^^ 1-6 should all be fixed by https://review.openstack.org/#/c/332889/ "Fix mitaka..newton upgrade for openstack-puppet-modules package"

new bug for undercloud hanging @ https://bugs.launchpad.net/tripleo/+bug/1596950

Gerrit topic: https://review.openstack.org/#q,topic:n-m-ha-upgrades,n,z

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.