Upstart stateful re-exec

Registered by James Hunt on 2012-05-02

= Summary =

Upstart is not currently able to retain state across a re-exec. Re-exec is useful in the following scenarios:

(1) The version of Upstart is upgraded.

(2) An Upstart dependency (eglibc, libnih) is upgraded.

(3) Upstart is run from the initramfs.

Without full re-exec support, upgrades are complicated significantly. An example:

https://bugs.launchpad.net/ubuntu/+source/upstart/+bug/985755

Problem here is that upgrading from lucid to precise causes errors since the version of Upstart *running* is older than the version of Upstart *installed*. Full re-exec handling would resolve the problem as that would allow the post-inst script to re-exec upstart such that the running version == the installed version.

= Details =

The problem is no so much the re-exec - that's easy to do, but that on re-exec, the new instance of Upstart needs to retain the state of the old instance (difficult). This state-passing would be critical to having Upstart run in the initramfs for example since without it, the main system instance of Upstart would have no knowledge of existing jobs started by Upstart in the initramfs (for example plymouthd).

= Plan =

- create a pipe.
- fork.
- child creates socket and listens on it.
- child passes details of socket back to parent via pipe
  (or could just use well-known location).
- child closes pipe.
- parent re-execs itself (closing pipe), passing a cmdline option to notify
  init to read from the socket.
- child sends meta-data on existing jobs through pipe.
- parent parses meta-data and initializes data structures based on this info.

  Plan is to use JSON for structured representation of meta-data.

= Perceived issues =

- Cannot restore D-Bus connections. This might not be an issue for the initramfs scenario since there shouldn't be any, but is an issue for Upstart upgrades.
- New version of init being exec'ed must understand all historical JSON syntax quirks if we ever change how we represent objects.
- Child must send its version to the re-exec'ed parent and if that parent detects the child is newer than it, state passing would
  be usafe since this scenario is indicative of downgrading the Upstart version. In such instances, the best course of action may be to:
  - generate a warning
  - log the childs state to a file
  - re-exec with no state-passing.
- adding an extra library dependency to /sbin/init is a concern.
- existing JSON libraries may be unsuitable for boot
  - would need to select a library with very clean code and
    comprehensive tests.
  - should we implement a JSON subset parser in NIH for safety?
    - time cost (code+tests) may be prohibitive?

= References =

- https://bugs.launchpad.net/upstart/+bug/348455
- https://blueprints.launchpad.net/ubuntu/+spec/foundations-q-event-based-initramfs
- https://lists.ubuntu.com/archives/upstart-devel/2011-August/001707.html

Blueprint information

Status:
Started
Approver:
Steve Langasek
Priority:
High
Drafter:
James Hunt
Direction:
Approved
Assignee:
James Hunt
Definition:
Approved
Series goal:
Accepted for raring
Implementation:
Started
Milestone target:
milestone icon ubuntu-13.04-beta-1
Started by
Kate Stewart on 2012-07-10

Related branches

Sprints

Whiteboard

json-c:
 - only dependency is libc
 - already in main and on cd (used by pulseaudio, and so far we haven't had any problems with it)
 - seems to have some test cases written (test1.c test2.c and test3.c), might need improvements
Please consider reviewing and see if this library is sufficient for your needs or could be improved to fulfil your requirements.
-- diwic

From the etherpad...

Welcome to Ubuntu Developer Summit!
#uds-p #track #topic
put your session notes here

 To serialize upstart's state we need to do this:
 <insert crazy diagram>
http://people.ubuntu.com/~dmitrij.ledkov/upstart/
</insert>
1. Sessions serialized first (json off you go)
Event can block jobs & jobs have pointers to events, therefore serialization cannot do in one passed. You want to hide blocked, just serialize that an event is blocking a job., nor event operator or processs
2. serialize event
3. jobclass
4. job
for d-bus only need serial number.
ConfigSourceTypes
- create 'seriliazed source'
- pivoted upstart should get new config & seriliazed
New upstart has to do config reload anyone, hence you only require the highest priority/latest state from old upstart.
New upstart has the roofs init jobs, which may mask the initramfs config, because the rootfs edition is bigger/better.
Stateful upstart re-exec is in the first place for upstart upgrades! and other use cases, before the initramfs
serialization process
(1) dispatch all d-bus messages (any incoming msg has been dispatched). That guarantees that you only have messages waiting for their reply.
Note: we're not accepting initctl commands either so upstart is "paused". We're also not in the main loop, so ignoring signals.
(2) create pipe.
(3) create child process.
(4) parent mark d-bus file descriptors so they are NOT closed on exec and reading end of pipe not closed on exec. Close write end of pipe.
(5) parent should exec new upstart passing magic flag to tell it to read pipe and pipe FD.
(6) child closes read end of pipe.
(7) child writes states to pipe and exits. If fails to complete write in "some amount of time" indicating the new parent doesn't understand serialization, have it log an error and exit.
(8) parent creates serialization ConfSource, reads from pipe
(9) parent closes pipe.
(10) parent normal initialization.
For debug, add ability to dump the serialization as a string over d-bus.
Use third-party json with tests in upstart.
If upstart detects that the user is down-grading upstart and it sees syntax it doesn't understand, add a flag to the ConfSource stating that the job cannot be restarted.
when deserialize jobs, create single confsource "serialized_conf_source" or similar that is not backed by any actual file. If a job that is already running from the initramfs is stopped and started, at that point you get the correct *new* /etc/init/job.conf from the main system.
# events can block jobs, jobs have pointers to events.
# to resolve circular loop, each time an event is seen, add to a hash table with a unique id.
# when unserializing, use lookup.
# cannot do 1 pass serialization.
# want to hide blocked - don't serialize - just serialize that an event is blocking a job.
# don't serialize event operator or process.
#
# only serialize sessions, events, job classes and jobs.
#
# for d-bus, only need serial number.
* ensure tests actually force breakages and that they are detected and handled.
= do not do stateful re-exec, until after a reboot since introduction of stateful re-exec.

Draft Spec is here: https://wiki.ubuntu.com/FoundationsTeam/Specs/QuantalUpstartStatefulReexec

(?)

Work Items

Work items:
[jamesodhunt] Identify suitable json library (safe, small, simple, clean): DONE
[jamesodhunt] Write proof-of-concept serialisation+deserialisation test using the library: DONE
[jamesodhunt] Add ability to serialise a meta-data header: POSTPONED
[jamesodhunt] Add ability to deserialise a meta-data header: POSTPONED
[jamesodhunt] Add ability to serialise Session objects: DONE
[jamesodhunt] Add ability to serialise Event objects: DONE
[jamesodhunt] Add ability to serialise JobClass objects: DONE
[jamesodhunt] Add ability to serialise Job objects: DONE
[jamesodhunt] Add ability to deserialise Session objects: DONE
[jamesodhunt] Add ability to deserialise Event objects: DONE
[jamesodhunt] Add ability to deserialise JobClass objects: DONE
[jamesodhunt] Add ability to deserialise Job objects: DONE
[jamesodhunt] Add ability to retain original command-line: DONE
[jamesodhunt] Add ability to read serialised JSON from an fd: DONE
[jamesodhunt] Create JSON schema and example JSON showing full serialization for each object: DONE
[jamesodhunt] Write core serialization code (create pipe, fork, etc): DONE
[jamesodhunt] Write re-exec scenario "NSU" handling: POSTPONED
[jamesodhunt] Write tests for re-exec scenario "NSU": POSTPONED
[jamesodhunt] Write re-exec scenario "SSU-E" handling: DONE
[jamesodhunt] Write tests for re-exec scenario "SSU-E": DONE
[jamesodhunt] Write re-exec scenario "SND" handling: POSTPONED
[jamesodhunt] Write tests for re-exec scenario "SND": POSTPONED
[jamesodhunt] Write re-exec scenario "SSU-G" handling: POSTPONED
[jamesodhunt] Write tests for re-exec scenario "SSU-G": POSTPONED
[jamesodhunt] Write re-exec scenario "SSD-L" handling: POSTPONED
[jamesodhunt] Write tests for re-exec scenario "SSD-L": POSTPONED
[jamesodhunt] Ensure system job running the 'pre-start' process can be serialised, deserialised and allowed to continue: DONE
[jamesodhunt] Ensure system job running the 'pre-start' process can be serialised, deserialised and stopped: POSTPONED
[jamesodhunt] Ensure system job running the 'pre-start' process can be serialised, deserialised and restarted: POSTPONED
[jamesodhunt] Ensure user job running the 'pre-start' process can be serialised, deserialised and allowed to continue: POSTPONED
[jamesodhunt] Ensure user job running the 'pre-start' process can be serialised, deserialised and stopped: POSTPONED
[jamesodhunt] Ensure user job running the 'pre-start' process can be serialised, deserialised and restarted: POSTPONED
[jamesodhunt] Ensure system job running the 'main' process can be serialised, deserialised and allowed to continue: DONE
[jamesodhunt] Ensure system job running the 'main' process can be serialised, deserialised and stopped: DONE
[jamesodhunt] Ensure system job running the 'main' process can be serialised, deserialised and restarted: POSTPONED
[jamesodhunt] Ensure user job running the 'main' process can be serialised, deserialised and allowed to continue: POSTPONED
[jamesodhunt] Ensure user job running the 'main' process can be serialised, deserialised and stopped: POSTPONED
[jamesodhunt] Ensure user job running the 'main' process can be serialised, deserialised and restarted: POSTPONED
[jamesodhunt] Ensure system job running the 'post-start' process can be serialised, deserialised and allowed to continue: POSTPONED
[jamesodhunt] Ensure system job running the 'post-start' process can be serialised, deserialised and stopped: POSTPONED
[jamesodhunt] Ensure system job running the 'post-start' process can be serialised, deserialised and restarted: POSTPONED
[jamesodhunt] Ensure user job running the 'post-start' process can be serialised, deserialised and allowed to continue: POSTPONED
[jamesodhunt] Ensure user job running the 'post-start' process can be serialised, deserialised and restarted: POSTPONED
[jamesodhunt] Ensure system job running the 'pre-stop' process can be serialised, deserialised and allowed to continue: POSTPONED
[jamesodhunt] Ensure system job running the 'pre-stop' process can be serialised, deserialised and stopped: POSTPONED
[jamesodhunt] Ensure system job running the 'pre-stop' process can be serialised, deserialised and restarted: POSTPONED
[jamesodhunt] Ensure user job running the 'pre-stop' process can be serialised, deserialised and allowed to continue: POSTPONED
[jamesodhunt] Ensure user job running the 'pre-stop' process can be serialised, deserialised and stopped: POSTPONED
[jamesodhunt] Ensure user job running the 'pre-stop' process can be serialised, deserialised and restarted: POSTPONED
[jamesodhunt] Ensure system job running the 'post-stop' process can be serialised, deserialised and allowed to continue: POSTPONED
[jamesodhunt] Ensure system job running the 'post-stop' process can be serialised, deserialised and stopped: POSTPONED
[jamesodhunt] Ensure system job running the 'post-stop' process can be serialised, deserialised and restarted: POSTPONED
[jamesodhunt] Ensure user job running the 'post-stop' process can be serialised, deserialised and allowed to continue: POSTPONED
[jamesodhunt] Ensure user job running the 'post-stop' process can be serialised, deserialised and stopped: POSTPONED
[jamesodhunt] Ensure user job running the 'post-stop' process can be serialised, deserialised and restarted: POSTPONED
[jamesodhunt] Ensure that a stopped system job can be serialised, deserialised and started: DONE
[jamesodhunt] Ensure that a running multi-instance system job can be serialised, deserialised and continued: POSTPONED
[jamesodhunt] Ensure that a running multi-instance user job can be serialised, deserialised and continued: POSTPONED
[jamesodhunt] Ensure that a blocked multi-instance system job can be serialised, deserialised and started: POSTPONED
[jamesodhunt] Ensure that a blocked multi-instance user job can be serialised, deserialised and started: POSTPONED
[jamesodhunt] Ensure that a system job blocked on an event can be serialised, deserialised and started: POSTPONED
[jamesodhunt] Ensure that a user job blocked on an event can be serialised, deserialised and started: POSTPONED
[jamesodhunt] Ensure that a system job blocked on another job can be serialised, deserialised and started: DONE
[jamesodhunt] Ensure that a user job blocked on another job can be serialised, deserialised and started: DONE
[jamesodhunt] Ensure that an event blocked on a system job can be serialised, deserialised and 'continue': DONE
[jamesodhunt] Ensure that an event blocked on a user job can be serialised, deserialised and 'continue': POSTPONED
[jamesodhunt] Ensure that a stopped system job in a chroot can be serialised, deserialized and started: POSTPONED
[jamesodhunt] Ensure that a stopped user job can be serialised, deserialised and started: POSTPONED
[jamesodhunt] Ensure that a task which dies after serialisation but before deserialisation is handled: POSTPONED
[jamesodhunt] Ensure that a respawn service which dies after serialisation but before deserialisation is handled: POSTPONED
[jamesodhunt] Ensure that a respawn service which forks once after serialisation but before deserialisation is handled: POSTPONED
[jamesodhunt] Ensure that a respawn service which forks twice after serialisation but before deserialisation is handled: POSTPONED
[jamesodhunt] D-Bus BLOCKED_EMIT_METHOD tests: POSTPONED
[jamesodhunt] D-Bus BLOCKED_JOB_START_METHOD tests: POSTPONED
[jamesodhunt] D-Bus BLOCKED_JOB_STOP_METHOD tests: POSTPONED
[jamesodhunt] D-Bus BLOCKED_JOB_RESTART_METHOD tests: POSTPONED
[jamesodhunt] D-Bus BLOCKED_INSTANCE_START_METHOD tests: POSTPONED
[jamesodhunt] D-Bus BLOCKED_INSTANCE_STOP_METHOD tests: POSTPONED
[jamesodhunt] D-Bus BLOCKED_INSTANCE_RESTART_METHOD tests: POSTPONED

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.