Ubuntu

Overcome ptrace limitations in Upstart

Registered by James Hunt on 2012-05-02

= Summary =

Upstart currently uses ptrace(2) for job PID tracking. This generally works very well but there are a few scenarios where it doesn't:

(1) Cannot track jobs which themselves call ptrace(2).

Workarounds: don't use such apps ;-)

(2) Cannot track apps which are being run through gdb

This is a specific example of (1).

Workaround: run "gdb -p <pid>" in post-start.

(3) Upstart tracks wrong PID if "expect X" used with "script" stanza.

Workaround: RTFM and handle script logic in pre-start allowing daemon to run via "exec" stanza.

(4) Cannot handle tracking jobs that fork the expected number of times, but then performs further work before they are fully initialised.

    (This is not strictly an issue with ptrace - it's the "service
    readiness" problem - see
    https://blueprints.launchpad.net/ubuntu/+spec/foundations-q-upstart-service-read).

Workaround: nothing generic - requires specialist knowledge of how the application works.

(5) Upstart does not wait for parent PID to exit before following the child.

Workaround: some form of ugly post-start polling.

(6) It requires users to determine how many times a service forks.
      This requires one of:
      - a published explanation of how the service starts.
      - the user running the daemon via 'strace -f' to count the forks
        (see http://upstart.ubuntu.com/cookbook/#how-to-establish-fork-count)
      - the user guessing the value and getting it wrong
        (see http://upstart.ubuntu.com/cookbook/#implications-of-misspecifying-expect)

      Workarounds:
      - Don't guess! :)
      - Nothing generic - requires specialist knowledge of how the application works.

However, even with the magic fork number, incorrect behaviour can result if:

- the daemon is run incorrectly (with a '--foreground' / '--no-daemon' flag or similar)
- the daemon is configured to not fork via /etc/default/foo).

Also, you can end up with a "stuck job" that cannot be cleared without a reboot or gross scripts that exhaust the PID namespace.

= Discussion =

Discuss the staged plan documented in [1] whereby we:

(A) Track exits, not forks (or should that be track exits *and* forks?)

(B) Track all PIDS a job process has been known by, not just the most recent one.

= Advantages =

- By providing Upstart with full knowledge of PID history for a job, it should be possible to avoid the most important problems, namely (3) and (5).
((1) is very unusual, (2) is easily resolved, (4) is being dealt with on another blueprint, (6) is covered below).

= Disadvantages =

- cgroups ties Upstart further to Linux

= Observations =

- Job lookup internal to Upstart will be by cgroup rather than by PID. Note that ptrace tracking would be retained for flexibility.
- Once implemented, we should also consider enhancing the 'expect' stanza to allow 'expect fork <n>' and 'expect exit <n>' for flexibility.

- To address (6), we could conceivably provide a utility (strace/ptrace/cgroups/init run as user) that runs the specified daemon application and prints out data in the following format:

<count> <pid> <name> [fork|exit]

It would then be up to the user to check the daemons log file or similar to determine the actual PID.
Once done, looking at the output should allow the correct determination of the "expect" stanza.

= Thoughts =

- A partial solution to (6) ?: One of the things "expect" does is tell Upstart *when* it can emit the "started" event (after 'x' forks).

Introducing tracking of all PIDS and cgroups does not help in this respect so although we could potentially drop the ptrace usage, we would still need users to specify "expect".

There is a possible optimisation that could be performed for jobs that specify an explicit "ready on" ([2])
that would allow "expect" to be automatically-determined, and thus redundant...

Assuming cgroups support, if a job specifies "ready on", when the "ready on" condition becomes true, Upstart would read the cgroups "tasks" file:

  - If a single PID is found, that is the master (RELIABLE).
  - If multiple PIDS are found, the master is the oldest (by time or ancestry).
    This should cover 99% of daemon behaviour so is not fully reliable.

= References =

Related Bugs:

- https://bugs.launchpad.net/upstart/+bug/406397
"init: job stuck with expect fork/daemon when parent reaps child"

- https://bugs.launchpad.net/upstart/+bug/530779
"init: does not wait for parent to exit when following forks"

_____
[1] - https://lists.ubuntu.com/archives/upstart-devel/2011-August/001707.html
[2] - https://blueprints.launchpad.net/ubuntu/+spec/foundations-q-upstart-service-readiness