Comment 5 for bug 1878973

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hmm, I spawned a clean focal as well, installed and started qmeu-guest-agent but it works as expected:

ubuntu@focal:~$ systemctl status qemu-guest-agent
● qemu-guest-agent.service - QEMU Guest Agent
     Loaded: loaded (/lib/systemd/system/qemu-guest-agent.service; static; vendor preset: enabled)
     Active: active (running) since Mon 2020-05-18 09:15:05 UTC; 2s ago
   Main PID: 36945 (qemu-ga)
      Tasks: 1 (limit: 533)
     Memory: 824.0K
     CGroup: /system.slice/qemu-guest-agent.service
             └─36945 /usr/sbin/qemu-ga
May 18 09:15:05 focal systemd[1]: Started QEMU Guest Agent.

The crash has not used any special arguments, from your dump:
  ProcCmdline: /usr/sbin/qemu-ga

This hits an assertion in the code og the guest agent, from the crash when thrown into gdb:

(gdb) frame 4
#4 0x0000556edcfb3451 in send_response (s=0x556ede07b940, s=0x556ede07b940, rsp=0x0) at ./qga/main.c:532
532 g_assert(rsp && s->channel);
(gdb) p rsp
$1 = (const QDict *) 0x0
(gdb) p s->channel
$2 = (GAChannel *) 0x556ede07be90

This is rsp which is 0x0 is created at process_event:
    rsp = qmp_dispatch(&ga_commands, obj, false);

It is unchecked and passes it to send_response which fails on the assert.

On dispatch it runs:
    ret = do_qmp_dispatch(cmds, request, allow_oob, &err);
    if (err) {
        rsp = qmp_error_response(err);
    } else if (ret) {
        rsp = qdict_new();
        qdict_put_obj(rsp, "return", ret);
    } else {
        /* Can only happen for commands with QCO_NO_SUCCESS_RESP */
        rsp = NULL;
    }

Since we get a NULL back from here we likely have hit the third case and the code forgot to check for it.

The debug info at this stage isn't enough to see why it failed.
(gdb) p *obj
$6 = {base = {type = QTYPE_QDICT, refcnt = 1}}
gdb) p ga_commands
$4 = {tqh_first = 0x556ede077d30, tqh_circ = {tql_next = 0x556ede077d30, tql_prev = 0x556ede0785c8}}

So we have two issues here:
1. when processing a command fails qemu-ga gets into a hard crash.
   Instead if rsp == NULL the function process_event should go into its error path I'd think.
2. processing of the command failed.
   To go further with your case as well as a patch for #1 we'd need to understand
   (to reproduce and submit a patch) what exactly triggers this.

You said focal/mate on install. Is this reproducible? If so might this align to one of the jobs qemu-ga has like resolution changes or any of it? Can you later start and run qemu-qa without issues? If so can you identify which then would trigger a new crash?