snowballs don't come back after hard reset

Bug #891078 reported by Dave Pigott
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
LAVA Validation Lab
Fix Released
Medium
Paul Larson

Bug Description

When the snowballs are rebooted with hard reset they seem to lose the "cu" connection and it never comes back. To restore the connection you have to stop the particular conmux daemon and then start it again. For the time being I have taken them offline to try to triage and figure out a fix

Revision history for this message
Dave Pigott (dpigott) wrote :

After further investigation, it looks to me as though conmux may have changed it's behaviour. Previously, we used to get the message <<<PAYLOAD LOST ... retrying in 30 secs>>> (the time could be variable) and we would eventually retry and connect. My feeling at this stage is that the retry isn't happening.

If I just run 'sg dialout "cu -l /dev/USBsnowball01 -s 115200"' after stopping the conmux daemon for that board, we just get a disconnect after power is removed. If I reconnect power and then restart the dial out, it connects properly.

Revision history for this message
Paul Larson (pwlars) wrote :

ChiThu, could you help look into this please since you have a snowball?

Changed in lava-lab:
assignee: nobody → Le Chi Thu (le-chi-thu)
Revision history for this message
Dave Pigott (dpigott) wrote :

ChiThu, if you could try to reproduce with your snowball it would tell us if it's just a local problem, or something oneiric related. This all worked until we upgraded the server last week.

Revision history for this message
Le Chi Thu (le-chi-thu) wrote : Re: [Bug 891078] Re: snowballs don't come back after hard reset

Hi

It is not a snowball issue. See my other email about fixing the conmux for
11.10

I found solution of the problem with conmux not retry to connect to the
serial port when cu exits (when the serial port gone). Apply for all board
with serial over USB and 11.10 only

For Ubuntu 11.10 - the libio-multiplex-perl is 1.13-1
for Ubuntu natty - the libio-multiplex-perl is 1.10-1

It is one file. /usr/share/perl5/IO/Multiplex.pm which is use by conmux to
register callback. In 11.10 the callback does not occured.

I replaced the Multiple.pm (embedded in this email) from
libio-multiplex-perl version 1.10-1 and the conmux works agains.

BR
/Chi Thu

On 17 November 2011 14:27, Dave Pigott <email address hidden> wrote:

> ChiThu, if you could try to reproduce with your snowball it would tell
> us if it's just a local problem, or something oneiric related. This all
> worked until we upgraded the server last week.
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/891078
>
> Title:
> snowballs don't come back after hard reset
>
> Status in LAVA Validation Lab:
> New
>
> Bug description:
> When the snowballs are rebooted with hard reset they seem to lose the
> "cu" connection and it never comes back. To restore the connection you
> have to stop the particular conmux daemon and then start it again. For
> the time being I have taken them offline to try to triage and figure
> out a fix
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/lava-lab/+bug/891078/+subscriptions
>

Revision history for this message
Paul Larson (pwlars) wrote :

No, it's a usb attached board issue, of which snowball is the only one we have that requires a usb connection at the moment. :)
Did you see my email though? I've worked around this at the moment by just calling cu in a loop from conmux. Unless you see a downside to this, the upside is that it also fixes the issue we have with not being able to cache the first several seconds of boot when, and having to work around that for snowaball boards in lava-dispatcher.

However, for completeness, if this is a bug in libio-multiplex-perl, you should consider filing a bug against that and how you see it as a general problem, not just a problem for our case.

Revision history for this message
Le Chi Thu (le-chi-thu) wrote :

Paul found a work around. Here is his email.

Paul Larson <email address hidden>
22 Nov (7 days ago)

to me, Validation
I commented on this in the bug also, but all I did was put cu in a loop. This actually fixed two issues:
1. exiting and not respawning when we hardreset the board
2. not seeing the first several seconds of boot, and having to handle snowball as a special case by doing a messy hardreset/wait/softreset loop. (this problem was one we had for a while).

As mentioned, it's a bit of a hack, but it seems to work pretty nicely. Reverting to an older version of the library would fix problem #1, but not #2.

Thanks,

Changed in lava-lab:
status: New → Fix Released
status: Fix Released → Confirmed
Revision history for this message
Paul Larson (pwlars) wrote :

Thought this got marked fix-released with the change I did in the conmux config - please reopen and comment if you disagree

Changed in lava-lab:
assignee: Le Chi Thu (le-chi-thu) → Paul Larson (pwlars)
importance: Undecided → Medium
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.