When fsck fails, the machine reboots and does the check again

Bug #204097 reported by Mary Gardiner
This bug report is a duplicate of:  Bug #209416: fsck not repairing corruption on boot. Edit Remove
8
Affects Status Importance Assigned to Milestone
upstart (Ubuntu)
New
Undecided
Unassigned

Bug Description

Binary package hint: upstart

Ubuntu Hardy, relevant packages:

usplash 0.5.16
e2fsprogs 1.40.3-1
upstart 0.3.9-1

I have recently had some filesystem corruption due to a failing hard drive. This was originally discovered by one of the routine fscks performed (according to tune2fs) every 22 mounts on this particular partition. It seems that when fsck fails in a way that requires a manual check, the boot sequence stops for a moment, with usplash's status area going black, and then the machine reboots in the normal fashion and tries automatic fsck again. I observed this happen twice (after that I realised what was happening, booted into the LiveCD and start copying all the data that I could).

While a manual fsck is not going to be an easy thing to wrap up prettily or walk a user through, it seems that usplash or upstart would keep trying the automatic fsck over and over again, without providing errors other than the repeated unexplained reboots and checks of the drive.

I do not think that this is a duplicate of bug 83831, at least on the face of it, as that bug seems to be about filesystem checks that pass, not ones that fail.

Revision history for this message
Andrew (keen101) wrote :

I would like to confirm.

 It's a bit hard to understand from the explanation above, but I think i have relatively the same problem.

After installing the latest updates on my hardy heron today (Mar, 22, 2008), I rebooted. Upon the boot-up the normal splash screen appeared, and soon underneath I noticed that in small orange letters it said it was doing a routine disk check (because it had been mounted 22 times). Which was OK, since I've seen it do a disk check with no problem before. However, at around 70% (maybe less), it stops and appears to hang. The system then proceeds to reboot. If left unattended the system will repeat this procedure and reboot indefinatley. The only way to avoid it, is to press ESC when it starts doing the disk check. That way it never hangs, and continues to boot normally.

Revision history for this message
Mary Gardiner (puzzlement) wrote : Re: [Bug 204097] Re: When fsck fails, the machine reboots and does the check again

Andrew's bug does have the same symptoms of mine, except for one unknown: we don't know if he had file system errors.

Andrew, do you know if you have filesystem errors? You can get the fsck to complete by booting into recovery mode (root shell), finding out which partition has Linux on it if you don't know ("fdisk -l" lists partitions) and running "fsck [device name]", eg "fsck /dev/sda1" or "fsck /dev/sda3" or whatever. (Incidently, if you repair errors and new errors show up on subsequent boots, this can be a sign of a failing hard drive.) You can also run fsck from the Live (install) CD if you boot into it. You cannot run it during normal operation, as the filesystem must be read-only or unmounted.

Revision history for this message
Andrew (keen101) wrote :

We'll, I did what you suggested. I booted into recovery mode, and ran fsck /dev/sda1.

It found a file system with errors. After fixing them, my system boots fine. Thanks for the help. I've never actually done a manual fsck check before.

I do not think I have a failing hard disk though. I think hardy heron just never installed right, because i was having a few problems to begin with. One of which was an error in wine about an lspci sound error or something. The fsck check fixed something called that, so maybe wine will run OK now. I had just chalked all that stuff up to bugs, since hardy was still in alpha. I will keep an eye out for future file system errors nonetheless though.

-Andrew Barney
(keen101)

Revision history for this message
Andrew (keen101) wrote :

In hindsight though, I have been using the "Ext2Fsd-0.37" program on windows xp occasionally.

Who knows if i caused the errors accidentally on my system.

oh, and p.s.: when the fsck had froze, and rebooted. It seemed to freeze at 30%. not 60%. Don't if that helps though.

Revision history for this message
Jim Hutchinson (jphutch) wrote :

Yep. Same here. My system had errors and then got into an endless reboot pattern - or I suspect would have been endless if I hadn't stopped it. I booted the live CD and ran fsck. Attached is the output of the check showing the errors. To be honest, I find it hard to believe that so many errors were the result of a bad drive. I won't discount that it could be bad but as it's less than a year old and not heavily used it seems suspect. Also, why are the errors coming so soon after install? Coincidence perhaps but maybe worth looking into.

Revision history for this message
Mary Gardiner (puzzlement) wrote :
  • unnamed Edit (189 bytes, application/pgp-signature; name="signature.asc")

Maybe I've confused people with the bad hard drive thing. ONE fsck with errors is not necessarily the result of a bad hard drive. It could be a bad shutdown (although journalled filesystems like ext3 are less susceptible) or other things. It's if you need to REPEATEDLY manually fsck a hard drive (ie you have to do the recovery/LiveCD thing a bunch of times) within a short period of time that a bad hard drive should be suspected as a cause. *I* noticed this bug on what happened to be a bad hard drive, others of you have only so far had filesystem errors.

Revision history for this message
Andrew (keen101) wrote :

I have been having to power off my machine lately in gutsy, because it freezes and cannot be accessed or "CTLR + ALT + BACKSPACED". It has been unresponsive. So, lately I've just had to power off. That too may have caused my file system error.

I should probably report that as a bug too. But, I have no idea what to report it as.

Revision history for this message
Mary Gardiner (puzzlement) wrote :
  • unnamed Edit (189 bytes, application/pgp-signature; name="signature.asc")

Re the possible other bug have a look at the file /var/log/syslog at just before the crash and see if that gives you any clues.

Revision history for this message
Theodore Ts'o (tytso) wrote :

See my comment on bug 209416.

Revision history for this message
Greg Yates (gyates) wrote :

Thank you Mary Gardner, your description on using fdisk and fsck was short and excellent. I have been workking for three days trying to fix my broken Ubuntu partition. WinXP hikkuped and required a boot disk scan (I suspect from my son not removing his camera properly) and I think that it may have messed some things up. I wish more people were as good in explaining things as you were here.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.