invoking dhclient3 with -1 causes issue if no dhcp server available

Bug #974284 reported by Scott Moser
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
isc-dhcp (Ubuntu)
Fix Released
High
Stéphane Graber
Precise
Fix Released
High
Stéphane Graber
Quantal
Fix Released
High
Stéphane Graber

Bug Description

[rational]
A patch was designed to fix this bug back in precise but because of where it was put in debian/patches/00list it was actually being reverted at build time and so never fixing that bug.

In quantal, the reverting code in debian/rules is now gone, so it's been applied ever since the 4.2 merge, so far without hearing any problem with it.

[test case]
 1) Start dhclient -1 <interface> on a working network
 2) Unplug the network cable or stop the DHCP server
 3) Wait for the lease to expire
 4) Check that dhclient tries to get a new lease and when failing, keeps trying

The original behaviour of -1 would make 4) try just a single time, then give up, causing dhclient to remove all addresses and exit on a machine that was unable to reach its dhcp server for >= expiry.

[regression potential]
This change is definitely causing a slight change in behaviour, though based on this bug report and others, it's believe to be the wanted behaviour of -1 for most of our users.
The change itself has been applied to quantal without any regression and was tested on 12.04 in the past (before I messed up the ordering in the final upload ...).
The code change itself just makes "-1" use the same renewal behaviour as when called without "-1" (but still follows the standard "-1" behaviour for the first request).

In bug 838968, we modified ifupdown to invoke dhclient3 with '-1' as a parameter [1], and subsequently changed the default timeout of dhclient in isc-dhcp3 to from 60 seconds to 300 seconds [2].

The reason for this is that we now have a reliable "static-networking-up" event that can be used for upstart jobs to start on, when static networking is up. Here, static is any networking with an entry in /etc/network/interfaces.

That event is used by cloud-init and other things that depend on network.

The fallout of this is that if for some reason a server (or cloud-instance, or anything really), boots and does not obtain a dhcp address in 5 minutes, then it will give up forever. The previous behavior is that it would try forever.

This scenario isn't terribly unrealistic. A power fail could take out a dchp server, cause a fsck, while the server came up 5 minutes before the dhcp server was up.

Issue was originally raised in #openstack-dev by rmk around 2012-04-05T06:42:19 [3]

--
[1] http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/precise/ifupdown/precise/revision/56
[2] http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/precise/isc-dhcp/precise/revision/32
[3] http://eavesdrop.openstack.org/irclogs/%23openstack-dev/%23openstack-dev.2012-04-05.log

Releated bugs:
  * bug 838968: static-network-up event does not wait for interfaces to have an address

Scott Moser (smoser)
description: updated
description: updated
Revision history for this message
Rafi Khardalian (rkhardalian) wrote :

The problem is actually more prevalent than just boot time. In fact, if the DHCP server goes away at any point and the lease expires, dhclient will timeout and exit without ever retrying again. The result is that the system in question basically falls off the network, with the only way to recover being manual intervention. To be clear, this means recovery requires physical access or remote management capabilities.

It's straightforward to reproduce the problem. Configure an Ubuntu system as a DHCP client and set the lease time on the server to something short, like 60 seconds. After the system has successfully booted and grabbed an IP, shut down the DHCP server. Within 2 minutes, the dhcp client will timeout, lose it's IP address and drop off the network with no chance of recovering automatically.

Revision history for this message
Steve Langasek (vorlon) wrote :

I think the upshot here is that neither dhclient -1 nor dhclient without -1 gives us the behavior we need for reliable handling via ifupdown; so we either need to modify the behavior of -1 to be more sensible regarding dropped leases *after* the initial lease, or we need to add a new dhclient option that does the right thing. Stéphane, do you agree?

Changed in ifupdown (Ubuntu):
assignee: nobody → Stéphane Graber (stgraber)
affects: ifupdown (Ubuntu) → isc-dhcp (Ubuntu)
Changed in isc-dhcp (Ubuntu):
assignee: Stéphane Graber (stgraber) → nobody
importance: Undecided → High
assignee: nobody → Stéphane Graber (stgraber)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in isc-dhcp (Ubuntu):
status: New → Confirmed
Revision history for this message
Stéphane Graber (stgraber) wrote :

16:58 < smoser> ie, to safeguard for the power failure case, where 5 minutes wasn't enough, but 5 minutes and 3 seconds would have been.
16:59 < stgraber> so we want dhclient to try for $TIMEOUT, if it gets a lease during that time, background itself and get into a renewal/retry loop, if it doesn't, then ???
16:59 < stgraber> for ??? I tend to prefer failing as I really don't want to run the ifupdown hooks unless we have a working connection
17:00 < stgraber> but you clearly prefer continuing regardless that'll make all the upstart jobs trigger as well as all upstart hooks (even though you don't have network access)

So to make this happen we'd need to patch isc-dhcp-client to add a new parameter (to avoid regressing -1 or the default mode) that essentially only triggers the initial wait for lease part of -1 but not the exit on failure to renew part.

Revision history for this message
Scott Moser (smoser) wrote : Re: [Bug 974284] Re: invoking dhclient3 with -1 causes issue if no dhcp server available

On Thu, 5 Apr 2012, Stéphane Graber wrote:

> 16:58 < smoser> ie, to safeguard for the power failure case, where 5 minutes wasn't enough, but 5 minutes and 3 seconds would have been.
> 16:59 < stgraber> so we want dhclient to try for $TIMEOUT, if it gets a lease during that time, background itself and get into a renewal/retry loop, if it doesn't, then ???
> 16:59 < stgraber> for ??? I tend to prefer failing as I really don't want to run the ifupdown hooks unless we have a working connection
> 17:00 < stgraber> but you clearly prefer continuing regardless that'll make all the upstart jobs trigger as well as all upstart hooks (even though you don't have network access)

Well, I don't know that I prefer that. Sorry I didn't see that earlier.
Obviously, triggering jobs that expect network when network isn't up isn't
a great solution.

What about exiting failure after $TIMEOUT, but forking into retry loop?

Revision history for this message
Stéphane Graber (stgraber) wrote :

That sounds reasonable and is what I implemented for the isc-dhcp change currently in the queue.

isc-dhcp will "exit 2" if the first attempt to get a lease fails, if it works, it then gets into the same code path as a regular dhclient call.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Sorry, thanks to Adam for making me read Scott's comment properly ;)

Forking into a retry loop before exit 2 sounds really bad as you'll still get network coming up at some random point without triggering any of the ifupdown/upstart hooks/jobs.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package isc-dhcp - 4.1.ESV-R4-0ubuntu5

---------------
isc-dhcp (4.1.ESV-R4-0ubuntu5) precise; urgency=low

  * When dhclient is called with -1, exit on failure to get a lease only
    when getting the initial lease. Once backgrounded, behave exactly like
    in normal mode. (LP: #974284)
 -- Stephane Graber <email address hidden> Tue, 10 Apr 2012 14:19:23 +0200

Changed in isc-dhcp (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Marc Deslauriers (mdeslaur) wrote :

The patch to correct this issue never actually made it to precise, as it got added at the end of the 00series file, which gets reverted during build. Reopening.

Changed in isc-dhcp (Ubuntu):
status: Fix Released → Confirmed
Changed in isc-dhcp (Ubuntu Precise):
status: New → Confirmed
Changed in isc-dhcp (Ubuntu Quantal):
status: Confirmed → Fix Released
Changed in isc-dhcp (Ubuntu Precise):
importance: Undecided → High
assignee: nobody → Stéphane Graber (stgraber)
Changed in isc-dhcp (Ubuntu Quantal):
assignee: Stéphane Graber (stgraber) → nobody
description: updated
Revision history for this message
Clint Byrum (clint-fewbar) wrote : Please test proposed package

Hello Scott, or anyone else affected,

Accepted isc-dhcp into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/isc-dhcp/4.1.ESV-R4-0ubuntu5.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in isc-dhcp (Ubuntu Precise):
status: Confirmed → Fix Committed
tags: added: verification-needed
Revision history for this message
Steve Langasek (vorlon) wrote :

Scott, can you confirm that this fixes the issue with dhclient -1 for you? We're waiting for SRU verification before publishing this change.

Revision history for this message
Stéphane Graber (stgraber) wrote :

The patch is being applied but the result is still wrong. dhclient dies after the lease expires instead of just keeping retrying indefinitely.

This also suggests that the fix in Ubuntu 12.10 doesn't actually work... I'll have to take another go at that fix and update the patch in 12.10 and 12.04.

tags: added: verification-failed
removed: verification-needed
Changed in isc-dhcp (Ubuntu Precise):
status: Fix Committed → Triaged
Changed in isc-dhcp (Ubuntu Quantal):
status: Fix Released → Triaged
assignee: nobody → Stéphane Graber (stgraber)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package isc-dhcp - 4.2.4-1ubuntu8

---------------
isc-dhcp (4.2.4-1ubuntu8) quantal; urgency=low

  [ Scott Moser ]
  * debian/apparmor-profile.dhcpd: use include directory to enable
    other packages to re-use isc-dhcp-server. (LP: #1049177)

  [ Stéphane Graber ]
  * Re-introduce the wait_for_rw code in dhclient-script which got lost
    in the last merge, this code is there for the few rare systems that
    aren't using resolvconf and don't have /etc mounted read/write by the
    time dhclient-script is called.
  * Update onetry_retry_after_initial_success to disable the onetry variable
    early enough to actually prevent dhclient from exiting. (LP: #974284)
 -- Stephane Graber <email address hidden> Wed, 12 Sep 2012 17:30:26 -0400

Changed in isc-dhcp (Ubuntu Quantal):
status: Triaged → Fix Released
Revision history for this message
Stéphane Graber (stgraber) wrote :

I ended up just moving part of the patch a bit before so that it actually does something.

I tested with "dhclient -1 -d" before and after confirming that when there's no connectivity in the initial request it properly exits after a minute but that if it gets a lease and then looses connectivity it'll never exit again.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Committed the same fix to my local isc-dhcp SRU branch, this will be pushed to proposed later this week.

Changed in isc-dhcp (Ubuntu Precise):
status: Triaged → Fix Committed
status: Fix Committed → In Progress
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Hello Scott, or anyone else affected,

Accepted isc-dhcp into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/isc-dhcp/4.1.ESV-R4-0ubuntu5.6 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in isc-dhcp (Ubuntu Precise):
status: In Progress → Fix Committed
tags: removed: verification-failed
tags: added: verification-needed
Revision history for this message
Stéphane Graber (stgraber) wrote :

I remember testing this and nobody reported any regression, good to go.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package isc-dhcp - 4.1.ESV-R4-0ubuntu5.6

---------------
isc-dhcp (4.1.ESV-R4-0ubuntu5.6) precise-proposed; urgency=low

  [ Scott Moser ]
  * debian/apparmor-profile.dhcpd: use include directory to enable
    other packages to re-use isc-dhcp-server. (LP: #1049177)

  [ Stéphane Graber ]
  * Update onetry_retry_after_initial_success to disable the onetry variable
    early enough to actually prevent dhclient from exiting. (LP: #974284)
  * Update droppriv patch to also call initgroups() (LP: #727837)
 -- Stephane Graber <email address hidden> Tue, 18 Sep 2012 10:34:10 -0400

Changed in isc-dhcp (Ubuntu Precise):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.