Instance fails to start

Bug #610479 reported by C de-Avillez
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Eucalyptus
Fix Released
Undecided
Unassigned
eucalyptus (Ubuntu)
Fix Released
High
Dustin Kirkland 
Maverick
Fix Released
High
Dustin Kirkland 

Bug Description

Instances fail to start, at a rate near 10%. All failed instances show up in euca-describe-instances with both public and private IPs set to 0.0.0.0, and stay in pending for a while (in our tests, around 20 minutes) before being forcefully terminated.

Maverick, Eucalyptus 2.0.

eucalyptus-cc 2.0~bzr1211-0ubuntu1 eucalyptus install ok installed
eucalyptus-cloud 2.0~bzr1211-0ubuntu1 eucalyptus install ok installed
eucalyptus-common 2.0~bzr1211-0ubuntu1 eucalyptus install ok installed
eucalyptus-gl 2.0~bzr1211-0ubuntu1 eucalyptus install ok installed
eucalyptus-java-common 2.0~bzr1211-0ubuntu1 eucalyptus install ok installed
eucalyptus-sc 2.0~bzr1211-0ubuntu1 eucalyptus install ok installed
eucalyptus-walrus 2.0~bzr1211-0ubuntu1 eucalyptus install ok installed
libeucalyptus-commons-ext-java 0.5.0-0ubuntu2 eucalyptus-commons-ext install ok installed
uec-component-listener 2.0~bzr1211-0ubuntu1 eucalyptus install ok installed

Revision history for this message
C de-Avillez (hggdh2) wrote :
Download full text (6.4 KiB)

I searched the logs for one such instance:

ubuntu@cempedak:/var/log/eucalyptus$ grep i-58C70904 *
cc.log.2:[Mon Jul 26 20:32:27 2010][013178][EUCADEBUG ] RunInstances(): running instance i-58C70904 with emiId emi-3C7E1C5B...
cc.log.2:[Mon Jul 26 20:42:45 2010][013178][EUCADEBUG ] TerminateInstances(): params: userId=(null), instIdsLen=1, firstInstId=i-58C70904
cloud-debug.log.2:20:32:27 INFO [ResourceToken:ClusterSink.16] :1280190747674:ResourceToken:77c754b4-9383-476a-bb07-a516fbb228a0:TOKEN_SPLIT:ResourceToken [addresses=[10.55.55.123], amount=1, cluster=UEC-TEST1, correlationId=77c754b4-9383-476a-bb07-a516fbb228a0, creationTime=Mon Jul 26 20:32:27 EDT 2010, instanceIds=[i-58C70904], networkTokens=[NetworkToken [cluster=UEC-TEST1, indexes=[4], name=admin-uectest-g0, networkName=uectest-g0, userName=admin, vlan=10]], sequenceNumber=398, userName=admin, vmType=c1.xlarge]:ClusterAllocator.<init>.129
cloud-debug.log.2:20:32:27 DEBUG [QueuedEventCallback:UEC-TEST1-ClusterAllocator-208] :1280190747934:QueuedEventCallback:QUEUE:class edu.ucsb.eucalyptus.cloud.VmRunType:[VmRunType reservationId=r-2EAC0570 userData= min=1 max=1 vlan=10 launchIndex=0 imageInfo=VmImageInfo [ancestorIds=[], imageId=emi-3C7E1C5B, imageLocation=http://10.55.55.2:8773/services/Walrus/maverick-20100726-amd64-20100726172535/maverick-server-uec-amd64.img.manifest.xml, kernelId=eki-D9422170, kernelLocation=http://10.55.55.2:8773/services/Walrus/maverick-20100726-amd64-20100726172535/maverick-server-uec-amd64-vmlinuz-virtual.manifest.xml, productCodes=[], ramdiskId=null, ramdiskLocation=null, size=1476395008] vmTypeInfo=VmTypeInfo [name='c1.xlarge', memory=2048, disk=20, cores=4] keyInfo=VmKeyInfo [fingerprint=10:14:76:bd:e7:4c:ec:96:20:da:2f:2d:5f:87:61:0c:ad:53:a6:40, name=uectest-k0, value=ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCQTKr0e39y1gpz2e0w80xfgmmI5tIoGd7cLYLb2ii12gJJImJwyOTaViXoUn26aQ5iN3z4zzXGW0LxupqLHWDApTZYG4Aqu2GuaVyxcBTgCJ7qthoKrn7wjifJsr6gF5a8LrPWdPu+8WKEIxLy/o45Y99jDuBZHWvNG4SbnUS94XX7Z1dLwaSYMMVGryB0TFCJ8UJXNLOGsKqJnshJwjMiGi5LONhjmIfrqzncghcd1M+9gGJ1P30EG5rrgTaWyIGetI6oKY6EKip0FJjBXFBhEI9rEHruSZIt4A0ADxlldqYD72XSHTUnn3dSD+wtlzwKEF5uR7Ss5HXXe0ANbGvf admin@eucalyptus] instanceIds=[i-58C70904] macAddresses=[d0:0d:58:C7:09:04] networkNames=[uectest-g0] networkIndexList=[4] correlationId=77c754b4-9383-476a-bb07-a516fbb228a0-220298 userId=admin effectiveUserId=eucalyptus _return=false statusMessage=null]:StatefulMessageSet.run.104
cloud-debug.log.2:20:42:45 INFO [VmInstance:New I/O client worker #2-10] i-58C70904 state change: PENDING -> TERMINATED
cloud-debug.log.2:20:42:45 DEBUG [QueuedEventCallback:New I/O client worker #2-10] :1280191365437:QueuedEventCallback:QUEUE:class edu.ucsb.eucalyptus.msgs.TerminateInstancesType:[TerminateInstancesType instancesSet=[i-58C70904] correlationId=30b262eb-d044-492a-9040-1901ebbbff48 userId=null effectiveUserId=null _return=false statusMessage=null]:VmInstance.setState.247
cloud-debug.log.2:20:42:45 INFO [VmInstance:New I/O client worker #2-10] :1280191365438:VmInstance:VM_STATE:user=admin:instance=i-58C70904:type=c1.xlarge:state=TERMINATED:details=[]:SystemState.handle.147
cloud-debug.log.2:20:42:45 INFO [TerminateInstan...

Read more...

description: updated
Thierry Carrez (ttx)
Changed in eucalyptus (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Dave Walker (davewalker) wrote :

@C de-Avillez: Is this behavior presenting similar characteristics to the pre SRU Lucid (1.6.2) version? Is it totally unrelated to bug #566792 ?

Revision history for this message
C de-Avillez (hggdh2) wrote :

@Dave W:

> Is this behavior presenting similar characteristics to the pre SRU Lucid (1.6.2) version?

no, signature is distinct. On bug 566792 we would have the instance startup and reach RUNNING, with both private and public IPs set to the same (private). Here the instance does not ever seen to get off PENDING, and both (private|public) IP addresses are shown as 0.0.0.0.

> Is it totally unrelated to bug #566792 ?

Given the above, yes, it seems totally unrelated. I have not yet had time to dig into it, though. I will try to repeat -- heh. I *will* repeat, it is guaranteed to fail -- and follow an instance to the NC.

Revision history for this message
C de-Avillez (hggdh2) wrote :

milestoning to Alpha3

Changed in eucalyptus (Ubuntu):
milestone: none → maverick-alpha-3
Thierry Carrez (ttx)
Changed in eucalyptus (Ubuntu Maverick):
milestone: maverick-alpha-3 → ubuntu-10.10-beta
Revision history for this message
Ye Wen (wenye) wrote :

Could you verify this is still happening with the latest commits? I guess one of my fixes solves the problem.

Revision history for this message
C de-Avillez (hggdh2) wrote :

Seems to still happen on r1219. I have just ran a small (200) instances test, and I got one such error. I will be uploading the logs to the bzr repository in a few.

Revision history for this message
C de-Avillez (hggdh2) wrote :

logs uploaded:

bzr+ssh://bazaar.launchpad.net/~hggdh2/%2Bjunk/uec-qa/
Pushed up to revision 28.

Thierry Carrez (ttx)
Changed in eucalyptus (Ubuntu Maverick):
assignee: nobody → Dave Walker (davewalker)
Revision history for this message
Dave Walker (davewalker) wrote :

Recent results indicate that this issue is fixed in an upstream snapshot.

Changed in eucalyptus (Ubuntu Maverick):
status: Confirmed → Fix Committed
Revision history for this message
Dave Walker (davewalker) wrote :

Marking fixed release, as the fix is now released :)

Changed in eucalyptus (Ubuntu Maverick):
status: Fix Committed → Fix Released
Revision history for this message
C de-Avillez (hggdh2) wrote :

Reopening. On my first test run on r1230 I got the following results:

2010-08-13 15:16:03,002 SUMMARY:INFO not-tested=16
2010-08-13 15:16:03,002 SUMMARY:INFO being-tested=0
2010-08-13 15:16:03,002 SUMMARY:INFO success=130
2010-08-13 15:16:03,002 SUMMARY:INFO failed=54
2010-08-13 15:16:03,002 SUMMARY:INFO rescheduled=0

I did not look at all, but pretty much all 54 failed did *not* reach successful running state. Of course, this was a stress test (200 instances, as fast as possible, with one single NC (16 cores). I then ran another test with a larger interval between euca-run-instances, so that we should always have available VMs. I still got about 8% failure rate, all of them failure to start.

Logs are at lp:~hggdh2/uec-qa, revision 35.

Changed in eucalyptus (Ubuntu Maverick):
status: Fix Released → Triaged
Changed in eucalyptus (Ubuntu Maverick):
assignee: Dave Walker (davewalker) → Dustin Kirkland (kirkland)
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eucalyptus - 2.0~bzr1231-0ubuntu1

---------------
eucalyptus (2.0~bzr1231-0ubuntu1) maverick; urgency=low

  * New upstream snapshot, -r1231, bugs fixed by upstream:
    - LP: #566792 - metadata service returns empty data with 200 OK
    - LP: #606243 - euca-describe-availability-zones verbose corrupted
    - LP: #563175 - should hold on to console logs after terminated
    - LP: #613832 - Cannot mark address as allocating
    - LP: #610479 - Instance fails to start
 -- Dustin Kirkland <email address hidden> Tue, 17 Aug 2010 12:49:28 -0400

Changed in eucalyptus (Ubuntu Maverick):
status: In Progress → Fix Released
Changed in eucalyptus:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.