Comment 17 for bug 566792

Revision history for this message
Scott Moser (smoser) wrote : Re: UEC guests sometimes fail on consuming user data (metadata service isn't ready)

more testing has led me to the following:
a.) I cannot reproduce the 200 OK response with empty metadata that we see in the data center rig on my own hardware. That is still an issue.
b.) It appears that in all my tests the metadata service eventually *does* come up. On average in my tests (hundreds if not thousands), the metadata service is up after less than 10 seconds. However, in cases where it is not it doesn't seem to come up until something around 2 minutes.

The solution to 'b' from the instances perspective is to simply retry for much longer than we initially were (which was ~ 30 seconds).

this is quite problematic for any sort of automated testing, because you essentially cannot call a system "failed" for several minutes. Both my tests and mathias' tests give up after less than a minute of unreachability and call the instance failed.

We may have to redesign tets to accomodate this.

I will also say, though, that I've not seen a single case of "metadata service not available" on ec2 in the past 9 months. They seem to have fixed that issue that was present previously.

I plan on putting some better code in the "wait for metadata service", complete with waiting (indefinitely) for a non-empty value in instance-id (which would work around the 200-OK failure, and give us a chance to catch it an debug when it does fail).