Ubuntu EC2 package mirror intermitent failures

Bug #932088 reported by Paul Sokolovsky
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Linaro Android Infrastructure
Fix Released
High
Paul Sokolovsky
Linaro CI
Fix Released
High
Deepti B. Kalakeri

Bug Description

Not too frequent, but regularly (say, once a month) https://android-build.linaro.org/ service, which is Jenkins with EC2 slaves, gets failures when initializing EC2 instances, which involves installing packages.

The log looks like:
Connecting to ec2-23-20-20-233.compute-1.amazonaws.com on port 22.
Waiting for SSH to come up. Sleeping 5.
Connecting to ec2-23-20-20-233.compute-1.amazonaws.com on port 22.
Connected via SSH.
Authenticating as ubuntu
Connecting to ec2-23-20-20-233.compute-1.amazonaws.com on port 22.
Connected via SSH.
Executing init script
+ apt-get update
Ign http://security.ubuntu.com natty-security InRelease
Get:1 http://security.ubuntu.com natty-security Release.gpg [198 B]
Get:2 http://security.ubuntu.com natty-security Release [39.8 kB]
Ign http://us-east-1.ec2.archive.ubuntu.com natty InRelease
Ign http://us-east-1.ec2.archive.ubuntu.com natty-updates InRelease
Ign http://us-east-1.ec2.archive.ubuntu.com natty Release.gpg
Ign http://us-east-1.ec2.archive.ubuntu.com natty-updates Release.gpg
Ign http://us-east-1.ec2.archive.ubuntu.com natty Release
Ign http://us-east-1.ec2.archive.ubuntu.com natty-updates Release
Ign http://us-east-1.ec2.archive.ubuntu.com natty/main amd64 Packages/DiffIndex
Ign http://us-east-1.ec2.archive.ubuntu.com natty/universe amd64 Packages/DiffIndex
Ign http://us-east-1.ec2.archive.ubuntu.com natty/main TranslationIndex
Ign http://us-east-1.ec2.archive.ubuntu.com natty/universe TranslationIndex
Ign http://us-east-1.ec2.archive.ubuntu.com natty-updates/main amd64 Packages/DiffIndex
Ign http://us-east-1.ec2.archive.ubuntu.com natty-updates/universe amd64 Packages/DiffIndex
Ign http://us-east-1.ec2.archive.ubuntu.com natty-updates/main TranslationIndex
Ign http://us-east-1.ec2.archive.ubuntu.com natty-updates/universe TranslationIndex
Err http://us-east-1.ec2.archive.ubuntu.com natty/main Sources
  403 Forbidden [IP: 10.210.205.172 80]
Err http://us-east-1.ec2.archive.ubuntu.com natty/universe Sources
  403 Forbidden [IP: 10.210.205.172 80]
...

Trying to access a specific file manually shows that there're interleaving 404 and 403 responses for it (the file indeed doesn't exists). And generally situation look like 403 is being intermittently returned for any url (be it 200 or 404 one).

ubuntu@ip-10-243-34-224:~$ wget http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/natty/main/source/Sources
--2012-02-14 13:42:12-- http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/natty/main/source/Sources
Resolving us-east-1.ec2.archive.ubuntu.com... 10.250.142.223, 10.252.111.96, 10.202.26.15, ...
Connecting to us-east-1.ec2.archive.ubuntu.com|10.250.142.223|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2012-02-14 13:42:12 ERROR 404: Not Found.

ubuntu@ip-10-243-34-224:~$ wget http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/natty/main/source/Sources
--2012-02-14 13:42:14-- http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/natty/main/source/Sources
Resolving us-east-1.ec2.archive.ubuntu.com... 10.210.205.172, 10.250.142.223, 10.252.111.96, ...
Connecting to us-east-1.ec2.archive.ubuntu.com|10.210.205.172|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2012-02-14 13:42:14 ERROR 403: Forbidden.

ubuntu@ip-10-243-34-224:~$ wget http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/natty/main/source/Sources
--2012-02-14 13:42:16-- http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/natty/main/source/Sources
Resolving us-east-1.ec2.archive.ubuntu.com... 10.202.26.15, 10.210.205.172, 10.250.142.223, ...
Connecting to us-east-1.ec2.archive.ubuntu.com|10.202.26.15|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2012-02-14 13:42:16 ERROR 404: Not Found.

ubuntu@ip-10-243-34-224:~$ wget http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/natty/main/source/Sources
--2012-02-14 13:42:17-- http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/natty/main/source/Sources
Resolving us-east-1.ec2.archive.ubuntu.com... 10.252.111.96, 10.202.26.15, 10.210.205.172, ...
Connecting to us-east-1.ec2.archive.ubuntu.com|10.252.111.96|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2012-02-14 13:42:17 ERROR 404: Not Found.

ubuntu@ip-10-243-34-224:~$ wget http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/natty/main/source/Sources
--2012-02-14 13:42:18-- http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/natty/main/source/Sources
Resolving us-east-1.ec2.archive.ubuntu.com... 10.250.142.223, 10.252.111.96, 10.202.26.15, ...
Connecting to us-east-1.ec2.archive.ubuntu.com|10.250.142.223|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2012-02-14 13:42:18 ERROR 404: Not Found.

ubuntu@ip-10-243-34-224:~$ wget http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/natty/main/source/Sources
--2012-02-14 13:42:19-- http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/natty/main/source/Sources
Resolving us-east-1.ec2.archive.ubuntu.com... 10.210.205.172, 10.250.142.223, 10.252.111.96, ...
Connecting to us-east-1.ec2.archive.ubuntu.com|10.210.205.172|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2012-02-14 13:42:19 ERROR 403: Forbidden.

Related branches

summary: - Ubuntu EC2 mirror intermitent failures
+ Ubuntu EC2 package mirror intermitent failures
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Issues like these usually resolve in hour or two. But again, happen with stable regularity.

Scott Moser (smoser)
affects: ubuntu-on-ec2 → ubuntu
tags: added: ec2-images
Changed in ubuntu:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Scott Moser (smoser) wrote :

Hi,
  We're hoping to [very soon] retire the old EC2 mirror infrastructure and replace it with mirrors backed by S3 [1].
  When they go live, the rollover will be transparent to the user, but right now you can selectively choose the S3 mirrors, which should provide better uptime and speed.

  If you're using automated infrastructure, we'd *really* love for you to hit those mirrors instead of the default ones to give them a good test and report problems you have. In the thread at [2], I indicate how you can make use of these mirrors either via cloud-init data or 'sed' editing of /etc/apt/sources.list .

--
[1] http://cloud.ubuntu.com/2012/01/regional-s3-backed-ec2-mirrors-available-for-testing/
[2] http://groups.google.com/group/ec2ubuntu/browse_thread/thread/507f58dd51b6b631

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Thanks for quick reply, Scott. I'll RFC switching to new S3 sources, and unless our service stakeholder raise serious concerns with that, going to try them soon.

Changed in linaro-android-infrastructure:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Ok this was deployed for https://android-build.linaro.org/ . Watching builds, it all goes well.

We'll share issues/comments (if any) of how it goes for Ubuntu EC2 team info here.

Changed in linaro-android-infrastructure:
milestone: none → 2012.02
assignee: nobody → Paul Sokolovsky (pfalcon)
status: Triaged → Fix Committed
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Have been working ok for android-build.linaro.org , and at the end of our monthly cycle, we'd like to close this in linaro-android-infrastructure project. We still hope this we be open in Ubuntu for reference and to allow to communicate any issues/observations. Thanks.

Changed in linaro-android-infrastructure:
status: Fix Committed → Fix Released
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

We started to see following issues with beta S3 mirror:

Get:40 http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/ natty/main zip amd64 3.0-3build1 [265 kB]
Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/pool/main/d/dpkg/libdpkg-perl_1.16.0~ubuntu7.1_all.deb
  Size mismatch
Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/pool/main/d/dpkg/dpkg-dev_1.16.0~ubuntu7.1_all.deb Size mismatch
Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/pool/main/b/build-essential/build-essential_11.5ubuntu1_amd64.deb Size mismatch
Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/pool/main/c/curl/curl_7.21.3-1ubuntu1.5_amd64.deb Size mismatch
Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/pool/main/e/emacsen-common/emacsen-common_1.4.19ubuntu2_all.deb Size mismatch
Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/pool/main/f/fakeroot/fakeroot_1.14.4-1ubuntu1_amd64.deb Size mismatch
Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/pool/main/libe/liberror-perl/liberror-perl_0.17-1_all.deb 416 Requested Range Not Satisfiable
Fetched 37.7 MB in 3min 36s (174 kB/s)
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?

Based on our logs, this happened 9 time since we started to use new mirror (i.e., 9 different ec2 instances had this issue during their initial setup) (names of packages are different in each case of course).

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

We worked that around by adding retries for apt-get operations with exponential backoff and eventual abort after 6 retries (64s wait last time). Of 51 instances which were launched since that fix was deployed, 18 had retries kick in. And 10 still failed in the end (i.e. after 6 retries). However, all failures were in "apt-get update", not "apt-get install". The actual failures were like:

W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/natty-updates/main/binary-amd64/Packages 403 Forbidden

In other words, we came back to same issue which we had with original EC2 mirror - weird 403 errors. However, I'm still getting 403 on that URL now, so maybe not all of those errors are really random, maybe there's indeed permission problems, or 403 is misreported for absent files or something.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

This latest round of issues is tracked in our bug lp:941784, which may have more info.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Here's good AWS forum thread from 2008: https://forums.aws.amazon.com/thread.jspa?threadID=21514 . According to it, "403 Forbidden" in AWS-speak means "there may be unusually long propagation delay".

Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

The "416" errors are caused by Bug 798023.

I did find a minor bug in the flipping code that activates the S3 meta-data. It should shake out by later tonight. Can you let me know what the results of the next nights tests are?

Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

I found another bug in APT that appears to explain the hashsum/size mismatches. See Bug 948461

tags: added: mirrors
Revision history for this message
Scott Moser (smoser) wrote :

Hi Paul,
  Some more debugging info is in bug 948461, but it looks like, for the moment if you set apt to use 'Acquire::http::Pipeline-Depth=0', then many of your errors may go away.
  Could you please try in your builds to do (prior to even apt-get update):
$ echo "Acquire::http::Pipeline-Depth 0;" | sudo tee /etc/apt/apt.conf.d/99no-pipelining

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Sorry for delays with responses. The story on our side was that we implemented auto-retries for failed apt-get invocations in our scripts with exponential back-off on 2012-03-06, and that "fixed" it for us, at least I never saw slaves failed to launch. Soon after that we were swamped into other urgent issue, and I didn't even have time to analyze the logs to see what helped most - auto-retries or fixes you did to the mirror.

I'm back to it now, analyzing logs, and will follow up with more info soon. Thanks!

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

$ grep -l -a -E '^Failed to fetch.+s3' *
...
2012-03-06-slave-i-63954a07.log
2012-03-06-slave-i-7d12c119.log
2012-03-06-slave-i-e1c51a85.log
2012-03-07-slave-i-31c66355.log
2012-03-07-slave-i-fbd9029f.log
2012-03-08-slave-i-5961c23d.log
2012-03-08-slave-i-5f40ee3b.log
2012-03-08-slave-i-c97cdfad.log
2012-03-08-slave-i-e59e3281.log
2012-03-09-slave-i-69ec5a0d.log
2012-03-09-slave-i-9bdf69ff.log
2012-03-09-slave-i-b3a81cd7.log
2012-03-10-slave-i-bdb705d9.log
2012-03-10-slave-i-fd1ba899.log
2012-03-11-slave-i-0f59e16b.log
2012-03-12-slave-i-23058647.log
2012-03-12-slave-i-97ed6cf3.log
2012-03-12-slave-i-bb60e0df.log
2012-03-12-slave-i-e79b1b83.log
2012-03-13-slave-i-7d109819.log
2012-03-13-slave-i-ad5cd4c9.log
2012-03-14-slave-i-23ca5947.log
2012-03-14-slave-i-6931a10d.log
2012-03-14-slave-i-8d0a9ae9.log
2012-03-14-slave-i-ebe1728f.log
2012-03-14-slave-i-fb63fe9f.log
2012-03-15-slave-i-fb23b89f.log
2012-03-16-slave-i-e9d3b18d.log
2012-03-17-slave-i-d5b9d2b1.log
2012-03-18-slave-i-c1a3d3a5.log
2012-03-19-slave-i-1f90ef7b.log
2012-03-19-slave-i-f3dea597.log
2012-03-19-slave-i-fbb3d09f.log
2012-03-20-slave-i-bd195fd9.log
2012-03-20-slave-i-bf195fdb.log
2012-03-21-slave-i-1f8cc67b.log
2012-03-21-slave-i-2b5c174f.log
2012-03-21-slave-i-41b6f925.log
2012-03-21-slave-i-795b101d.log
2012-03-21-slave-i-957b37f1.log
2012-03-23-slave-i-2694cd42.log
2012-03-24-slave-i-d8b196bc.log
2012-03-25-slave-i-e2082a86.log
2012-03-26-slave-i-5852663c.log
2012-03-26-slave-i-78e4ce1c.log
2012-03-27-slave-i-70e4d414.log
2012-03-27-slave-i-88cdfbec.log
2012-03-28-slave-i-ce221aaa.log
2012-03-29-slave-i-3ce4e458.log
2012-03-29-slave-i-52919e36.log
2012-03-29-slave-i-5c232238.log
2012-04-02-slave-i-7fff1c18.log
2012-04-03-slave-i-25ff1742.log
2012-04-03-slave-i-271bf040.log
2012-04-03-slave-i-69bb4e0e.log
2012-04-04-slave-i-0f32c068.log
2012-04-04-slave-i-a5a153c2.log

So yes, we're still having it almost everyday, it's just auto-retrying hides it from our eyes.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

echo "Acquire::http::Pipeline-Depth 0;" | sudo tee /etc/apt/apt.conf.d/99no-pipelining - was added to our EC2 instance init scripts, I'll be watching how it goes.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Previous note is about applying to android-build.linaro.org. Now also applied to ci.linaro.org, where have slave apt-get issues, albeit errors we get there are different %).

Changed in linaro-android-infrastructure:
status: Fix Released → In Progress
milestone: 2012.02 → 2012.04
Revision history for this message
Scott Moser (smoser) wrote : Re: [Bug 932088] Re: Ubuntu EC2 package mirror intermitent failures

On Wed, 4 Apr 2012, Paul Sokolovsky wrote:

> $ grep -l -a -E '^Failed to fetch.+s3' *
> ...
> 2012-03-06-slave-i-63954a07.log
> 2012-04-02-slave-i-7fff1c18.log
<snip>
> 2012-04-04-slave-i-0f32c068.log
> 2012-04-04-slave-i-a5a153c2.log
>
> So yes, we're still having it almost everyday, it's just auto-retrying hides it from our eyes.

I'd really appreciate it if you could put the apt-pipelining change in
place and remove the update loop.

Ie, you can either use a daily ami (or one of those just released today), or
sudo sh -c 'cat > /etc/apt/apt.conf.d/90cloud-init-pipelining' <<EOF
//Written by cloud-init per 'apt_pipelining'
Acquire::http::Pipeline-Depth "0";
EOF

I really, really, value the automated results you're getting, so if you
could turn off the retries and add that, then yell loudly when it fails,
i'd really appreciate it.

If you have disabled pipeline in apt, and are still seeing errors, please
yell loudly.

Revision history for this message
Scott Moser (smoser) wrote :

On Wed, 4 Apr 2012, Paul Sokolovsky wrote:

> Previous note is about applying to android-build.linaro.org. Now also
> applied to ci.linaro.org, where have slave apt-get issues, albeit errors
> we get there are different %).

Thank you.

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

Our slaves only seem to be failing when accessing http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/natty-updates/ now. ubuntu/dists/natty is fine. Is there a permission problem with that directory?

Revision history for this message
Scott Moser (smoser) wrote :

James, Thanks for the info, we're looking into it.

Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

James, do you have some logs? I am not able to replicate this.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

I personally not sure what happens here, but:

Just opening http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/natty-updates/ in a browser leads to 403 XML. Ok, suppose it's limited to only EC2. From out EC2 machine:

ubuntu@ip-10-243-34-224:~$ wget http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/natty-updates/
--2012-04-06 15:36:06-- http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/natty-updates/
Resolving us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com... 72.21.214.160
Connecting to us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com|72.21.214.160|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2012-04-06 15:36:06 ERROR 403: Forbidden.

But, any url of http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/* leads to 403. So again, I'm not sure this is really a bug vs something we don't know how to use.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

On the other hand:

$ grep -l -a -E '^Failed to fetch.+s3' * | tail -n3
2012-04-03-slave-i-69bb4e0e.log
2012-04-04-slave-i-0f32c068.log
2012-04-04-slave-i-a5a153c2.log

Last line matches the line in comment #15 which I made shortly before turning off pipelining. Which means we didn't have (that specific) error since then!

$ ls -1|tail -n3
2012-04-06-slave-i-e3894484.log
2012-04-06-slave-i-e5894482.log
2012-04-06-slave-i-fd599a9a.log

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Ok, now I eyeballed few logs and tentative conclusion - thank you very much, turning off pipelining seem to have solved our issues! Not even those weird 403's don't pop up after 2012-04-04. If you're interested what it was here's example:

2012-04-04-slave-i-a5a153c2.log:W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/natty-updates/universe/source/Sources 403 Forbidden
2012-04-04-slave-i-a5a153c2.log:W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/natty-updates/main/binary-amd64/Packages 403 Forbidden
2012-04-04-slave-i-a5a153c2.log:W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/natty-updates/universe/binary-amd64/Packages 403 Forbidden

All this applies to android-build.linaro.org so far, I'll check ci.linaro.org state later.

Revision history for this message
Scott Moser (smoser) wrote :

The 403s are in fact weird, but they are explainable. The reason for them is that the 'Sources', and 'Packages' files do not exist in uncompressed form on S3 *or* on the *.archive.ubuntu.com mirrors (see http://us.archive.ubuntu.com/ubuntu/dists/natty-updates/universe/source/ )

The reason you see the 403 is that apt got bad data for the Sources.bz2 (or Sources.gz) and decided to try the 'Sources' as a fallback. It does unfortunately hide the real issue quite well :)

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :
Download full text (4.7 KiB)

We just had a case of 10+ broken builds slave being started up in short time. Typical Jenkins instance init log:

https://ci.linaro.org/jenkins/computer/i-185d097f/log

As you can see, "Acquire::http::Pipeline-Depth 0;" is set, but still there's hashsum mismatch error (but the fact that dozen of slaves was started up in succession may mean there' actually mismatch in the source).

Connecting to ec2-50-17-143-184.compute-1.amazonaws.com on port 22.
Waiting for SSH to come up. Sleeping 5.
Connecting to ec2-50-17-143-184.compute-1.amazonaws.com on port 22.
Waiting for SSH to come up. Sleeping 5.
Connecting to ec2-50-17-143-184.compute-1.amazonaws.com on port 22.
Waiting for SSH to come up. Sleeping 5.
Connecting to ec2-50-17-143-184.compute-1.amazonaws.com on port 22.
Connected via SSH.
Authenticating as ubuntu
Connecting to ec2-50-17-143-184.compute-1.amazonaws.com on port 22.
Connected via SSH.
Executing init script
+ echo+ tee /etc/apt/apt.conf.d/99no-pipelining
 Acquire::http::Pipeline-Depth 0;
Acquire::http::Pipeline-Depth 0;
+ apt-get update
Ign http://security.ubuntu.com precise-security InRelease
Hit http://security.ubuntu.com precise-security Release.gpg
Hit http://security.ubuntu.com precise-security Release
Ign http://us-east-1.ec2.archive.ubuntu.com precise InRelease
Ign http://us-east-1.ec2.archive.ubuntu.com precise-updates InRelease
Get:1 http://us-east-1.ec2.archive.ubuntu.com precise Release.gpg [198 B]
Hit http://us-east-1.ec2.archive.ubuntu.com precise-updates Release.gpg
Get:2 http://us-east-1.ec2.archive.ubuntu.com precise Release [49.6 kB]
Hit http://us-east-1.ec2.archive.ubuntu.com precise-updates Release
Get:3 http://us-east-1.ec2.archive.ubuntu.com precise/main Sources [932 kB]
Get:4 http://us-east-1.ec2.archive.ubuntu.com precise/universe Sources [5,030 kB]
Get:5 http://security.ubuntu.com precise-security/main Sources [14 B]
Get:6 http://security.ubuntu.com precise-security/universe Sources [14 B]
Get:7 http://us-east-1.ec2.archive.ubuntu.com precise/main amd64 Packages [1,272 kB]
Hit http://security.ubuntu.com precise-security/main amd64 Packages
Get:8 http://us-east-1.ec2.archive.ubuntu.com precise/universe amd64 Packages [4,803 kB]
Hit http://security.ubuntu.com precise-security/universe amd64 Packages
Hit http://security.ubuntu.com precise-security/main i386 Packages
Get:9 http://us-east-1.ec2.archive.ubuntu.com precise/main i386 Packages [1,272 kB]
Hit http://security.ubuntu.com precise-security/universe i386 Packages
Get:10 http://us-east-1.ec2.archive.ubuntu.com precise/universe i386 Packages [4,811 kB]
Hit http://security.ubuntu.com precise-security/main TranslationIndex
Get:11 http://us-east-1.ec2.archive.ubuntu.com precise/main TranslationIndex [3,706 B]
Get:12 http://us-east-1.ec2.archive.ubuntu.com precise/universe TranslationIndex [2,922 B]
Get:13 http://us-east-1.ec2.archive.ubuntu.com precise-updates/main Sources [14 B]
Get:14 http://us-east-1.ec2.archive.ubuntu.com precise-updates/universe Sources [14 B]
Hit http://us-east-1.ec2.archive.ubuntu.com precise-updates/main amd64 Packages
Hit http://us-east-1.ec2.archive.ubuntu.com precise-updates/universe amd64 Packages
Hit http://us-east-1.ec2.archive...

Read more...

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Also, if you look that log carefully, you can see following weird thing: we do two "apt-get update"s in row, with sleep 2 inbetween. First update hits *only*

http://security.ubuntu.com
http://archive.ubuntu.com

doesn't hit in-EC2 mirror at all. Second updates starts to hit http://us-east-1.ec2.archive.ubuntu.com

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

That's with ami-b5ea34dc just in case.

Revision history for this message
Scott Moser (smoser) wrote :

Paul, your link to https://ci.linaro.org/jenkins/computer/i-185d097f/log does not work for me, could you attach full log ?

Revision history for this message
Scott Moser (smoser) wrote :

Paul,
  I also just noticed, you're not using the S3 backed mirrors:
     sudo sed -i.dist 's,archive.ubuntu.com,archive.ubuntu.com.s3.amazonaws.com,g' /etc/apt/sources.list

I'm not saying there was not a problem, only that I thought (and hoped) you were using the s3 mirrors. I've just tried right now and have not been able to reproduce against the us-east-1.ec2.archive.ubuntu.com mirrors (we've not done the switch over to S3 yet).

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Yes, those slave log links expire with slaves, that's why I pasted it in . https://bugs.launchpad.net/linaro-android-infrastructure/+bug/932088/comments/27/+download should give the same info as if it was attached.

And yes, this latest issue and log from ci.linaro.org, which is not yet uses S3 mirror (another host, android-build.linaro.org does use it). And that issues subsided as abruptly as it started, so another +1 that it might be some update in progress on the server.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Update on android-build.linaro.org (uses S3):

$ grep -l -a -E '^Failed to fetch.+s3' * | tail -n3
2012-04-03-slave-i-69bb4e0e.log
2012-04-04-slave-i-0f32c068.log
2012-04-04-slave-i-a5a153c2.log

$ grep -l -a "Hash Sum mismatch" * | tail -n3
2012-04-04-slave-i-25629242.log
2012-04-04-slave-i-2d1beb4a.log
2012-04-04-slave-i-a5a153c2.log

$ ls -1|tail -n3
2012-04-23-slave-i-f87a2e9f.log~
2012-04-23-slave-i-fedd9499.log
2012-04-23-slave-i-fedd9499.log~

I.e., with S3 mirror (and pipelining disabled), we're really good.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

But on ci.linaro.org (non-S3 mirror):

$ grep -l -a "Hash Sum mismatch" *|wc
     40 40 1309

So, I'm going to leverage same config to ci.linaro.org and see how it goes (it uses different set of AMIs, so results may not match android-build).

Revision history for this message
Deepti B. Kalakeri (deeptik) wrote :

According to the comments in the bug above we have been able to resolve the 404 Not Found error and HashSum issue by using the following 2 things.
1) sudo sed -i.dist 's,archive.ubuntu.com,archive.ubuntu.com.s3.amazonaws.com,g' /etc/apt/sources.list
2) echo "Acquire::http::Pipeline-Depth 0;" | sudo tee /etc/apt/apt.conf.d/99no-pipelining

I see on ci.linaro.org init for precise instance ami-b5ea34dc has been updated to use all the above.
But, even after applying those changes I still I got the 404 file not found, though not the HashSum mismatch errors this morning continuously.
With this the builds started to be queue in for longer time.
Finally I had to change the ami instance used from beta2 ami-b5ea34dc instance to beta1 ami-2061b349 instance and was able to proceed with the builds.

I believe the mirrors are separately hosted and their availablility are independent of the ec2 instances we use.
In that case how did the error go away when the instance was changed ? Am I missing something.

Also, I see the following http URL in the error slave logs:
W: Failed to fetch http://us-east-1.ec2.us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/precise-updates/universe/binary-i386/Packages 404 Not Found

I guess its malformed, I looked through the sed commands used , they looked fine with me.
sed -i.dist 's,archive.ubuntu.com,us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com,g' /etc/apt/sources.list
sed -i.bk 's,^\(.*://[^.]*.ec2.archive.ubuntu.com\)/,\1.s3.amazonaws.com/,' /etc/apt/sources.list

Any idea on how the above URL appeared ?

These don't appear in the slaves that are booting successfully.

Revision history for this message
Scott Moser (smoser) wrote :

On Tue, 24 Apr 2012, Deepti B. Kalakeri wrote:

> According to the comments in the bug above we have been able to resolve the 404 Not Found error and HashSum issue by using the following 2 things.
> 1) sudo sed -i.dist 's,archive.ubuntu.com,archive.ubuntu.com.s3.amazonaws.com,g' /etc/apt/sources.list
> 2) echo "Acquire::http::Pipeline-Depth 0;" | sudo tee /etc/apt/apt.conf.d/99no-pipelining
>
> I see on ci.linaro.org init for precise instance ami-b5ea34dc has been updated to use all the above.
> But, even after applying those changes I still I got the 404 file not found, though not the HashSum mismatch errors this morning continuously.
> With this the builds started to be queue in for longer time.
> Finally I had to change the ami instance used from beta2 ami-b5ea34dc instance to beta1 ami-2061b349 instance and was able to proceed with the builds.

I can't come up with an explanation for this.
This doesn't really make any sense to me.

I just launched an instance of ami-b5ea34dc, and ran the same 2 commands
above and cannot replicate the issues. After doing so, I also did:
  apt-get dist-upgrade
which downloaded 86M in 142 packages, and then
  apt-get --download-only --assume-yes install ubuntu-desktop^
which downloaded 454M in 1009 packages

> Also, I see the following http URL in the error slave logs:
> W: Failed to fetch http://us-east-1.ec2.us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/precise-updates/universe/binary-i386/Packages 404 Not Found

This is a very confusing error indeed. The fact is that 'Packages', as
opposed to Packages.gz or Packages.bz2 does not really exist anywhere.
The error you're seeing is because the apt did not like the content it
got for Packages.gz or Packages.bz2 (likely both of them) , and fell back
to trying Packages, which resulted in a 404.

Its annoying that apt basically hides the original error.

Revision history for this message
Deepti B. Kalakeri (deeptik) wrote :
Download full text (8.2 KiB)

On Tue, Apr 24, 2012 at 6:30 PM, Scott Moser <email address hidden> wrote:

> On Tue, 24 Apr 2012, Deepti B. Kalakeri wrote:
>
> > According to the comments in the bug above we have been able to resolve
> the 404 Not Found error and HashSum issue by using the following 2 things.
> > 1) sudo sed -i.dist 's,archive.ubuntu.com,
> archive.ubuntu.com.s3.amazonaws.com,g' /etc/apt/sources.list
> > 2) echo "Acquire::http::Pipeline-Depth 0;" | sudo tee
> /etc/apt/apt.conf.d/99no-pipelining
> >
> > I see on ci.linaro.org init for precise instance ami-b5ea34dc has been
> updated to use all the above.
> > But, even after applying those changes I still I got the 404 file not
> found, though not the HashSum mismatch errors this morning continuously.
> > With this the builds started to be queue in for longer time.
> > Finally I had to change the ami instance used from beta2 ami-b5ea34dc
> instance to beta1 ami-2061b349 instance and was able to proceed with the
> builds.
>
> I can't come up with an explanation for this.
> This doesn't really make any sense to me.
>
> I just launched an instance of ami-b5ea34dc, and ran the same 2 commands
> above and cannot replicate the issues. After doing so, I also did:
> apt-get dist-upgrade
> which downloaded 86M in 142 packages, and then
> apt-get --download-only --assume-yes install ubuntu-desktop^
> which downloaded 454M in 1009 packages
>
>
Well I have logs for the slave that used ami instance ami-b5ea34dc and
failed to start with the 404 failure in the morning.
Now the same slaves using ami-b5ea34dc instance are able to start without
the problem.
I am not an EC2 expert so am not sure why that failure occurred when I
tried earlier.
Anyways will keep a watch and inform if I see the problem again.

> Also, I see the following http URL in the error slave logs:
> > W: Failed to fetch
> http://us-east-1.ec2.us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/precise-updates/universe/binary-i386/Packages 404 Not Found
>
> This is a very confusing error indeed. The fact is that 'Packages', as
> opposed to Packages.gz or Packages.bz2 does not really exist anywhere.
> The error you're seeing is because the apt did not like the content it
> got for Packages.gz or Packages.bz2 (likely both of them) , and fell back
> to trying Packages, which resulted in a 404.
>
> Its annoying that apt basically hides the original error.
>
> --
> You received this bug notification because you are subscribed to Linaro
> CI.
> https://bugs.launchpad.net/bugs/932088
>
> Title:
> Ubuntu EC2 package mirror intermitent failures
>
> Status in Linaro Android Infrastructure:
> In Progress
> Status in Linaro Continuous Integration:
> New
> Status in Ubuntu:
> Confirmed
>
> Bug description:
> Not too frequent, but regularly (say, once a month) https://android-
> build.linaro.org/ service, which is Jenkins with EC2 slaves, gets
> failures when initializing EC2 instances, which involves installing
> packages.
>
> The log looks like:
> Connecting to ec2-23-20-20-233.compute-1.amazonaws.com on port 22.
> Waiting for SSH to come up. Sleeping 5.
> Connecting to ec2-23-20-20-233.compute-1.amazonaws.com on port 22.
> Connected via SS...

Read more...

Revision history for this message
Scott Moser (smoser) wrote :

On Tue, 24 Apr 2012, Deepti B. Kalakeri wrote:

> Well I have logs for the slave that used ami instance ami-b5ea34dc and
> failed to start with the 404 failure in the morning.
> Now the same slaves using ami-b5ea34dc instance are able to start without
> the problem.
> I am not an EC2 expert so am not sure why that failure occurred when I
> tried earlier.
> Anyways will keep a watch and inform if I see the problem again.

I did not mean to imply that you had not seen these, only that I could not
reproduce. Do you have a link to the logs you have where it failed?

Revision history for this message
Deepti B. Kalakeri (deeptik) wrote :

Log (i-5adcf83d_slave.log) for the slave using the ami-b5ea34dc instance which failed with the 404 error attached.

Revision history for this message
Scott Moser (smoser) wrote :

Deepti, Ben pointed out, the log you posted has bad apt sources lines:
   http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com
not
   http://us-east-1.ec2.us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com

Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

Deepti...you are using the S3 bucket of:
us-east-1.ec2.us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com

Which should be:
us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com

Note the extra "us-east-1.ec2." in the URL's from the error logs. If you remove that, it should work for you.

Revision history for this message
Deepti B. Kalakeri (deeptik) wrote :
Download full text (6.4 KiB)

On Wed, Apr 25, 2012 at 8:15 PM, Ben Howard <email address hidden>wrote:

> Deepti...you are using the S3 bucket of:
> us-east-1.ec2.us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com
>
> Which should be:
> us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com
>
> Note the extra "us-east-1.ec2." in the URL's from the error logs. If you
> remove that, it should work for you.
>

Right, that is what I pointed in the comment #35.
This appears sometimes(intermittenlty) and when that appears it is obvious
it fails.
Do you know why we should get this malformed sometimes and not always.

> --
> You received this bug notification because you are subscribed to Linaro
> CI.
> https://bugs.launchpad.net/bugs/932088
>
> Title:
> Ubuntu EC2 package mirror intermitent failures
>
> Status in Linaro Android Infrastructure:
> In Progress
> Status in Linaro Continuous Integration:
> New
> Status in Ubuntu:
> Confirmed
>
> Bug description:
> Not too frequent, but regularly (say, once a month) https://android-
> build.linaro.org/ service, which is Jenkins with EC2 slaves, gets
> failures when initializing EC2 instances, which involves installing
> packages.
>
> The log looks like:
> Connecting to ec2-23-20-20-233.compute-1.amazonaws.com on port 22.
> Waiting for SSH to come up. Sleeping 5.
> Connecting to ec2-23-20-20-233.compute-1.amazonaws.com on port 22.
> Connected via SSH.
> Authenticating as ubuntu
> Connecting to ec2-23-20-20-233.compute-1.amazonaws.com on port 22.
> Connected via SSH.
> Executing init script
> + apt-get update
> Ign http://security.ubuntu.com natty-security InRelease
> Get:1 http://security.ubuntu.com natty-security Release.gpg [198 B]
> Get:2 http://security.ubuntu.com natty-security Release [39.8 kB]
> Ign http://us-east-1.ec2.archive.ubuntu.com natty InRelease
> Ign http://us-east-1.ec2.archive.ubuntu.com natty-updates InRelease
> Ign http://us-east-1.ec2.archive.ubuntu.com natty Release.gpg
> Ign http://us-east-1.ec2.archive.ubuntu.com natty-updates Release.gpg
> Ign http://us-east-1.ec2.archive.ubuntu.com natty Release
> Ign http://us-east-1.ec2.archive.ubuntu.com natty-updates Release
> Ign http://us-east-1.ec2.archive.ubuntu.com natty/main amd64
> Packages/DiffIndex
> Ign http://us-east-1.ec2.archive.ubuntu.com natty/universe amd64
> Packages/DiffIndex
> Ign http://us-east-1.ec2.archive.ubuntu.com natty/main TranslationIndex
> Ign http://us-east-1.ec2.archive.ubuntu.com natty/universe
> TranslationIndex
> Ign http://us-east-1.ec2.archive.ubuntu.com natty-updates/main amd64
> Packages/DiffIndex
> Ign http://us-east-1.ec2.archive.ubuntu.com natty-updates/universe amd64
> Packages/DiffIndex
> Ign http://us-east-1.ec2.archive.ubuntu.com natty-updates/main
> TranslationIndex
> Ign http://us-east-1.ec2.archive.ubuntu.com natty-updates/universe
> TranslationIndex
> Err http://us-east-1.ec2.archive.ubuntu.com natty/main Sources
> 403 Forbidden [IP: 10.210.205.172 80]
> Err http://us-east-1.ec2.archive.ubuntu.com natty/universe Sources
> 403 Forbidden [IP: 10.210.205.172 80]
> ...
>
>
> Trying to access a specific file manually shows that there're
> interleaving 404 and 403 responses for it (t...

Read more...

Revision history for this message
Scott Moser (smoser) wrote :

On Wed, 25 Apr 2012, Deepti B. Kalakeri wrote:

> On Wed, Apr 25, 2012 at 8:15 PM, Ben Howard
> <email address hidden>wrote:
>
> > Deepti...you are using the S3 bucket of:
> > us-east-1.ec2.us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com
> >
> > Which should be:
> > us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com
> >
> > Note the extra "us-east-1.ec2." in the URL's from the error logs. If you
> > remove that, it should work for you.
> >
>
> Right, that is what I pointed in the comment #35.
> This appears sometimes(intermittenlty) and when that appears it is obvious
> it fails.
> Do you know why we should get this malformed sometimes and not always.

Well, I can't be sure, but I suspect you're somehow racing with
cloud-init.

Cloud-init is updating /etc/apt/sources.list on boot based on region from
a template.
I suspect you have something that is also editing /etc/apt/sources.list,
and not expecting it to have us-east-1.ec2.archive.kubuntu.com in it.
Then, if that something runs after cloud-init, you get 2 'us-east-1.ec2.'
hunks.

Just a hunch, but *something* is editing /etc/apt/sources.list and doing
this.

David Zinman (dzinman)
Changed in linaro-android-infrastructure:
milestone: 2012.04 → 2012.05
Changed in linaro-ci:
assignee: nobody → Deepti B. Kalakeri (deeptik)
milestone: none → 2012.05
importance: Undecided → High
Revision history for this message
Fathi Boudra (fboudra) wrote :

we hit again 404 Not Found error on ci.linaro.org since yesterday. As mentioned in previous comments, "us-east-1.ec2." is duplicated. afaics, we don't use cloud-init, only sed call manipulates sources.list.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

As of now (well, for couple of hours at least) we're having following errors. These are known to happen from time to time, and dissolve after some time (we usually attribute it to to ongoing package repository update), but this ones seems like taking longer then usual, so figure I'd post about it here.

Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/pool/main/g/gmp/libgmpxx4ldbl_4.3.2+dfsg-1ubuntu3_amd64.deb 403 Forbidden
Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/pool/main/g/gcc-4.5/libstdc++6-4.5-dev_4.5.2-8ubuntu4_amd64.deb 403 Forbidden
Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/pool/main/g/gcc-4.5/g++-4.5_4.5.2-8ubuntu4_amd64.deb 403 Forbidden
Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/pool/main/g/gcc-defaults/g++_4.5.2-1ubuntu3_amd64.deb 403 Forbidden

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Switching back to standard EC2 mirror (from S3 one) fixed that for now.

Revision history for this message
Scott Moser (smoser) wrote :

On Tue, 22 May 2012, Paul Sokolovsky wrote:

> As of now (well, for couple of hours at least) we're having following
> errors. These are known to happen from time to time, and dissolve after
> some time (we usually attribute it to to ongoing package repository
> update), but this ones seems like taking longer then usual, so figure
> I'd post about it here.
>
> Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/pool/main/g/gmp/libgmpxx4ldbl_4.3.2+dfsg-1ubuntu3_amd64.deb 403 Forbidden
> Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/pool/main/g/gcc-4.5/libstdc++6-4.5-dev_4.5.2-8ubuntu4_amd64.deb 403 Forbidden
> Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/pool/main/g/gcc-4.5/g++-4.5_4.5.2-8ubuntu4_amd64.deb 403 Forbidden
> Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/pool/main/g/gcc-defaults/g++_4.5.2-1ubuntu3_amd64.deb 403 Forbidden

I've pinged ben on this. he's looking into it.
If/when you see issues like this, please ping utlemming or smoser in
Freenode IRC immediately.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Thanks Scott!

I've just re-enabled S3 mirror, it works ok now:

Get:23 http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/ natty/main g++-4.5 amd64 4.5.2-8ubuntu4 [6,511 kB]
Get:24 http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/ natty/main g++ amd64 4:4.5.2-1ubuntu3 [1,444 B]

Revision history for this message
Ingo (ingo-jaeckel) wrote :

This exact problem is happening for me for about 8 hours now. Is anybody else seeing this, too?

http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ natty-updates/main [...] 403 Forbidden

I also tried a workaround proposed by Scott Moser from http://markmail.org/message/ihueaulwm7k7swkb#query:+page:1+mid:asgcypbdw7lnrcaj+state:results.

But the problem still occurs. Any ideas?

Thanks,
Ingo

Revision history for this message
Felipe Reyes (freyes) wrote :

This exact problem a few minutes ago and I've been suffering it during the last hour

Revision history for this message
Michael Hope (michaelh1) wrote :

I've been having failures in cbuild starting a natty instance for the last 24 hours. Switching to S3 with no pipelining works around the problem.

Revision history for this message
Deepti B. Kalakeri (deeptik) wrote :

similar errors with me as well since couple of hours

W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/precise-updates/universe/binary-amd64/Packages 404 Not Found

W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/precise-updates/main/binary-i386/Packages 404 Not Found

W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/precise-updates/universe/binary-i386/Packages 404 Not Found

E: Some index files failed to download. They have been ignored, or old ones used instead.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Just for stats, Linaro's https://android-build.linaro.org/jenkins/ didn't have any issues lately (is on S3 mirror w/o pipelining).

Changed in linaro-ci:
milestone: 2012.05 → 2012.06
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

This seems to be stably fixed for linaro-android-infrastructure, closing in it (may still be pertinent for linaro-ci and ubuntu).

Changed in linaro-android-infrastructure:
status: In Progress → Fix Released
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Today we're having 403's on ci.linaro.org again, see attachment for a sample log.

Revision history for this message
Deepti B. Kalakeri (deeptik) wrote :
Download full text (7.2 KiB)

On Wed, Jun 6, 2012 at 12:17 PM, Paul Sokolovsky
<email address hidden>wrote:

> Today we're having 403's on ci.linaro.org again, see attachment for a
> sample log.
>
>

** Attachment added: "i-18e33761.log"
>
> https://bugs.launchpad.net/linaro-android-infrastructure/+bug/932088/+attachment/3177116/+files/i-18e33761.log
>

Right, I am looking into it.
Between, you mention in the comment #53 you have disabled pipelining, but
a-b* slave log shows something like

+ + echo Acquire::http::Pipeline-Depth 0;^M
tee /etc/apt/apt.conf.d/99no-pipelining^M
Acquire::http::Pipeline-Depth 0;^M

Is this something different than what you mentioned about pipelining ?
Also, am trying to access the s3 mirror as below instead of just
us-east-1.ec2.archive.ubuntu.com and I still getting 403 error.

$wget
http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/precise-updates/universe/binary-i386/Packages
--2012-06-06 07:11:27--
http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/precise-updates/universe/binary-i386/Packages
Resolving us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com... 72.21.211.168
Connecting to us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com|72.21.211.168|:80...
connected.
HTTP request sent, awaiting response... 403 Forbidden
2012-06-06 07:11:27 ERROR 403: Forbidden.

so, what are the exact changes on a-b* that we have that fixes this problem
?

>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/932088
>
> Title:
> Ubuntu EC2 package mirror intermitent failures
>
> Status in Linaro Android Infrastructure:
> Fix Released
> Status in Linaro Continuous Integration:
> New
> Status in Ubuntu:
> Confirmed
>
> Bug description:
> Not too frequent, but regularly (say, once a month) https://android-
> build.linaro.org/ service, which is Jenkins with EC2 slaves, gets
> failures when initializing EC2 instances, which involves installing
> packages.
>
> The log looks like:
> Connecting to ec2-23-20-20-233.compute-1.amazonaws.com on port 22.
> Waiting for SSH to come up. Sleeping 5.
> Connecting to ec2-23-20-20-233.compute-1.amazonaws.com on port 22.
> Connected via SSH.
> Authenticating as ubuntu
> Connecting to ec2-23-20-20-233.compute-1.amazonaws.com on port 22.
> Connected via SSH.
> Executing init script
> + apt-get update
> Ign http://security.ubuntu.com natty-security InRelease
> Get:1 http://security.ubuntu.com natty-security Release.gpg [198 B]
> Get:2 http://security.ubuntu.com natty-security Release [39.8 kB]
> Ign http://us-east-1.ec2.archive.ubuntu.com natty InRelease
> Ign http://us-east-1.ec2.archive.ubuntu.com natty-updates InRelease
> Ign http://us-east-1.ec2.archive.ubuntu.com natty Release.gpg
> Ign http://us-east-1.ec2.archive.ubuntu.com natty-updates Release.gpg
> Ign http://us-east-1.ec2.archive.ubuntu.com natty Release
> Ign http://us-east-1.ec2.archive.ubuntu.com natty-updates Release
> Ign http://us-east-1.ec2.archive.ubuntu.com natty/main amd64
> Packages/DiffIndex
> Ign http://us-east-1.ec2.archive.ubuntu.com natty/universe amd64
> Packages/DiffIndex
> Ign http://us-east-1.ec2.archive.ubuntu.com natty/mai...

Read more...

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

> + + echo Acquire::http::Pipeline-Depth 0;^M
> tee /etc/apt/apt.conf.d/99no-pipelining^M
> Acquire::http::Pipeline-Depth 0;^M

Well, this command exactly disables pipelining.

> so, what are the exact changes on a-b* that we have that fixes this problem ?

The best way probably will be to compare slave init script on a-b and ci.*

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

> so, what are the exact changes on a-b* that we have that fixes this problem ?

And it "fixed" it for Ubuntu version we use on android-build (Natty). They may be different/additional issues with OS versions we use on ci.*. That's why I think that we should just switch to custom AMI as a sustainable solution to this problem.

Changed in linaro-ci:
status: New → Confirmed
Revision history for this message
Deepti B. Kalakeri (deeptik) wrote :

As suggested in the comment #53 have disabled pipeline in apt, and switched to s3 mirror and I see the following error :

W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/precise/main/source/Sources 403 Forbidden

W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/precise/universe/source/Sources 403 Forbidden

W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/precise/main/binary-amd64/Packages 403 Forbidden

W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/precise/universe/binary-amd64/Packages 403 Forbidden

W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/precise/main/binary-i386/Packages 403 Forbidden

W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/precise/universe/binary-i386/Packages 403 Forbidden

W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/precise-updates/main/source/Sources 403 Forbidden

W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/precise-updates/universe/source/Sources 403 Forbidden

W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/precise-updates/main/binary-amd64/Packages 403 Forbidden

W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/precise-updates/universe/binary-amd64/Packages 403 Forbidden

W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/precise-updates/main/binary-i386/Packages 403 Forbidden

W: Failed to fetch http://us-east-1.ec2.archive.ubuntu.com.s3.amazonaws.com/ubuntu/dists/precise-updates/universe/binary-i386/Packages 403 Forbidden

E: Some index files failed to download. They have been ignored, or old ones used instead.

Although the slaves were failing to start up with using us-east-1.ec2.archive.ubuntu.com this morning, they are able to start fine with main archive us-east-1.ec2.archive.ubuntu.com "now".

FYI. am using the ami-3c994355 precise instance to bring up the slaves.
Any other suggestions why this could be happening with the main archive in the morning and as to why it fails now with the s3 mirrors ?

Revision history for this message
Deepti B. Kalakeri (deeptik) wrote :

With the pipelining disabled, using the main archive for packages seems to have fixed the issue.
Have not seen packages installation failure because of the archive problems and the ec2 slaves deploy are quite stable now.
Closing this bug now, reopen it if it re appears.

Thanks!!!
Deepti.

Changed in linaro-ci:
status: Confirmed → Fix Committed
Changed in linaro-ci:
status: Fix Committed → Fix Released
Mathew Hodson (mhodson)
no longer affects: ubuntu
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.