Comment 6 for bug 559230

Revision history for this message
C de-Avillez (hggdh2) wrote :

I tested it again thie evening, with Dustin monitoring. We again used lucid-amd64-topo2, and based the installs on the daily server/UEC images (releases.ubuntu.com is not accessible from tamarind, so I could not use Beta2).

Installation was uneventful.

I then ran the config_single.yaml test. No problems starting instances, but still the script (or even I, manually) could not ssh into them, failing with a timeout.

ran, just for the sake of it (I do not know what is, or is not, blocked by the firewall(s)) a traceroute agaisnt one of the instances, from cepedak. It reached marula (the CC), and then starred all.

I then logged in Marula, and ssh-ed to an instance I had manually started. I *could* reach it (but failed, correctly, on public key -- I had not added a new key for this run, and the ones used by uec_test.py had already been revoked).

This is the log of the IRC chat between Dustin and myself:

2010-04-13 18:25:32 hggdh kirkland: nodes registered, running a single-instance test now
2010-04-13 18:33:02 hggdh kirkland: test running, log is being written to ~/uec-testing-scripts/resutls/single*
2010-04-13 18:33:09 hggdh kirkland: on cempedak
2010-04-13 18:33:20 kirkland hggdh: cool, and you can ssh in?
2010-04-13 18:35:08 hggdh kirkland: negative
2010-04-13 18:35:19 kirkland hggdh: cannot ssh in
2010-04-13 18:35:25 hggdh kirkland: ssh fails on timeout
2010-04-13 18:35:31 hggdh really sounds like routing
2010-04-13 18:36:18 kirkland hggdh: interesting
2010-04-13 18:36:25 kirkland hggdh: okay, put the log somewhere for me to check out
2010-04-13 18:38:27 hggdh kirkland: k. I just ran one instance by hand, and then tried to ssh into it -- fails with a timeout
2010-04-13 18:39:25 kirkland hggdh: okay, that's easy to reproduce
2010-04-13 18:39:27 kirkland hggdh: log?
2010-04-13 18:42:29 hggdh kirkland: people.c.c/~cerdea/single_test.log.2010-04-13_193218
2010-04-13 18:46:15 kirkland hggdh: rsync -aP people.canonical.com:~cerdea/single_test.log.2010-04-13_193218 .
2010-04-13 18:46:20 kirkland hggdh: file not found
2010-04-13 18:47:04 kirkland hggdh: found it, public_html
2010-04-13 18:47:27 hggdh heh. one wants it on public_html, another on the root ;-)
2010-04-13 18:49:35 kirkland hggdh: ls -alF users/admin/uectest-k0.priv
2010-04-13 18:50:07 kirkland hggdh: and cat that file, make sure it matches -----BEGIN RSA PRIVATE KEY-----
2010-04-13 18:50:33 kirkland hggdh: is that instance still running?
2010-04-13 18:50:43 kirkland hggdh: can you telnet to its port 22 ?
2010-04-13 18:51:03 hggdh kirkland: yes, the instance is still running
2010-04-13 18:52:00 hggdh kirkland: the priv key seems kosher
2010-04-13 18:52:27 kirkland hggdh: and telnet ?
2010-04-13 18:53:50 hggdh kirkland: timeout. Also, a traceroute (FWIW) reaches marula (the CC) and stops there
2010-04-13 18:54:07 kirkland hggdh: oh, interesting
2010-04-13 18:54:22 kirkland hggdh: that's got to be it
2010-04-13 18:54:25 hggdh kirkland: let me try to ssh from marula
2010-04-13 18:54:38 kirkland hggdh: yeah
2010-04-13 18:54:43 kirkland hggdh: scp the priv key over
2010-04-13 18:54:47 kirkland hggdh: and try from there
2010-04-13 18:55:15 hggdh kirkland: first test -- reachability -- successful
2010-04-13 18:55:21 hggdh will move the priv key there now
2010-04-13 18:55:21 kirkland hggdh: ack
2010-04-13 19:00:03 kirkland hggdh: and?
2010-04-13 19:00:13 hggdh kirkland: getting permission denied (pub key)
2010-04-13 19:00:30 hggdh kirkland: but the important piece is that I am *reaching* the instance
2010-04-13 19:00:34 kirkland hggdh: hrm, odd
2010-04-13 19:00:38 kirkland hggdh: agreed on that point
2010-04-13 19:00:49 kirkland hggdh: and you're doing ssh -i ./whatever.priv ubuntu@ip ?
2010-04-13 19:00:58 kirkland hggdh: and whatever.priv is perm'd 600
2010-04-13 19:01:17 hggdh kirkland: yes indeed, and will check again
2010-04-13 19:01:26 hggdh but on wrong permission ssh would bail out
2010-04-13 19:03:41 hggdh kirkland: and the full command is ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i ./uectest-k0.priv ubuntu@10.55.55.100
2010-04-13 19:04:07 hggdh although sort of overworked, I admit
2010-04-13 19:04:24 kirkland hggdh: hmm, okay
2010-04-13 19:04:35 kirkland hggdh: it may be that the guest is having trouble getting out
2010-04-13 19:04:48 kirkland hggdh: or at least to have the key injected
2010-04-13 19:04:58 kirkland hggdh: okay, add your traceroute findings to that bug
2010
-04-13 19:05:11 kirkland hggdh: and email mathias (cc me) the link to that log
2010-04-13 19:05:33 kirkland hggdh: i'm reassured that this appears to be a networking issue, but we'll need to get to the bottom of it
2010-04-13 19:05:38 kirkland hggdh: i gotta run for the night
2010-04-13 19:05:41 kirkland hggdh: thanks dude!
2010-04-13 19:05:55 hggdh kirkland: will do, and g'night