Parallel DNS lookups
Several pages in NAV's web interface perform potentially large numbers of DNS reverse lookups in serial; among them IP Device Info and Machine Tracker. This does not scale well, and causes inordinately slow page loads. NAV should be able to performs DNS lookups asynchronously (in parallel). This can either be implemented using the GNU adns library (python-adns) or Twisted's Names library (by driving the reactor manually, since we're in a synchronous environment). ipdevinfo and machinetracker should have access to a simple API which hides the details of selecting and using either a parallel or serial method of making DNS queries.
Blueprint information
- Status:
- Complete
- Approver:
- Morten Brekkevold
- Priority:
- Medium
- Drafter:
- Morten Brekkevold
- Direction:
- Approved
- Assignee:
- Christian Strand Young
- Definition:
- Approved
- Series goal:
- Accepted for 3.10
- Implementation:
- Implemented
- Milestone target:
- 3.10.0
- Started by
- Christian Strand Young
- Completed by
- Morten Brekkevold
Related branches
Related bugs
Sprints
Whiteboard
adns does not support IPv6, reverse lookups work, but that is just luck. The only sulution as I see it is to use this: http://
https:/
Turns out it was more than good enough to use twisted for the async lookups. This works like a charm and scales very well.
-------
The whole thing might fail with twisted 11.0 ref. http://
To use twisted in something that is not a twisted process, is kind of bad. The implementation will also block everything until every deferred is resolved. Starting and stopping the reactor in the same process should not work, and is a bug before 11.0. If the same process is used for multiple requests from the API, the whole thing can crash.
The implementation now with deferredList works and is better than sequential lookups, but the use of twisted is not quite right after a chat with some "gurus".
- Inline callbacks should be used for better readability.
- We could nudge the reactor. Not sure how with the deferredlist.
-------
The API fails when implemented in ipdevinfo. This is because ipdevinfo does a forward-lookup first, then a reverse-lookup. They are called from the same twisted process, and therefore the reactor is asked to start twice in the same process, which is not possible.
A quick solution is to use getaddrinfo() for the forward lookups and asyncdns for the reverse, since reverse is the slowest one.
Sidenote:
A test with both forward and reverse lookups on uninett-
real 0m0.122s
user 0m0.016s
sys 0m0.012s
Pageload timings:
navdev with async
6sec
7sec
7sec
navdev without async
6sec
5sec
7sec
Maybe something else seems to be the issue with ipdevinfo?
-------
The latest code from Christian starts and stops the reactor, which cannot be done multiple times during the same process. This is why the original suggestion was to drive the reactor loop manually using reactor.iterate. Yes, we know it's a dirty hack, but it works. I've refactored the entire asyncdns module and tested it manually and with the Machine Tracker changes and it seems to be working fine now.
So the only thing missing now is an implementation for ipdevinfo (which I'm guessing you've already tried working on, according to the whiteboard)