Main page slow to load with many nodes

Bug #1066775 reported by John A Meinel
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Raphaël Badin
1.2
Fix Released
Critical
Julian Edwards
maas (Ubuntu)
Fix Released
High
Unassigned
Quantal
Fix Released
High
Unassigned
Raring
Fix Released
High
Unassigned

Bug Description

Just going to "http://<server/MAAS/" seems to try to load some sort of information for all Nodes, and then as it gets the data it changes the number displayed.

Once you have lots of nodes (say 8000 or so), that page becomes very slow to update, and hangs at 0 for a long time.

Tags: scaling ui

Related branches

John A Meinel (jameinel)
Changed in maas:
importance: Medium → Low
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Raising to high since we're supposed to be handling a few thousand nodes in the 12.10 release.

How long is a long time, out of interest?

tags: added: ui
Changed in maas:
importance: Low → High
Revision history for this message
John A Meinel (jameinel) wrote :

Offhand I don't have a machine with "only" 8000 nodes.

I have an EC2 instance that currently has 72,000. The main page loads the original view in about 5s. Afterwards, it has sat with a spinner for 1m30s, and is finally at the point to display '0 nodes in this MAAS".

At the 2.5min mark, it still says 0 nodes, and has a spinner.

At the 5 minute mark, it still has a spinner and 0 nodes listed.

It is possible that other things are broken (like txlongpoll) in this particular configuration.

From memory, I would say is that it was taking minutes to show me 758 nodes available out of the 8000 that I had added to the system.

After 10min or so, Firefox asked me if I wanted to stop the script, because it seemed broken.

To test it yourself, you can do:

bin/maas shell
from maasserver.testing.factory import factory
from maasserver.models import NodeGroup
ng = NodeGroup.objects.ensure_master()
for i in range(8000):
 node = factory.make_node(mac=True, set_hostname=True)

And that should just create 8000 nodes and you can see how long it takes to load '/MAAS/' on that machine.

Revision history for this message
Raphaël Badin (rvb) wrote :

This is mainly because the call to NodesHandler.list() is very expensive. 3 SQL queries are issued for each node.

The fact that we return the fields 'macaddress_set' and 'tag_names' is responsible for this (http://paste.ubuntu.com/1285463/).

In a single node record (http://paste.ubuntu.com/1285467/), the macaddress_set bit is expensive to compute: it has to fetch the MAC address and then it fetches (again) the node itself to compute the 'resource_uri' bit. This is obviously fairly stupid but the problem is that we can't use "nodes = nodes.prefetch_related('macaddress_set__node')" because internally, piston uses queryset.iterator (https://docs.djangoproject.com/en/dev/ref/models/querysets/#iterator) to iterate over the query and this discards any optimization made with prefetch_related(). I've tried adding the prefetch_related() call and changed the source of piston to use queryset.all() instead of queryset.iterator() and it works fine.

For the tag_name stuff, it's the same problem (the fact that piston uses query.iterator() forbids the usage of prefetch_related) but on top of that it seems that values_list() can't work with prefetch_related either. If I change piston to use queryset.all() instead of queryset.iterator() and change tag_name to not use values_list (http://paste.ubuntu.com/1285476/), then the number of queries is constant.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

John, thanks for clarifying.

Raphaël, Ok, let's get that fix in!

Changed in maas:
importance: High → Critical
Changed in maas:
milestone: none → 12.10-stabilization
Revision history for this message
Raphaël Badin (rvb) wrote :

> Raphaël, Ok, let's get that fix in!

Not really easy because, as I say in my comment, piston uses queryset.iterator() internally when serializing the objects and this forbids the usage of prefetch_related().

Maybe there is a way to trick piston into not using iterator()… but it's really not obvious how.

Revision history for this message
John A Meinel (jameinel) wrote :

If the result is a queryset, can't you just call list(queryset) and it will evaluate early, and then it will iterator the result?

I suppose it depends if you actually want to stream the data out to the caller, but if you are okay with using all() then it seems fine to just using list().

Revision history for this message
John A Meinel (jameinel) wrote :

To be clear, change NodesHandler.list to finish with:

- return nodes.order_by('id')
+ return list(nodes.order_by('id'))

I think that will give us whatever prefetching we want, at the expense of having to load all the nodes into memory. But that is what we pay for the prefetch.

Revision history for this message
Raphaël Badin (rvb) wrote :

Well, John's suggestion is not working because the problem comes from the fact that we have fetch (and piston uses iterator() for this) all the related MAC addresses for all the nodes.

Revision history for this message
Raphaël Badin (rvb) wrote :

I think I have a plan which involves creating a tiny specialized emitter. I'll try that now but I think I'll work.

Revision history for this message
Raphaël Badin (rvb) wrote :

The structure of the python code inside piston actually prevents me from subclassing and overriding the methods I'd need to override to fix this properly (I would need to override in _related() which is defined *inside* construct() (see https://bitbucket.org/jespern/django-piston/src/882d38485abc/piston/emitters.py#cl-136).

I'm looking into alternative ways to fix this.

Raphaël Badin (rvb)
Changed in maas:
assignee: nobody → Raphaël Badin (rvb)
status: Triaged → In Progress
Raphaël Badin (rvb)
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
status: Fix Released → Fix Committed
Changed in maas:
milestone: 12.10-stabilization → none
Changed in maas:
status: Fix Committed → Fix Released
James Page (james-page)
Changed in maas (Ubuntu Quantal):
status: New → Triaged
Changed in maas (Ubuntu Raring):
status: New → Triaged
Changed in maas (Ubuntu Quantal):
importance: Undecided → High
Changed in maas (Ubuntu Raring):
importance: Undecided → High
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.6 KiB)

This bug was fixed in the package maas - 1.2+bzr1349+dfsg-0ubuntu1

---------------
maas (1.2+bzr1349+dfsg-0ubuntu1) raring; urgency=low

  * New upstream bugfix release. Fixes:
    - The DNS configuration is not created if maas-dns is installed after
      the DNS config has been set up (LP: #1085865).
    - IPMI detection ends up with power_address of 0.0.0.0 (LP: #1064224)
    - Main page slow to load with many nodes (LP: #1066775)
    - maas-cluster-controller doesn't have images for
      provisioning (LP: #1068843)
    - Filestorage is unique to each appserver instance (LP: #1069734)
    - import_pxe_files does not include quantal (LP: #1069850)
    - maas-cli nodes new incomplete documentation (LP: #1070522)
    - DNS forward zone ends up with nonsensical entries (LP: #1070765)
    - The hostname of a node can still be changed once the node is in
      use. (LP: #1070774)
    - The zone name (attached to a cluster controller) can still be changed
      when it contains in-use nodes and DNS is managed. (LP: #1070775)
    - Duplicated prefix in the url used by the CLI (LP: #1075597)
    - Not importing Quantal boot images (LP: #1077180)
    - Nodes are deployed with wrong domain name. (LP: #1078744)
    - src/maasserver/api.py calls request.data.getlist with a 'default'
      parameter. That parameter is not supported by Django 1.3. (LP: #1080673)
    - API calls that return a node leak private data (LP: #1034318)
    - MAAS hostnames should be 5 easily disambiguated characters (LP: #1058998)
    - URI in API description wrong when accessing machine via alternative
      interface. (LP: #1059645)
    - Oops when renaming nodegroup w/o interface (LP: #1077075)
    - Error in log when using 'Start node' button: MAASAPINotFound: No user
      data available for this node. (LP: #1069603)

  [ Raphaël Badin ]
  * debian/maas-dns.postinst: Call write_dns_config (LP: #1085865).
  * debian/maas-dns.postinst: fix permissions and group ownership of
    file /etc/bind/maas/named.conf.rndc.maas. (LP: #1066935)

  [ Julian Edwards ]
  * debian/maas-region-controller.install: Remove installation of maas-gc; it
    is no longer required as upstream no longer stores files in the filesystem.
    (LP: #1069734)
  * debian/maas-cluster-controller.postinst: Ensure that /etc/maas/pserv.yaml
    is updated when reconfiguring. (LP: #1081212)

  [ Andres Rodriguez ]
  * debian/control:
    - maas-cluster-controller Conflicts with tftpd-hpa (LP: #1076028)
    - maas-dns: Conflicts with dnsmasq
    - Drop Dependency on rabbitmq-server for maas-cluster-controller.
      (LP: #1072744)
    - Add conflicts/replaces for maas-region-controller to
      maas-cluster-controller.
  * debian/maas-cluster-controller.config: If URL has been detected, add
    /MAAS if it doesn't contain it. This helps upgrades from versions where
    DEFAULT_MAAS_URL didn't use /MAAS.
  * Install maas-import-pxe-files and related files with
    maas-cluster-controller, as well as configure tgtd, as
    maas-region-controller no longer stores images. Thanks to Jeroen
    Vermuelen.

  [ Gavin Panella ]
  * debian/extras/99-maas: squashfs image download is no longer needed.
  * debian/maas-clu...

Read more...

Changed in maas (Ubuntu Raring):
status: Triaged → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package maas - 1.2+bzr1373+dfsg-0ubuntu1

---------------
maas (1.2+bzr1373+dfsg-0ubuntu1) quantal-proposed; urgency=low

  * MAAS Stable Release Update (LP: #1109283):
    This SRU brings a new upstream release of MAAS that removes
    the usage of a cobbler code copy, 'maas-provision' as well as
    several bug fixes. Exception has been granted by the Technical
    Board to proceed. More information can be found in:
    https://lists.ubuntu.com/archives/ubuntu-devel-announce/2013-February/001012.html

  [ Andres Rodriguez ]
  * debian/control:
    - Change Conflicts/Replaces for Breaks/Replaces.
    - Conflicts on tftpd-hpa and dnsmasq.
    - Do not pre-depends, but Depends on ${misc:Depends} for 'maas'.

  [ Steve Langasek ]
  * postinst scripts are never called with 'reconfigure' as the script
    argument. Remove references to this (mythical) invocation.
  * always call 'set -e' from maintainer scripts instead of passing 'sh -e'
    as the interpreter, so that scripts will behave correctly when run via
    'sh -x'.
  * invoke-rc.d is never allowed to not exist - simplify scripts (and make
    them better policy-compliant) by invoking unconditionally. (The only
    possible exception is in the postrm, where it's *theoretically* possible
    for invoke-rc.d to be missing if the user has completely stripped
    down their system; that's a fairly unreasonable corner case, but we
    might as well be correct if it ever happens.)
  * db_get+db_set is a no-op; don't call db_set to push back a value we just
    got from db_get.
  * Omit superfluous calls to 'exit 0' at the end of each script.
  * Remove maas-cluster-controller prerm script, which called debconf for no
    reason.
  * Don't invoke debconf in the postrm script either, debhelper already does
    this for us.
  * Other miscellaneous maintainer script fixes
  * debian/maas-common.postinst: call adduser and addgroup unconditionally;
    the tools are already designed to DTRT, we don't need to check for the
    user/group existence before calling them nor should we worry about
    calling them only once on first install.
  * debian/maas-common.postrm: delete the maas group, not just the user,
    as the comment in the code implies we should do.
 -- Andres Rodriguez <email address hidden> Thu, 07 Mar 2013 14:22:35 -0500

Changed in maas (Ubuntu Quantal):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.