Siesta

New distributions for projector application

Registered by Alberto Garcia on 2016-11-04

Further optimization of projector application by using special orbital distributions.

Blueprint information

Status:: Not started

Approver:: None

Priority:: Undefined

Drafter:: Alberto Garcia

Direction:: Needs approval

Assignee:: None

Definition:: New

Series goal:: None

Implementation:: Unknown

Milestone target:: None

Related branches

Related bugs

Sprints

Whiteboard

The main problem is that the loop over KB projectors is "global", and a substantial amount of work, proportional to the size of the system, has to be done by all MPI processes.

The initial steps in nlefsm optimization (blueprint projector-optimization) have reduced the cpu weight of that loop by:

- Pre-computation of the atoms really involved in KB overlaps with the orbitals handled by the process.
- Early exit from the loop after simpler tests
- Other minor optimiizations

Further progress now should target the data distribution in parallel.

Due to the use of a block-cyclic distribution, the orbitals handled by a given process are scattered all over the volume of the system, so a large number of KBs overlap with them. A different distribution based on orbital proximity should be used to reduce the load, by making the number of KB projectors needed decrease as the number of MPI processes increase. There is already code (domain_decom) for setting up a domain-decomposition using METIS (written by Rogeli Grima of BSC), but it has not
been exploited for this yet.

Note that the Scalapack solver would still require a re-distribution to block-cyclic form, but this is a minor issue. A larger issue is that there is a lower limit to the size of the set of KB projectors needed by a process. Even if the orbitals directly handled by a process are restricted to a single atom, several
atoms (dozens) will have KBs that will overlap with them and with those linked to them in H. So with the current algorithm we will hit another wall of inefficiency sooner or later, depending on the size of system. We should look then for other ideas.

<NICK: Perhaps we may test this with increasing block-sizes and strictly sorted atoms. To test these things we do these different tasks:
1A. Take a geometry and strictly sort the atoms
1B. Change the blocksize and see the effect in nlesfm
2A. Take a geometry in any sorted configuration and generate an initial DM file
2B. Take the initial DM and feed it to Util/SpPivot which enables one to generate METIS compatible input for the atomic sparsity pattern.
2C. Sort the geometry according to METIS
2D. Change the blocksize and see the effect in nlesfm

The above two methods should uncover which strategy may be good.
NICK>

(?)

Work Items

This blueprint contains Public information

Everyone can see this information.

Subscribers

No subscribers.