New distributions for projector application
Further optimization of projector application by using special orbital distributions.
Blueprint information
- Status:
- Not started
- Approver:
- None
- Priority:
- Undefined
- Drafter:
- Alberto Garcia
- Direction:
- Needs approval
- Assignee:
- None
- Definition:
- New
- Series goal:
- None
- Implementation:
- Unknown
- Milestone target:
- None
- Started by
- Completed by
Related branches
Related bugs
Sprints
Whiteboard
The main problem is that the loop over KB projectors is "global", and a substantial amount of work, proportional to the size of the system, has to be done by all MPI processes.
The initial steps in nlefsm optimization (blueprint projector-
- Pre-computation of the atoms really involved in KB overlaps with the orbitals handled by the process.
- Early exit from the loop after simpler tests
- Other minor optimiizations
Further progress now should target the data distribution in parallel.
Due to the use of a block-cyclic distribution, the orbitals handled by a given process are scattered all over the volume of the system, so a large number of KBs overlap with them. A different distribution based on orbital proximity should be used to reduce the load, by making the number of KB projectors needed decrease as the number of MPI processes increase. There is already code (domain_decom) for setting up a domain-
been exploited for this yet.
Note that the Scalapack solver would still require a re-distribution to block-cyclic form, but this is a minor issue. A larger issue is that there is a lower limit to the size of the set of KB projectors needed by a process. Even if the orbitals directly handled by a process are restricted to a single atom, several
atoms (dozens) will have KBs that will overlap with them and with those linked to them in H. So with the current algorithm we will hit another wall of inefficiency sooner or later, depending on the size of system. We should look then for other ideas.
<NICK: Perhaps we may test this with increasing block-sizes and strictly sorted atoms. To test these things we do these different tasks:
1A. Take a geometry and strictly sort the atoms
1B. Change the blocksize and see the effect in nlesfm
2A. Take a geometry in any sorted configuration and generate an initial DM file
2B. Take the initial DM and feed it to Util/SpPivot which enables one to generate METIS compatible input for the atomic sparsity pattern.
2C. Sort the geometry according to METIS
2D. Change the blocksize and see the effect in nlesfm
The above two methods should uncover which strategy may be good.
NICK>