Parallelisation of the LSMGenGeo geometry construction library

Registered by Dion Weatherley

The current implementation of LSMGenGeo is serial but considerably more flexible than the built-in geometry construction subroutines of ESyS-Particle. Either shared memory (OpenMP) or distributed memory (MPI) parallelisation of LSMGenGeo would greatly increase the size of models that can be constructed.

Blueprint information

Status:
Started
Approver:
Dion Weatherley
Priority:
Medium
Drafter:
Dion Weatherley
Direction:
Approved
Assignee:
SteffenAbe
Definition:
Discussion
Series goal:
None
Implementation:
Needs Infrastructure
Milestone target:
None
Started by
Dion Weatherley

Related branches

Sprints

Whiteboard

Bumped priority from low to medium because for some applications the geometry generation time is starting to become an issue.
I'd suggest to focus on a shared memory parallelisation, for two reasons:
1.) It is most likely going to be _much_ less complicated than a distributed memory (i.e. MPI) approach, in particular a shared memory parallelisation can largely be done incrementally, and
2.) with current computer architectures we have have at least 4-6 cores in most desktop PCs, 8+ cores in workstations, and those numbers are rising. So there is a fair bit of potential in a shared memory parallelisation.

However, I'm not convinced that OpenMP will be the way to go. As far as I understand it, the OpenMP programming model doesn't go much beyond vectorization / loop parallelisation which might not fit the algorithms in gengeo too well.
After looking into the issue in more detail I believe that even something like TBB might be overkill - at least initially. Doing some very rough profiling confirmed that, not surprisingly, the vast majority of the time (>98% in my test cases) is spent in generatePacking.
So the first target in an incremental parallelization would be InsertGenerator3D::fillIn(...). Looking at this function it appears that the key change needed would be to make MNTable3D::insertChecked(...) thread-safe.

Steffen

I agree that we should target generatePacking for parallelisation and that threads are a good way to go. I would be shy of using libraries for this that are too esoteric, to avoid dependency issues. Something like pthreads would ease portability IMO.

I've been wondering whether a type of subdomain decomposition would be an easy parallelisation strategy. Each thread does generatePacking (or fillIn) for its own subdomain, avoiding the grid cells in the overlap region with neighbouring subdomains. Once all threads are finished packing, the "master" thread fills in the gaps along the subdomain boundaries with another global call to fillIn.

Dion

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.