Parallelisation of the LSMGenGeo geometry construction library
The current implementation of LSMGenGeo is serial but considerably more flexible than the built-in geometry construction subroutines of ESyS-Particle. Either shared memory (OpenMP) or distributed memory (MPI) parallelisation of LSMGenGeo would greatly increase the size of models that can be constructed.
Blueprint information
- Status:
- Started
- Approver:
- Dion Weatherley
- Priority:
- Medium
- Drafter:
- Dion Weatherley
- Direction:
- Approved
- Assignee:
- SteffenAbe
- Definition:
- Discussion
- Series goal:
- None
- Implementation:
-
Needs Infrastructure
- Milestone target:
- None
- Started by
- Dion Weatherley
- Completed by
Related branches
Related bugs
Sprints
Whiteboard
Bumped priority from low to medium because for some applications the geometry generation time is starting to become an issue.
I'd suggest to focus on a shared memory parallelisation, for two reasons:
1.) It is most likely going to be _much_ less complicated than a distributed memory (i.e. MPI) approach, in particular a shared memory parallelisation can largely be done incrementally, and
2.) with current computer architectures we have have at least 4-6 cores in most desktop PCs, 8+ cores in workstations, and those numbers are rising. So there is a fair bit of potential in a shared memory parallelisation.
However, I'm not convinced that OpenMP will be the way to go. As far as I understand it, the OpenMP programming model doesn't go much beyond vectorization / loop parallelisation which might not fit the algorithms in gengeo too well.
After looking into the issue in more detail I believe that even something like TBB might be overkill - at least initially. Doing some very rough profiling confirmed that, not surprisingly, the vast majority of the time (>98% in my test cases) is spent in generatePacking.
So the first target in an incremental parallelization would be InsertGenerator
Steffen
I agree that we should target generatePacking for parallelisation and that threads are a good way to go. I would be shy of using libraries for this that are too esoteric, to avoid dependency issues. Something like pthreads would ease portability IMO.
I've been wondering whether a type of subdomain decomposition would be an easy parallelisation strategy. Each thread does generatePacking (or fillIn) for its own subdomain, avoiding the grid cells in the overlap region with neighbouring subdomains. Once all threads are finished packing, the "master" thread fills in the gaps along the subdomain boundaries with another global call to fillIn.
Dion