ESyS-Particle

Parallelisation of the LSMGenGeo geometry construction library

Registered by Dion Weatherley on 2009-02-27

The current implementation of LSMGenGeo is serial but considerably more flexible than the built-in geometry construction subroutines of ESyS-Particle. Either shared memory (OpenMP) or distributed memory (MPI) parallelisation of LSMGenGeo would greatly increase the size of models that can be constructed.

Blueprint information

Status:: Started

Approver:: Dion Weatherley

Priority:: Medium

Drafter:: Dion Weatherley

Direction:: Approved

Assignee:: SteffenAbe

Definition:: Discussion

Series goal:: None

Implementation:: Needs Infrastructure

Milestone target:: None

Started by: Dion Weatherley on 2011-10-05

Related branches

Related bugs

Sprints

Whiteboard

Bumped priority from low to medium because for some applications the geometry generation time is starting to become an issue.
I'd suggest to focus on a shared memory parallelisation, for two reasons:
1.) It is most likely going to be _much_ less complicated than a distributed memory (i.e. MPI) approach, in particular a shared memory parallelisation can largely be done incrementally, and
2.) with current computer architectures we have have at least 4-6 cores in most desktop PCs, 8+ cores in workstations, and those numbers are rising. So there is a fair bit of potential in a shared memory parallelisation.

However, I'm not convinced that OpenMP will be the way to go. As far as I understand it, the OpenMP programming model doesn't go much beyond vectorization / loop parallelisation which might not fit the algorithms in gengeo too well.
After looking into the issue in more detail I believe that even something like TBB might be overkill - at least initially. Doing some very rough profiling confirmed that, not surprisingly, the vast majority of the time (>98% in my test cases) is spent in generatePacking.
So the first target in an incremental parallelization would be InsertGenerator3D::fillIn(...). Looking at this function it appears that the key change needed would be to make MNTable3D::insertChecked(...) thread-safe.

Steffen

I agree that we should target generatePacking for parallelisation and that threads are a good way to go. I would be shy of using libraries for this that are too esoteric, to avoid dependency issues. Something like pthreads would ease portability IMO.

I've been wondering whether a type of subdomain decomposition would be an easy parallelisation strategy. Each thread does generatePacking (or fillIn) for its own subdomain, avoiding the grid cells in the overlap region with neighbouring subdomains. Once all threads are finished packing, the "master" thread fills in the gaps along the subdomain boundaries with another global call to fillIn.

Dion

(?)

Work Items

This blueprint contains Public information

Everyone can see this information.

Subscribers

No subscribers.