IO revamp

Registered by Nick Papior on 2016-03-01

All IO routines of data that should/could be read in again should be revamped. This should be performed using a "version" ID for the files.

Blueprint information

Status:
Not started
Approver:
None
Priority:
Medium
Drafter:
Nick Papior
Direction:
Needs approval
Assignee:
Nick Papior
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

This blue-print should emphasize the need for new IO files.

Importantly the current IO files are somewhat limited by their "strict" order.
I suggest that all files contain a header ID which denotes the file-version.

This will have several advantages:

- If one later decides that the default file-format is poorly constructed creating a compatibility layer between different version is _extremely_ easy, in all its simplicity.
- This will also accommodate the possibility of reverting between two IO possibilities for exa-scale computing where single IO is extremely inefficient.

An explicit suggestion for a fortran record based IO file can be this:
1. <specifier>, <content>, <file-version>
 - <specifier> 1 == Sparse matrix (<sparse>, <extra-dim>), 2 == Sparse matrix (<extra-dim>, <sparse>), 3 == dense matrix, 4 == grid, etc.
 - <content> 1 == Hamiltonian, 2 == Overlap, 3 == DM, 4 == Rho, 5 == dRho, etc.
   Of course <specifier> and <content> are highly linked.
   From <content> and <specifier> the data is "fixed" and uniquely defined.
 - <file-version> is the storage format of the data.
 - This _has_ to be an integer. Using an integer is easier to compare and it allows 2^31 different versions which I think is more than enough.
 - The other option is using strings which I think is sub-optimal. Then the length of the string is also an issue and for different codes writing a string has to be performed equivalently (UPPER/lower case). For consistency and ease I think using an integer is the best way. Generally writing characters is bad practice
2. <n-dimensions>
  <n-dimensions> specify number of dimensions of each element
3. <1-dim-size>, <2-dim-size>, ... <<n-dimension>-size>
4. Actual data is stored here.
  In general the way the data stored is dependent on the <file-version>
5. Additional data dependent on <file-version>

Having this basic infrastructure of IO routines also enables us to more easily extract them to a consistent library.

===== (7 Aug, 2016, NRP)
Come to think about this, I think our first approach should be to get NetCDF4 up and running in full parallel. Tools may easily be created to convert to binary DM, ... formats.

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.