Serialization of engine output

Registered by Lars Butler

1) Currently, the OpenQuake engine is designed to output calculation results in the form of NRML XML. In some cases, SVGs and GeoTIFFs are generated as well. Output artifacts include:
- Hazard curves
- Hazard maps
- GMFs (Ground Motion Fields)
- Loss curves
- Loss maps

To start with, we need a clean separation between the calculation code and output generation code.

2) We want to be able to choose where the calculation result information is stored; the OpenQuake DB should be the default choice. (This could be a command line option, like --ouput=db or --output=xml.)

3) We want to store the calculation results in the database (and use them as input for further processing when it makes sense); producing XML output will be an additional step

Blueprint information

Status:
Complete
Approver:
John Tarter
Priority:
Medium
Drafter:
Lars Butler
Direction:
Approved
Assignee:
None
Definition:
Approved
Series goal:
Accepted for 0.4
Implementation:
Implemented
Milestone target:
milestone icon 0.4.1
Started by
John Tarter
Completed by
John Tarter

Whiteboard

The final aim is not clear to me: what needs to be serialized to the database
and what will still be stored in the kvs, in the light of the fact that some
outputs are inputs of subsequent steps in the calculation?

This is a very rough sketch of how I understand oq works:

When running bin/openquake:

    config.gem, other_input.gem are parsed and stored in kvs as a "job"

Inside workers:

    jobs are executed and output goes to kvs and xml files

    if the calculation is multi-step (e.g. risk) the input for the following
    step is taken from the kvs

When using the web GUI (for the limited case of hazard maps, the only one
currently implemented):

    config.gem, other_input.gem are parsed and stored in kvs as a "job" AND partially inside db as OqJob

Inside workers:

    jobs are executed with param SERIALIZE_MAPS_TO_DB=True

    output goes to Output, HazardMapData tables in the database (Output has a reference to ObJob)

It was decided to always store the ouptut artifacts (harard curves, ...) in the database and remove them from the KVS; besides avoiding duplicated data, it's a step towards organizing the computation as a pipeline of independent steps.

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.