OpenQuake (deprecated)

Serialization of engine output

Registered by Lars Butler on 2011-06-15

1) Currently, the OpenQuake engine is designed to output calculation results in the form of NRML XML. In some cases, SVGs and GeoTIFFs are generated as well. Output artifacts include:
- Hazard curves
- Hazard maps
- GMFs (Ground Motion Fields)
- Loss curves
- Loss maps

To start with, we need a clean separation between the calculation code and output generation code.

2) We want to be able to choose where the calculation result information is stored; the OpenQuake DB should be the default choice. (This could be a command line option, like --ouput=db or --output=xml.)

3) We want to store the calculation results in the database (and use them as input for further processing when it makes sense); producing XML output will be an additional step

Blueprint information

Status:: Complete

Approver:: John Tarter

Priority:: Medium

Drafter:: Lars Butler

Direction:: Approved

Assignee:: None

Definition:: Approved

Series goal:: Accepted for 0.4

Implementation:: Implemented

Milestone target:: 0.4.1

Started by: John Tarter on 2011-11-18

Completed by: John Tarter on 2011-11-18

Related branches

Related bugs

Bug #797602: Serialize Hazard Curves to the OQ DB	Fix Released
Bug #797703: Serialize Hazard Maps to the OQ DB	Fix Released
Bug #797704: Serialize Loss Curves to the OQ DB	Fix Released
Bug #797708: Serialize Loss Maps to the OQ DB	Fix Released
Bug #797728: Serialize GMFs to the OQ DB	Fix Released
Bug #797761: Option to specify engine output destination	Fix Released
Bug #797765: Clean up bin/openquake	Fix Released
Bug #804332: Use the OQ DB as the input for the risk calculations	Fix Released
Bug #809215: All OQ results must be stored in the postgres database	Fix Released
Bug #809301: Remove hazard curve/GMF data from redis	Fix Released
Bug #809410: serialize loss metadata	Fix Released
Bug #809780: DB serialization slow	Fix Released

Sprints

Whiteboard

The final aim is not clear to me: what needs to be serialized to the database
and what will still be stored in the kvs, in the light of the fact that some
outputs are inputs of subsequent steps in the calculation?

This is a very rough sketch of how I understand oq works:

When running bin/openquake:

config.gem, other_input.gem are parsed and stored in kvs as a "job"

Inside workers:

jobs are executed and output goes to kvs and xml files

if the calculation is multi-step (e.g. risk) the input for the following
step is taken from the kvs

When using the web GUI (for the limited case of hazard maps, the only one
currently implemented):

config.gem, other_input.gem are parsed and stored in kvs as a "job" AND partially inside db as OqJob

Inside workers:

jobs are executed with param SERIALIZE_MAPS_TO_DB=True

output goes to Output, HazardMapData tables in the database (Output has a reference to ObJob)

It was decided to always store the ouptut artifacts (harard curves, ...) in the database and remove them from the KVS; besides avoiding duplicated data, it's a step towards organizing the computation as a pipeline of independent steps.

(?)

Work Items

This blueprint contains Public information

Everyone can see this information.

Subscribers

No subscribers.