Serialization of engine output
1) Currently, the OpenQuake engine is designed to output calculation results in the form of NRML XML. In some cases, SVGs and GeoTIFFs are generated as well. Output artifacts include:
- Hazard curves
- Hazard maps
- GMFs (Ground Motion Fields)
- Loss curves
- Loss maps
To start with, we need a clean separation between the calculation code and output generation code.
2) We want to be able to choose where the calculation result information is stored; the OpenQuake DB should be the default choice. (This could be a command line option, like --ouput=db or --output=xml.)
3) We want to store the calculation results in the database (and use them as input for further processing when it makes sense); producing XML output will be an additional step
Blueprint information
- Status:
- Complete
- Approver:
- John Tarter
- Priority:
- Medium
- Drafter:
- Lars Butler
- Direction:
- Approved
- Assignee:
- None
- Definition:
- Approved
- Series goal:
- Accepted for 0.4
- Implementation:
- Implemented
- Milestone target:
- 0.4.1
- Started by
- John Tarter
- Completed by
- John Tarter
Related branches
Related bugs
Bug #797602: Serialize Hazard Curves to the OQ DB | Fix Released |
Bug #797703: Serialize Hazard Maps to the OQ DB | Fix Released |
Bug #797704: Serialize Loss Curves to the OQ DB | Fix Released |
Bug #797708: Serialize Loss Maps to the OQ DB | Fix Released |
Bug #797728: Serialize GMFs to the OQ DB | Fix Released |
Bug #797761: Option to specify engine output destination | Fix Released |
Bug #797765: Clean up bin/openquake | Fix Released |
Bug #804332: Use the OQ DB as the input for the risk calculations | Fix Released |
Bug #809215: All OQ results must be stored in the postgres database | Fix Released |
Bug #809301: Remove hazard curve/GMF data from redis | Fix Released |
Bug #809410: serialize loss metadata | Fix Released |
Bug #809780: DB serialization slow | Fix Released |
Sprints
Whiteboard
The final aim is not clear to me: what needs to be serialized to the database
and what will still be stored in the kvs, in the light of the fact that some
outputs are inputs of subsequent steps in the calculation?
This is a very rough sketch of how I understand oq works:
When running bin/openquake:
config.gem, other_input.gem are parsed and stored in kvs as a "job"
Inside workers:
jobs are executed and output goes to kvs and xml files
if the calculation is multi-step (e.g. risk) the input for the following
step is taken from the kvs
When using the web GUI (for the limited case of hazard maps, the only one
currently implemented):
config.gem, other_input.gem are parsed and stored in kvs as a "job" AND partially inside db as OqJob
Inside workers:
jobs are executed with param SERIALIZE_
output goes to Output, HazardMapData tables in the database (Output has a reference to ObJob)
It was decided to always store the ouptut artifacts (harard curves, ...) in the database and remove them from the KVS; besides avoiding duplicated data, it's a step towards organizing the computation as a pipeline of independent steps.