OpenQuake (deprecated)

Catalog Homogenisation Tool requirement list

Registered by matley on 2012-04-30

Here on, we describe which are the main requirements for the first (and probably the second) version of the CHT APIs.

1) The user should be able to import seismic event catalogues in the following formats: ISF Bulletin v.1.0, USGS CSV Format, QuakeML. In the first version ISF Bulletin format should be considered

2) The user should be able to query the catalogues to retrieve the subset of earthquakes containing measures [solutions] recorded in the user-defined "native" and "target" magnitude scales. The user should be able to filter events by different criteria (time, magnitude, position, max error allowed, agency, etc.). See https://bugs.launchpad.net/openquake/+bug/979915 for query examples

3) When importing from multiple sources, it can be difficult to associate measures (coming from different sources) to seismic events.
The user should be able to easily group measures for each distinct seismic event and to select a criteria to solve eventual conflicts. A conflict can occur when two different source catalogs provide two different magnitude value in the same magnitude scale for the same event.
In the first version (utilising the ISF Bulletin format) the grouping should be done by using the source_key, i.e. the key used by the different event catalogues to identify events. Thus, we just rely on the association provided by the source catalog.

In the second version the user should be able to merge catalogues from multiple different sources. In this case there is no source_key to identify all the measures representing the same event. It is therefore necessary to introduce a computational search that will be able to identify, for each event, the group of measures from all the merged catalogues, which represent that event. One possible option is to select a clustering algorithm that considers the time dimension as the clustering criterion (eventually supported by using the source_key field criteria), then using the distance and magnitude dimensions to provide further checks on the events identified a possible "duplicates". The user should be able to select the clustering algorithm (http://docs.scipy.org/doc/scipy/reference/cluster.html) and input the algorithm parameters (or to not choose to cluster at all).

4) The user should be able to select a target and a native magnitude. The user should be able to state how to handle missing uncertainty data for measure in target or native magnitude. In the first version the strategy will be only to discard measures with missing data. In the second version the user should be able to specify a proper per-session default value for each missing column value.

5) For each event the measure in the native and target magnitude are chosen if available. When multiple measure of interest are found for the same event, the user can specify some criteria to select the preferred measure. In the first version we provide some simple criteria (e.g. random pickup). In the second version, the user should be able to specify a ranking of agencies for both native and target magnitude in order to select the measures for each event.

6) The user should be able to apply a regression algorithm (in the first version, we will use ODR http://docs.scipy.org/doc/scipy/reference/generated/scipy.odr.ODR.html) to define an empirical model relating the "native" magnitude to the "target" magnitude (eventually preprocessed in 2,3,4,5). The user should be able to specify the model function between a linear model, a polynomial model and a piecewise one, and the odr params as well.

7) The user should be able to plot the results of all regression models and data sets (both scatter plot and the output function given at point 6 or 8).

8) The user should be able to apply empirical models to then convert, for each event, the magnitudes given in the native magnitude scale to that given in the target scale (for events in which the native magnitude is reported, but not the target magnitude). The user should have the following options:
i) The ability to modify or replace an empirical model defined by the current regression tools, with an alternative empirical or physical model to use for conversion - in cases where the user believes the empirical model to be insufficient or incorrect.
ii) As events may be reported in with measures several "native" scales, the user should be able to specify an order of preference for selecting which scale (and corresponding model) to use for converting to the target magnitude
iii) As there will likely exist uncertainty in both the observed magnitude of the native scale and in the empirical model used to convert the native magnitude scale to the target magnitude scale. The two uncertainties should be correctly merged to give an uncertainty on the output (target) magnitude.
\sigma_{Target} = \sqrt{ \sigma_{Native}^{2} + \left( {\frac{\partial f}{\partial M}} )^{2}} \right) \sigma_{Model}^{2}}
where

f(M) is the empirical model relating the native magnitude to the target magnitude
\sigma_{Native} is the measurement error (in standard deviations) of the native magnitude
\sigma_{Model} is the variance (scatter) of the empirical model relating the native magnitude to the target magnitude
\sigma_{Target} is the output error on the target magnitude

Blueprint information

Status:: Started

Approver:: Graeme Weatherill

Priority:: Medium

Drafter:: matley

Direction:: Approved

Assignee:: Giuseppe Vallarelli

Definition:: Approved

Series goal:: None

Implementation:: Good progress

Milestone target:: None

Started by: Giuseppe Vallarelli on 2012-06-06

Related branches

Related bugs

Bug #989482: eqcatalogue-tool docs skeleton and project description	Fix Released
Bug #992046: CHT: Implement ISF bulletin format import tool	Fix Released
Bug #993140: CHT: Implement IASPEI CSV format import tool	Fix Released
Bug #993142: CHT: Implement QuakeML format import tool	Won't Fix
Bug #993165: CHT: API for filtering events	Fix Released
Bug #993219: CHT: Set native and target magnitude and handle missing measure uncertainty info	Fix Released
Bug #993232: CHT: API for grouping event solutions (measures) by events: simple strategy	Fix Released
Bug #993329: CHT: API for grouping event solutions (measures) by event: clustering strategy	Fix Released
Bug #993351: CHT: Event solution (measure) selection	Fix Released
Bug #993365: CHT: Magnitude measures regression analysis	Fix Released
Bug #993370: CHT: Plot and write output results to file	Fix Released
Bug #995936: CHT: Event solution (measure) selection by agency/magnitude ranking	Fix Released
Bug #996472: CHT: Allow custom empirical magnitude scale relations	Fix Released
Bug #1007404: CHT: Integration tests	Fix Released
Bug #1007417: CHT: QA Tests	Fix Released
Bug #1007418: CHT: Package oq-eqcatalogue-tool	Fix Released
Bug #1007425: CHT: Test catalogue merging	Fix Released
Bug #1007432: CHT: Duplicate finder on merged catalogues	Invalid
Bug #1007434: CHT: Catalogue Harmonizer. Basic version	Fix Released
Bug #1007441: CHT: Magnitude conversion to a target scale	Invalid
Bug #1007443: CHT: Export measures to CSV	Opinion
Bug #1011617: CHT: Reduce the time needed to run the full test suite	Fix Released
Bug #1023866: CHT: Duplicate finding by sequential grouping: time window, distance window, magnitude	Fix Released
Bug #1023868: CHT. Add an API for combining measure filters by logical OR	Fix Released
Bug #1023871: CHT: Allow the user to specify criteria (agency ranking, spatial, time, etc.) to select the proper conversion formula in the harmonization phase	Fix Released
Bug #1023872: CHT: Extend harmonization phase to allow composition of conversion formula	Fix Released
Bug #1046296: CHT: Add a flag for trivial conversion	Fix Released
Bug #1046306: CHT: Harmonisation, apply propagation of uncertainty	Fix Released
Bug #1050961: cht: save the agency related to each origin	Won't Fix
Bug #1051934: CHT: remove simple csv reader	Invalid
Bug #1053303: cht: do not create a database in the current directory when running tests	Won't Fix
Bug #1081934: CHT Adding new filters	Won't Fix
Bug #1081935: CHT: Selective export to CSV of the catalogue	Fix Released
Bug #1081936: CHT Catalogue Database get summary informations	Fix Released
Bug #1081939: CHT: Export of the harmonizer output	Fix Released
Bug #1081942: CHT: Add logging	Fix Released

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information

Everyone can see this information.

Subscribers

No subscribers.