Catalog Homogenisation Tool requirement list

Registered by matley

Here on, we describe which are the main requirements for the first (and probably the second) version of the CHT APIs.

1) The user should be able to import seismic event catalogues in the following formats: ISF Bulletin v.1.0, USGS CSV Format, QuakeML. In the first version ISF Bulletin format should be considered

2) The user should be able to query the catalogues to retrieve the subset of earthquakes containing measures [solutions] recorded in the user-defined "native" and "target" magnitude scales. The user should be able to filter events by different criteria (time, magnitude, position, max error allowed, agency, etc.). See https://bugs.launchpad.net/openquake/+bug/979915 for query examples

3) When importing from multiple sources, it can be difficult to associate measures (coming from different sources) to seismic events.
The user should be able to easily group measures for each distinct seismic event and to select a criteria to solve eventual conflicts. A conflict can occur when two different source catalogs provide two different magnitude value in the same magnitude scale for the same event.
In the first version (utilising the ISF Bulletin format) the grouping should be done by using the source_key, i.e. the key used by the different event catalogues to identify events. Thus, we just rely on the association provided by the source catalog.

In the second version the user should be able to merge catalogues from multiple different sources. In this case there is no source_key to identify all the measures representing the same event. It is therefore necessary to introduce a computational search that will be able to identify, for each event, the group of measures from all the merged catalogues, which represent that event. One possible option is to select a clustering algorithm that considers the time dimension as the clustering criterion (eventually supported by using the source_key field criteria), then using the distance and magnitude dimensions to provide further checks on the events identified a possible "duplicates". The user should be able to select the clustering algorithm (http://docs.scipy.org/doc/scipy/reference/cluster.html) and input the algorithm parameters (or to not choose to cluster at all).

4) The user should be able to select a target and a native magnitude. The user should be able to state how to handle missing uncertainty data for measure in target or native magnitude. In the first version the strategy will be only to discard measures with missing data. In the second version the user should be able to specify a proper per-session default value for each missing column value.

5) For each event the measure in the native and target magnitude are chosen if available. When multiple measure of interest are found for the same event, the user can specify some criteria to select the preferred measure. In the first version we provide some simple criteria (e.g. random pickup). In the second version, the user should be able to specify a ranking of agencies for both native and target magnitude in order to select the measures for each event.

6) The user should be able to apply a regression algorithm (in the first version, we will use ODR http://docs.scipy.org/doc/scipy/reference/generated/scipy.odr.ODR.html) to define an empirical model relating the "native" magnitude to the "target" magnitude (eventually preprocessed in 2,3,4,5). The user should be able to specify the model function between a linear model, a polynomial model and a piecewise one, and the odr params as well.

7) The user should be able to plot the results of all regression models and data sets (both scatter plot and the output function given at point 6 or 8).

8) The user should be able to apply empirical models to then convert, for each event, the magnitudes given in the native magnitude scale to that given in the target scale (for events in which the native magnitude is reported, but not the target magnitude). The user should have the following options:
i) The ability to modify or replace an empirical model defined by the current regression tools, with an alternative empirical or physical model to use for conversion - in cases where the user believes the empirical model to be insufficient or incorrect.
ii) As events may be reported in with measures several "native" scales, the user should be able to specify an order of preference for selecting which scale (and corresponding model) to use for converting to the target magnitude
iii) As there will likely exist uncertainty in both the observed magnitude of the native scale and in the empirical model used to convert the native magnitude scale to the target magnitude scale. The two uncertainties should be correctly merged to give an uncertainty on the output (target) magnitude.
\sigma_{Target} = \sqrt{ \sigma_{Native}^{2} + \left( {\frac{\partial f}{\partial M}} )^{2}} \right) \sigma_{Model}^{2}}
where

f(M) is the empirical model relating the native magnitude to the target magnitude
\sigma_{Native} is the measurement error (in standard deviations) of the native magnitude
\sigma_{Model} is the variance (scatter) of the empirical model relating the native magnitude to the target magnitude
\sigma_{Target} is the output error on the target magnitude

Blueprint information

Status:
Started
Approver:
Graeme Weatherill
Priority:
Medium
Drafter:
matley
Direction:
Approved
Assignee:
Giuseppe Vallarelli
Definition:
Approved
Series goal:
None
Implementation:
Good progress
Milestone target:
None
Started by
Giuseppe Vallarelli

Related branches

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.