Create catalogue processing and homogenisation tools (MTK 2nd workflow)

Registered by Graeme Weatherill

A new set of processing tools is required for the MTK 2nd workflow. In the first workflow it was assumed that the input catalogue had already been prepared correctly and was homogenous (i.e. only one solution for each event, the same magnitude unit etc.). Many MTK users have requested that the MTK includes tools to help prepare the catalogue for this purpose. In practice, for a given region of the world there are often many catalogues that are available. Each catalogue may originate from a different recording agency, or may have been compiled by a different organisation. Consequently, for many events in the catalogue there may be different interpretations of the location (long., lat., depth) of each event. Similarly, there are usually many different estimates of the event magnitude, often recorded in different magnitude scales. Catalogue preparation and homogenisation is usually broken down into the following steps:

1) Identification of duplicates (i.e. the set of solutions that describe the same event)
2) Selection of the preferred solution from the set of solutions for each event
3) Creation of empirical formulae to describe the correlation between different magnitude scales and their relation to the target magnitude scale (moment magnitude, Mw)
4) Calculation of the target magnitude (Mw) for each event by applying the empirical formulae to the recorded magnitude(s) for each event

Blueprint information

Status:
Started
Approver:
John Tarter
Priority:
Medium
Drafter:
Graeme Weatherill
Direction:
Approved
Assignee:
Giuseppe Vallarelli
Definition:
Drafting
Series goal:
None
Implementation:
Started
Milestone target:
milestone icon 0.6.1
Started by
Giuseppe Vallarelli

Whiteboard

Catalogue Homogenisation Tool – Outline

Component 1: Duplicate Finding

If merging catalogues from several different sources the first task is to identify events that the user believes are duplicates. This is done via a simple “fixed windowing” algorithm that is based on a similar principle to that of the declustering procedures. For each event the catalogue is searched within a time and distance (or time, distance and magnitude window). If one or more events are found inside the search windows then the “group” is identified as a potential set of duplicates.

Tasks:
  i) Read in one or more catalogues – need to include IMS1.0, QuakeML format and a slightly refactored csv format
 ii) Merge the catalogues into a single catalogue (array?)
iii) Sort the merged catalogue into chronological order
iv) Apply duplicate finding – assign a unique event ID to each group of duplicates

Component 2: Selection

In order to derive a homogeneous catalogue the user needs to select a “preferred” solution from each group of duplicates. Ideally, this needs to be as flexible as possible to allow the user to make better modelling decisions. There are two possible ways of doing this:

1) For each group, display the events in the group and allow the user to select the preferred event, e.g:
#. GroupID,AGENCY,EventID,year,month,day,hour,minute,second,...
1. 0001,XXX,0001,1900,1,2,3,4,5,6.1.....
2. 0001,YYY,0002,1900,1,2,3,4,5,6.3.....
3. 0001,ZZZ,0003,1900,1,2,3,4,5,6.5.....
Enter number of preferred solution or 0 to over-ride (i.e. no duplicates in this group): 2

This approach has several pros and cons:

+ Allows the user to make nuanced decisions
+ User can possibly identify false positives in the group (i.e. events that are not duplicates being assigned as duplicates)
- More complicated to write the basic UI
- If the catalogue identifies thousands of groups then it is unrealistic to expect the user to persist in manual selection
- Possible loss of transparency and replicability

2) A more automated approach would be to allow the user to specify a selection by hierarchy. For example: if four catalogues (VVV,XXX,YYY,ZZZ) are merged then the user should specify which event to take as priority by default. This could be done inside the config file such as
 Selection_Ranking{
Attribute: Agency,
# Preferred ranking (high to low)
Hierarchy: YYY,XXX,ZZZ,VVV,
Split_if_key_not_found: No}

In this process the hierarchical selection would be applied based on the agency attribute. Here if a solution from Agency YYY is found then that would be the preferred solution. If not, then if a solution by XXX is found then that is preferred, and so on. If the user does not specify all the agencies found in the catalogue then it is possible that groups might exist from which there are no agencies that have been assigned ranks. In these circumstances the user could have two options i) remove the group from the catalogue (i.e. Split_if_key_not_found: No) or simply split the group and retain all the solutions as individual events (i.e. Split_if_key_not_found: Yes).

This may be sufficient at a basic level. At a higher level it would be good to allow the user to implement a little bit more judgement such as adopting different hierarchies for different periods of time (e.g. 1900 – 1963 – hierarchy 1; 1964 – 1990 – hierarchy 2; 1990 – end; hierarchy 3), or even within geographical regions/polygons.

Tasks:

i) Create hierarchy selection algorithm
ii) Define user input methodology (e.g. file type/config settings)
iii) Create user interface version (later modification?)

Component 3: Magnitude Comparison

In most catalogues the magnitude of events can be recorded in different native scales, and hence they need to be homogenised into a target scale (usually Moment-Magnitude). To do this, conversion equations should be applied. Some global conversion equations exist, and these could be hard-coded or user selected (see magnitude homogenisation). Mostly these conversion equations are derived by statistical regressions on pairs of earthquakes that are recorded in both the original and target magnitude.

The regressions between magnitude scales may be done either as a collective entity (i.e. on all events recorded in both native and target scales) or on the basis or agency/catalogue. Future modifications could include temporal variation or spatial variation, but these would need to implement further statistical tests to determine whether the conversion equations are statistically significantly different for each time or spatial window given that they will have fewer events.

Tasks:
i) Build selection tools

ii) Build regression calculators (linear, bi-linear, power-law, chi2)

iii) Plot regressions (?) and/or store regression info (possible file type? ascii, xml?)

Component 4: Magnitude Homogenisation

The conversion of native magnitudes into target magnitudes is another user specified modelling process. In general, the process is a set of IF-THEN arguments e.g.:

IF Native Magnitude is Ms
 IF Ms > 6.0 THEN apply Model 1
 ELSE Apply Model 2
ELSE IF Native Magnitude is mb
 IF AGENCY == XXX THEN apply Model 3
          ELSE IF AGENCY==YYY THEN apply Model 4
 ELSE Apply model 5
ELSE IF
...
ELSE Reject

This formulation can map into a binary-tree or binary heap style of execution, which should be relatively dynamic to allow the user to specify arbitrarily complex models. At the very least, there are some “global” models that could be hard-coded, so it would be helpful to have the possibility to implement these empirical functions into the code quickly. Most functions will usually take a simple form (1 – 2 lines of code, usually of the form y = mx + c ± σ).

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.