Multidimensional datamart and report tables

Registered by Ola Hodne Titlestad on 2009-03-26

The current datamart and report table features are limited in regards to dimensional data and needs improvement. At the moment aggregation is only possible for a specific data element + categoryoptioncombo or for the root data element itself. Basically only the dimensions that you use for data registration directly. E.g. you can get "Malaria cases" or "Malaria cases"+"Male"+"<5", but not subtotals (or categoryoptions) like "Malaria cases"+"Male" or "Malaria cases"+">5".

There are roughly two types of datamart tables needed, 1) report tables with custom categoryoptions and categoryoptioncombos as columns, and 2) a set of resource tables with categories and data element group sets as columns (and their options and groups as values) linked to data elements. 1) is needed for canned reports e.g. in BIRT or JasperReports building on the report table + report functionality, and 2) is for more flexible data analysis in pivot tables or other OLAP based tools.

1) Report tables with catagoryoptions and categoryoptioncombos as columns
These are tabular reports with Data Element names listed on rows of the first column and then have a custom list of columns with categoryoptions and categoryoptioncombos, similar to data entry forms, but more flexible in terms of use of subtotals and totals.

TABLE 1A
https://spreadsheets.google.com/ccc?key=0Ar-LT8UHNjOadE1mZlFsX1psSVI3bVoybHBIUFFoS2c&hl=en

NB Table 1 is now implemented.

A slightly more advanced table could use a data element group set to group data elements. E.g. a group set "Disease category" with the two groups "Communicable" and "Non-communicable" could be used to group the diseases in the data element name column.

TABLE 1B
https://spreadsheets.google.com/ccc?key=0Ar-LT8UHNjOadGJnMzdiMjJrak5hTElLNlpiYlpLQXc&hl=en

2) Use of multidimensional data in external tools (e.g. Excel pivots or some OLAP tool)

(JUMP TO EXAMPLES BELOW AND READ LATER IF NECESSARY)

To make sure the process of getting data into an external pivot table becomes as simple as possible there is a need to provide database queries or views that give the user exactly what is needed without any other need for database queries. In the future we would like excel pivots or perhaps OLAP fact tables to be generated automatically from the UI, but for now we can continue the database view approach which works quite well, and has been there for many years of 1.3 and 1.4 as well.

Here the focus is not on being able to design a fixed look as in the examples above, but to provide data together with many dimensions (categories and/or data element group sets) so that the users can slice and dice as they would like.

aggregateddatavalue and aggregatedindicatorvalue are the two source tables (datamart) for such multidimensional pivot views, and one of these two will be joined with various resource tables to provide more information to the data. Currently we have the orgunitgroupsetexclusivestructure table that provides additional metadata about the orgunits, such as their type and owenership, or any other exclusive orgunit groupset. Another resource table is the orgunitstructure table that provides the full hierarchy up from any given orgunit, which is needed to be able to build an orgunit hierarchy in any OLAP tool.

To make use of both the data element categories and the data element groups there is a need for more resource tables that can be joined with the datamart tables. Catgories are more complicated as they are directly linked to the datavalue through the categoryoptioncombo which is part of a datavalue's primary key. Data element groups are additional metadata tagged onto the data element and are therefore simpler to use, anyway the two need to be treated differently here.

So I propose two new types of resource tables,
1) a data element group set exclusive table for all group sets and groups, and
2) one dataset dimension table for every multidimensional dataset in the database.

Here are some examples:

Dataset dimension resource table for EPI dataset:

https://spreadsheets.google.com/ccc?key=0Ar-LT8UHNjOadFE3SVcxTFVqeVNpQXZpdzhaWWlMcnc&hl=en

Dataset dimension resource table for morbidity dataset:
https://spreadsheets.google.com/ccc?key=0Ar-LT8UHNjOadGxiaUlwV2VYbE5weUdmMlNBQk1IdHc&hl=en

Dataelement groupset exclusive structure:
https://spreadsheets.google.com/ccc?key=0Ar-LT8UHNjOadHRVYjBxdkxrLVFEbTc5d3h2TTVpT2c&hl=en

Here are some examples of possible pivot views building on the tables above:

Pivot view for 1 datatset with dimensions (EPI):
https://spreadsheets.google.com/ccc?key=0Ar-LT8UHNjOadGFZOTBTQTBsamtRYndJRjNuWkpmUGc&hl=en

Pivot view for 1 dataset with dimensions (Morbidity):
https://spreadsheets.google.com/ccc?key=0Ar-LT8UHNjOadFdmYkVzbUFaUHFhRGU0bllCaXRMR1E&hl=en

Pivot view for all data elements with group set dimensions:
https://spreadsheets.google.com/ccc?key=0Ar-LT8UHNjOadDdEY2FjX2RZTk45WmxNU0FMeDZuU2c&hl=en

(For this table I had to use a GROUP BY statement in the view to sum up all categoryoptioncombos for the dataelement. This could be simplified by providing a datamart table (aggregateddatavalue) without categoryoptioncombos and with the totals for data elements only.

Blueprint information

Status:
Complete
Approver:
None
Priority:
High
Drafter:
Ola Hodne Titlestad
Direction:
Needs approval
Assignee:
Lars Helge Øverland
Definition:
Approved
Series goal:
Accepted for trunk
Implementation:
Implemented
Milestone target:
milestone icon 2.0.5
Started by
Lars Helge Øverland on 2009-11-05
Completed by
Lars Helge Øverland on 2010-07-25

Related branches

Sprints

Whiteboard

Updated the whole text on Dec 2.

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.