(Validation) Validate data value fields with regular expressions

Registered by jason.p.pickering on 2010-01-26

Validation rules and/or data integrity rules should be expanded to allow for evaluations of data fields with regular expressions. For instance, one could check to ensure that a data value is actually an integer, with no extraneous white space which has been known to happen. National identity numbers could be validated to ensure they have been entered properly, particularly important for the patient module. URLs and email addresses could be validated to ensure they actually are what they should be. Another use would be the structure of files, such as the organisational unit and data elements. In Zambia, all organisational units begin with the province to which they belong (looks like this was something done in ZA as well). We could assign a regex such as ^(ce|no|ls|no|nw) [\w* ]+$ to allow only values that have one of the prefixes listed in the regex, followed by a single space, followed by at least one word with single spaces. This might be a bit restrictive in practice but the idea is clear. Additionally, indicators and data elements often follow naming conventions. In the absence of relational integrity in the database, regular expressions could be used instead to enforce these conventions.
 Basic rules could be shipped prepopulated as part of the base system, while the user could be allowed to implement their own in the data integrity module.

Blueprint information

Status:
Not started
Approver:
Lars Helge Øverland
Priority:
Undefined
Drafter:
jason.p.pickering
Direction:
Needs approval
Assignee:
None
Definition:
Drafting
Series goal:
Accepted for trunk
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

I am thinking of something like this....

The user would create a regular expression for later assignment to a
database object. The user would select a database table (object) and
field for validation. For instance, lets say we want to validate that
there are no trailing spaces in an organization name.

So, we would create a rule called "Trailing spaces are not allowed"

We would create this rule, and assign a description and a regular
expression to it.

in this case, it would probably be something really simple like '\s+$'

So, i think we need two objects.

1) A persistence object (REGEX) that stores the following fields for a number of regular expressions. Regular expressions would be expected to be in Java regular expression syntax.

a) regexid (PK, integer)
b) name (string)
c) code (string, unique)
d) expression (string)
e) locale (NOT NULL DEFAULT en_GB)
f) description (string)
g) resolution (telling the user how to solve this problem) (string)

2) Object/property Validation style checks- A table (REGEX_OBJECT) to assign regular expressions to sets of properties within an object. The regex would be applied to all properties of a particular object, for instance all "name" properties of the "organisationunit" object. This table would be used in the context of interactive data validation, similar to the existing data integrity checks procedures during data entry, data import, and through ad-hoc data integrity checks.. Regexes could be reused for many different objects (e.g. Names in organisationunit and names in dataelement) (regex_members) by assigning a defined regex from the

a) regexid (int) (NOT NULL, FK reference to REGEX regexid )
b) object (string) (NOT NULL) (reference to a particular object, for instance organisationunit)
c) property (string) (NOT NULL) (reference to a particular property, for instance name in organisationunit)
d) negate (specifies whether the expression should be negated or not) (boolean) (NOT NULL DEFAULT 0)
e) caseinsentive (boolean) (specifies whether the search should be made case insensitive (NOT NULL DEFAULT 0)

The regexid would reference a particular regex in the REGEX object. The "object" field would reference a given object/table (for instance organisationunit). "property" would reference a particular property/field in the table (for instance "name"). "locale" would establish a locale that this regex should be applied to. Implementors could then chose to assign a particular regex to a given locale, or if the field was left blank, the regex would be applied to all locales. "negate" specifies whether the results of the regex should be negated. "caseinsensitive" specifies whether or not the regex should be case sensitive.

A third UI element would be used to allow users to run particular regex checks. A list of possible regex validations would be presented to the user, and the user would tick the regex checks they would like to run. The user would press start or cancel.
Upon pressing start, each regex would be run in turn. Properties that match the regex would be presented to the user as a list. Each list would present the name of the regex, the problem, and the suggested resolution (persisted in the regex) table.

3) Javascript checks- Currently, many of the validation rules are stored statically in JavaScript in the DHIS2 user interface. Instead of statically defining these rules in the JavaScript code, a generic Javascript function would be used to validate user input against a regular expression that has been stored in the database and evaluated by the server. As a more detailed example a snippet of code from /dhis2/dhis2/dhis-2/dhis-web/dhis-web-commons-resources/src/main/webapp/dhis-web-commons/util/validate/additional-methods.js

jQuery.validator
    .addMethod("letterswithbasicspecialchars", function(value, element) {
        return this.optional(element)
        || /^[\w-.,()\/%'\"\s]+$/i.test(value);
    },
    "Please Letters, numbers, spaces or some special chars like .,-,%,(,) only ");

In the context of the current blueprint, a regular expression would be stored in the REGEX object like this..

a) regexid (1)
b) name (Letters with basic special characters)
c) code (letterswithbasicspecialchars)
d) expression (^[\w-.,()\/%'\"\s]+$)
e) locale (en_GB)
f) description ("Allow only letters with some special characters such as dashes, periods, commas, parentheses, etc.")
g) resolution ("Please Letters, numbers, spaces or some special chars like .,-,%,(,) only ")

Javascript functions would make an AJAX call to the server to determine the validity of the expression, or at the point that the data is submitted by the form. The function would accept the following parameters
1) String to be validated
2) locale (from the current UI)
3) code of regular expression (e.g. letterswithbasicspecialcharacters)

The function would return a boolean value to determine whether the input has satisfied the regular expression or not.

 Another approach would be to rely on the REGEX_OBJECT table and construct the function without reference to a particular regular expression. Data would be submitted to the server and would be evaluated against the REGEX_OBJECT table. A boolean value would be returned to the user interface, along with the "resolution" message. This would then allow implementers to dynamically define their own regular expressions for object properties both in the UI and for the purpose of data validation checks through the interface to the REGEX_OBJECT.

3) Data import and export.
Similar to the approach above, system functions would be refactored to make reference to objects/properties that the user will import or export. Each property would be (optionally) evaluated and the user would be informed of the properties that do not satisfy regular expressions that have been assigned to these properties by the REGEX_OBJECT. Data import and export could be optionally allowed, so that the user could import data into the system, and resolve data integrity violations later.

4) Data values.

Validation of certain data data values is required. For instance integers, real numbers, negative integers, positive integers. In this case another object regex_datavalue would be implemented.

a) regexid (NOT NULL, FK reference to REGEX regexid )
b) dataelementid (NOT NULL, FK reference to REGEX dataelementid )
c) locale (NULLABLE with NULL applying to all locales)

For instance, ^(0|-?[1-9]\d*)\.d$ could be used to force data entry of a number with a single decimal point and a single digit, optionally with a sign and would be appropriate for data entry of real numbers.

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.