Implement sensible versioning of congregation & group data
Databases uniquely identify records by assigning a unique ID (normally an autoincremented integer) in an 'id' column in every table. This works fine when you have one database. But, we're dealing with federated data--data in many databases, each of which has its own set of IDs, and normally the RCL database will have no access to non-RCL databases' ID columns. The best way to deal with this situation at present is to use UUIDs - universally unique identifiers, which are basically long randomly generated integers that are thereby highly probable to be unique. But non-RCL databases (feed sources) normally won't provide us UUIDs.
When we update one feed's cache with new data from the feed, in the case where the feed does provide its own set of IDs (though not UUIDs) for each congregation/group (e.g., as a URL parameter like ?id=123), we can record that ID not as the RCL ID, but as the feed-specific ID for that congregation/group, and thereby avoid creating duplicates in that feed's data. This will not prevent duplicates in other cases, though. It also might fail if a feed source unexpectedly rearranges its IDs.
So, in addition to what is described in other blueprints already, we need to decide how to match
a) a congregation/group coming from a feed that doesn't already associate the congregation/group with a RCL UUID with
b) the right RCL congregation/group UUID.
Perhaps do it via similarity heuristics (like difflib http://
We may also need to prioritize different fields - a congregation's name is more likely to remain the same than its street address. Likewise, we can consider prioritizing the beginning & end of some fields differently. E.g., changes after the first two words in the street address are less likely to be significant than changes to the first two words. Changes after the first word of the congregation name are less likely to be significant than changes to the first word.
Blueprint information
- Status:
- Not started
- Approver:
- None
- Priority:
- Essential
- Drafter:
- None
- Direction:
- Approved
- Assignee:
- None
- Definition:
- Drafting
- Series goal:
- Accepted for couchapp-backbone
- Implementation:
- Unknown
- Milestone target:
- 0.3.5
- Started by
- Completed by
Related branches
Related bugs
Sprints
Whiteboard
One good way to do versioning in CouchDB is described at http://
----------
It seems this versioning plan will require the following tables and fields:
feed
- id
- name
- uuid
congregation
- id
- name
- etc...
- uuid
- version_date
- feed_id # foreign key to feed.uuid, identifying from which feed this version came
group
- id
- name
- etc...
- uuid
- version_date
- feed_id # foreign key to feed.uuid, identifying from which feed this version came
When a user creates a new record in the database, the model (or maybe the controller that writes to the model/database) should have a function that applies a Levenshtein similarity comparison to the submitted data, comparing it with existing records, then displays to the user a list of records that might be the same UUID as the congregation or group which the user has in mind, so the user can state that this data is actually for a new congregation/group, or that it is an update to an existing congregation/group.
When a feed is read and the feed's ID structure or data doesn't match existing data, we'll have to create some kind of automated Levenshtein comparison, like what is described above.
The section at http://
Work Items
Dependency tree
* Blueprints in grey have been implemented.