Diamond: develop a tool for finding entries in the schema that are unused

Registered by Patrick Farrell

We currently have the diamond_validation test, which checks that every flml file is valid with respect to the current schema. I would also like another complementary tool: given a schema and a set of flml files, which entries in the schema are never used?
The reason for this is that parsing the schema is the most expensive part of diamond's operation. As cruft accumulates in the schema, diamond only gets slower and slower. However, if we can automatically identify schema cruft, then it can be stripped and everyone will be happy.

Blueprint information

Status:
Started
Approver:
Patrick Farrell
Priority:
Undefined
Drafter:
Patrick Farrell
Direction:
Needs approval
Assignee:
Fraser Waters
Definition:
New
Series goal:
None
Implementation:
Beta Available
Milestone target:
None
Started by
Fraser Waters

Related branches

Sprints

Whiteboard

[fwaters]
So this is almost working, but running up against a problem. Its slow, like reallllly slow and I don't think there's a lot we can do about it. The schemas are huge and iterating over the entire thing 3 times (once to find the fullset, once to populate the treeview and once to color it correctly) is going to take a while. Hell iterating once takes a while.

[pefarrell]
Slow means -- an hour? a day? a week?

[fwaters]
About half an hour currently. That's only comparing with one flml file but the actual flml files don't take as long to process so doing more shouldn't significantly slow the process.

[pefarrell]
I don't know exactly what approach you're taking, but it sounds like there's something wrong fundamentally with the approach if it takes that long. Why would registering the used parts of the schema take any longer than reading the flml?

[fwaters]
Reading the flml only has to read the schema as far as it matches the flml. Reading the whole schema means reading in over 2000 (for flml) elements. Just building up the set of paths from that takes about 3 minutes. And we have to iterate over it 3 times which takes about 10 minutes in total. So I guess half an hour was an overestimate but it's not fast.

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.