Spud

Diamond: develop a tool for finding entries in the schema that are unused

Registered by Patrick Farrell on 2011-05-18

We currently have the diamond_validation test, which checks that every flml file is valid with respect to the current schema. I would also like another complementary tool: given a schema and a set of flml files, which entries in the schema are never used?
The reason for this is that parsing the schema is the most expensive part of diamond's operation. As cruft accumulates in the schema, diamond only gets slower and slower. However, if we can automatically identify schema cruft, then it can be stripped and everyone will be happy.

Blueprint information

Status:: Started

Approver:: Patrick Farrell

Priority:: Undefined

Drafter:: Patrick Farrell

Direction:: Needs approval

Assignee:: Fraser Waters

Definition:: New

Series goal:: None

Implementation:: Beta Available

Milestone target:: None

Started by: Fraser Waters on 2011-09-05

Related branches

lp:~spud/spud/unused-schemas

Related bugs

Sprints

Whiteboard

[fwaters]
So this is almost working, but running up against a problem. Its slow, like reallllly slow and I don't think there's a lot we can do about it. The schemas are huge and iterating over the entire thing 3 times (once to find the fullset, once to populate the treeview and once to color it correctly) is going to take a while. Hell iterating once takes a while.

[pefarrell]
Slow means -- an hour? a day? a week?

[fwaters]
About half an hour currently. That's only comparing with one flml file but the actual flml files don't take as long to process so doing more shouldn't significantly slow the process.

[pefarrell]
I don't know exactly what approach you're taking, but it sounds like there's something wrong fundamentally with the approach if it takes that long. Why would registering the used parts of the schema take any longer than reading the flml?

[fwaters]
Reading the flml only has to read the schema as far as it matches the flml. Reading the whole schema means reading in over 2000 (for flml) elements. Just building up the set of paths from that takes about 3 minutes. And we have to iterate over it 3 times which takes about 10 minutes in total. So I guess half an hour was an overestimate but it's not fast.

(?)

Work Items

This blueprint contains Public information

Everyone can see this information.

Subscribers

No subscribers.