Extract metadata from resources
One of the main important task is guess metadata from resources.
'libextractor' is a C library which helps to guess metadata from files of arbitrary type. But it does not return any kind of semantic around metadata. Vazaar uses the python bindings for this library.
Blueprint information
- Status:
- Started
- Approver:
- Tomás Vírseda García
- Priority:
- Essential
- Drafter:
- Tomás Vírseda García
- Direction:
- Approved
- Assignee:
- Tomás Vírseda García
- Definition:
- Drafting
- Series goal:
- None
- Implementation:
- Started
- Milestone target:
- None
- Started by
- Tomás Vírseda García
- Completed by
Related branches
Related bugs
Sprints
Whiteboard
Get metadata with python is easy:
--- sample code ---
import extractor
xtract = extractor.
keys = xtract.
for keyword_type, keyword in keys:
print "t(%s) -> k(%s)" % (keyword_type, keyword)
--- results: t(ype) -> k(ey) ---
t(description) -> k(Speedsound)
t(mimetype) -> k(application/ogg)
t(copyright) -> k(2008 Speedsound)
t(location) -> k(http://
t(date) -> k(2008)
t(album) -> k(Groove Connection)
t(artist) -> k(Speedsound)
t(publisher) -> k(Xiph.Org libVorbis)
t(mimetype) -> k(application/ogg)
---
But results does not mean anything by themselves. So the best solution is to fit the results into Dublin Core Metadata Element Set [DCMI Recommendation]. NEPOMUK classes are based in Dublin Core.
'keyword_type' -> DC -> NEPOMUK
'description' -> dc:description -> nao:description