Extract metadata from resources

Registered by Tomás Vírseda García

One of the main important task is guess metadata from resources.

'libextractor' is a C library which helps to guess metadata from files of arbitrary type. But it does not return any kind of semantic around metadata. Vazaar uses the python bindings for this library.

Blueprint information

Status:
Started
Approver:
Tomás Vírseda García
Priority:
Essential
Drafter:
Tomás Vírseda García
Direction:
Approved
Assignee:
Tomás Vírseda García
Definition:
Drafting
Series goal:
None
Implementation:
Started
Milestone target:
None
Started by
Tomás Vírseda García

Related branches

Sprints

Whiteboard

Get metadata with python is easy:

--- sample code ---
import extractor
xtract = extractor.Extractor()
keys = xtract.extract(path)

for keyword_type, keyword in keys:
     print "t(%s) -> k(%s)" % (keyword_type, keyword)

--- results: t(ype) -> k(ey) ---
t(description) -> k(Speedsound)
t(mimetype) -> k(application/ogg)
t(copyright) -> k(2008 Speedsound)
t(location) -> k(http://jamendo.com)
t(date) -> k(2008)
t(album) -> k(Groove Connection)
t(artist) -> k(Speedsound)
t(publisher) -> k(Xiph.Org libVorbis)
t(mimetype) -> k(application/ogg)
---

But results does not mean anything by themselves. So the best solution is to fit the results into Dublin Core Metadata Element Set [DCMI Recommendation]. NEPOMUK classes are based in Dublin Core.

'keyword_type' -> DC -> NEPOMUK
'description' -> dc:description -> nao:description

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.