Vazaar

Extract metadata from resources

Registered by Tomás Vírseda García on 2009-12-18

One of the main important task is guess metadata from resources.

'libextractor' is a C library which helps to guess metadata from files of arbitrary type. But it does not return any kind of semantic around metadata. Vazaar uses the python bindings for this library.

Blueprint information

Status:: Started

Approver:: Tomás Vírseda García

Priority:: Essential

Drafter:: Tomás Vírseda García

Direction:: Approved

Assignee:: Tomás Vírseda García

Definition:: Drafting

Series goal:: None

Implementation:: Started

Milestone target:: None

Started by: Tomás Vírseda García on 2009-12-18

Related branches

Related bugs

Sprints

Whiteboard

Get metadata with python is easy:

--- sample code ---
import extractor
xtract = extractor.Extractor()
keys = xtract.extract(path)

for keyword_type, keyword in keys:
print "t(%s) -> k(%s)" % (keyword_type, keyword)

--- results: t(ype) -> k(ey) ---
t(description) -> k(Speedsound)
t(mimetype) -> k(application/ogg)
t(copyright) -> k(2008 Speedsound)
t(location) -> k(http://jamendo.com)
t(date) -> k(2008)
t(album) -> k(Groove Connection)
t(artist) -> k(Speedsound)
t(publisher) -> k(Xiph.Org libVorbis)
t(mimetype) -> k(application/ogg)
---

But results does not mean anything by themselves. So the best solution is to fit the results into Dublin Core Metadata Element Set [DCMI Recommendation]. NEPOMUK classes are based in Dublin Core.

'keyword_type' -> DC -> NEPOMUK
'description' -> dc:description -> nao:description

(?)

Work Items

This blueprint contains Public information

Everyone can see this information.

Subscribers

No subscribers.