Distributed Media Library

Registered by Jason Gerard DeRose on 2010-10-13

Let's talk about where it makes sense to integrate dmedia (or other Novacut components) into Ubuntu One, what applications might be good candidates for adding dmedia support (PiTiVi, Shotwell?).

The Distributed Media Library (dmedia) is designed to bring Media Asset Management (MAM) to the freedesktop. It's designed to be suitable for both content creation (e.g. photo editors) and consumption (e.g. video players), and for both casual and professional apps. Key features are:

    * Deduplication - media files are ID'd by their content-hash
    * Synchronization and backup (e.g., you import some photos on a netbook, then they are automatically merged onto your desktop or cloud storage)
    * Reclamation - tracks where media files are stored, so knows when a file can be safely deleted to free space
    * Gracefully use a large library from a device with limited storage (e.g., netbook, tablet, phone)

Meta-data for the media files is stored in CouchDB, which means it will be easy to integrate with Ubuntu One. Each media file has a corresponding document in CouchDB, and the media file content-hash is used for the document ID:

      {"_id": "UB2VSXLKXSXBP44VQ5MHBPRBLVH7QWEW"}

The media files are stored on the file system in a special layout according the the content-hash:

      ~/.dmedia/UB/2VSXLKXSXBP44VQ5MHBPRBLVH7QWEW.png

Media files are moved between computers over HTTP. The meta-data for the entire library is stored on each device (meta-data is small), but a given device might have only a small subset of the media files (media files are big).

Blueprint information

Status:
Started
Approver:
Stuart Langridge
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
None
Definition:
Drafting
Series goal:
Accepted for natty
Implementation:
Good progress
Milestone target:
None
Started by
Jason Gerard DeRose on 2010-10-14

Related branches

Sprints

Whiteboard

[exact gobby doc from session]

https://wiki.ubuntu.com/Specs/N/DistributedMediaLibrary is a high-level spec,
which explains the basic model.

Media files are stored on the machine itself, shipped around with HTTP;
metadata stored in CouchDB and available everywhere.
Switchable back-ends for storing media files.

 * perhaps use a UDF for each media project, so when you open a project on your
   netbook, all media files used are downloaded (but not all your media files).
   - It still doesn't solve the issue of having 500GB of Music files and not
     wanting to download it all.
     - you can unsubscribe the UDF with music on your netbook.
       - Sure, but what if I ant only to access some of my music (or photos,
         or movies etc). What I'd like to see is some sort of "stub file"
         or entry that you can click and get access to the actual file (i.e.
         it's downloaded from the cloud/server/add your usecase) on demand.

Question: Does this blueprint also consider things like making music players share the same database? Or is that a completely seperate issue? Or just something that could come further down the road?
Answer: this blueprint does not consider that; this is pretty much purely about
Novacut. There's obviously a bunch of stuff that novacut will think through which
has wider application to things like music players sharing the same DB, though.

Something that would be good would be a standard way to serialise an action on a video/image/music into a description -- Shotwell does this, and if novacut did the serialisation the same way then they can both store into couchdb and collaborate on them.

[jderose - my answer to some questions above]

"""Question: Does this blueprint also consider things like making music players share the same database? Or is that a completely seperate issue? Or just something that could come further down the road?"""

 The "official" answer (not sure who answered on the gobby doc, but it wasn't anyone from the Novacut team): Yes! dmedia is not Novacut specific. I'm keenly focused on keeping dmedia simple, generic, and usable by a wide range of applications. I want to get this important user data out of application specific silos, enable more fluid integration with the desktop (especially the social desktop).

"""Something that would be good would be a standard way to serialise an action on a video/image/music into a description -- Shotwell does this, and if novacut did the serialisation the same way then they can both store into couchdb and collaborate on them."""

 Yes, this is exactly what I want to accomplish, although it isn't technically in the scope for dmedia itself (but dmedia provides the foundation). Here's a use case I want to enable: you edit a video in a home-use-oriented editor; later you incorporate that edit into a more complicated edit in a pro-use-oriented editor. I believe home-use and pro-use editors call for different user interfaces because the nature of the workflow is different enough (of course, I could be wrong). But either way, there's no reason the editors shouldn't use the same edit description (serialization). And this also means they can use the same backed to render the edits, which is a huge common component and a big win for everyone.

[jderose - summary of UDS conversations, next steps]

I'll try to summarize the dmedia session and the many dmedia/Novacut discussions I had with different people at UDS. If I've misrepresented anyone or screwed up details, please jump in and fix it.

dmedia + Shotwell
-----------------

    * I got the impression that Adam and Jim feel the dmedia design overall will be easy to map into Shotwell and they seem excited about the capabilities dmedia will bring
    * One problematic use case Jim brought up is when Shotwell modifies the original file's EXIF data, changing the content-hash. I personally feel this use case is rare enough that the burden should be on this use case rather than changing the core dmedia design. dmedia captures the 99% very well... pro media apps are universally non-destructive, as are most consumer media apps these days. Also, dmedia provides an elegant way to have important meta-data available everywhere, even when the media file itself isn't available locally. Of course, when a photo is exported/rendered to share on the net or whatever, we obviously can write whatever EXIF data needed to the exported file... the point is just to never alter the original master file.
    * I don't think anyone thinks we should try to replace the current Shotwell database with dmedia in this cycle (if ever... dmedia can be just one of several options). Seems like the best way forward is to integrate via a plugin, so we need to wait for the Shotwell plugin architecture to be in place first.
    * Jim - I got the impression that you feel the plugin architecture isn't realistic for the N cycle. Am I correct on this? How about for the O cycle?
    * Jim and I had some great talks about standardizing the edit description semantics... standardizing the semantics is the important thing and something we will be working on immediately. The exact serialization is less important to standardize right now, but Novacut is using a graph-based JSON description stored in CouchDB. But if say Shotwell uses XML (I don't know if it does, just using it as example) that's not a problem as long as you can do a lossless round-trip between different serialization formats.
    * Summary - there may not be any Shotwell + demdia integration this cycle, but we will be in close communication so that we're both moving toward a place where the integration will be easy to do when the time comes. Adam and Jim, you both rock! Great to meet you!

(From Jim):

* Regarding never touching the master file, it should be understood that in the world of photography, it's not so clear-cut. Some users love that leave their masters pristine, but there is a vocal segment of our population that do want us to update their master's metadata, specifically: title, description (which Shotwell doesn't have yet), tags/keywords, exposure date/time, and orientation (which allows for lossless rotations). Just this week, I added a feature to Shotwell to commit these metadata items (there may be more in the future) to the master files in the background. This is an optional feature, but highly requested.

I know that in the world of MP3, this is also not unusual, since people often update metadata to fill-in or correct artist name, album, etc.

I know this greatly complicates the problem -- it certainly complicated Shotwell, which was envisioned as being entirely non-destructive -- but I really feel more consideration and investigation should be made. A hash of the file contents is vital for a system like this, but I'm unsure it's sufficient as a content ID. Just my thoughts.

* The current thinking on plug-ins in Shotwell is would-very-much-like-to-have in our next release, which (knock on wood) would be ready in time for Natty. No promises, however.

* I've spoken about this before, but to add it to the permanent record, I would recommend looking at XMP as the metadata format to store and transport in dmedia. It's a well thought out spec, all XML, highly extensible and supports custom namespaces. I call it The One Metadata to Rule Them All.

dmedia + UbuntuOne
------------------

    * All the UbuntuOne people have been very enthusiastic about dmedia and Novacut - thanks everyone for the encouragement and technical guidance!
    * Fortunately, for the hard dmedia problem (bi-directionally syncing the DB), we already have full UbuntuOne integration with basically no effort (thanks to CouchDB, desktopcouch, and the great UbuntuOne CouchDB sync infrastructure)
    * Unfortunately, as the UbuntuOne file sync works at the directory level, there's not an easy way to map dmedia into the UbuntuOne file sync for cases when only an arbitrary and changing subset of the files will be on a given device (basically determined by the access patterns and storage capacity of a particular device... phone vs netbook vs workstation vs storage cluster). This is sort of the dmedia killer feature, and in pro Video production especially, having only a partial library on a given machine will be the rule, not the exception. Will also generally be the rule when in comes to video consumption. Or when talking phones, whether video audio or photos.
    * On the other hand, the dmedia file transport isn't a hard problem. We plan to make this part plugable to support multiple backends (dmedia native, S3, removable devices, whatever). I think the UbuntuOne file sync works well for what it's designed to do. Syncing read-only intrinsically named media files is quite a different problem than bidirectionally syncing file modifications and renames on your Documents directory. I would personally like to see UbuntuOne run the native dmedia storage server for dmedia use. Stuart Langridge... lets have a Skype chat about this. Even if UbuntuOne running the dmedia storage server might be a ways off, I'd like to accommodate the UbuntuOne infrastructure design/quirks in the dmedia storage server so that it's easy to run on UbuntuOne when the time is right.
    * I really want to see dmedia in Natty and think it's a totally realistic goal. At minimum, 1) the DB must sync with UbuntuOne (done!), and 2) some kind of file sync must be in place, even if just over localnet and/or personal S3 account. Ideally, file sync would be available via UbuntuOne and there would be an app or two with dmedia integration.
    * Summary - lets get dmedia into Natty! What does the Novacut team need to get done in the next 2 weeks to meet the Nov 25 Feature-Definition-Freeze criteria?
    * Special thanks to Stuart, Chad, and Manuel for fielding my many desktopcouch questions. All the UbuntuOne peeps rock!

dmedia + CouchOne
-----------------

    * Likewise, all the CouchOne people have been very enthusiastic about dmedia and Novacut - thanks everyone for the encouragement and technical guidance!
    * From talking with Jason Smith, I got the impression that CouchOne is keenly interested in having dmedia run on phones. This has always been on the Novacut radar, just not the immediate radar. But if CouchOne can aid us a bit on this front, we want to make this a high priority (something targeted for the N cycle).
    * dmedia is simple and mostly just leverages CouchDB. The native dmedia file transfer will use simple HTTP. So although the reference dmedia implementation is being done in Python, reimplementing it in a different language wont be much work (if I do a decent job keeping things simple). Plus some desktop dmedia features don't make sense on a phone (like rendering low-rez proxy versions). The most phone appropriate design might work quite a bit differently too. Like perhaps on phones dmedia should store the media files in CouchDB instead of on the file system using the content-hash-based layout.
    * The Novacut team doesn't have any Android/iOS development experience, so we would need some help here. Plus I'm personally allergic to java and apples. :)
    * Summary - lets get dmedia on phones! CouchOne people, lets have a chat soon to make sure the dmedia design decisions I'm making will also work well on phones. And if you already have some ideas about how you'd implement dmedia on Android or iOS, I'd love to know what you're thinkin'.
    * Special thanks to Jason Smith for spending like 5 hours helping me understand CouchDB views better, best practices. All the CouchOne peeps rock! Hi, Jan!

Next steps
----------

After taking a week to digest everything, I'm ready to layout the dmedia architecture and milestones for the Natty cycle. I'm working on an architecture diagram that I will probably explain through, you know, a trade mark video with pieces of paper. Should be published in the next few days.

Last weekend I watched college football with rockstar during which he explained a lot of Launchpad best practices as far as making it easy for people to understand the roadmap and take on small features. There are just a few bugs filed so far, but many more will be filed shortly. I'm tagging them with 'easy':

    https://bugs.launchpad.net/dmedia/+bugs?field.tag=easy

Lastly, there is now a #novacut IRC channel where we'll idle. We'll also start doing a short IRC meeting once a week to keep our process transparent and make it easy for people to join in.

[jderose]

* Completed professional file import UX design - https://wiki.ubuntu.com/AyatanaDmediaLovefest
* Released dmedia 0.1 - https://launchpad.net/dmedia/+announcement/7271

[jderose]

* Released dmedia 0.2 - http://blog.novacut.com/2010/12/announcing-dmedia-02-feature-frenzy.html
* Video of pro file import UX in action - http://vimeo.com/18287329

[jderose]

* Released dmedia 0.3 - http://blog.novacut.com/2011/01/announcing-dmedia-03-made-of-web.html

[jderose]

* Good progress on formal, test-driven definition of dmedia CouchDB schema - http://bazaar.launchpad.net/~dmedia/dmedia/trunk/view/head:/dmedia/schema.py

[jderose]

* Released dmedia 0.4 - http://blog.novacut.com/2011/02/announcing-dmedia-04-forplay.html
* FileStore is basically feature complete - http://bazaar.launchpad.net/~dmedia/dmedia/trunk/view/head:/dmedia/filestore.py

[jderose]

* Released dmedia 0.5 - http://blog.novacut.com/2011/03/announcing-dmedia-05-so-shiny.html

[jderose]

* Released dmedia 0.6 - http://blog.novacut.com/2011/04/announcing-dmedia-06-go-time.html

(?)

Work Items