Proposal for Voice Driven User Interface

Registered by David Sugar

This specification (through the soon to be created and associated wiki page) will cover both the challenges and the start of a roadmap for voice driven user interfaces in Ubuntu, focusing on how it may apply specifically to the ubuntu mobile platforms. Any goals practical to do in the Karmic release timeframe will also be identified. This blueprint also relates strongly to other work being done in both accessibility and core platform sound.

Blueprint information

Status:
Started
Approver:
David Mandala
Priority:
Low
Drafter:
David Sugar
Direction:
Needs approval
Assignee:
None
Definition:
Review
Series goal:
None
Implementation:
Started
Milestone target:
None
Started by
David Sugar

Related branches

Sprints

Whiteboard

Work Items:
Updated packaging of pocketsphinx: DONE
Acceptance of pocketsphinx in archive: DONE
Rhymtmbox changes: DEFERRED

Update 2009-06-26: To clarify, there are no essential, required, or otherwise blocking issues in this spec for Karmic, though there are items which could be started early for Karmic +1 if deemed important for that. It is also worth noting that there had been some changes proposed for Rhythmbox, and this may need to be reconsidered if Banshee does become the default media player for Karmic.

---

As per uds, a team and mailing list has been formed for voice driven ui in Ubuntu:

https://launchpad.net/~voice-driven-ui
...

It is worth noting there is a version of pocketsphinx packaged and in revu. The package is a bit broken and seems like it's been stuck for a year or more ;), but at least it would likely be easy enough to fix and get working.

(Regarding pocketsphinx - I've started looking at finishing the packaging, afaics, it'll depend on the currently un-packaged sphinx-base library, so I'll start there --michael.nelson).

(sphinxbase is also stuck in revu, and I'm not really sure how the package is broken unless it has problems with Intrepid/Jaunty ... no idea why it's been stuck in REVU for almost two years now --dhd)

For TTS I am thinking of Festival, which people have already got working with alsa and pulse back-ends https://help.ubuntu.com/community/TextToSpeech

Also, this should be tracked with accessibility, orca, and screen readers framework. Any TTS related UI for speech probably should tie back to/through orca.

https://help.ubuntu.com/community/Accessibility

There's also espeak for TTS which is installed by default and has a C/C++ library (and I started working on Python bindings but those will still need time to be useful; https://launchpad.net/python-espeak), and Julius + Voxforge speech corpora for voice recognition packaged in Ubuntu. -- RainCT

Simon also looks pretty promising, but I couldn't get it to work. http://sourceforge.net/projects/speech2text/ -- RainCT
TTS APIs in Linux, particularly for gnome/screen reading are going to change somewhat in the next 12 months over to speech-dispatcher, which will hopefully become the single Linux speech API. Espeak is the best bet for multi-lingual TTS. (TheMuso)

I've just found another application (which uses Sphinx 3) that looks quite interesting: http://sourceforge.net/projects/voicekey/. -- RainCT

I've just found a table comparing different voice recognition engines (according to which Julius is the best one available, beating even proprietary solutions when it comes to the amount of correctly recognized speech), I don't know how old it is though. If you want to have a look at it: http://translate.google.com/translate?u=http%3A%2F%2Fsimon-listens.org%2Findex.php%3Fid%3D124&langpair=de|en&hl=en&ie=UTF8 -- RainCT

(?)

Work Items