Sunday 6 May 2012

The journey to a text-to-speech PVR

TVonics recently launched their "talking PVR" product in association with the Royal National Institute of Blind People (RNIB). To the casual observer you may think this was something that was purely a commercial development but for me it is the result of 4 years of effort and a big personal goal.

I won't take credit for the final implementation the team at TVonics fully deserve that. My role was in conceiving the technical implementation and producing a demo that helped secure the interest and collaboration of the RNIB. It took a few years to get there and it was the evolution of digital TV hardware that made this realistically possible and financially viable to put in a mass market product at little or no extra cost to the end user.

The cost aspect is very important as accessibility products represent a difficult junction between serving the specific needs of a small community of disabled users while trying to serve them with a product that is of equal features and quality to those of non-disabled users without having to charge a large premium. The best and most sustainable strategy to achieving this is to add accessibility as a standard feature of an existing product line rather than commission a new product. If this can be achieved without having to increase the hardware specification and hence cost of the product for everyone then it simply becomes a question of the cost of development and maintenance.

My journey started after the development and launch of TVonics' first twin-record PVR the DTR-Z250. One of the directors asked the software department about the possibility of putting text-to-speech (TTS) support on the product. Having recently gone through the pain of the death of my grandfather I recalled his many years of service to the local talking newspaper "Sounds & Voices". I very much admired him for that and hoped I could contribute something to the benefit of the blind and partially sighted myself. Finding a way to make text-to-speech possible on an affordable product would be a fitting tribute.

At the time we had very limited storage, memory and processing power, this is generally constrained to be just enough to do the job effectively. This left very little room for extras over and above the normal operation of the product. I tried to get the best commercially compatible open source solution I had access to running at the time but there were too many constraints to even make a reasonable technology demo at the time. It also quickly dawned upon us that at the very least we would have to increase the hardware specification to achieve this compared to the standard product and this would be expensive. I dropped the idea for a few years while we shifted platform a few times and had important OEM projects to work on.

The breakthrough came with the launch of text-to-speech on Google's Android platform in Android 1.6 aka Donut. At some point after the launch I recalled that because Android is open source I should be able to find the source code to their text to speech solution to see how they achieved it. The beauty of Android is that Google appear to have licensed a number of commercial products and then put them out under commercial friendly Apache or BSD type licenses.

This should allow anyone, even those outside of mobile development to take these components and integrate them into a commercial product but don't quote me on that, I'm not a legal expert! The text-to-speech engine in Android is called picoTTS and is developed by a company called SVOX. In the Android source code this is provided as an SDK along with the integration code for the Android platform.

Someone giving a demo of the text-to-speech feature on the TVonics Freeview HD product
Once I extracted it out and played with it over a few lunch breaks I realised that I could use it with our current hardware at the time to potentially achieve the aim of adding text-to-speech to the product at no additional cost. After messing about with funny phrases and giving the speech engine a foul mouth I then proceeded to put some simple user interface integration in there. Once again I had to shelve the project because we were making the transition from Freeview to Freeview HD and in the process we changed from using externally supplied digital TV middleware to licensing another solution we could bring in house. This meant I had to rewrite my initial work to get back to where I was however this time it was better as I could do it far more integrated way. At this point I was able to pitch the idea formally to my management and get some time and support to sort out the technical issues. Finally with some initial work from myself and help of another engineer we put in some support for text-to-speech in the user interface to create our demonstration version.

A demo was given by the management from which the partnership was formed, it was not long afterwards that I made the decision to leave TVonics. Shortly after I left TVonics other software engineers took my work, added a different text-to-speech engine suggested by the RNIB called IVONA and a lot of work went into making the entire user interface usable to someone with little or no vision. I suspect the user interface was even harder work but I this is what separates an interesting technology demo from a usable well integrated and useful product.

Finally on 3rd May 2012 text to speech on the TVonics Freeview HD product range was launched, it can be purchased pre-loaded on the PVR from the RNIB or you can upgrade existing products. I downloaded and tried it out and it was everything I wanted it to be. I hope this feature will make a positive difference to some people out there and well done to TVonics for seeing the initial rough idea through to launch.

If it makes a difference for you I'd love to hear about it in the comments it would mean a lot to me.