Automatic Speech Recognition software event, University of Leeds – Rhys Morgan writes:

Saturday 6th of February saw YTI members once again battling the elements, this time to get to the University of Leeds for an extremely well-attended event on Automatic Speech Recognition (ASR) software led by the ever-enthusiastic Dragoș Ciobanu.

While most of the audience had never used ASR (even though it turns out that everybody with a smartphone can use it to write messages), there were mixed opinions about its benefits amongst the few audience members who already had experience with it. The presentation was a good opportunity to learn about how ASR works and to discuss the pros and cons of using it.

Essentially, it works by recognising phonemes then using probability scores to predict the word that was said. It uses a similar system to predict phrases based on the probability of certain words following certain others.

The reported benefits of using ASR relate to both a translator’s productivity and their health:

–          ASR gives, on average, a 30% productivity increase. Some professional translators surveyed even claimed a 500% increase, but remember that people like to brag and exaggerate.

–          It can help to prevent eye strain, neck pain and back pain by allowing you to move away from your desk and computer screen as you dictate (if you have a wireless microphone, of course).

–          It can also help to prevent repetitive strain injuries by allowing you to type less. A normal working day for a professional translator can involve 96,000 key presses, or 16 tons of force being applied by the fingers every day!

The disadvantages of using ASR were mostly related to operating the software and the amount of editing that can be required if the software doesn’t do a good job of understanding your speech. For ASR to work optimally, you need to:

–          Modify your speech. Your pace must be normal, and you should speak with predictable sound patterns. You must not speak in syllables, as ASR will likely transcribe each one as a separate word.

–          Speak with punctuation (“For example comma like this full stop”).

While it uses probability scores, ASR may get confused with homophones. For example, ‘know’ and ‘no’. Again, this can mean a fair deal of manual editing unless you master how to control your entire computer with your voice.

The true test of the software came at the end of the presentation with some audience participation with a particular piece of ASR software, namely Dragon Naturally Speaking. Faced with a rather broad Glaswegian accent, the poor thing didn’t know what had hit it. Making mistakes which had the audience in fits of laughter, there seemed to be no way back for it. The software had one last trick up its sleeve, though – it could be trained to recognise a new user’s speech. So, a few minutes later and having been given a crash course in Glaswegian, Dragon Naturally Speaking found its feet and transcribed nearly everything being said perfectly. Another victory for technology.

Personally, I believe that ASR software would be worth the investment if you learnt to use it properly and made the most of it. That being said, there are some people who say it isn’t worth the hassle. The only way to find out if it works for you is to try it for yourself!