Log in
Speech Graphics Ltd is a spinout company from the University of Edinburgh, building on research into the animation of talking heads during 2006-2011. Speech Graphics' technology is the first high fidelity lip-sync solution driven by audio. Speech Graphics market a multi-lingual, scalable solution to audio-driven animation that uses acoustic analysis and muscle dynamics to drive the faces of computer game characters accurately matching the words and emotion in the audio. The industry-leading technology developed by Speech Graphics has been used to animate characters in computer games developed by Supermassive games in 2012 and in music videos for artists such as Kanye West in 2013.
This impact case study provides evidence of economic impacts of our research because:
i) a spin-out company, Speech Graphics Ltd, has been created, established its viability, and gained international recognition;
ii) the computer games industry and the music video industry have adopted a new technology founded on University of Edinburgh research into a novel technique to synthesize lip motion trajectories using Trajectory Hidden Markov Models; and
iii) this led to the improvement of the process of cost-effective creation of computer games which can be sold worldwide because their dialogue can be more easily specialised into different human languages with rapid creation of high-quality facial animation replacing a combination of motion capture and manual animation.
Edinburgh's research in multilingual speech synthesis has had clinical and commercial impact, and has resulted in a large and diverse community of users.
Clinical applications: Our research has enabled the construction of natural-sounding, personalised synthetic voices from recordings of speech from people with disordered speech due to conditions such as Parkinson's disease or Motor Neurone Disease. These synthetic voices are used in assistive technology devices that allow sufferers of these conditions to communicate more easily and effectively.
Commercial take-up: Our research has achieved commercial impact through the licensing of technology components, and through the activities of start-up companies.
Community of users: The Festival Speech Synthesis System (v2.1 released in November 2010) is a complete open-source text-to-speech system released under an unrestrictive X11-type license, and is distributed as part of many major Linux distributions.
Forensic speaker comparison is the analysis of recorded speech with evidential value in legal (usually criminal) cases. It is now routinely undertaken in the UK (ca. 600 cases annually) and increasingly elsewhere. It is vital that casework is underpinned by robust research, that reliable methods are applied, and that evidential results are framed appropriately. York is one of the world's largest research groups in forensic speech science, and in those academic disciplines (phonetics, sociolinguistics, sociophonetics) that provide the essential foundation for this applied field. The impacts of York research are felt through (i) enhancing understanding of variation in speech; (ii) applying research findings via collaboration in casework and research with J P French Associates (JPFA), one of the world's leading laboratories; (iii) providing doctoral research supervision for JPFA staff and professional training for other experts; (iv) providing expert evidence in legal cases in the UK and internationally; and (v) improving policy on expert evidence in the UK.
One of the world-leading systems for large-vocabulary Automatic Speech Recognition (ASR) has been developed by a team led from the University of Sheffield. This system, which won the international evaluation campaigns for rich speech transcription organised by the US National Institute for Standards and Technology (NIST) in 2007 and 2009, has led directly to the creation of one spin-out, been largely instrumental in the launch of a second, has had significant impact on the development and growth of three existing companies, and has made highly advanced technology available free for the first time to a broad range of individual and organisational users, with applications including language learning, speech-to-speech translation and access to education for those with reading and writing difficulties.
Stroke and other forms of brain injury often result in debilitating communication impairments. For example, patients with acquired apraxia of speech (AOS) experience difficulties that affect their capacity to verbally express thoughts and needs. Such individuals have benefitted from the development of a novel computerised treatment — "Sheffield Word" (SWORD). Patients who took part in clinical trials showed improvements in aspects of speech that were impaired after stroke. SWORD is now used by healthcare teams worldwide, providing benefits to a large patient population. The SWORD computerised treatment is convenient to use at home, fosters users' autonomy, and delivers higher treatment doses than possible through traditional clinical sessions. Clinicians who treat AOS have also benefitted through education, training and access to online materials about SWORD which were provided by the research team.
Nearly every large-vocabulary speech recognition system in current use employs outputs from fundamental research carried out in the University of Cambridge Department of Engineering (DoEng) on adaptation of Hidden Markov Models (HMMs). One example of the commercial application of these outputs is their use on the Microsoft Windows desktop for both the command and control functions and the dictation functions. Approximately one billion copies of Windows have been shipped since 2008. Other examples show the outputs used in the automatic transcription of a wide range of types of data. [text removed for publication]
Articulate Instruments Ltd. was founded in 2003 as a research, design, manufacturing and consulting company for users of phonetic instrumentation. It invents, designs, markets and supports instrumental technologies for normative and clinical speech science and for the diagnosis and treatment of speech disorders. Products include electronic systems, headsets, software, and methodologies, underpinned by QMU research. Clinical use of relevant products as medical devices requires "CE marking" to prove on-going safety and support, first achieved in 2004.
Impact relates primarily to the company's on-going financial health and its non-academic customer base. In its first 10 years, turnover averaged ~£120k, with over 200 customers internationally, of whom more than 50 were non-academic.
GSM and 3G mobile systems do not currently support end-to-end security in the form of encryption for speech. Research at Surrey has created new speech technology which allows complete end-to-end security via the mobile speech channel. This worldwide first secure-from-eavesdropping mobile phone system is available anywhere there is mobile coverage.
A Surrey spin out, MulSys Ltd., has licensed the technology to security agencies and is now developing a mass market product.
Increasingly in court cases the recorded voice of a perpetrator has to be compared with that of a suspect. Research on speaker characteristics carried out by/under Prof. Nolan has directly contributed to the work of those offering forensic speech services commercially or developing relevant speech processing software. Impact arises from seminal ideas such as LTF (Long Term Formant) analysis, and from the 100-speaker `DyViS' accent-matched database. The latter has directly enabled: the testing of an automatic speaker recognition system preparatory to its incorporation into forensic casework; the development of speaker recognition and speaker separation software; the adoption of systematic `voice quality' analysis; and the availability for casework of population statistics on pitch and disfluencies. Public engagement has raised awareness of the possibilities and limitations of speaker identification in legal and general audiences.
The impact is primarily in Public Health. It mainly concerns the adoption of and demand for a speech research technology, Electropalatography (EPG), for clinical diagnosis and treatment of speech disorders. Our continuing long-term and interdisciplinary research into EPG has increased our impact in this census period from the previous RAE2008, during which time the UOA had already been awarded a Queen's Anniversary Prize (2002) for working towards the clinical application of speech science.
Financial Support from the charitable sector and the NHS for the training of classroom assistants and SLTs in EPG therapy is highlighted, along with user testimonials, unmet demand, and small-scale provision of the therapy.