One of the world-leading systems for large-vocabulary Automatic Speech
Recognition (ASR) has
been developed by a team led from the University of Sheffield. This
system, which won the
international evaluation campaigns for rich speech transcription organised
by the US National
Institute for Standards and Technology (NIST) in 2007 and 2009, has led
directly to the creation of
one spin-out, been largely instrumental in the launch of a second, has had
significant impact on the
development and growth of three existing companies, and has made highly
available free for the first time to a broad range of individual and
organisational users, with
applications including language learning, speech-to-speech translation and
access to education for
those with reading and writing difficulties.
Research in robust speech enhancement and audio-visual processing has led
to impact on a range of different fronts:
(i) Collaboration with CSR, a leading $1 billion consumer
electronics company, has shaped its R&D research agenda in speech
enhancement, has inspired ideas for new product improvements, and has
helped establish Belfast as an audio research centre of excellence within
(ii) Our technology has changed the strategic R&D direction of a
company delivering healthcare monitoring systems, with potential
for multi-million pound savings in NHS budgets.
(iii) Audio-visual speech processing research has led to a
proof-of-concept biometric system, Liopa: a novel,
robust and convenient person authentication and verification technology
exploiting lip and facial movements (www.liopa.co.uk). A start-up company
is in an advanced stage of being established to commercialise this
product. The product and commercialisation strategy was awarded First
Prize in the 2013 NISP Connect £25K entrepreneurship competition in the
Digital Media and Software category. The first commercial partner for
Liopa has been engaged.
(iv) A system-on-chip implementation of a version of our speech
recognition engine, which was developed through an EPSRC project, was
awarded first prize in the High Technology Award in the 2010 NISP £25K
Awards competition, and contributed to the founding of a spin-out company,
Analytics Engines (www.analyticsengines.com).
Speech Graphics Ltd is a spinout company from the University of
Edinburgh, building on research into the animation of talking heads during
2006-2011. Speech Graphics' technology is the first high fidelity lip-sync
solution driven by audio. Speech Graphics market a multi-lingual, scalable
solution to audio-driven animation that uses acoustic analysis and muscle
dynamics to drive the faces of computer game characters accurately
matching the words and emotion in the audio. The industry-leading
technology developed by Speech Graphics has been used to animate
characters in computer games developed by Supermassive games in 2012 and
in music videos for artists such as Kanye West in 2013.
This impact case study provides evidence of economic impacts of
our research because:
i) a spin-out company, Speech Graphics Ltd, has been created, established
its viability, and gained international recognition;
ii) the computer games industry and the music video industry have adopted
a new technology founded on University of Edinburgh research into a novel
technique to synthesize lip motion trajectories using Trajectory Hidden
Markov Models; and
iii) this led to the improvement of the process of cost-effective
creation of computer games which can be sold worldwide because their
dialogue can be more easily specialised into different human languages
with rapid creation of high-quality facial animation replacing a
combination of motion capture and manual animation.
Edinburgh's research in multilingual speech synthesis has had clinical
and commercial impact, and has resulted in a large and diverse community
Clinical applications: Our research has enabled the construction
of natural-sounding, personalised synthetic voices from recordings of
speech from people with disordered speech due to conditions such as
Parkinson's disease or Motor Neurone Disease. These synthetic voices are
used in assistive technology devices that allow sufferers of these
conditions to communicate more easily and effectively.
Commercial take-up: Our research has achieved commercial impact
through the licensing of technology components, and through the activities
of start-up companies.
Community of users: The Festival Speech Synthesis System (v2.1
released in November 2010) is a complete open-source text-to-speech system
released under an unrestrictive X11-type license, and is distributed as
part of many major Linux distributions.
GSM and 3G mobile systems do not currently support end-to-end security in
the form of encryption for speech. Research at Surrey has created new
speech technology which allows complete end-to-end security via the mobile
speech channel. This worldwide first secure-from-eavesdropping mobile
phone system is available anywhere there is mobile coverage.
A Surrey spin out, MulSys Ltd., has licensed the technology to security
agencies and is now developing a mass market product.
Forensic speaker comparison is the analysis of recorded speech with
evidential value in legal (usually criminal) cases. It is now routinely
undertaken in the UK (ca. 600 cases annually) and increasingly elsewhere.
It is vital that casework is underpinned by robust research, that reliable
methods are applied, and that evidential results are framed appropriately.
York is one of the world's largest research groups in forensic speech
science, and in those academic disciplines (phonetics, sociolinguistics,
sociophonetics) that provide the essential foundation for this applied
field. The impacts of York research are felt through (i) enhancing
understanding of variation in speech; (ii) applying research findings via
collaboration in casework and research with J P French Associates (JPFA),
one of the world's leading laboratories; (iii) providing doctoral research
supervision for JPFA staff and professional training for other experts;
(iv) providing expert evidence in legal cases in the UK and
internationally; and (v) improving policy on expert evidence in the UK.
Stroke and other forms of brain injury often result in debilitating
communication impairments. For example, patients with acquired apraxia of
speech (AOS) experience difficulties that affect their capacity to
verbally express thoughts and needs. Such individuals have benefitted from
the development of a novel computerised treatment — "Sheffield
Word" (SWORD). Patients who took part in clinical trials showed improvements
in aspects of speech that were impaired after stroke. SWORD is now
used by healthcare teams worldwide, providing benefits to a large patient
population. The SWORD computerised treatment is convenient to use at
home, fosters users' autonomy, and delivers higher treatment
doses than possible through traditional clinical sessions.
Clinicians who treat AOS have also benefitted through education, training
and access to online materials about SWORD which were provided by the
Articulate Instruments Ltd. was founded in 2003 as a research, design,
manufacturing and consulting company for users of phonetic
instrumentation. It invents, designs, markets and supports instrumental
technologies for normative and clinical speech science and for the
diagnosis and treatment of speech disorders. Products include electronic
systems, headsets, software, and methodologies, underpinned by QMU
research. Clinical use of relevant products as medical devices requires
"CE marking" to prove on-going safety and support, first achieved in
Impact relates primarily to the company's on-going financial
health and its non-academic customer base. In its first 10
years, turnover averaged ~£120k, with over 200 customers internationally,
of whom more than 50 were non-academic.
Increasingly in court cases the recorded voice of a perpetrator has to be compared with that of a
suspect. Research on speaker characteristics carried out by/under Prof. Nolan has directly
contributed to the work of those offering forensic speech services commercially or developing
relevant speech processing software. Impact arises from seminal ideas such as LTF (Long Term
Formant) analysis, and from the 100-speaker `DyViS' accent-matched database. The latter has
directly enabled: the testing of an automatic speaker recognition system preparatory to its
incorporation into forensic casework; the development of speaker recognition and speaker
separation software; the adoption of systematic `voice quality' analysis; and the availability for
casework of population statistics on pitch and disfluencies. Public engagement has raised
awareness of the possibilities and limitations of speaker identification in legal and general
Our research is concerned with enabling access to mathematical literature
to users with visual impairments (i.e., blind or partially sighted users)
or print impairments (i.e., users with specific learning disabilities like
dyslexia or dysgraphia). Therefore the impact is primarily of a societal
nature: we enable visual and print impaired learners' access to scientific
and mathematical knowledge from which they were previously excluded,
thereby furthering an inclusive teaching and learning environment. With
the number of people with learning disabilities being over 1 million (http://www.learningdisabilities.org.uk/help-information/Learning-Disability-Statistics-/)
and the number of visually impaired people predicted to rise over 2
million by 2020 in the UK alone (http://www.rnib.org.uk/aboutus/Research/statistics/Pages/statistics.aspx),
the work is significant in providing equal opportunities to learners in
the STEM subjects.
Our research has led to work with Google Inc. to enhance mathematics
accessibility on the Web via the screen-reader ChromeVox. It enables the
full text to speech translation of mathematics on the Web for all
users of the Chrome browser and Android platforms and has been
included in ChromeVox since version 27, released 9/5/2013, which has
31,518 downloads from the Chrome store as of 11/10/2013.
Our work has also resulted in an assistive technology tool, called
MaxTract, for providing access to teaching and learning material. It has
been deployed within digital mathematics libraries to enhance
accessibility to online material. Through direct feedback we are aware of
a number of visually impaired users that have actively used our tool.