Research in robust speech enhancement and audio-visual processing has led
to impact on a range of different fronts:
(i) Collaboration with CSR, a leading $1 billion consumer
electronics company, has shaped its R&D research agenda in speech
enhancement, has inspired ideas for new product improvements, and has
helped establish Belfast as an audio research centre of excellence within
(ii) Our technology has changed the strategic R&D direction of a
company delivering healthcare monitoring systems, with potential
for multi-million pound savings in NHS budgets.
(iii) Audio-visual speech processing research has led to a
proof-of-concept biometric system, Liopa: a novel,
robust and convenient person authentication and verification technology
exploiting lip and facial movements (www.liopa.co.uk). A start-up company
is in an advanced stage of being established to commercialise this
product. The product and commercialisation strategy was awarded First
Prize in the 2013 NISP Connect £25K entrepreneurship competition in the
Digital Media and Software category. The first commercial partner for
Liopa has been engaged.
(iv) A system-on-chip implementation of a version of our speech
recognition engine, which was developed through an EPSRC project, was
awarded first prize in the High Technology Award in the 2010 NISP £25K
Awards competition, and contributed to the founding of a spin-out company,
Analytics Engines (www.analyticsengines.com).
Edinburgh's research in multilingual speech synthesis has had clinical
and commercial impact, and has resulted in a large and diverse community
Clinical applications: Our research has enabled the construction
of natural-sounding, personalised synthetic voices from recordings of
speech from people with disordered speech due to conditions such as
Parkinson's disease or Motor Neurone Disease. These synthetic voices are
used in assistive technology devices that allow sufferers of these
conditions to communicate more easily and effectively.
Commercial take-up: Our research has achieved commercial impact
through the licensing of technology components, and through the activities
of start-up companies.
Community of users: The Festival Speech Synthesis System (v2.1
released in November 2010) is a complete open-source text-to-speech system
released under an unrestrictive X11-type license, and is distributed as
part of many major Linux distributions.
Forensic speaker comparison is the analysis of recorded speech with
evidential value in legal (usually criminal) cases. It is now routinely
undertaken in the UK (ca. 600 cases annually) and increasingly elsewhere.
It is vital that casework is underpinned by robust research, that reliable
methods are applied, and that evidential results are framed appropriately.
York is one of the world's largest research groups in forensic speech
science, and in those academic disciplines (phonetics, sociolinguistics,
sociophonetics) that provide the essential foundation for this applied
field. The impacts of York research are felt through (i) enhancing
understanding of variation in speech; (ii) applying research findings via
collaboration in casework and research with J P French Associates (JPFA),
one of the world's leading laboratories; (iii) providing doctoral research
supervision for JPFA staff and professional training for other experts;
(iv) providing expert evidence in legal cases in the UK and
internationally; and (v) improving policy on expert evidence in the UK.
Nearly every large-vocabulary speech recognition system in current use
employs outputs from fundamental research carried out in the University of
Cambridge Department of Engineering (DoEng) on adaptation of Hidden Markov
Models (HMMs). One example of the commercial application of these outputs
is their use on the Microsoft Windows desktop for both the command and
control functions and the dictation functions. Approximately one billion
copies of Windows have been shipped since 2008. Other examples show the
outputs used in the automatic transcription of a wide range of types of
data. [text removed for publication]
Stroke and other forms of brain injury often result in debilitating
communication impairments. For example, patients with acquired apraxia of
speech (AOS) experience difficulties that affect their capacity to
verbally express thoughts and needs. Such individuals have benefitted from
the development of a novel computerised treatment — "Sheffield
Word" (SWORD). Patients who took part in clinical trials showed improvements
in aspects of speech that were impaired after stroke. SWORD is now
used by healthcare teams worldwide, providing benefits to a large patient
population. The SWORD computerised treatment is convenient to use at
home, fosters users' autonomy, and delivers higher treatment
doses than possible through traditional clinical sessions.
Clinicians who treat AOS have also benefitted through education, training
and access to online materials about SWORD which were provided by the
One of the world-leading systems for large-vocabulary Automatic Speech
Recognition (ASR) has
been developed by a team led from the University of Sheffield. This
system, which won the
international evaluation campaigns for rich speech transcription organised
by the US National
Institute for Standards and Technology (NIST) in 2007 and 2009, has led
directly to the creation of
one spin-out, been largely instrumental in the launch of a second, has had
significant impact on the
development and growth of three existing companies, and has made highly
available free for the first time to a broad range of individual and
organisational users, with
applications including language learning, speech-to-speech translation and
access to education for
those with reading and writing difficulties.
The impact is primarily in Public Health. It mainly concerns the adoption
of and demand for a speech research technology, Electropalatography (EPG),
for clinical diagnosis and treatment of speech disorders. Our continuing
long-term and interdisciplinary research into EPG has increased our impact
in this census period from the previous RAE2008, during which time the UOA
had already been awarded a Queen's Anniversary Prize (2002) for working
towards the clinical application of speech science.
Financial Support from the charitable sector and the NHS for the training
of classroom assistants and SLTs in EPG therapy is highlighted, along with
user testimonials, unmet demand, and small-scale provision of the therapy.
Articulate Instruments Ltd. was founded in 2003 as a research, design,
manufacturing and consulting company for users of phonetic
instrumentation. It invents, designs, markets and supports instrumental
technologies for normative and clinical speech science and for the
diagnosis and treatment of speech disorders. Products include electronic
systems, headsets, software, and methodologies, underpinned by QMU
research. Clinical use of relevant products as medical devices requires
"CE marking" to prove on-going safety and support, first achieved in
Impact relates primarily to the company's on-going financial
health and its non-academic customer base. In its first 10
years, turnover averaged ~£120k, with over 200 customers internationally,
of whom more than 50 were non-academic.
GSM and 3G mobile systems do not currently support end-to-end security in
the form of encryption for speech. Research at Surrey has created new
speech technology which allows complete end-to-end security via the mobile
speech channel. This worldwide first secure-from-eavesdropping mobile
phone system is available anywhere there is mobile coverage.
A Surrey spin out, MulSys Ltd., has licensed the technology to security
agencies and is now developing a mass market product.
Increasingly in court cases the recorded voice of a perpetrator has to be compared with that of a
suspect. Research on speaker characteristics carried out by/under Prof. Nolan has directly
contributed to the work of those offering forensic speech services commercially or developing
relevant speech processing software. Impact arises from seminal ideas such as LTF (Long Term
Formant) analysis, and from the 100-speaker `DyViS' accent-matched database. The latter has
directly enabled: the testing of an automatic speaker recognition system preparatory to its
incorporation into forensic casework; the development of speaker recognition and speaker
separation software; the adoption of systematic `voice quality' analysis; and the availability for
casework of population statistics on pitch and disfluencies. Public engagement has raised
awareness of the possibilities and limitations of speaker identification in legal and general