Log in
GATE (a General Architecture for Text Engineering—see http://gate.ac.uk/) is an experimental apparatus, R&D platform and software suite with very wide impact in society and industry. There are many examples of applications: the UK National Archive uses it to provide sophisticated search mechanisms over its .gov.uk holdings; Oracle includes it in its semantics offering; Garlik Ltd. uses it to mine the web for data that might lead to identity theft; Innovantage uses it in intelligent recruiting products; Fizzback uses it for customer feedback analysis; the British Library uses it for environmental science literature indexing; the Stationery Office for value-added services on top of their legal databases. It has been adopted as a fundamental piece of web infrastructure by major organisations like the BBC, Euromoney and the Press Association, enabling them to integrate huge volumes of data with up-to-the-minute currency at an affordable cost, delivering cost savings and new products.
COnnecting REpositories (CORE) is a system for aggregating, harvesting and semantically enriching documents. As at July 2013, CORE contains 15m+ open access research papers from worldwide repositories and journals, on any topic and in more than 40 languages. In July 2013, CORE recorded 500k+ visits from 90k+ unique visitors. By processing both full-text and metadata, CORE serves four communities: researchers searching research materials; repository managers needing analytical information about their repositories; funders wanting to evaluate the impact of funded projects; and developers of new knowledge-mining technologies. The CORE semantic recommender has been integrated with digital libraries and repositories of cultural institutions, including the European Library and UNESCO. CORE has been selected to be the metadata aggregator of the UK's national open access services.
State-of-the-art reasoning systems developed in the UoA have underpinned the standardisation of ontology languages, and play a critical role in numerous applications. For example, HermiT, software developed in the UoA, is being used by Électricité de France (EDF) to provide bespoke energy saving advice to 265,000 customers in France, and a roll out of the use of the system to all of their 17 million customers is planned.
Research carried out at Sussex into the automatic grammatical analysis of English text has enabled and enhanced a range of commercial text-processing applications and services. These include an automatic SMS question-answering service and a computer system that grades essays written by learners of English as a second language. Over the REF period there has been substantial economic impact on a spin-out company, whose viability has been established through revenue of around £500k from licensing, development and maintenance contracts for these applications.
The security of data in printing and network environments is an area of increasing concern to individuals, businesses, government organisations and security agencies throughout the world. Mathematical algorithms developed at the School of Mathematics at Cardiff University represent a significant step-change in existing data security techniques. The algorithms enable greater security in automatic document classification and summarisation, information retrieval and image understanding. Hewlett-Packard (HP), the world's leading PC vendor, funded the research underpinning this development and patented the resulting software, with the aim of strengthening its position as the market leader in this sector of the global information technology industry. Hewlett Packard has incorporated the algorithms in a schedule of upgrades to improve the key security features in over ten million of their electronic devices. Accordingly, the impact claimed is mitigating data security risks for HP users and clients and substantial economic gain for the company.
Based in the School of English, the Research and Development Unit for English Studies (RDUES) conducts research in the field of corpus linguistics and develops innovative software tools to allow a wide range of external audiences to locate, annotate and use electronic data more effectively. This case study details work carried out by the RDUES team (Matt Gee, Andrew Kehoe, Antoinette Renouf) in building large-scale corpora of web texts, from which examples of language use have been extracted, analysed, and presented in a form suitable for teaching and research across and beyond HE, including collaboration with commercial partners.
The Statistical Cybermetrics Research Group (SCRG) has developed social science sentiment analysis methods that estimate the strength of positive and negative sentiment in short informal social web text. These methods are encapsulated in the SentiStrength software, which is sold commercially, used commercially to develop socially useful computing applications (e.g., question answering systems, customer relations management systems), used to engage the public in science-related entertaining events, and used for data journalism to inform the public about specific news events. The research includes the development and evaluation of new sentiment analysis techniques that can detect informal expressions of sentiment in social web texts and that can detect the strength of positive and negative sentiment and not just its polarity. The research also includes the development of commercially viable software that includes the sentiment analysis methods.
The research has economic impact by enhancing the performance of commercial software systems, benefitting the owners of these systems (e.g., Yahoo!, Inbenta, Gemius, New Cities Foundation). The research also has economic impact by enhancing the customer relations of companies using sentiment-enhanced customer relations management systems, and with the traffic congestion detection system helping people to get to work on time. It has wide public services impact by helping people to find answers to their questions (via Yahoo! Answers). It has societal impact by supporting newsworthy analyses of social phenomena for the media. It has enhanced cultural life by driving spectacular lightshows during the London Olympics.
Essex research into the practical deployment of computational grammar theories, tools and techniques led to the expertise of Dr Doug Arnold being sought between 2009 and 2011 by BAE Systems, a leading UK manufacturer of advanced defence and security systems. Arnold advised the company on the design of two prototype natural-language interfaces for responding to emergency situations and sharing sensitive data across organisations. The projects' goals were met and his contribution enabled BAE Systems to develop feasibility-of-concept demonstration systems. His practical expertise in Natural Language Processing provided the company with an appreciation of the limits of particular tools and helped it to avoid undertaking over-ambitious projects.
The Natural Language Toolkit (NLTK) is a widely-adopted Python library for natural language processing. NLTK is run as an open source project. Three project leaders, Steven Bird (Melbourne University), Edward Loper (BBN, Boston) and Ewan Klein (University of Edinburgh) provide the strategic direction of the NLTK project.
NLTK has been widely used in academia, commercial / non-profit organisations and public bodies, including Stanford University and the Educational Testing Service (ETS), which administers widely-recognised tests across more than 180 countries. NLTK has played an important role in making core natural language processing techniques easy to grasp, easy to integrate with other software tools, and easy to deploy.
Worldwide impact on language learners and others has been generated by the development at Lancaster of a ground-breaking natural language processing tool (CLAWS4), and an associated unique collection of natural language data (the British National Corpus, or BNC). Some highlights selected from the primary impacts are as follows:
The pathways to impact have been primarily via consultancy and via licencing of software IP. The impact itself is largely on the language learners—i.e. users of products such as the above. There is a secondary economic impact on a UK SME which has licenced our software.