Log in
COnnecting REpositories (CORE) is a system for aggregating, harvesting and semantically enriching documents. As at July 2013, CORE contains 15m+ open access research papers from worldwide repositories and journals, on any topic and in more than 40 languages. In July 2013, CORE recorded 500k+ visits from 90k+ unique visitors. By processing both full-text and metadata, CORE serves four communities: researchers searching research materials; repository managers needing analytical information about their repositories; funders wanting to evaluate the impact of funded projects; and developers of new knowledge-mining technologies. The CORE semantic recommender has been integrated with digital libraries and repositories of cultural institutions, including the European Library and UNESCO. CORE has been selected to be the metadata aggregator of the UK's national open access services.
Research carried out at Sussex into the automatic grammatical analysis of English text has enabled and enhanced a range of commercial text-processing applications and services. These include an automatic SMS question-answering service and a computer system that grades essays written by learners of English as a second language. Over the REF period there has been substantial economic impact on a spin-out company, whose viability has been established through revenue of around £500k from licensing, development and maintenance contracts for these applications.
Extracting information and meaning from natural language text is central to a wide variety of computer applications, ranging from social media opinion mining to the processing of patient health-care records. Sentic Computing, pioneered at the University of Stirling, underpins a unique set of related tools for incorporating emotion and sentiment analysis in natural language processing. These tools are being employed in commercial products, with performance improvements of up to 20% being reported in accuracy of textual analysis, matching or even exceeding human performance (Zoral Labs). Current applications include social media monitoring as part of a web content management system (Sitekit Solutions Ltd), personal photo management systems (HP Labs India) and patient opinion mining (Patient Opinion Ltd). Impact has also been achieved through direct collaboration with other commercial partners such as Microsoft Research Asia, TrustPilot and Abies Ltd. Moreover, international organisations such as the Brain Sciences Foundation and the A*Star Institute for High Performance Computing have realised major impact by drawing upon our research.
Based in the School of English, the Research and Development Unit for English Studies (RDUES) conducts research in the field of corpus linguistics and develops innovative software tools to allow a wide range of external audiences to locate, annotate and use electronic data more effectively. This case study details work carried out by the RDUES team (Matt Gee, Andrew Kehoe, Antoinette Renouf) in building large-scale corpora of web texts, from which examples of language use have been extracted, analysed, and presented in a form suitable for teaching and research across and beyond HE, including collaboration with commercial partners.
Data-to-text utilises Natural Language Generation (NLG) technology that allows computer systems to generate narrative summaries of complex data sets. These can be used by experts, professional and managers to better, and quickly, understand the information contained within large and complex data sets. The technology has been developed since 2000 by Prof Reiter and Dr Sripada at the University of Aberdeen, supported by several EPSRC grants. The Impact from the research has two dimensions.
As economic impact, a spinout company, Data2Text (www.data2text.com), was created in late 2009 to commercialise the research. As of May 2013, Data2Text had 14 employees. Much of Data2Text's work is collaborative with another UK company, Arria NLG (www.arria.com), which as of May 2013 had about 25 employees, most of whom were involved in collaborative projects with Data2Text.
As impact on practitioners and professional services, case studies have been developed in the oil & gas sector, in weather forecasting, and in healthcare, where NLG provides tools to rapidly develop narrative reports to facilitate planning and decision making, introducing benefits in terms of improved access to information and resultant cost and/or time savings. In addition the research led to the creation of simplenlg (http://simplenlg.googlecode.com/), an open-source software package which performs some basic natural language generation tasks. The simplenlg package is used by several companies, including Agfa, Nuance and Siemens as well as Data2Text and Arria NLG.
State-of-the-art reasoning systems developed in the UoA have underpinned the standardisation of ontology languages, and play a critical role in numerous applications. For example, HermiT, software developed in the UoA, is being used by Électricité de France (EDF) to provide bespoke energy saving advice to 265,000 customers in France, and a roll out of the use of the system to all of their 17 million customers is planned.
Within this case study we present the TrOWL technology developed at the University of Aberdeen that enables more efficient and scalable exploitation of semantic data. TrOWL and its component algorithms — REL, Quill and the Aberdeen Profile Checker — have had non-academic impact in two key areas. With respect to practitioners and professional services, the technology has enabled the introduction of two important World Wide Web Consortium (W3C) standards: OWL2 and SPARQL 1.1. This has led to impact in the way that many companies work, across a range of sectors. Further, through partnership with specific companies, the use of TrOWL has changed the way they operate and the technical solutions they provide to clients. These collaborations have led to economic impacts in companies such as Oracle in "mitigat[ing] the losses of potential customers", and IBM in "using the TrOWL reasoning infrastructure in [their] Smarter Cities solutions".
The impact of this work stems from the provision of better quality information models, and is manifest via: (a) reduced cost through improved reuse and less rework; (b) improved system interoperability; and (c) enhanced assurance and checking that information requirements are supported by the resultant systems. The approach has been applied in commercial environments, such as Shell (UK), where it has reduced development costs by up to 50% ($1m in one case). It has also been applied in the defence environment, forming a part of underpinning standards currently being implemented by the UK and Swedish Armed Forces.
The security of data in printing and network environments is an area of increasing concern to individuals, businesses, government organisations and security agencies throughout the world. Mathematical algorithms developed at the School of Mathematics at Cardiff University represent a significant step-change in existing data security techniques. The algorithms enable greater security in automatic document classification and summarisation, information retrieval and image understanding. Hewlett-Packard (HP), the world's leading PC vendor, funded the research underpinning this development and patented the resulting software, with the aim of strengthening its position as the market leader in this sector of the global information technology industry. Hewlett Packard has incorporated the algorithms in a schedule of upgrades to improve the key security features in over ten million of their electronic devices. Accordingly, the impact claimed is mitigating data security risks for HP users and clients and substantial economic gain for the company.
The Natural Language Toolkit (NLTK) is a widely-adopted Python library for natural language processing. NLTK is run as an open source project. Three project leaders, Steven Bird (Melbourne University), Edward Loper (BBN, Boston) and Ewan Klein (University of Edinburgh) provide the strategic direction of the NLTK project.
NLTK has been widely used in academia, commercial / non-profit organisations and public bodies, including Stanford University and the Educational Testing Service (ETS), which administers widely-recognised tests across more than 180 countries. NLTK has played an important role in making core natural language processing techniques easy to grasp, easy to integrate with other software tools, and easy to deploy.