Providing Accessibility to Scientific Documents to Visually Impaired Readers via Mathematical Formula Recognition

Submitting Institution

University of Birmingham

Unit of Assessment

Computer Science and Informatics

Summary Impact Type


Research Subject Area(s)

Information and Computing Sciences: Artificial Intelligence and Image Processing, Information Systems
Psychology and Cognitive Sciences: Psychology

Download original


Summary of the impact

Our research is concerned with enabling access to mathematical literature to users with visual impairments (i.e., blind or partially sighted users) or print impairments (i.e., users with specific learning disabilities like dyslexia or dysgraphia). Therefore the impact is primarily of a societal nature: we enable visual and print impaired learners' access to scientific and mathematical knowledge from which they were previously excluded, thereby furthering an inclusive teaching and learning environment. With the number of people with learning disabilities being over 1 million ( and the number of visually impaired people predicted to rise over 2 million by 2020 in the UK alone (, the work is significant in providing equal opportunities to learners in the STEM subjects.

Our research has led to work with Google Inc. to enhance mathematics accessibility on the Web via the screen-reader ChromeVox. It enables the full text to speech translation of mathematics on the Web for all users of the Chrome browser and Android platforms and has been included in ChromeVox since version 27, released 9/5/2013, which has 31,518 downloads from the Chrome store as of 11/10/2013.

Our work has also resulted in an assistive technology tool, called MaxTract, for providing access to teaching and learning material. It has been deployed within digital mathematics libraries to enhance accessibility to online material. Through direct feedback we are aware of a number of visually impaired users that have actively used our tool.

Underpinning research

The research for this case study is in scientific document analysis, especially in mathematical formula recognition. Its aim is to find and analyse mathematical expressions in documents in order to transform them into a format that can be machine analysed and processed. While the recognition of regular text, e.g. via optical character recognition from scanned documents, is fairly routine, the recognition of formulae is a highly non-trivial task due to the complex two dimensional layout of mathematical expressions, the importance of special attributes, like accents and fonts, and the lack of dictionaries usable for correction of recognition results.

We have developed a linear grammar that processes streams of characters with associated spatial information — in the form of character bounding boxes and their absolute positioning on a page — and parses it into a two dimensional parse tree representation reflecting the two dimensional structure of formulas. The grammar employs constraints on the relative positioning and size of neighbouring characters and sets of characters in order to construct two dimensional formula layout. This makes the recognition process entirely independent of any information on the actual characters and purely a heuristic function of spatial character distribution. The resulting parse trees are an effective data structure for translation into diverse mathematical markup formats, such as LaTeX and (presentation) MathML, but also allow for a description in natural language [1].

As a further refinement we have improved the interpretation of the parse tree into two dimensional formulas by introducing some semantic interpretation of its components. Specifically we are taking information on separation distance of character sequences and character fonts into account when constructing formulas and clustering component subformulas [2].

Our work has led to the development of a novel tool, MaxTract, which enables the extraction of content from PDF documents. Unlike comparable tools that perform PDF re-engineering, MaxTract does not treat mathematical content as noise and omit it. Instead, mathematical expressions are analysed using our grammatical approach and translated into processable formats using modular output drivers, which allows easy customisation of the system for different application domains. In particular, MaxTract can generate standard mathematical markup formats (LaTeX, presentation MathML) and as description in natural language, which can be given directly to speech engines. In comparison with a state-of-the-art approach to analysing mathematical PDF documents [3], MaxTract has been shown to be the most reliable tool to re-engineer PDF documents.

The work on formula recognition and accessibility led to an invitation to Sorge by TV Raman, the head of the accessibility engineering group at Google, to apply for a visiting faculty project with Google for a sabbatical in the 2012/13 academic year. This project was accepted and Sorge worked with the ChromeVox team at Google to provide text-to-speech translation of mathematics in a fully-fledged web-based screen-reader. In his year at Google Sorge developed a rule based approach to speaking mathematical expressions on the web that are either given as MathML markup or rendered via MathJax. He implemented a speech rule engine that is now embedded as a core feature of ChromeVox, exposing a public API for easy customisation of speech rules by authors of websites. In addition his framework enables the speech translation of image-based maths expressions with alternative text markup, making maths on websites like Wikipedia or Mathworld fully accessible.

Furthermore, Sorge developed and implemented an approach to allow visually impaired readers to interactively engage with mathematical content by systematically exploring subexpressions, as well as a novel transformation of presentation MathML markup into semantic interpretation, which was continuation of work in [2,4] that is geared towards a more natural reading experience for K-12 and undergraduate mathematics. These were included in the ChromeVox version 30 and 31 releases.

The Birmingham research team led by Sorge (Senior Lecturer) includes Josef Baker (2006-13), PhD student and then RA, and Alan Sexton (2006-12), Lecturer.

References to the research

The key publication for our research is [1], which established the grammatical approach leading to a flexible representation of mathematical expressions that was central to the rest of the research. [2] and [4] contain the initial steps towards semantic analysis of formulas and layout reconstruction, which were significantly expanded by Sorge's work at Google and which were released with ChromeVox version 31.


[1] J. Baker, A. Sexton, and V. Sorge. A linear grammar approach to mathematical formula recognition from PDF. In J. Carette, et al (eds), Intelligent Computer Mathematics — Joint Proceedings of Calculemus 2009 and MKM 2009, volume 5625 of LNAI, Springer Verlag, Berlin, Germany. (MKM Best Paper Award) doi: 10.1007/978-3-642-02614-0_19


[2] J. Baker, A. Sexton, and V. Sorge. Faithful mathematical formula recognition from pdf documents. In 9th IAPR International Workshop on Document Analysis Systems, pages 485-492, Boston, USA, June 9-11 2010. ACM Press. doi: 10.1145/1815330.1815393


[3] J. Baker, A. Sexton, V. Sorge, and Masakazu Suzuki. Comparing approaches to mathematical document analysis from PDF. In International Conference in Document Analysis and Recognition, pages 463-467, Beijing, China, 2011. IEEE Computer Society Press, Los Alamitos, CA, USA. doi:


[4] J. B. Baker, A. P. Sexton, and V. Sorge. Towards reverse engineering of PDF documents. In P. Sojka and T. Bouche, editors, Towards a Digital Mathematics Library, DML 2011, pages 65-75, Bertinoro, Italy, July 2011. Masaryk University Press. ISBN 978-80-210-5542-1.available from:


[5] European digital mathematics library. EU CIP-ICT Grant, 1st February 2009 - 31st January 2013. Euros 373,160. PI: Volker Sorge, CIs: Alan Sexton, Mark Lee, RA: Josef Baker

[6] Improving accessibility to mathematical teaching resources. JISC OER Rapid Innovation Grant, 1st April - 30th September 2012. £22,460. PI: Volker Sorge, RA: Josef Baker

Details of the impact

The impact of our research is of a societal nature, by enabling visually and print impaired users' access to scientific and mathematical literature from which they were previously excluded. Making scientific and teaching material accessible for visually impaired students is essential in an inclusive teaching and learning environment. However, it is still very difficult and expensive to make mathematical documents accessible, making this a major obstacle for visually impaired learners wanting to pursue subjects such as Mathematics, Physics, or Computer Science in both further and higher education. As a consequence, our research provides a novel means to further inclusion of the visually impaired in the mathematical sciences.

Text to Speech for Mathematics, Google, Inc., Mountain View, USA. From September 2012 to September 2013 Sorge spent a sabbatical year working with the accessibility engineering group at Google, Inc. to enable full text to speech translation of mathematics with the ChromeVox screen- reader. The initial release of the work has been launched at Google IO in May 2013 [7] and a further release in September 2013 provides enhanced semantic interpretation and an API for customisation of the rule engine [8]. ChromeVox is open source [9] and a web-based screen-reader for Chrome on all platforms. As of 11/10/2013 it has been downloaded 31,518 times from the ChromeStore [10]. ChromeVox is also the screen reading solution for web content on Android devices via WebViews. While there have been over 1 billion Android activations, the exact number of users of accessibility services on these devices is confidential.

Furthermore, the WebViews API is also used by ePub3 readers to make content accessible. An example is the Ideal Group Reader [11] which concentrates on STEM material that has been installed in beta around 35,000 times as of 9th July 2013.

From PDF to Accessible Mathematics via MaxTract. MaxTract is available as a free web service at It allows users to upload PDF documents and get the document in an accessible format back. We have also worked directly with users via email support and at specialist workshops on e-inclusion for sciences and mathematics. To obtain feedback to both enhance the system and assess its impact we have conducted a case study with a number of blind students and researchers at Johannes Kepler University, Linz, Austria and the Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany, where two of the foremost European Centres for Integrated Studies are located. The following are two quotes from questionnaire replies:

"Before MaxTract, if I wanted to access a PDF document with mathematical contents I needed to hire a person who scanned the document and wrote the mathematical formulae into it by hand. This is now eliminated — I have digital access, although not yet optimal one, which eliminates the expense of hiring a person to do the work mentioned."

"It is faster. As I already said, without MaxTract other people are involved for me to receive accessible material. [...] Using MaxTract does convert when I submit a document within a few minutes.

It is cheaper — not for me in person but for the university. [...] Using MaxTract is free. And it is easier. I submit a document and after a few minutes I get a converted one back. No-one has to be asked, no checking for errors, no waiting." [sources 12-16]

In addition, we are deploying the results of our research in knowledge transfer projects, which demonstrate the impact of our work outside the immediate scientific community.

European Digital Mathematics Library [5] This is a project under the CIP-ICT programme (a programme which is explicit about being for knowledge transfer only, not allowing for new research in a project) with the aim of assembling the European Digital Mathematics Library (EuDML) that offers working mathematicians a central portal to access a large collection of mathematical literature as well as provides functionality that is specific to mathematics. We were a technology provider in the project, integrating MaxTract into the EuDML workflow for enhancement of over 5000 documents to enable advanced search features on mathematical formulas to support research mathematicians and engineers and provide accessibility for some of the content for visually impaired users [4].

Improving Accessibility to Mathematical Teaching Resources [6] was a JISC OER Rapid Innovation grant, classified as teaching only, with the aim to develop MaxTract from the research prototype into an assistive technology tool that meets the identifiable needs of accessibility support practitioners for providing access to mathematical teaching material in higher and further education. The resulting assistive technology tool has been successfully used in the OERPUB project (

Sources to corroborate the impact

[7] Google I/O 2013 - Advancing Web Accessibility with ChromeVox, showing Sorge explaining ChromeVox along with Google engineers Charles Chen and David Tseng. Accessed on 11/10/2013.

[8] Google Developers Live - Spoken Mathematics on the Web, showing Sorge explaining the ChromeVox API for Mathematics along with Google engineer David Tseng. Accessed on 11/10/2013.

[9] Google open source repository of ChromeVox exhibits Sorge's authorship behind the mathematics part. Accessed on 11/10/2013. E.g.,


[11] IDEAL Group Reader® Beta II, (Accessed on 31/10/2013)

[12]. [text removed for publication] Answers to Questionnaire, 9 March 2013.

[13]. [text removed for publication] Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany. Answers to Questionnaire, 26 March 2013.

[14]. [text removed for publication] Vienna University of Technology, Vienna, Austria. Answers to Questionnaire, 15 March 2013.

[15]. [text removed for publication] Johannes Kepler University, Linz, Austria. Answers to Questionnaire, 29 March 2013.

[16]. [text removed for publication] Johannes Kepler University, Linz, Austria. Answers to Questionnaire, 18 March 2013.