### Providing Accessibility to Scientific Documents to Visually Impaired Readers via Mathematical Formula Recognition

**Submitting Institution**

University of Birmingham**Unit of Assessment**

Computer Science and Informatics**Summary Impact Type**

Societal**Research Subject Area(s)**

Information and Computing Sciences: Artificial Intelligence and Image Processing, Information Systems

Psychology and Cognitive Sciences: Psychology

**Download original**

PDF**Summary of the impact**

Our research is concerned with enabling access to mathematical literature to users with visual impairments (i.e., blind or partially sighted users) or print impairments (i.e., users with specific learning disabilities like dyslexia or dysgraphia). Therefore the impact is primarily of a societal nature: we enable visual and print impaired learners' access to scientific and mathematical knowledge from which they were previously excluded, thereby furthering an inclusive teaching and learning environment. With the number of people with learning disabilities being over 1 million (http://www.learningdisabilities.org.uk/help-information/Learning-Disability-Statistics-/) and the number of visually impaired people predicted to rise over 2 million by 2020 in the UK alone (http://www.rnib.org.uk/aboutus/Research/statistics/Pages/statistics.aspx), the work is significant in providing equal opportunities to learners in the STEM subjects.

Our research has led to work with Google Inc. to enhance mathematics
accessibility on the Web via the screen-reader ChromeVox. It enables the
full text to speech translation of mathematics on the Web **for all
users of the Chrome browser and Android platforms** and has been
included in ChromeVox since version 27, released 9/5/2013, which has
31,518 downloads from the Chrome store as of 11/10/2013.

Our work has also resulted in an assistive technology tool, called MaxTract, for providing access to teaching and learning material. It has been deployed within digital mathematics libraries to enhance accessibility to online material. Through direct feedback we are aware of a number of visually impaired users that have actively used our tool.

**Underpinning research**

The research for this case study is in scientific document analysis, especially in mathematical formula recognition. Its aim is to find and analyse mathematical expressions in documents in order to transform them into a format that can be machine analysed and processed. While the recognition of regular text, e.g. via optical character recognition from scanned documents, is fairly routine, the recognition of formulae is a highly non-trivial task due to the complex two dimensional layout of mathematical expressions, the importance of special attributes, like accents and fonts, and the lack of dictionaries usable for correction of recognition results.

We have developed a linear grammar that processes streams of characters with associated spatial information — in the form of character bounding boxes and their absolute positioning on a page — and parses it into a two dimensional parse tree representation reflecting the two dimensional structure of formulas. The grammar employs constraints on the relative positioning and size of neighbouring characters and sets of characters in order to construct two dimensional formula layout. This makes the recognition process entirely independent of any information on the actual characters and purely a heuristic function of spatial character distribution. The resulting parse trees are an effective data structure for translation into diverse mathematical markup formats, such as LaTeX and (presentation) MathML, but also allow for a description in natural language [1].

As a further refinement we have improved the interpretation of the parse tree into two dimensional formulas by introducing some semantic interpretation of its components. Specifically we are taking information on separation distance of character sequences and character fonts into account when constructing formulas and clustering component subformulas [2].

Our work has led to the development of a novel tool, MaxTract, which enables the extraction of content from PDF documents. Unlike comparable tools that perform PDF re-engineering, MaxTract does not treat mathematical content as noise and omit it. Instead, mathematical expressions are analysed using our grammatical approach and translated into processable formats using modular output drivers, which allows easy customisation of the system for different application domains. In particular, MaxTract can generate standard mathematical markup formats (LaTeX, presentation MathML) and as description in natural language, which can be given directly to speech engines. In comparison with a state-of-the-art approach to analysing mathematical PDF documents [3], MaxTract has been shown to be the most reliable tool to re-engineer PDF documents.

The work on formula recognition and accessibility led to an invitation to Sorge by TV Raman, the head of the accessibility engineering group at Google, to apply for a visiting faculty project with Google for a sabbatical in the 2012/13 academic year. This project was accepted and Sorge worked with the ChromeVox team at Google to provide text-to-speech translation of mathematics in a fully-fledged web-based screen-reader. In his year at Google Sorge developed a rule based approach to speaking mathematical expressions on the web that are either given as MathML markup or rendered via MathJax. He implemented a speech rule engine that is now embedded as a core feature of ChromeVox, exposing a public API for easy customisation of speech rules by authors of websites. In addition his framework enables the speech translation of image-based maths expressions with alternative text markup, making maths on websites like Wikipedia or Mathworld fully accessible.

Furthermore, Sorge developed and implemented an approach to allow visually impaired readers to interactively engage with mathematical content by systematically exploring subexpressions, as well as a novel transformation of presentation MathML markup into semantic interpretation, which was continuation of work in [2,4] that is geared towards a more natural reading experience for K-12 and undergraduate mathematics. These were included in the ChromeVox version 30 and 31 releases.

The Birmingham research team led by Sorge (Senior Lecturer) includes Josef Baker (2006-13), PhD student and then RA, and Alan Sexton (2006-12), Lecturer.

**References to the research**

The key publication for our research is [1], which established the grammatical approach leading to a flexible representation of mathematical expressions that was central to the rest of the research. [2] and [4] contain the initial steps towards semantic analysis of formulas and layout reconstruction, which were significantly expanded by Sorge's work at Google and which were released with ChromeVox version 31.

**Publications**

[1] J. Baker, A. Sexton, and V. Sorge. A linear grammar approach to mathematical formula recognition from PDF. In J. Carette, et al (eds), Intelligent Computer Mathematics — Joint Proceedings of Calculemus 2009 and MKM 2009, volume 5625 of LNAI, Springer Verlag, Berlin, Germany. (MKM Best Paper Award) doi: 10.1007/978-3-642-02614-0_19

[2] J. Baker, A. Sexton, and V. Sorge. Faithful mathematical formula recognition from pdf documents. In 9th IAPR International Workshop on Document Analysis Systems, pages 485-492, Boston, USA, June 9-11 2010. ACM Press. doi: 10.1145/1815330.1815393

[3] J. Baker, A. Sexton, V. Sorge, and Masakazu Suzuki. Comparing approaches to mathematical document analysis from PDF. In International Conference in Document Analysis and Recognition, pages 463-467, Beijing, China, 2011. IEEE Computer Society Press, Los Alamitos, CA, USA. doi: http://dx.doi.org/10.1109/ICDAR.2011.99

[4] J. B. Baker, A. P. Sexton, and V. Sorge. Towards reverse engineering of PDF documents. In P. Sojka and T. Bouche, editors, Towards a Digital Mathematics Library, DML 2011, pages 65-75, Bertinoro, Italy, July 2011. Masaryk University Press. ISBN 978-80-210-5542-1.available from: http://www.cs.bham.ac.uk/~jbb/documents/dml11.pdf

**Projects**

[5] European digital mathematics library. EU CIP-ICT Grant, 1^{st}
February 2009 - 31^{st} January 2013. Euros 373,160. PI: Volker
Sorge, CIs: Alan Sexton, Mark Lee, RA: Josef Baker

[6] Improving accessibility to mathematical teaching resources. JISC OER
Rapid Innovation Grant, 1^{st} April - 30^{th} September
2012. £22,460. PI: Volker Sorge, RA: Josef Baker

**Details of the impact**

The impact of our research is of a societal nature, by enabling visually and print impaired users' access to scientific and mathematical literature from which they were previously excluded. Making scientific and teaching material accessible for visually impaired students is essential in an inclusive teaching and learning environment. However, it is still very difficult and expensive to make mathematical documents accessible, making this a major obstacle for visually impaired learners wanting to pursue subjects such as Mathematics, Physics, or Computer Science in both further and higher education. As a consequence, our research provides a novel means to further inclusion of the visually impaired in the mathematical sciences.

**Text to Speech for Mathematics, Google, Inc., Mountain View, USA.**
From September 2012 to September 2013 Sorge spent a sabbatical year
working with the accessibility engineering group at Google, Inc. to enable
full text to speech translation of mathematics with the ChromeVox screen-
reader. The initial release of the work has been launched at Google IO in
May 2013 [7] and a further release in September 2013 provides enhanced
semantic interpretation and an API for customisation of the rule engine
[8]. ChromeVox is open source [9] and a web-based screen-reader for
Chrome on all platforms. As of 11/10/2013 it has been downloaded 31,518
times from the ChromeStore [10]. ChromeVox is also the screen reading
solution for web content on Android devices via WebViews. While there have
been over 1 billion Android activations, the exact number of users of
accessibility services on these devices is confidential.

Furthermore, the WebViews API is also used by ePub3 readers to make
content accessible. An example is the Ideal Group Reader [11] which
concentrates on STEM material that has been installed in beta around
35,000 times as of 9^{th} July 2013.

**From PDF to Accessible Mathematics via MaxTract.** MaxTract is
available as a free web service at http://www.cs.bham.ac.uk/research/groupings/reasoning/sdag/submit.php.
It allows users to upload PDF documents and get the document in an
accessible format back. We have also worked directly with users via email
support and at specialist workshops on e-inclusion for sciences and
mathematics. To obtain feedback to both enhance the system and assess its
impact we have conducted a case study with a number of blind students and
researchers at Johannes Kepler University, Linz, Austria and the Karlsruhe
Institute of Technology (KIT), Karlsruhe, Germany, where two of the
foremost European Centres for Integrated Studies are located. The
following are two quotes from questionnaire replies:

"*Before MaxTract, if I wanted to access a PDF document with
mathematical contents I needed to hire a person who scanned the document
and wrote the mathematical formulae into it by hand. This is now
eliminated — I have digital access, although not yet optimal one, which
eliminates the expense of hiring a person to do the work mentioned.*"

"*It is faster. As I already said, without MaxTract other people are
involved for me to receive accessible material. [...] Using MaxTract
does convert when I submit a document within a few minutes.*

*It is cheaper — not for me in person but for the university. [...]
Using MaxTract is free. And it is* *easier. I submit a document
and after a few minutes I get a converted one back. No-one has to be
asked, no checking for errors, no waiting.*" [sources 12-16]

In addition, we are deploying the results of our research in knowledge transfer projects, which demonstrate the impact of our work outside the immediate scientific community.

**European Digital Mathematics Library [5]** This is a project under
the CIP-ICT programme (a programme which is explicit about being for
knowledge transfer only, not allowing for new research in a project) with
the aim of assembling the European Digital Mathematics Library (EuDML)
that offers working mathematicians a central portal to access a large
collection of mathematical literature as well as provides functionality
that is specific to mathematics. We were a technology provider in the
project, integrating MaxTract into the EuDML workflow for enhancement of
over 5000 documents to enable advanced search features on mathematical
formulas to support research mathematicians and engineers and provide
accessibility for some of the content for visually impaired users [4].

**Improving Accessibility to Mathematical Teaching Resources [6]** was
a JISC OER Rapid Innovation grant, classified as teaching only, with the
aim to develop MaxTract from the research prototype into an assistive
technology tool that meets the identifiable needs of accessibility support
practitioners for providing access to mathematical teaching material in
higher and further education. The resulting assistive technology tool has
been successfully used in the OERPUB project (http://kefletcher.blogspot.com/).

**Sources to corroborate the impact **

[7] Google I/O 2013 - Advancing Web Accessibility with ChromeVox, showing Sorge explaining ChromeVox along with Google engineers Charles Chen and David Tseng. Accessed on 11/10/2013. http://www.youtube.com/watch?v=YyWu9HB9QtU

[8] Google Developers Live - Spoken Mathematics on the Web, showing Sorge explaining the ChromeVox API for Mathematics along with Google engineer David Tseng. Accessed on 11/10/2013. https://developers.google.com/live/shows/5881057312243712.

[9] Google open source repository of ChromeVox exhibits Sorge's authorship behind the mathematics part. Accessed on 11/10/2013. E.g., https://code.google.com/p/google-axs-chrome/source/browse/trunk/chromevox/speech_rules/speech_rule_engine.js

[10] https://chrome.google.com/webstore/detail/chromevox/kgejglhpjiefppelpmljglcjbhoiplfn

[11] IDEAL Group Reader® Beta II, https://play.google.com/store/apps/details?id=org.easyaccess.epubreader (Accessed on 31/10/2013)

[12]. [text removed for publication] Answers to Questionnaire, 9 March 2013.

[13]. [text removed for publication] Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany. Answers to Questionnaire, 26 March 2013.

[14]. [text removed for publication] Vienna University of Technology, Vienna, Austria. Answers to Questionnaire, 15 March 2013.

[15]. [text removed for publication] Johannes Kepler University, Linz, Austria. Answers to Questionnaire, 29 March 2013.

[16]. [text removed for publication] Johannes Kepler University, Linz, Austria. Answers to Questionnaire, 18 March 2013.