Milepost GCC and compiler research at Edinburgh
Submitting Institution
University of EdinburghUnit of Assessment
Computer Science and InformaticsSummary Impact Type
TechnologicalResearch Subject Area(s)
Information and Computing Sciences: Computation Theory and Mathematics, Computer Software
Technology: Computer Hardware
Summary of the impact
Compiler research at Edinburgh over the last decade has had significant
industrial and commercial impact. Early work on pointer conversion is now
available in Intel's commercial compilers. Later ground-breaking work on
machine-learning based compilation led to the release of MilePost GCC, an
enhanced version of the world's widest-used open source compiler supported
by IBM. More recent work on parallelism discovery and machine-learning
mapping has led to a new ARM Centre of Excellence at Edinburgh.
Underpinning research
University of Edinburgh researchers involved in this case study are
listed below.
Professor Michael O’Boyle, 1997–date |
Professor Christopher Williams, 1998–date |
Professor Nigel Topham, 2003–date |
Björn Franke, Reader, 2003–date |
Christophe Dubach, PhD Edinburgh 2009,
Lecturer, RAEng Research Fellow and Intel
Honor Fellow |
Hugh Leather, PhD Edinburgh 2010,
Lecturer, RAEng Research Fellow |
Timothy M. Jones, PhD UoE 2006, RAEng /
EPSRC Research Fellow. Left UoE 2011. |
Grigori Fursin, PhD Edinburgh 2004.
Research Assistant. Left UoE 2005. |
2.1. Pointer conversion
Embedded systems account for the vast majority of shipped processors and
require high performance and energy efficiency at low cost. Until recently
the compiler technology for such systems was poor. This was partly due to
unconventional processor architectures and the pointer-based structure of
the programs. Franke and O'Boyle (2001) developed the first pointer
conversion scheme that automatically recovers linear array accesses in
digital signal processing applications. This opened up the possibility of
applying the large body of literature in high-level transformations to DSP
programs for the first time to dramatic effect. Embedded systems are now
parallel and multi-core in nature. However, the complex and non-standard
memory model of such systems means that they are extremely difficult to
program. Franke and O'Boyle (2003) developed the first ever
auto-parallelisation approach for multiple address space DSPs. This
required the combination of pointer-recovery and a new rank-modifying
transformation framework to reconcile location of memory addresses and
enable communication optimisation.
2.2. Iterative compilation and auto-tuning via machine learning
Traditional approaches to optimisation rely on static models of
program/processor interaction. O'Boyle (1998) was the first to show that
such an approach poorly models the interaction and is fundamentally
flawed. This led to work in iterative compilation that formulated the
transformations available as a formal optimisation space and applied
search-based techniques. This work has been widely used and shown to
outperform all existing techniques. Iterative compilation and auto-tuning
are now standard topics in compiler- and performance-based conferences.
Our research work has incorporated machine-learning techniques directly
into the search, modelling transformation spaces as Markov processes,
which can then be learnt [1]. This has been used to speed up the
performance of iterative compilation by an order of magnitude and
dramatically improve the performance of Just-In-Time (JIT) compilation.
This research has led to the development of compilers that can self-adapt
and learn about the optimisation space automatically, outperforming the
best hand-tuned compiler-writer heuristics.
2.3. Applying machine learning to compilers and architectures
The machine-learning-based approach has extended beyond compiler
optimisation to consider the compiler/architecture design space. Dubach
and O'Boyle [3] developed modelling approaches that could simulate and
predict the performance of any architecture configuration. This approach
was then extended [4] to predict the performance of an optimising compiler
on any architecture and finally to automatically generate an optimising
compiler for any architecture. In addition, we have developed techniques
that dynamically adjust hardware to the predicted best on-line
configuration allowing hardware to adapt to workloads, reducing energy
consumption [5].
2.4. Innovations in auto-parallelisation
Since 2009, Franke and O'Boyle have developed a unique approach to
auto-parallelisation. First they developed a machine-learning-based
approach to mapping different forms of parallelism to varying
architectures outperforming all existing techniques. In 2009, they
developed an innovative approach to determining the best mapping of
parallelism with profile-directed discovery of parallelism [2]. This has
then been extended to the heterogeneous multi-core space. Franke and
Topham's research on parallel JIT compilation [6] has contributed to the
scientific and commercial success of the ArcSim dynamic binary translator.
Parallel JIT compilation is a novel concept to hide JIT compilation
latency and to increase compiler throughput on standard multi-core host
machines. This results in unprecedented simulation speeds of single-core
and multi-core simulators beyond those of actual speed-optimised silicon
implementations of the system under simulation.
References to the research
3.1. Publications
1. Using Machine Learning to Focus Iterative Optimization. F.
Agakov, E.V. Bonilla, J. Cavazos, B. Franke, G. Fursin, M.F.P. O'Boyle, J.
Thomson, M. Toussaint, and C.K.I. Williams, Proceedings of the
International Symposium on Code Generation and Optimization (CGO '06),
pages 295-305, March 2006. (doi: 10.1109/CGO.2006.37)
2. Towards a Holistic Approach to Auto-Parallelization: Integrating
Profile-Driven Parallelism Detection and Machine-Learning Based Mapping.
Z. Wang, B. Franke and M. O'Boyle, Proceedings of the ACM SIGPLAN 2009
Conference on Programming Language Design and Implementation (PLDI '09),
June 2009. Pages 177-187. (doi: 10.1145/1542476.1542496)
3. Portable Compiler Optimization Across Embedded Programs and
Microarchitectures using Machine Learning. C. Dubach, T.M. Jones,
E.V. Bonilla, G. Fursin and M.F.P. O'Boyle, 42nd IEEE/ACM International
Symposium on Microarchitecture (MICRO '09), December 2009. Pages 78-88.
(doi: 10.1145/1669112.1669124)
4. Partitioning Streaming Parallelism for Multi-cores: A Machine
Learning Based Approach. Z. Wang and M. O'Boyle, In 19th
International Conference on Parallel Architectures and Compilation
Techniques (PACT '10), September 2010. Pages 307-318. (doi: 10.1145/1854273.1854313)
5. Predictive Model for Dynamic Microarchitectural Adaptivity Control.
C. Dubach, T.M. Jones, E.V. Bonilla, and M.F.P. O'Boyle, In 43rd IEEE/ACM
International Symposium on Microarchitecture (MICRO '10), December 2010.
Pages 485-496. (doi: 10.1109/MICRO.2010.14)
6. Generalized Just-In-Time Trace Compilation using a Parallel Task
Farm in a Dynamic Binary Translator. Igor Bøhm, T.J.K. Edler von
Koch, S. Kyle, B. Franke, and N. Topham, Proceedings of the 32nd ACM
SIGPLAN conference on Programming Language Design and Implementation (PLDI
'11), June 2011, San Jose, California, USA. Pages 74-85. (doi: 10.1145/1993498.1993508)
References [1], [2] and [3] above are most indicative of the quality of
the underpinning research.
3.2. Research grants and funding
• EP/G000691 Machine Learning for Thread Level Speculation on Multicore
architectures £350,652
• EP/I013539 Dynamic Adaptation in Heterogeneous Multicore Embedded
Processors £1,217,557
• EP/H051988 A predictive modelling based approach to portable parallel
compilation for heterogeneous multi-cores £494,120
• EP/K008730 PAMELA: A Panoramic Approach to the Many-Core Landscape
£4,135,048 (3 partners)
• EU HiPEAC 2 Network of Excellence FP7 c £400,000 2008-2012
• EU HiPEAC 3 Network of Excellence FP7 c £400,000 2012-2016
• EU TETRACOM — technology transfer project c. £100,000
3.3. Awards and fellowships
• Tim Jones, Christophe Dubach, Hugh Leather, Christian Fensch — Royal
Academy of Engineering Five-year Research Fellowships
• Christophe Dubach CPHC/BCS Distinguished Dissertation award 2009
Details of the impact
4.1. Impact of pointer conversion
Pointer conversion is now available in Intel's commercial icc
compiler. This was added in 2005, and continues to be used in versions
11.0, 11.1, 12.0, 12.1, and 13.0 of this compiler, released in 2008, 2009,
2010, 2011, and 2012. Intel dominates the desktop and high-end processor
market. Research undertaken at Edinburgh is now used to improve code
performance on Intel platforms across the world. This is a wide impact
since the vast majority of desktop machines are Intel-based: according to
http://www.cpubenchmark.net/market_share.html,
estimates of market share since 2008 show that Intel has between 70% and
73% of the x86 processor market with ARM providing almost all the rest. A
smaller scale company CAPS-Enterprise (approximately 20 people) are also
known to have implemented this technique in their software tool chain,
which is used by Intel in their library development.
4.2. Impact of machine-learning-based approaches
GCC is the most widely used compiler in the world. It is open-source and
has a large community of academic and industrial contributors of which IBM
is the leader. Working with IBM we developed MilePost GCC, a compiler that
automatically learns to optimise [A, B, C]. The learning component is
available as a simple plug-in that determines optimisation based on prior
knowledge. Uniquely this can access a shared database allowing
community-based continuous optimisation. There have been 643 downloads by
developers, the number of end users is not known. This work led to the
creation of the Collaborative Tuning resource [D], a platform for exchange
of best practice in performance optimisation of program code.
4.3. Impact of machine-learning-based approaches
Our work on compiler/architecture co-design in collaboration with the
architecture group at the School of Informatics influenced the design of
the reconfigurable EnCore processor. The associated ArcSim
high-performance architecture simulator is based on the parallel JIT
compiler technology developed by us. EnCore and ArcSim are the subject of
a separate School of Informatics REF impact case study.
4.4. Impact of auto-parallelisation research
Combining our experience in parallelisation with machine-learning-based
optimisation has led to a major breakthrough in the area of
auto-parallelisation. This was recognised when ARM made a substantial
investment in a heterogeneous parallelism centre of excellence at
Edinburgh [E]. This is ARM's first centre of excellence outside the
University of Michigan. The centre funds fundamental research in data
centre scale parallelism leading to patentable ARM IP. We are currently
jointly working with ARM on an LLVM-based OpenCL compiler based on this
work. This work has attracted considerable industrial interest: NVIDIA has
made one of our students a fellow for our work on heterogeneous
parallelisation while Freescale, Imagination Technology and IBM are
collaborating on a variety of projects. Samsung is developing a prototype
based on our JIT technology. The pioneering work on profile-directed
parallelisation is the on-going subject of commercialisation. The
University of Edinburgh and Samsung have signed a collaboration agreement
[F] publicised at http://wcms.inf.ed.ac.uk/icsa/news/samsung-research-collaboration.
4.5. Details of on-going collaboration arrangements with industrial
partners
The ARM centre of excellence has two components: an overarching
collaboration agreement, and student project agreements. This allows
intellectual property to be jointly created and exploited by all parties.
Students have a supervisor at both ARM and Edinburgh. They are paid an
enhanced stipend and undertake a three-month internship during their
studies.
In 2012 Intel announced expansion of its Intel Doctoral Student Honour
Programme into Europe. The University of Edinburgh was one of only three
universities in the UK to be selected. In 2012 one of our students
Bhargava Rajaram was awarded an Intel PhD fellowship [G]. Christophe
Dubach was awarded an Intel Early Career Faculty award: this was the only
award made to a UK academic [H].
Sources to corroborate the impact
A. MilePostGCC press release. http://www-03.ibm.com/press/us/en/pressrelease/27874.wss
B. High-Impact ICT research: "Machine-learning revolutionises software
development". http://cordis.europa.eu/ictresults/index.cfm?section=news&tpl=article&ID=91208
C. An open-source machine-learning compiler that intelligently optimizes
applications. Dr Dobb's Software Journal.
http://www.drdobbs.com/open-source/milepost-gcc-now-available/218102130
D. The Collective Tuning website. http://ctuning.org
E. University of Edinburgh and ARM Research Centre of Excellence
Framework agreement. This is a commercially sensitive document describing
the details of the collaboration agreement between the University of
Edinburgh and ARM. Copies can be made available on request.
F. University of Edinburgh and Samsung Research Collaboration agreement.
This is a commercially sensitive document which describing the details of
the collaboration agreement between the University of Edinburgh and
Samsung. Copies can be made available on request.
G. Intel Doctoral Student Honour Programme.
http://www.intel.com/content/www/us/en/education/university/intel-2012-doctoral-student-honor-awardees.html?wapkw=2012+doctoral+student+honor+awardees
H. Intel University Collaborative Research
https://www.intel-university-collaboration.net/?ai1ec_event=early-career-faculty-awards
Copies of these web page sources are available at http://ref2014.inf.ed.ac.uk/impact