Publications
Work-related, self-published writings not affiliated with research institutes:
On dr-knz.net
My Go executable files are still large.
April 2021.
Errors vs. exceptions in Go and C++ in 2020.
December 2020.
The Go low-level calling convention on x86-64 (updated).
November 2020.
Cloud-Native Security has Two R’s, not Three
September 2020.
-
September 2020.
Local connection overheads in PostgreSQL and CockroachDB.
April 2020.
Data flows and security architecture in CockroachDB.
February 2020.
Why are my Go executable files so large?.
March 2019.
Measuring errors vs. exceptions in Go and C++.
September 2018.
Measuring multiple return values in Go and C++.
August 2018.
Measuring argument passing in Go and C++.
August 2018.
The Go low-level calling convention on x86-64.
July 2018.
CS PhD student in the Netherlands: to be or not to be?.
January 2015.
On the future of computer science.
September 2014.
How good are you at programming?.
July 2014.
Rust for functional programmers.
July 2014.
-
April 2014.
Haskell for OCaml programmers.
March 2014.
-
January 2014.
-
2 september 2013.
Third-party channels
Nested transactions in CockroachDB 20.1.
Cockroach Labs Blog, June 2020.
Why are my Go executable files so large?.
Cockroach Labs Blog, April 2019. Edited version of my previous article from March 2019.
-
CockroachDB project, GitHub, September 2018.
Why CockroachDB and PostgreSQL are compatible.
Cockroach Labs Blog, August 2018. Edited version of my previous blog post from May 2018.
Local and distributed query processing in CockroachDB.
Cockroach Labs Blog, June 2017.
On the Way to Better SQL Joins in CockroachDB.
Cockroach Labs Blog, February 2017.
-
Cockroach Labs Blog, November 2016.
Squashing a Schrödinbug With Strong Typing.
Cockroach Labs Blog, August 2016.
Modesty in Simplicity: CockroachDB’s JOIN.
Cockroach Labs Blog, July 2016.
Critters in a Jar: Running CockroachDB on FreeBSD.
Cockroach Labs Blog, July 2016.
Revisiting SQL typing in CockroachDB, with Nathan VanBenschoten.
Cockroach Labs Blog, June 2016.
DIY Jepsen Testing CockroachDB.
Cockroach Labs Blog, April 2016.
Hacker-Friendly Systems for Systems Innovation (Extrinsically Adaptable Systems).
Cryptome 2013-0854, July 2013.
Academic publications
My academic publication stream is on hiatus as of january 2016: I have currently access to a funding source which enables me to continue research (and teaching) without the pressure to publish, and I intend to exploit this opportunity for as long as I can.
To the site owner:
Action required! Mendeley is changing its API. In order to keep using Mendeley with BibBase past April 14th, you need to:
- renew the authorization for BibBase on Mendeley, and
- update the BibBase URL in your page the same way you did when you initially set up this page.
Paper doi bibtex
@inproceedings{poss20sigmod, author = {Taft, Rebecca and Sharif, Irfan and Matei, Andrei and VanBenschoten, Nathan and Lewis, Jordan and Grieger, Tobias and Niemi, Kai and Woods, Andy and Birzin, Anne and Poss, Raphael and Bardea, Paul and Ranade, Amruta and Darnell, Ben and Gruneir, Bram and Jaffray, Justin and Zhang, Lucy and Mattis, Peter}, title = {{CockroachDB}: The Resilient Geo-Distributed {SQL} Database}, year = {2020}, isbn = {9781450367356}, publisher = {ACM}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3318464.3386134}, doi = {10.1145/3318464.3386134}, booktitle = {Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data}, pages = {1493–1509}, numpages = {17}, location = {Portland, OR, USA}, series = {SIGMOD ’20} }
Doi Local doi bibtex abstract
@article{poss15tpds, Abstract = {This article advocates the use of new architectural features commonly found in many-cores to replace the machine model underlying Unix-like operating systems. We present a general Abstract Many-core Machine Model (AM3), a proof-of-concept implementation and first evaluation results in the context of an emerging many-core, hardware multi-threaded architecture without support for interrupts. Our proposed approach makes it possible to reuse off-the-shelf multithreaded/multiprocess software on massively parallel architectures, without need to change code to use custom programming models like CUDA or OpenCL. Benefits include higher hardware utilization, higher performance and higher energy efficiency for workloads common to general-purpose platforms, such as in datacenters and Clouds. The benefits also include simpler software control over the hardware platform, an enabling factor for the further evolution of parallel programming languages.}, Author = {{Raphael~`kena'} Poss and Koen Koning}, Doi = {10.1109/TPDS.2015.2492542}, Urldoi = {http://dx.doi.org/10.1109/TPDS.2015.2492542}, Issn = {1045-9219}, Urllocal = {pub/poss.15.tpds.pdf}, Journal = {IEEE Trans. Parallel Distrib. Syst.}, Month = {October}, Title = {{AM$^3$}: Towards a hardware {Unix} accelerator for many-cores}, Volume = {26}, Year = {2015}}
Paper bibtex
@manual{poss14liv, Author = {{Raphael~`kena'} Poss and Robert {van Wijk} and the Computer Science teaching staff of the University of Amsterdam}, Month = {August}, Organization = {University of Amsterdam}, Title = {Leerlijn informaticavaardigheden}, Url = {http://liv.science.uva.nl/}, Year = {2015}, }
Paper Pdf bibtex
@article{poss15phd, Author = {{Raphael~`kena'} Poss}, Month = {January}, Title = {{CS} {PhD} student in the Netherlands: to be or not to be?}, Url = {http://dr-knz.net/so-you-want-to-apply-for-a-cs-phd.html}, Urlpdf = {so-you-want-to-apply-for-a-cs-phd.pdf}, Year = {2015}, }
Paper Pdf bibtex
@article{poss14cs, Author = {{Raphael~`kena'} Poss}, Month = {September}, Title = {On the future of computer science}, Url = {http://dr-knz.net/on-the-future.html}, Urlpdf = {on-the-future.pdf}, Year = {2014}, }
Doi doi bibtex abstract
@inproceedings{fu14ares, Abstract = {Transient fault recovery is important in processor availability. However, significant silicon or performance overheads are characteristics of existing techniques. We uncover an opportunity to reduce the overheads dramatically in modern processors that appears as a side-effect of introducing hard- ware multithreading to improve performance. We observe that threads are usually short code sequences with no branches and few memory side-effects, which means that the number of checkpoints is small and constant. In addition, the state structures of a thread already presented in hardware can be reused to provide checkpointing. In this paper, we demonstrate this principle of using a hardware/software co-design called Rethread, which features compiler-generated code annotations and automatic recovery in hardware by restarting threads. This approach provides the ability to recover from transient faults without dedicated hardware. Moreover, results show performance degradation under both fault-free condition (\<5\%) and as a function of fault rate.}, Address = {University of Fribourg, Switzerland}, Author = {Jian Fu and Qiang Yang and Raphael Poss and Chris Jesshope and Chunyuan Zhang}, Booktitle = {Proc. 9th International Conference on Availability, Reliability and Security (ARES'14)}, Doi = {10.1109/ARES.2014.18}, Urldoi = {http://dx.doi.org/10.1109/ARES.2014.18}, Month = {September}, Pages = {88--93}, Publisher = {IEEE}, Title = {Rethread: A Low-cost Transient Fault Recovery Scheme for Multithreaded Processors}, Year = {2014}, }
Doi doi bibtex abstract 4 downloads
@inproceedings{mirfan14simultech, Abstract = {The simulation of fine-grained latency tolerance based on the dynamic state of the system in high-level simulation of many-core systems is a challenging simulation problem. We have introduced a high-level simulation technique for microthreaded many-core systems based on the assumption that the throughput of the program can always be one cycle per instruction as these systems have fine-grained latency tolerance. However, this assumption is not always true if there are insufficient threads in the pipeline and hence long latency operations are not tolerated. In this paper we introduce Signatures to classify low-level instructions in high-level categories and estimate the performance of basic blocks during the simulation based on the concurrent threads in the pipeline. The simulation of fine-grained latency tolerance improves accuracy in the high-level simulation of many-core systems.}, Address = {Vienna, Austria}, Author = {Irfan Uddin and Raphael Poss and Chris Jesshope}, Booktitle = {Proc. 4th International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH 2014)}, Doi = {10.5220/0004982405090516}, Urldoi = {http://dx.doi.org/10.5220/0004982405090516}, Isbn = {978-989-758-038-3}, Month = {August}, Pages = {509--516}, Publisher = {Scitepress}, Title = {Signature-based high-level simulation of microthreaded many-core architectures}, Year = {2014}, }
Paper Pdf bibtex 43 downloads
@article{poss14rust, Author = {{Raphael~`kena'} Poss}, Eprint = {1407.5670}, Month = {July}, Title = {Rust for functional programmers}, Url = {http://dr-knz.net/rust-for-functional-programmers.html}, Urlpdf = {rust-for-functional-programmers.pdf}, Year = {2014}, }
Paper Pdf bibtex 62 downloads
@article{poss14pl, Author = {{Raphael~`kena'} Poss}, Month = {July}, Title = {How good are you at programming?---A {CEFR}-like approach to measure programming proficiency}, Url = {http://dr-knz.net/programming-levels.html}, Urlpdf = {programming-levels.pdf}, Year = {2014}, }
Doi Local doi bibtex abstract 16 downloads
@inproceedings{poss14trust, Abstract = {Recent work on academic publishing has focused on transparency, to eliminate skews in favor of results channeled through already established publishers. This movement, called ``open peer review'', will require infrastructure. So far, proposed realizations of open peer review have relied on centralized coordinating platforms; this is unsatisfactory as this architectural choice stays vulnerable to long-term predatory commercial capture and data loss. Instead, we propose ``Academia 2.0'', a combination of both true peer-to-peer, distributed scientific dissemination channels, and their accompanying workflows for open peer review. It features safe decoupling of storage, indexing and search sites and supports research metrics. Our proposal relies on the existence of semantic web sites for researchers and powerful Internet search engines, an assumption which did not hold 10 years ago. We also introduce post-hoc citations, a key mechanism for quality control, impact measurement and post-hoc credit attribution for previous work. Due to the technology involved, computer engineering is likely the scientific field with the most potential to try out and evaluate our model.}, Acmid = {2618139}, Address = {Edinburgh, UK}, Author = {Raphael Poss and Sebastian Altmeyer and Mark Thompson and Rob Jelier}, Booktitle = {Proc 1st ACM SIGPLAN Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering (TRUST'14)}, Doi = {10.1145/2618137.2618139}, Urldoi = {http://dx.doi.org/10.1145/2618137.2618139}, Isbn = {978-1-4503-2951-4}, Month = {June}, Pages = {3:1--3:6}, Publisher = {ACM}, Title = {{Academia 2.0}: removing the publisher middle-man while retaining impact}, Urllocal = {pub/poss.14.trust.pdf}, Year = {2014}, }
Local bibtex abstract 10 downloads
@techreport{poss14trust2, Abstract = {"Academia 2.0" is a proposal to organize scientific publishing around true peer-to-peer distributed dissemination channels and eliminate the traditional role of the academic publisher. This model will be first presented at the 2014 workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering (TRUST'14) in the form of a high-level overview, so as to stimulate discussion and gather feedback on its merits and feasibility. This report complements the 6-page introductory article presented at TRUST, by answering the reviewer's comments in detail and reviewing the related work on open peer review.}, Author = {R. Poss and S. Altmeyer and M. Thompson and R. Jelier}, Eprint = {1404.7753}, Institution = {University of Amsterdam}, Month = {May}, Read = {1}, Title = {Aca 2.0: Questions and Answers}, Urllocal = {pub/poss.14.trust2.pdf}, Year = {2014}, }
Paper Pdf bibtex 43 downloads
@article{poss14cfs, Author = {{Raphael~`kena'} Poss}, Eprint = {1405.3073}, Month = {April}, Title = {Categories from scratch}, Url = {http://dr-knz.net/categories-from-scratch.html}, Urlpdf = {categories-from-scratch.pdf}, Year = {2014}, }
Paper Pdf bibtex 26 downloads
@article{poss14hs4mlp, Author = {{Raphael~`kena'} Poss}, Eprint = {1405.3072}, Month = {March}, Title = {Haskell for {OCaml} programmers}, Url = {http://dr-knz.net/haskell-for-ocaml-programmers.html}, Urlpdf = {haskell-for-ocaml-programmers.pdf}, Year = {2014}, }
Doi doi bibtex abstract 6 downloads
@inproceedings{fu14date, Abstract = {This paper designs and implements the Redundant Multi-Threading (RMT) in a Data-flow scheduled Multi-Threaded (DMT) multicore processor, called Data-flow scheduled Redundant Multi-Threading (DRMT). Meanwhile, It presents Asynchronous Output Comparison (AOC) for RMT techniques to avoid fault detection related inter-core communication and alleviate the performance and hardware overheads induced by output comparison. Results show that the performance overhead of DRMT is less than 60\% even when the number of threads is four times the number of processing elements. Also the performance and hardware overheads of AOC are insignificant.}, Address = {Dresden, Germany}, Author = {Jian Fu and Qiang Yang and Raphael Poss and Chris Jesshope and Chunyuan Zhang}, Booktitle = {Proc. 2014 Conference on Design, Automation and Test in Europe (DATE'14)}, Doi = {10.7873/DATE.2014.076}, Urldoi = {http://dx.doi.org/10.7873/DATE.2014.076}, Month = {March}, Pages = {1--4}, Publisher = {IEEE}, Title = {A Fault Detection Mechanism in a Data-flow Scheduled Multithreaded Processor}, Year = {2014}, }
Doi doi bibtex abstract
@inproceedings{mirfan14pdp, Abstract = {High-level simulation is becoming commonly used for design space exploration of many-core systems. We have been working on high-level simulation techniques for the microthreaded many-core architecture at the University of Amsterdam. In previous work different levels of high-level simulation for instruction execution have been proposed, where the objective of every level is to keep the highest possible abstraction in order to achieve the least complexity and highest simulation speed with a compromise on the amount of accuracy. In this article we propose a new breakthrough in abstraction by simulating entire compo- nents in applications using analytical models. This simulation technique greatly reduces the complexity of the simulator and increases the simulation speed by orders of magnitude compared to the other levels of the high-level simulator, without affecting the simulation accuracy.}, Address = {Turin, Italy}, Author = {Irfan Uddin and Raphael Poss and Chris Jesshope}, Booktitle = {Proc. 22nd Euromicro International Conference on Parallel, distributed and network-based processing (PDP'14)}, Doi = {10.1109/PDP.2014.81}, Urldoi = {http://dx.doi.org/10.1109/PDP.2014.81}, Issn = {1066-6192}, Month = {February}, Pages = {344--351}, Publisher = {IEEE Computer Society}, Title = {Analytical-based high-level simulation of the microthreaded many-cores architectures}, Year = {2014}, }
Paper bibtex abstract 6 downloads
@inproceedings{poss14oopsle, Abstract = {The innovation of DSLs was the recognition that each application domain has its few idiomatic patterns of language use, found often in that domain and rarely in others. Capturing these idioms in the language design makes a DSL and yields gains in productivity, reliability and maintainability. Similarly, different groups of programmers have different predominant cognitive quirks. In this article I argue that programmers are attracted to some types of languages that resonate with their quirks and reluctant to use others that grate against them. Hence the question: could we tailor or evolve programming languages to the particular personality of their users? Due to the sheer diversity of personality types, any answer should be combined with automated language generation. The potential benefits include a leap in productivity and more social diversity in software engineering workplaces. The main pitfall is the risk of introducing new language barriers between people and decreased code reuse. However this may be avoidable by combining automated language generation with shared fundamental semantic building blocks.}, Address = {Antwerp, Belgium}, Author = {Raphael Poss}, Booktitle = {Proc. 2nd International Workshop on Open and Original Problems in Software Language Engineering (OOPSLE'14)}, Editor = {Anya Helene Bagge and Vadim Zaytsev}, Month = {February}, Pages = {15--18}, Title = {People-Specific Languages: a case for automated programming language generation by reverse-engineering programmer minds}, Url = {http://oopsle.github.io/2014/abstracts.pdf}, Year = {2014}, }
Doi Paper Local doi bibtex 31 downloads
@inbook{poss13csh, Author = {Raphael Poss}, Chapter = {Multicore Architectures and Their Software Landscape (Chapter 24)}, Doi = {10.1201/b16812}, Urldoi = {http://dx.doi.org/10.1201/b16812}, Edition = {Third}, Editor = {Teofilo Gonzalez and Jorge Diaz-Herrera and Allen Tucker}, Isbn = {978-1-4398-9852-9}, Publisher = {Chapman and Hall/CRC}, Read = {1}, Title = {Computing Handbook, Third Edition}, Url = {http://www.crcpress.com/product/isbn/9781439898529}, Urllocal = {pub/poss.13.csh.pdf}, Volume = {Computer Science and Software Engineering}, Year = {2014}, }
Doi doi bibtex abstract 1 download
@article{irfan14jsa, Abstract = {The accuracy of simulated cycles in high-level simulators is generally less than the accuracy in detailed simulators for a single-core systems, because high-level simulators simulate the behaviour of components rather than the components themselves as in detailed simulators. The simulation problem becomes more challenging when simulating many-core systems, where many cores are executing instructions concurrently. In these systems data may be accessed from multiple caches and the abstraction of the instruction execution has to consider the dynamic resource sharing on the whole chip. The problem becomes even more challenging in microthreaded many-core systems, because there may exist concurrent hardware threads. Which means that the latency of long latency operations can be tolerated from many cycles to just few cycles. We have previously presented a simulation technique to improve the accuracy in high-level simulation of microthreaded many-core systems, known as Signature-based high-level simulator, which adapts the throughput of the program based on the type of instructions, number of instructions and number of active threads in the pipeline. However, it disregards the access to different levels of the caches on the many-core system. Accessing L1-cache has far less latency than accessing off-chip memory and if the core is not able to tolerate latency, different levels of caches can not be treated equally. The distributed cache network along with the synchronization-aware coherency protocol in the Microgrid is a complicated memory architecture and it is difficult to simulate its behaviour at a high-level. In this article we present a high-level cache model, which aims to improve the accuracy in high-level simulators for general-purpose many-core systems by adding little complexity to the simulator and without affecting the simulation speed.}, Author = {Irfan Uddin and Raphael Poss and Chris Jesshope}, Doi = {10.1016/j.sysarc.2014.05.003}, Urldoi = {http://dx.doi.org/10.1016/j.sysarc.2014.05.003}, Issn = {1383-7621}, Journal = {Journal of Systems Architecture}, Number = {7}, Pages = {529--552}, Title = {Cache-based high-level simulation of the microthreaded many-core architectures}, Volume = {60}, Year = {2014}, }
Doi Local doi bibtex abstract 5 downloads
@article{poss13micpro, Abstract = {To harness the potential of CMPs for scalable, energy-efficient performance in general-purpose computers, the Apple-CORE project has co-designed a general machine model and concurrency control interface with dedicated hardware support for concurrency management across multiple cores. Its SVP interface combines dataflow synchronisation with imperative programming, towards the efficient use of parallelism in general-purpose workloads. Its implementation in hardware provides logic able to coordinate single-issue, in-order multi-threaded RISC cores into computation clusters on chip, called Microgrids. In contrast with the traditional ``accelerator'' approach, Microgrids are components in distributed systems on chip that consider both clusters of small cores and optional, larger sequential cores as system services shared between applications. The key aspects of the design are asynchrony, i.e. the ability to tolerate irregular long latencies on chip, a scale-invariant programming model, a distributed chip resource model, and the transparent performance scaling of a single program binary code across multiple cluster sizes. This article describes the execution model, the core micro-architecture, its realization in a many-core, general-purpose processor chip and its software environment. This article also presents cycle-accurate simulation results for various key algorithmic and cryptographic kernels. The results show good efficiency in terms of the utilisation of hardware despite the high-latency memory accesses and good scalability across relatively large clusters of cores.}, Author = {Raphael Poss and Mike Lankamp and Qiang Yang and Jian Fu and Michiel W. {van Tol} and Irfan Uddin and Chris Jesshope}, Doi = {10.1016/j.micpro.2013.05.004}, Urldoi = {http://dx.doi.org/10.1016/j.micpro.2013.05.004}, Issn = {0141-9331}, Journal = {Microprocessors and Microsystems}, Month = {November}, Number = {8}, Pages = {1090--1101}, Read = {1}, Title = {{Apple-CORE}: harnessing general-purpose many-cores with hardware concurrency management}, Urllocal = {pub/poss.13.micpro.pdf}, Volume = {37}, Year = {2013}, }
Paper Local bibtex abstract 3 downloads
@article{poss13bench, Abstract = {This article highlights how small modifications to either the source code of a benchmark program or the compilation options may impact its behavior on a specific machine. It argues that for evaluating machines, benchmark providers and users be careful to ensure reproducibility of results based on the machine code actually running on the hardware and not just source code. The article uses color to grayscale conversion of digital images as a running example.}, Author = {{Raphael~`kena'} Poss}, Journal = {Computing Research Repository}, Month = {September}, Read = {1}, Title = {Machines are benchmarked by code, not algorithms}, Url = {http://arxiv.org/abs/1309.0534}, Urllocal = {pub/poss.13.bench.pdf}, Year = {2013}, }
Paper Pdf bibtex 18 downloads
@article{poss13unix, Author = {{Raphael~`kena'} Poss}, Month = {September}, Title = {Introductie Unix --- De eerste dag overleven}, Url = {http://dr-knz.net/intro-unix.html}, Urlpdf = {intro-unix.pdf}, Year = {2013}, }
Paper Local bibtex abstract 3 downloads
@article{poss13iocosts, Abstract = {Is there a relationship between computing costs and the confidence people place in the behavior of computing systems? What are the tuning knobs one can use to optimize systems for human confidence instead of correctness in purely abstract models? This report explores these questions by reviewing the mechanisms by which people build confidence in the match between the physical world behavior of machines and their abstract intuition of this behavior according to models or programming language semantics. We highlight in particular that a bottom-up approach relies on arbitrary trust in the accuracy of I/O devices, and that there exists clear cost trade-offs in the use of I/O devices in computing systems. We also show various methods which alleviate the need to trust I/O devices arbitrarily and instead build confidence incrementally "from the outside" by considering systems as black box entities. We highlight cases where these approaches can reach a given confidence level at a lower cost than bottom-up approaches. }, Author = {{Raphael~`kena'} Poss}, Journal = {Computing Research Repository}, Month = {August}, Read = {1}, Title = {Optimizing for confidence---Costs and opportunities at the frontier between abstraction and reality}, Url = {http://arxiv.org/abs/1308.1602}, Urllocal = {pub/poss.13.iocosts.pdf}, Year = {2013}, }
Doi Local doi bibtex abstract 5 downloads
@inproceedings{poss13samos, Abstract = {This article presents MGSim, an open source discrete event simulator for on-chip hardware components developed at the University of Amsterdam. MGSim is used as research and teaching vehicle to study the fine-grained hardware/software interactions on many-core chips with and without hardware multithreading. MGSim's component library includes support for core models with different instruction sets, a configurable multi-core interconnect, multiple configurable cache and memory models, a dedicated I/O subsystem, and comprehensive monitoring and interaction facilities. The default model configuration shipped with MGSim implements Microgrids, a multi-core architecture with hardware concurrency management. MGSim is furthermore written mostly in C++ and uses object classes to represent chip components. It is optimized for architecture models that can be described as process networks.}, Author = {Raphael Poss and Mike Lankamp and Qiang Yang and Jian Fu and Irfan Uddin and Chris Jesshope}, Booktitle = {Proc. Intl. Conf. on Embedded Computer Systems: Architectures, MOdeling and Simulation (SAMOS XIII)}, Doi = {10.1109/SAMOS.2013.6621109}, Urldoi = {http://dx.doi.org/10.1109/SAMOS.2013.6621109}, Month = {July}, Pages = {80--87}, Publisher = {IEEE}, Read = {1}, Title = {{MGSim}---A simulation Environment for Multi-Core Research and Education}, Urllocal = {pub/poss.13.samos.pdf}, Year = {2013}, }
Doi Local doi bibtex abstract
@inproceedings{fu13samos, Abstract = {The vulnerability of multi-core processors is increasing due to tighter design margins and greater susceptibility to interference. Moreover, concurrent programming environments are the norm in the exploitation of multi-core systems. In this paper, we present an on-demand thread-level fault detection mechanism for multi-cores. The main contribution is on-demand redundancy, which allows users to set the redundancy scope in the concurrent code. To achieve this we introduce intelligent redundant thread creation and synchronization, which manages concurrency and synchronization between the redundant threads via the master. This framework was implemented in an emulation of a multi-threaded, many-core processor with single, in-order issue cores. It was evaluated by a range of programs in image and signal processing, and encryption. The performance overhead of redundancy is less than 11% for single core execution and is always less than 100% for all scenarios. This efficiency derives from the platform's hardware concurrency management and latency tolerance.}, Author = {Jian Fu and Qiang Yang and Raphael Poss and Chris Jesshope and Chunyuan Zhang}, Booktitle = {Proc. Intl. Conf. on Embedded Computer Systems: Architectures, MOdeling and Simulation (SAMOS XIII)}, Doi = {10.1109/SAMOS.2013.6621132}, Urldoi = {http://dx.doi.org/10.1109/SAMOS.2013.6621132}, Month = {July}, Pages = {255--262}, Publisher = {IEEE}, Read = {1}, Title = {On-demand Thread-level Fault Detection in a Concurrent Programming Environment}, Urllocal = {pub/fu.13.samos.pdf}, Year = {2013}, }
Paper Local bibtex abstract 4 downloads
@article{poss13ctc, Abstract = {How can one recognize coordination languages and technologies? As this report shows, the common approach that contrasts coordination with computation is intellectually unsound: depending on the selected understanding of the word "computation", it either captures too many or too few programming languages. Instead, we argue for objective criteria that can be used to evaluate how well programming technologies offer coordination services. Of the various criteria commonly used in this community, we are able to isolate three that are strongly characterizing: black-box componentization, which we had identified previously, but also interface extensibility and customizability of run-time optimization goals. These criteria are well matched by Intel's Concurrent Collections and AstraKahn, and also by OpenCL, POSIX and VMWare ESX. }, Author = {{Raphael~`kena'} Poss}, Journal = {Computing Research Repository}, Month = {July}, Read = {1}, Title = {Characterizing traits of coordination}, Url = {http://arxiv.org/abs/1307.4827}, Urllocal = {pub/poss.13.ctc.pdf}, Year = {2013}, }
Paper Local bibtex abstract
@techreport{poss13spnet, Abstract = {This technical report introduces S+Net, a compositional coordination language for streaming networks with extra-functional semantics. Compositionality simplifies the specification of complex parallel and distributed applications; extra-functional semantics allow the application designer to reason about and control resource usage, performance and fault handling. The key feature of S+Net is that functional and extra-functional semantics are defined orthogonally from each other. S+Net can be seen as a simultaneous simplification and extension of the existing coordination language S-Net, that gives control of extra-functional behavior to the S-Net programmer. S+Net can also be seen as a transitional research step between S-Net and AstraKahn, another coordination language currently being designed at the University of Hertfordshire. In contrast with AstraKahn which constitutes a re-design from the ground up, S+Net preserves the basic operational semantics of S-Net and thus provides an incremental introduction of extra-functional control in an existing language.}, Author = {Raphael Poss and Merijn Verstraaten and Frank Penczek and Clemens Grelck and Raimund Kirner and Alex Shafarenko}, Institution = {University of Amsterdam and University of Hertfordshire}, Month = {June}, Number = {arXiv:1306.2743v1 [cs.PL]}, Read = {1}, Title = {{S+Net}: extending functional coordination with extra-functional semantics}, Url = {http://arxiv.org/abs/1306.2743}, Urllocal = {pub/poss.13.spnet.pdf}, Year = {2013}, }
Paper Local bibtex abstract 2 downloads
@article{poss13exadapt, Abstract = {Are there qualitative and quantitative traits of system design that contribute to the ability of people to further innovate? We propose that extrinsic adaptability, the ability given to secondary parties to change a system to match new requirements not envisioned by the primary provider, is such a trait. "Extrinsic adaptation" encompasses the popular concepts of "workaround", "fast prototype extension" or "hack", and extrinsic adaptability is thus a measure of how friendly a system is to tinkering by curious minds. In this report, we give "hackability" or "hacker-friendliness" scientific credentials by formulating and studying a generalization of the concept. During this exercise, we find that system changes by secondary parties fall on a subjective gradient of acceptability, with extrinsic adaptations on one side which confidently preserve existing system features, and invasive modifications on the other side which are perceived to be disruptive to existing system features. Where a change is positioned on this gradient is dependent on how an external observer perceives component boundaries within the changed system. We also find that the existence of objective cost functions can alleviate but not fully eliminate this subjectiveness. The study also enables us to formulate an ethical imperative for system designers to promote extrinsic adaptability.}, Author = {{Raphael~`kena'} Poss}, Journal = {Computing Research Repository}, Month = {June}, Read = {1}, Title = {Extrinsically adaptable systems}, Url = {http://arxiv.org/abs/1306.5445}, Urllocal = {pub/poss.13.exadapt.pdf}, Year = {2013}, }
Paper Local bibtex abstract 4 downloads
@article{poss13coord, Abstract = {Is there a characteristic of coordination languages that makes them qualitatively different from general programming languages and deserves special academic attention? This report proposes a nuanced answer in three parts. The first part highlights that coordination languages are the means by which composite software applications can be specified using components that are only available separately, or later in time, via standard interfacing mechanisms. The second part highlights that most currently used languages provide mechanisms to use externally provided components, and thus exhibit some elements of coordination. However not all do, and the availability of an external interface thus forms an objective and qualitative criterion that distinguishes coordination. The third part argues that despite the qualitative difference, the segregation of academic attention away from general language design and implementation has non-obvious cost trade-offs. }, Author = {{Raphael~`kena'} Poss}, Institution = {University of Amsterdam}, Journal = {Computing Research Repository}, Month = {June}, Read = {1}, Title = {The essence of component-based design and coordination}, Url = {http://arxiv.org/abs/1306.3375}, Urllocal = {pub/poss.13.coord.pdf}, Year = {2013}, }
Paper Local bibtex abstract 3 downloads
@techreport{poss13mg, Abstract = {This report lays flat my personal views on D-RISC and Microgrids as of March 2013. It reflects the opinions and insights that I have gained from working on this project during the period 2008-2013. This report is structed in two parts: deconstruction and reconstruction. In the deconstruction phase, I review what I believe are the fundamental motivation and goals of the D-RISC/Microgrids enterprise, and identify what I judge are shortcomings: that the project did not deliver on its expectations, that fundamental questions are left unanswered, and that its original motivation may not even be relevant in scientific research any more in this day and age. In the reconstruction phase, I start by identifying the merits of the current D-RISC/Microgrids technology and know-how taken at face value, re-motivate its existence from a different angle, and suggest new, relevant research questions that could justify continued scientific investment.}, Author = {{Raphael~`kena'} Poss}, Institution = {University of Amsterdam}, Month = {March}, Number = {arXiv:1303.4892v1 [cs.AR]}, Read = {1}, Title = {On whether and how {D-RISC} and {Microgrids} can be kept relevant (self-assessment report)}, Url = {http://arxiv.org/abs/1303.4892}, Urllocal = {pub/poss.13.mg.pdf}, Year = {2013}, }
Doi doi bibtex abstract
@article{yang13tecs, Abstract = {When hardware cache coherence scales to many cores on chip, the coherence protocol of the shared memory system may offset the benefit from massive hardware concurrency. In this article, we investigate the cost of a write-update policy in terms of on-chip memory network traffic and its adverse effects on the system performance based on a multi-threaded many-core architecture with distributed caches. We discuss possible software and hardware solutions to alleviate the network pressure without changing the protocol. We find that in the context of massive concurrency, by introducing a write-merging buffer with 0.46% area overhead to each core, applications with good locality and concurrency are boosted up by 18.74% in performance on average. Other applications also benefit from this addition and even achieve a throughput increase of 5.93%. In addition, this improvement indicates that higher levels of concurrency per core can be exploited without impacting performance, thus tolerating latency better and giving higher processor efficiencies compared to other solutions.}, Acmid = {2567931}, Address = {New York, NY, USA}, Author = {Qiang Yang and Jian Fu and Raphael Poss and Chris Jesshope}, Doi = {10.1145/2567931}, Urldoi = {http://dx.doi.org/10.1145/2567931}, Issn = {1539-9087}, Journal = {ACM Trans. Embed. Comput. Syst.}, Month = {March}, Number = {3s}, Pages = {103:1--103:21}, Publisher = {ACM}, Title = {On-Chip Traffic Regulation to Reduce Coherence Protocol Cost on a Micro-threaded Many-Core Architecture with Distributed Caches}, Volume = {13}, Year = {2013}, }
Paper Local bibtex abstract 4 downloads
@techreport{lankamp13mgsim, Abstract = {MGSim is an open source discrete event simulator for on-chip hardware components, developed at the University of Amsterdam. It is intended to be a research and teaching vehicle to study the fine-grained hardware/software interactions on many-core and hardware multithreaded processors. It includes support for core models with different instruction sets, a configurable multi-core interconnect, multiple configurable cache and memory models, a dedicated I/O subsystem, and comprehensive monitoring and interaction facilities. The default model configuration shipped with MGSim implements Microgrids, a many-core architecture with hardware concurrency management. MGSim is furthermore written mostly in C++ and uses object classes to represent chip components. It is optimized for architecture models that can be described as process networks.}, Author = {Mike Lankamp and Raphael Poss and Qiang Yang and Jian Fu and Irfan Uddin and Chris R. Jesshope}, Institution = {University of Amsterdam}, Month = {February}, Number = {arXiv:1302.1390v1 [cs.AR]}, Read = {1}, Title = {{MGSim}---Simulation tools for multi-core processor architectures}, Url = {http://arxiv.org/abs/1302.1390}, Urllocal = {pub/lankamp.13.mgsim.pdf}, Year = {2013}, }
Paper Local bibtex abstract
@inproceedings{verstraaten13fdcoma, Abstract = {We propose an extension to S-NET's light-weight parallel execution layer (LPEL): dynamic migration of tasks between cores for improved load balancing and higher throughput of S-NET streaming networks. We sketch out the necessary implementation steps and empirically analyse the impact of task migration on a variety of S-NET applications.}, Author = {Merijn Verstraaten and Stefan Kok and Raphael Poss and Clemens Grelck}, Booktitle = {Proc. 2nd HiPEAC Workshop on Feedback-Directed Compiler Optimization for Multi-Core Architectures}, Editor = {Clemens Grelck and Kevin Hammond and Sven-Bodo Scholz}, Month = {January}, Read = {1}, Title = {Task Migration for {S-Net/LPEL}}, Url = {http://www.project-advance.eu/wp-content/uploads/2012/07/proceedings.pdf}, Urllocal = {pub/verstraaten.13.fdcoma.pdf}, Year = {2013}, }
Paper Local bibtex abstract
@inproceedings{mckenzie13fdcoma, Abstract = {We consider an ant-colony optimsation problem implemented on a multicore system as a collection of asynchronous stream-processing components under the control of the S-NET coordination language. Statistical analysis and visualisation techniques are used to study the behaviour of the application, and this enables us to discover and correct problems in both the application program and the run-time system underlying S-NET.}, Author = {Kenneth MacKenzie and Philip Kaj Ferdinand H\"{o}lzenspies and Kevin Hammond and Raimund Kirner and Nguyen Vu Tien Nga and Rene te Boekhorst and Clemens Grelck and Raphael Poss and Merijn Verstraaten}, Booktitle = {Proc. 2nd HiPEAC Workshop on Feedback-Directed Compiler Optimization for Multi-Core Architectures}, Editor = {Clemens Grelck and Kevin Hammond and Sven-Bodo Scholz}, Month = {January}, Read = {1}, Title = {Statistical Performance Analysis of an Ant-Colony Optimisation Application in {S-Net}}, Url = {http://www.project-advance.eu/wp-content/uploads/2012/07/proceedings.pdf}, Urllocal = {pub/mckenzie.13.fdcoma.pdf}, Year = {2013}, }
Doi Local doi bibtex abstract 4 downloads
@inproceedings{poss12dsd, Abstract = {To harness the potential of CMPs for scalable, energy-efficient performance in general-purpose computers, the Apple-CORE project has co-designed a general machine model and concurrency control interface with dedicated hardware support for concurrency management across multiple cores. Its SVP interface combines dataflow synchronisation with imperative programming, towards the efficient use of parallelism in general- purpose workloads. The corresponding hardware implementation provides logic able to coordinate single-issue, in-order multi- threaded RISC cores into computation clusters on chip, called Microgrids. In contrast with the traditional ``accelerator'' approach, Microgrids are intended to be used as components in distributed systems on chip that consider both clusters of small cores and optional larger cores optimized towards sequential performance as system services shared between applications. The key aspects of the design are asynchrony, i.e. the ability to tolerate operations with irregular long latencies, a scale-invariant programming model, a distributed vision of the chip's structure, and the transparent performance scaling of a single program binary code across multiple cluster sizes. This paper describes the execution model, the core micro-architecture, its realization in a many-core, general-purpose processor chip and its software environment. The reference chip parameters include 128 cores, a 4 MB on-chip distributed cache network and four DDR3-1600 memory channels. This paper presents cycle-accurate simulation results for various key algorithmic and cryptographic kernels. The results show good efficiency in terms of the utilisation of hardware despite the high-latency memory accesses and good scalability across relatively large clusters of cores.}, Author = {Raphael Poss and Mike Lankamp and Qiang Yang and Jian Fu and Michiel W. {van Tol} and Chris Jesshope}, Booktitle = {Proc. 15th Euromicro Conference on Digital System Design (DSD 2012)}, Doi = {10.1109/DSD.2012.25}, Urldoi = {http://dx.doi.org/10.1109/DSD.2012.25}, Editor = {Smail Niar}, Isbn = {978-0-7695-4798-5}, Month = {September}, Publisher = {IEEE Computer Society}, Read = {1}, Title = {{Apple-CORE}: {Microgrids} of {SVP} cores (invited paper)}, Urllocal = {pub/poss.12.dsd.pdf}, Year = {2012}, }
Doi Local doi bibtex abstract
@inproceedings{poss12nfsp, Abstract = {Optimising software for efficiency on a parallel hardware platform by analysing the performance of the application is often a complex and time-consuming task. In this paper we present a constraint annotation and aggregation system that allows programmers to annotate code by using a dedicated language for describing functional and extra-functional properties, such as for example algorithmic complexity, scaling factors or the number of required cores. The goal is to derive properties of the entire application that are parametrised over characteristics of the execution platform to assist programmers in better understanding the behaviour of an application and to assist the execution platform in making informed mapping and scheduling decisions.}, Address = {New York, NY, USA}, Author = {Frank Penczek and Raimund Kirner and Raphael Poss and Clemens Grelck and Alex Shafarenko}, Booktitle = {Proc. 4th International Workshop on Non-functional System Properties in Domain Specific Modeling Languages}, Doi = {10.1145/2420942.2420947}, Urldoi = {http://dx.doi.org/10.1145/2420942.2420947}, Isbn = {978-1-4503-1807-5}, Location = {Innsbruck, Austria}, Month = {September}, Numpages = {6}, Pages = {5:1--5:6}, Publisher = {ACM}, Read = {1}, Series = {NFPinDSML '12}, Title = {An Infrastructure for Multi-Level Optimisation through Property Annotation and Aggregation}, Urllocal = {pub/poss.12.nfsp.pdf}, Year = {2012}, }
Doi Paper doi bibtex abstract 1 download
@phdthesis{poss12, Abstract = {Multi-core chips are currently in the spotlight as a potential means to overcome the limits of frequency scaling for performance increases in processors. In this direction, the CSA group at the University of Amsterdam is investigating a new design for processors towards faster and more efficient general-purpose multi-core chips. However this design changes the interface between the hardware and software, compared to existing chips, in ways that have not been dared previously. Consequently, the concepts underlying existing operating systems and compilers must be adapted before this new design can be fully integrated and evaluated in computing systems. This dissertation investigates the impact of the changes in the machine interface on operating software and makes four contributions. The first contribution is a comprehensive presentation of the design proposed by the CSA group. The second contribution is formed by technology that demonstrates that the chip can be programmed using standard programming tools. The third contribution is a demonstration that the hardware components can be optimized by starting to implement operating software during the hardware design instead of afterwards. The fourth contribution is an analysis of which parts of the hardware design will require further improvements before it can be fully accepted as a general- purpose platform. The first conclusion is a confirmation that the specific design considered can yield higher performance at lower cost with relatively minimal implementation effort in software. The second conclusion is that the processor interface can be redefined while designing multi-core chips as long as the design work is carried out hand in hand with operating software providers.}, Author = {Poss, {Raphael `kena'}}, Doi = {11245/2.109482}, Urldoi = {http://dx.doi.org/11245/2.109482}, Isbn = {978-94-6108-320-3}, Month = {September}, Publisher = {Gildeprint Drukkerijen}, Read = {1}, School = {University of Amsterdam}, Title = {On the realizability of hardware microthreading---Revisting the general-purpose processor interface: consequences and challenges}, Url = {http://www.raphael.poss.name/on-the-realizability-of-hardware-microthreading/}, Year = {2012}, }
Paper Local bibtex abstract
@techreport{poss12sl, Abstract = {Many-core architectures of the future are likely to have distributed memory organizations and need fine grained concurrency management to be used effectively. The Self-adaptive Virtual Processor (SVP) is an abstract concurrent programming model which can provide this, but the model and its current implementations assume a single address space shared memory. We investigate and extend SVP to handle distributed environments, and discuss a prototype SVP implementation which transparently supports execution on heterogeneous distributed memory clusters over TCP/IP connections, while retaining the original SVP programming model. }, Author = {{Raphael~`kena'} Poss}, Institution = {University of Amsterdam}, Month = {August}, Number = {arXiv:1208.4572v1 [cs.PL]}, Read = {1}, Title = {{SL}---a ``quick and dirty'' but working intermediate language for {SVP} systems}, Url = {http://arxiv.org/abs/1208.4572}, Urllocal = {pub/poss.12.sl.pdf}, Year = {2012}, }
Doi Local doi bibtex abstract
@inproceedings{poss12interact, Abstract = {This papers revisits non-deferred reference counting, a common technique to ensure that potentially shared large heap objects can be reused safely when they are both input and output to computations. Traditionally, thread-safe reference counting exploit implicit memory-based communication of counter data and require means to achieve a globally consistent memory state, either using barriers or locks. Acknowledgeing the distributed nature of upcoming many-core chips, we have developed a novel approach that keeps reference counters at single physical locations and ships the counting operations asynchronously to these locations us- ing hardware primitives, rather than implicitely moving the counter data between threads. Compared to previous methods, our approach does not require full cache coherency.}, Author = {Raphael Poss and Clemens Grelck and Stephan Herhut and Sven-Bodo Scholz}, Booktitle = {Proc. 16th Workshop on Interaction between Compilers and Computer Architectures (INTERACT'16)}, Doi = {10.1109/INTERACT.2012.6339625}, Urldoi = {http://dx.doi.org/10.1109/INTERACT.2012.6339625}, Isbn = {978-1-4673-2613-1}, Issn = {1550-6207}, Month = {February}, Pages = {41--48}, Publisher = {IEEE}, Read = {1}, Title = {Lazy Reference Counting for the {Microgrid}}, Urllocal = {pub/poss.12.interact.pdf}, Year = {2012}, }
Doi Local doi bibtex abstract
@inproceedings{poss12rapido, Abstract = {The EU Apple-CORE project has explored the design and implementation of novel general-purpose many-core chips featuring hardware microthreading and hardware support for concurrency management. The introduction of the latter in the cores ISA has required simultaneous investigation into compilers and multiple layers of the software stack, including operating systems. The main challenge in such vertical approaches is the cost of implementing simultaneously a detailed simulation of new hardware components and a complete system platform suitable to run large software benchmaks. In this paper, we describe our use case and our solutions to this challenge.}, Acmid = {2162134}, Author = {Poss, Raphael and Lankamp, Mike and Uddin, M. Irfan and S\'{y}kora, Jaroslav and Kafka, Leo\v{s}}, Booktitle = {Proc. 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools}, Doi = {10.1145/2162131.2162134}, Urldoi = {http://dx.doi.org/10.1145/2162131.2162134}, Isbn = {978-1-4503-1114-4}, Keywords = {hardware multithreading, hardware/software co-design, many-core architecture, simulation, system design, system evaluation, system-on-chip design, vertical approach}, Location = {Paris, France}, Month = {January}, Numpages = {8}, Pages = {17--24}, Publisher = {ACM}, Read = {1}, Series = {RAPIDO '12}, Title = {Heterogeneous integration to simplify many-core architecture simulations}, Urllocal = {pub/poss.12.rapido.pdf}, Year = {2012}, }
Doi Local doi bibtex abstract
@inproceedings{mirfan12, Abstract = {The current many-core architectures are generally evaluated using cycle-accurate simulations. However these detailed simulations of the architecture make the evaluation of large programs very slow. Since the focus in many-core architecture is shifting from the performance of individual cores to the overall behavior of the chip, high-level simulations are becoming necessary, which evaluate the same architecture at less detailed level and allow the designer to make quick and reasonably accurate design decisions. We have developed a high-level simulator for the design space exploration of the Microgrid, which is a many-core architecture comprised of many fine-grained multi-threaded cores. This simulator al- lows us to investigate mapping and scheduling strategies of families (i.e. groups of threads) in developing an operating environment for the Microgrid. The previous method to count and evaluate the workload in basic blocks was not accurate enough. The key problem was that with many concurrent threads the latency of certain instructions is hidden because of the multi-threaded nature of the core. This paper presents a technique to determine the execution time of different types of instructions with thread concurrency. We believe to achieve high accuracy in evaluating programs in the high-level simulator.}, Acmid = {2162132}, Address = {New York, NY, USA}, Author = {Uddin, M. Irfan and Jesshope, Chris R. and van Tol, Michiel W. and Poss, Raphael}, Booktitle = {Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools}, Doi = {10.1145/2162131.2162132}, Urldoi = {http://dx.doi.org/10.1145/2162131.2162132}, Isbn = {978-1-4503-1114-4}, Keywords = {automatic annotation of basic blocks with performance, estimation, performance estimation}, Location = {Paris, France}, Month = {January}, Numpages = {8}, Pages = {1--8}, Publisher = {ACM}, Read = {1}, Series = {RAPIDO '12}, Title = {Collecting signatures to model latency tolerance in high-level simulations of microthreaded cores}, Urllocal = {pub/mirfan.12.pdf}, Year = {2012}, }
Paper Local bibtex
@misc{poss11adv16, Author = {Raphael Poss and Clemens Grelck and Merijn Verstraaten}, Month = {November}, Title = {Implementation of {SVP} on at least one target (Software), {ADVANCE} deliverable {D16}}, Url = {http://www.project-advance.eu/deliverables/}, Urllocal = {pub/poss.11.adv16.pdf}, Year = {2011}, }
Doi Local doi bibtex abstract
@incollection{herhut11ifl, Abstract = {We present a first evaluation of our novel approach for non-deferred reference counting on the Microgrid many-core architecture. Non-deferred reference counting is a fundamental building block of implicit heap management of functional array languages in general and Single Assignment C in particular. Existing lock-free approaches for multi-core and SMP settings do not scale well for large numbers of cores in emerging many-core platforms. We, instead, employ a dedicated core for reference counting and use asynchronous messaging to emit reference counting operations. This novel approach decouples computational work- load from reference-counting overhead. Experiments using cycle-accurate simulation of a realistic Microgrid show that, by exploiting asynchronism, we are able to tolerate even worst-case reference counting loads reasonably well. Scalability is essentially limited only by the combined sequential runtime of all reference counting operations, in accordance with Amdahl's law. Even though developed in the context of Single Assignment C and the Microgrid, our approach is applicable to a wide range of languages and platforms.}, Author = {Herhut, Stephan and Joslin, Carl and Scholz, Sven-Bodo and Poss, Raphael and Grelck, Clemens}, Booktitle = {Implementation and Application of Functional Languages}, Doi = {10.1007/978-3-642-24276-2_12}, Urldoi = {http://dx.doi.org/10.1007/978-3-642-24276-2_12}, Editor = {J. Haage and M. Moraz\'an}, Isbn = {978-3-642-24275-5}, Month = {October}, Pages = {185--202}, Publisher = {Springer Berlin / Heidelberg}, Read = {1}, Series = {Lecture Notes in Computer Science}, Title = {Concurrent Non-Deferred Reference Counting on the {Microgrid}: First Experiences}, Urllocal = {pub/herhut.11.ifl.pdf}, Volume = {6647}, Year = {2011}, }
Doi Local doi bibtex abstract
@incollection{bernard10hppc, Abstract = {Many-core architectures are a commercial reality, but programming them efficiently is still a challenge, especially if the mix is heterogeneous. Here granularity must be addressed, i.e. when to make use of concurrency resources and when not to. We have designed a data-driven, fine-grained concurrent execution model (SVP) that captures concurrency in a resource-agnostic way. Our approach separates the concern of describing a concurrent computation from its mapping and scheduling. We have implemented this model as a novel many-core architecture programmed with a language called muTC. In this paper we demonstrate how we achieve our goal of resource-agnostic programming on this target, where heterogeneity is exposed as arbitrarily sized clusters of cores. }, Author = {Thomas Bernard and Clemens Grelck and Michael Hicks and Chris Jesshope and Raphael Poss}, Booktitle = {Euro-Par 2010 Parallel Processing Workshops}, Doi = {10.1007/978-3-642-21878-1_14}, Urldoi = {http://dx.doi.org/10.1007/978-3-642-21878-1_14}, Editor = {Guarracino, Mario and Vivien, Fr\'ed\'eric and Tr\"aff, Jesper and Cannatoro, Mario and Danelutto, Marco and Hast, Anders and Perla, Francesca and Kn{\"u}pfer, Andreas and Di Martino, Beniamino and Alexander, Michael}, Isbn = {978-3-642-21877-4}, Month = {August}, Pages = {109--116}, Publisher = {Springer Berlin / Heidelberg}, Read = {1}, Series = {Lecture Notes in Computer Science}, Title = {Resource-agnostic programming for many-core {Microgrids}}, Urllocal = {pub/bernard.10.hppc.pdf}, Volume = {6586}, Year = {2011}, }
Paper Local bibtex
@misc{rolls11ac23, Author = {D. Rolls and C. Joslin and Sven-Bodo Scholz and C. Jesshope and R. Poss}, Month = {July}, Title = {Final report of benchmark evaluations in different programming paradigms, {Apple-CORE} deliverable {D2.3}}, Url = {http://apple-core.info/research.html}, Urllocal = {pub/rolls.11.ac23.pdf}, Year = {2011}, }
Paper bibtex
@techreport{lankamp11mgsim14, Author = {Mike Lankamp and Michiel W. {van Tol} and Chris Jesshope and Raphael Poss}, Institution = {University of Amsterdam}, Month = {May}, Number = {[mgsim14]}, Title = {Hardware {I/O} interface on the {Microgrid}}, Url = {https://notes.svp-home.org/mgsim14.html}, Year = {2011}, }
Paper Local bibtex 1 download
@misc{poss10adv6, Author = {Raphael Poss and Raimund Kirner}, Month = {November}, Title = {Hardware virtualisation notation, {ADVANCE} deliverable {D6}}, Url = {http://www.project-advance.eu/deliverables/}, Urllocal = {pub/poss.10.adv6.pdf}, Year = {2010}, }
bibtex abstract
@inproceedings{grelck10eric, Abstract = {The multi-core/many-core revolution has brought up a hardly precedented diversity in computer architecture. While parallelism id the common property, granularity of concurrent processing resources may easily span multiple orders of magnitude. This requires design decisions in the organisation of concurrent program execution to be made differently depending on the concrete execution platform. We propose a hardware virtualisation layer that separates the aspect of granularity from the expression (or detection) of concurrency and provides uniform access to concurrent computing resources.}, Address = {Braunschweig, Germany}, Author = {Clemens Grelck and Raphael Poss and Chris Jesshope}, Booktitle = {Proc. Intel European Research and Innovation Conference (ERIC'10)}, Month = {October}, Title = {Hardware virtualisation for heterogeneous many-core systems}, Year = {2010}, }
Paper bibtex
@misc{poss10hppc-pres, Author = {Raphael Poss}, Month = {September}, Read = {1}, Title = {Resource-agnostic programming of microgrids (talk at {HPPC}'10)}, Url = {http://www.hppc-workshop.org/HPPC10-Poss.pdf}, Year = {2010}, }
Paper Local bibtex 1 download
@misc{hicks10ac53, Author = {M.A. Hicks and R. Poss and C. Jesshope and M.W. {van Tol} and M. Lankamp}, Month = {September}, Title = {Report on Porting Operating System to {SVP/Microgrid} Platform, {Apple-CORE} deliverable {D5.3}}, Url = {http://apple-core.info/research.html}, Urllocal = {pub/hicks.10.ac53.pdf}, Year = {2010}, }
Paper Local bibtex abstract
@inproceedings{poss10amp, Abstract = {There exists several divides between implicit and explicit paradigms in concurrent programming models, for example between the as- sumption of coherent shared memory (e.g. OpenMP), and the as- sumption of distributed memory (e.g. MPI). Explicit paradigms exist to provide control to programmers, but cause scalability con- cerns: programs need to be adapted whenever the granularity of concurrency changes. With the rise of large heterogeneous pools of computing resources, we must increasingly distribute tasks au- tomatically. Implicit paradigms allow this in theory and are de- sirable for expressivity and intuitiveness, but their scalability in heterogeneous environments is yet unclear. In this position paper, we propose to consolidate previous knowledge by seeking more implicit concurrent programming models that combine three prop- erties. The first desirable property is resource agnosticism, where programs separate clearly the description of computations from the description of task distribution to resources. The second property is scoped synchronization, where programs express no more syn- chronization than required by the described computation. The third property is the visibility of data dependencies between tasks by compilers and run-time systems. Only when these properties exist together, it becomes possible to automatically tailor programs to heterogeneous target systems and achieve efficient execution. We show how specializability is needed to optimize this process.}, Address = {Toronto, Canada}, Author = {Raphael Poss and Chris Jesshope}, Booktitle = {The First Workshop on Advances in Message Passing (AMP'10)}, Keywords = {programming models, programming languages, concurrency, considered harmful}, Month = {June}, Read = {1}, Title = {Towards scalable implicit communication and synchronization}, Url = {http://www.cs.rochester.edu/u/cding/amp/papers/pos/Towards%20Scalable%20Implicit%20Communication%20and%20Synchronization.pdf}, Urllocal = {pub/poss.10.amp.pdf}, Year = {2010}, }
Doi Local doi bibtex abstract
@incollection{jesshope09parco, Abstract = {In this paper we will introduce work being supported by the EU in the Apple-CORE project (http://www.apple-core.info). This project is pushing the boundaries of programming and systems development in multi-core architectures in an attempt to make multi-core go mainstream, i.e. continuing the current trends in low-power, multi-core architecture to thousands of cores on chip and supporting this in the context of the next generations of PCs. This work supports dataflow principles but with a conventional programming style. The paper describes the underlying execution model, a core design based on this model and its emulation in software. We also consider system issues that impact security. The major benefits of this approach include asynchrony, i.e. the ability to tolerate long latency operations without impacting performance and binary compatibility. We present results that show very high efficiency and good scalability despite the high memory access latency in the proposed chip architecture. }, Author = {Chris Jesshope and Michael Hicks and Mike Lankamp and Raphael Poss and Li Zhang}, Booktitle = {Parallel Computing: From Multicores and GPU's to Petascale}, Doi = {10.3233/978-1-60750-530-3-16}, Urldoi = {http://dx.doi.org/10.3233/978-1-60750-530-3-16}, Editor = {Barbara Chapman and Fr{\'e}d{\'e}ric Desprez and Gerhard R. Joubert and Alain Lichnewsky and Frans Peters and Thierry Priol}, Isbn = {978-1-60750-529-7}, Pages = {16--31}, Publisher = {{IOS} Press}, Read = {1}, Series = {Advances in Parallel Computing}, Title = {Making multi-cores mainstream -- from security to scalability}, Urllocal = {pub/jesshope.09.parco.pdf}, Volume = {19}, Year = {2010}, }
Paper Local bibtex 3 downloads
@misc{poss09ac54, Author = {Raphael Poss}, Month = {May}, Title = {Core compiler, {Apple-CORE} deliverable {D5.4}}, Url = {http://apple-core.info/research.html}, Urllocal = {pub/poss.09.ac54.pdf}, Year = {2009}, }
Paper Local bibtex 1 download
@misc{masters08ac52, Author = {J. Masters and M. Lankamp and C. Jesshope and R. Poss and E. Hielscher}, Month = {December}, Title = {Report on memory protection in microthreaded processors, {Apple-CORE} deliverable {D5.2}}, Url = {http://apple-core.info/research.html}, Urllocal = {pub/masters.08.ac52.pdf}, Year = {2008}, }
Local bibtex abstract 1 download
@inproceedings{burrus03mpool, Abstract = {Object-oriented and generic programming are both supported in C++. OOP provides high expressiveness whereas GP leads to more efficient programs by avoiding dynamic typing. This paper presents SCOOP, a new paradigm which enables both classical OO design and high performance in C++ by mixing OOP and GP. We show how classical and advanced OO features such as virtual methods, multiple inheritance, argument covariance, virtual types and multimethods can be implemented in a fully statically typed model, hence without run-time overhead.}, Address = {Anaheim, CA, USA}, Author = {Nicolas Burrus and Alexandre Duret-Lutz and Thierry G{\'e}raud and David Lesage and Raphael Poss}, Booktitle = {Proceedings of the Workshop on Multiple Paradigm with OO Languages (MPOOL'03)}, Keywords = {generic programming, performance, C++}, Month = {October}, Project = {Olena}, Read = {Oui}, Title = {A static {C++} object-oriented programming ({SCOOP}) paradigm mixing benefits of traditional {OOP} and generic programming}, Urllocal = {pub/burrus.03.mpool.pdf}, Year = {2003}, }
Doi Local doi bibtex abstract 3 downloads
@inproceedings{lombardy03ciaa, Abstract = {This paper reports on a new software platform dedicated to the computation with automata and transducers, called Vaucanson, the main feature of which is the capacity of dealing with automata whose labels may belong to various algebraic structures. The paper successively shows how Vaucanson allows to program algorithms on automata in a way which is very close to the mathematical expression of the algorithm, describes some features of the Vaucanson platform, including the fact that the very rich data structure used to implement automata does not weight too much on the performance and finally explains the main issues of the programming design that allow to achieve both genericity and efficiency.}, Address = {Santa Barbara, CA, USA}, Author = {Sylvain Lombardy and Raphael Poss and Yann R{\'e}gis-Gianas and Jacques Sakarovitch}, Booktitle = {Proc. 8th International Conference on Implementation and Application of Automata (CIAA'03)}, Doi = {10.1007/3-540-45089-0_10}, Urldoi = {http://dx.doi.org/10.1007/3-540-45089-0_10}, Keywords = {Vaucanson, finite state automata, C++, generic programming}, Month = {July}, Pages = {96--107}, Project = {Vaucanson}, Publisher = {Springer-Verlag}, Read = {Oui}, Series = {Lecture Notes in Computer Science Series}, Title = {Introducing {V}aucanson}, Urllocal = {pub/lombardy.03.ciaa.pdf}, Volume = {2759}, Year = {2003}, }
Local bibtex abstract 1 download
@inproceedings{regisgianas03poosc, Abstract = {Vaucanson is a C++ generic library for weighted finite state machine manipulation. For the sake of generality, FSM are defined using algebraic structures such as alphabet (for the letters), free monoid (for the words), semiring (for the weights) and series (mapping from words to weights). As usual, what is at stake is to maintain efficiency while providing a high-level layer for the writing of generic algorithms. Yet, one of the particularities of FSM manipulation is the need of a fine grained specialization power on an object which is both an algebraic concept and an intensive computing machine.}, Address = {Darmstadt, Germany}, Author = {Yann R{\'e}gis-Gianas and Raphael Poss}, Booktitle = {Proc. Workshop on Parallel/High-performance Object-Oriented Scientific Computing (POOSC; in conjunction with ECOOP)}, Editor = {J{\"o}rg Striegnitz and Kei Davis}, Keywords = {Vaucanson, C++, generic programming}, Month = {July}, Number = {FZJ-ZAM-IB-2003-09}, Page = {71--82}, Project = {Vaucanson}, Read = {Oui}, Series = {John von Neumann Institute for Computing (NIC)}, Title = {On orthogonal specialization in {C++}: dealing with efficiency and algebraic abstraction in {V}aucanson}, Urllocal = {pub/regisgianas.03.poosc.pdf}, Year = {2003}, }
Embedding in another Page
Copy&paste any of the following snippets into an existing page to embed this page. For more details see the documention.
JavaScript (easiest)
<script src="https://bibbase.org/show?bib=https%3A%2F%2Fscience.raphael.poss.name%2Fpub.bib&jsonp=1"></script>
<?php
$contents = file_get_contents("https://bibbase.org/show?bib=https%3A%2F%2Fscience.raphael.poss.name%2Fpub.bib");
print_r($contents);
?>
<iframe src="https://bibbase.org/show?bib=https%3A%2F%2Fscience.raphael.poss.name%2Fpub.bib"></iframe>