Publications

Work-related, self-published writings not affiliated with research institutes:

On dr-knz.net

My Go executable files are still large.

April 2021.
Errors vs. exceptions in Go and C++ in 2020.

December 2020.
The Go low-level calling convention on x86-64 (updated).

November 2020.
Cloud-Native Security has Two R’s, not Three

September 2020.
Expressivity vs Tractability.

September 2020.
Local connection overheads in PostgreSQL and CockroachDB.

April 2020.
Data flows and security architecture in CockroachDB.

February 2020.
Why are my Go executable files so large?.

March 2019.
Measuring errors vs. exceptions in Go and C++.

September 2018.
Measuring multiple return values in Go and C++.

August 2018.
Measuring argument passing in Go and C++.

August 2018.
The Go low-level calling convention on x86-64.

July 2018.
CS PhD student in the Netherlands: to be or not to be?.

January 2015.
On the future of computer science.

September 2014.
How good are you at programming?.

July 2014.
Rust for functional programmers.

July 2014.
Categories from scratch.

April 2014.
Haskell for OCaml programmers.

March 2014.
Proefstuderen Informatica.

January 2014.
Introductie Unix.

2 september 2013.

Third-party channels

Nested transactions in CockroachDB 20.1.

Cockroach Labs Blog, June 2020.
Why are my Go executable files so large?.

Cockroach Labs Blog, April 2019. Edited version of my previous article from March 2019.
The SQL Layer in CockroachDB.

CockroachDB project, GitHub, September 2018.
Why CockroachDB and PostgreSQL are compatible.

Cockroach Labs Blog, August 2018. Edited version of my previous blog post from May 2018.
Local and distributed query processing in CockroachDB.

Cockroach Labs Blog, June 2017.
On the Way to Better SQL Joins in CockroachDB.

Cockroach Labs Blog, February 2017.
Memory Usage in CockroachDB.

Cockroach Labs Blog, November 2016.
Squashing a Schrödinbug With Strong Typing.

Cockroach Labs Blog, August 2016.
Modesty in Simplicity: CockroachDB’s JOIN.

Cockroach Labs Blog, July 2016.
Critters in a Jar: Running CockroachDB on FreeBSD.

Cockroach Labs Blog, July 2016.
Revisiting SQL typing in CockroachDB, with Nathan VanBenschoten.

Cockroach Labs Blog, June 2016.
DIY Jepsen Testing CockroachDB.

Cockroach Labs Blog, April 2016.
Hacker-Friendly Systems for Systems Innovation (Extrinsically Adaptable Systems).

Cryptome 2013-0854, July 2013.

Academic publications

My academic publication stream is on hiatus as of january 2016: I have currently access to a funding source which enables me to continue research (and teaching) without the pressure to publish, and I intend to exploit this opportunity for as long as I can.

BibBase poss, r

generated by

Group by
- Year
- Author
- Type
- Keyword
- Downloads

2025 (1)

CockroachDB Serverless: Sub-second scaling from zero with multi-region cluster virtualization. Swenson, J.; Kimball, A.; Poss, R.; Taft, R.; Lim, J.; Storm, A.; Bhola, S.; Bulkley-Logston, P.; Tatlow, P.; Harding, R.; Shamim, R.; Maru, A.; and Sharif, I. In Proceedings of the Companion of the 2025 International Conference on Management of Data, of SIGMOD-Companion ’25, New York, NY, USA, 2025. ACM

CockroachDB Serverless: Sub-second scaling from zero with multi-region cluster virtualization [link]

Paper

Doi doi link bibtex

@inproceedings{poss25sigmod,
        author = {Swenson, Jeff and Kimball, Andy and Poss, Raphael and Taft, Rebecca and Lim, Jay and Storm, Adam and Bhola, Sumeer and Bulkley-Logston, Paul and Tatlow, PJ and Harding, Rachael and Shamim, Rafi
and Maru, Aditya and Sharif, Irfan},
        title = {{CockroachDB} Serverless: Sub-second scaling from zero with multi-region cluster virtualization},
        year = {2025},
        isbn = {979-8-4007-1564-8/2025/06},
        publisher = {ACM},
        address = {New York, NY, USA},
        url = {https://doi.org/10.1145/3722212.3724432},
        Doi = {10.1145/3722212.3724432}, Urldoi = {http://dx.doi.org/10.1145/3722212.3724432},
        booktitle = {Proceedings of the Companion of the 2025 International Conference on Management of Data},
        series = {SIGMOD-Companion ’25}
}

2020 (1)

CockroachDB: The Resilient Geo-Distributed SQL Database. Taft, R.; Sharif, I.; Matei, A.; VanBenschoten, N.; Lewis, J.; Grieger, T.; Niemi, K.; Woods, A.; Birzin, A.; Poss, R.; Bardea, P.; Ranade, A.; Darnell, B.; Gruneir, B.; Jaffray, J.; Zhang, L.; and Mattis, P. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, of SIGMOD ’20, pages 1493–1509, New York, NY, USA, 2020. ACM

CockroachDB: The Resilient Geo-Distributed SQL Database [link]

Paper

Doi doi link bibtex 1 download

@inproceedings{poss20sigmod,
author = {Taft, Rebecca and Sharif, Irfan and Matei, Andrei and VanBenschoten, Nathan and Lewis, Jordan and Grieger, Tobias and Niemi, Kai and Woods, Andy and Birzin, Anne and Poss, Raphael and Bardea, Paul and Ranade, Amruta and Darnell, Ben and Gruneir, Bram and Jaffray, Justin and Zhang, Lucy and Mattis, Peter},
title = {{CockroachDB}: The Resilient Geo-Distributed {SQL} Database},
year = {2020},
isbn = {9781450367356},
publisher = {ACM},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3318464.3386134},
Doi = {10.1145/3318464.3386134}, Urldoi = {http://dx.doi.org/10.1145/3318464.3386134},
booktitle = {Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data},
pages = {1493–1509},
numpages = {17},
location = {Portland, OR, USA},
series = {SIGMOD ’20}
}

2015 (3)

AM$^3$: Towards a hardware Unix accelerator for many-cores. Poss, R.; and Koning, K. IEEE Trans. Parallel Distrib. Syst., 26. October 2015.

AM$^3$: Towards a hardware Unix accelerator for many-cores [link]

Doi

AM$^3$: Towards a hardware Unix accelerator for many-cores [pdf]

Local doi link bibtex abstract

@article{poss15tpds,
	Abstract = {This article advocates the use of new architectural features commonly
found in many-cores to replace the machine model underlying Unix-like
operating systems. We present a general Abstract Many-core Machine Model
(AM3), a proof-of-concept implementation and first evaluation results in
the context of an emerging many-core, hardware multi-threaded
architecture without support for interrupts. Our proposed approach makes
it possible to reuse off-the-shelf multithreaded/multiprocess software
on massively parallel architectures, without need to change code to use
custom programming models like CUDA or OpenCL. Benefits include higher
hardware utilization, higher performance and higher energy efficiency
for workloads common to general-purpose platforms, such as in
datacenters and Clouds. The benefits also include simpler software
control over the hardware platform, an enabling factor for the further
evolution of parallel programming languages.},
	Author = {{Raphael~`kena'} Poss and Koen Koning},


	Doi = {10.1109/TPDS.2015.2492542}, Urldoi = {http://dx.doi.org/10.1109/TPDS.2015.2492542},
	Issn = {1045-9219},
	Urllocal = {pub/poss.15.tpds.pdf},
	Journal = {IEEE Trans. Parallel Distrib. Syst.},
	Month = {October},
	Title = {{AM$^3$}: Towards a hardware {Unix} accelerator for many-cores},
	Volume = {26},
	Year = {2015}}

Leerlijn informaticavaardigheden. Poss, R.; van Wijk , R.; and the Computer Science teaching staff of the University of Amsterdam University of Amsterdam, August 2015.

Paper link bibtex

@manual{poss14liv,
	Author = {{Raphael~`kena'} Poss and Robert {van Wijk} and the Computer Science teaching staff of the University of Amsterdam},


	Month = {August},
	Organization = {University of Amsterdam},
	Title = {Leerlijn informaticavaardigheden},
	Url = {http://liv.science.uva.nl/},
	Year = {2015},
	}

CS PhD student in the Netherlands: to be or not to be?. Poss, R. . January 2015.

CS PhD student in the Netherlands: to be or not to be? [link]

Paper

CS PhD student in the Netherlands: to be or not to be? [pdf]

Pdf link bibtex 1 download

@article{poss15phd,
	Author = {{Raphael~`kena'} Poss},


	Month = {January},
	Title = {{CS} {PhD} student in the Netherlands: to be or not to be?},
	Url = {http://dr-knz.net/so-you-want-to-apply-for-a-cs-phd.html},
	Urlpdf = {so-you-want-to-apply-for-a-cs-phd.pdf},
	Year = {2015},
	}

2014 (14)

On the future of computer science. Poss, R. . September 2014.

On the future of computer science [link]

Paper

Pdf link bibtex

@article{poss14cs,
	Author = {{Raphael~`kena'} Poss},


	Month = {September},
	Title = {On the future of computer science},
	Url = {http://dr-knz.net/on-the-future.html},
	Urlpdf = {on-the-future.pdf},
	Year = {2014},
	}

Rethread: A Low-cost Transient Fault Recovery Scheme for Multithreaded Processors. Fu, J.; Yang, Q.; Poss, R.; Jesshope, C.; and Zhang, C. In Proc. 9th International Conference on Availability, Reliability and Security (ARES‘14), pages 88–93, University of Fribourg, Switzerland, September 2014. IEEE

Rethread: A Low-cost Transient Fault Recovery Scheme for Multithreaded Processors [link]

Doi doi link bibtex abstract

@inproceedings{fu14ares,
	Abstract = {Transient fault recovery is important in processor availability. However, significant silicon or performance overheads are characteristics of existing techniques. We uncover an opportunity to reduce the overheads dramatically in modern processors that appears as a side-effect of introducing hard- ware multithreading to improve performance. We observe that threads are usually short code sequences with no branches and few memory side-effects, which means that the number of checkpoints is small and constant. In addition, the state structures of a thread already presented in hardware can be reused to provide checkpointing. In this paper, we demonstrate this principle of using a hardware/software co-design called Rethread, which features compiler-generated code annotations and automatic recovery in hardware by restarting threads. This approach provides the ability to recover from transient faults without dedicated hardware. Moreover, results show performance degradation under both fault-free condition (\<5\%) and as a function of fault rate.},
	Address = {University of Fribourg, Switzerland},
	Author = {Jian Fu and Qiang Yang and Raphael Poss and Chris Jesshope and Chunyuan Zhang},
	Booktitle = {Proc. 9th International Conference on Availability, Reliability and Security (ARES'14)},


	Doi = {10.1109/ARES.2014.18}, Urldoi = {http://dx.doi.org/10.1109/ARES.2014.18},
	Month = {September},
	Pages = {88--93},
	Publisher = {IEEE},
	Title = {Rethread: A Low-cost Transient Fault Recovery Scheme for Multithreaded Processors},
	Year = {2014},
	}

Signature-based high-level simulation of microthreaded many-core architectures. Uddin, I.; Poss, R.; and Jesshope, C. In Proc. 4th International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH 2014), pages 509–516, Vienna, Austria, August 2014. Scitepress

Signature-based high-level simulation of microthreaded many-core architectures [link]

Doi doi link bibtex abstract 4 downloads

@inproceedings{mirfan14simultech,
	Abstract = {The simulation of fine-grained latency tolerance based on the dynamic state of the system in high-level simulation of many-core systems is a challenging simulation problem. We have introduced a high-level simulation technique for microthreaded many-core systems based on the assumption that the throughput of the program can always be one cycle per instruction as these systems have fine-grained latency tolerance. However, this assumption is not always true if there are insufficient threads in the pipeline and hence long latency operations are not tolerated. In this paper we introduce Signatures to classify low-level instructions in high-level categories and estimate the performance of basic blocks during the simulation based on the concurrent threads in the pipeline. The simulation of fine-grained latency tolerance improves accuracy in the high-level simulation of many-core systems.},
	Address = {Vienna, Austria},
	Author = {Irfan Uddin and Raphael Poss and Chris Jesshope},
	Booktitle = {Proc. 4th International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH 2014)},


	Doi = {10.5220/0004982405090516}, Urldoi = {http://dx.doi.org/10.5220/0004982405090516},
	Isbn = {978-989-758-038-3},
	Month = {August},
	Pages = {509--516},
	Publisher = {Scitepress},
	Title = {Signature-based high-level simulation of microthreaded many-core architectures},
	Year = {2014},
	}

Rust for functional programmers. Poss, R. . July 2014.

Paper

Pdf link bibtex 43 downloads

@article{poss14rust,
	Author = {{Raphael~`kena'} Poss},


	Eprint = {1407.5670},
	Month = {July},
	Title = {Rust for functional programmers},
	Url = {http://dr-knz.net/rust-for-functional-programmers.html},
	Urlpdf = {rust-for-functional-programmers.pdf},
	Year = {2014},
	}

How good are you at programming?—A CEFR-like approach to measure programming proficiency. Poss, R. . July 2014.

How good are you at programming?—A CEFR-like approach to measure programming proficiency [link]

Paper

How good are you at programming?—A CEFR-like approach to measure programming proficiency [pdf]

Pdf link bibtex 62 downloads

@article{poss14pl,
	Author = {{Raphael~`kena'} Poss},


	Month = {July},
	Title = {How good are you at programming?---A {CEFR}-like approach to measure programming proficiency},
	Url = {http://dr-knz.net/programming-levels.html},
	Urlpdf = {programming-levels.pdf},
	Year = {2014},
	}

Academia 2.0: removing the publisher middle-man while retaining impact. Poss, R.; Altmeyer, S.; Thompson, M.; and Jelier, R. In Proc 1st ACM SIGPLAN Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering (TRUST‘14), pages 3:1–3:6, Edinburgh, UK, June 2014. ACM

Academia 2.0: removing the publisher middle-man while retaining impact [link]

Doi

Academia 2.0: removing the publisher middle-man while retaining impact [pdf]

Local doi link bibtex abstract 16 downloads

@inproceedings{poss14trust,
	Abstract = {Recent work on academic publishing has focused on transparency, to eliminate skews in favor of results channeled through already established publishers. This movement, called ``open peer review'', will require infrastructure. So far, proposed realizations of open peer review have relied on centralized coordinating platforms; this is unsatisfactory as this architectural choice stays vulnerable to long-term predatory commercial capture and data loss. Instead, we propose ``Academia 2.0'', a combination of both true peer-to-peer, distributed scientific dissemination channels, and their accompanying workflows for open peer review. It features safe decoupling of storage, indexing and search sites and supports research metrics. Our proposal relies on the existence of semantic web sites for researchers and powerful Internet search engines, an assumption which did not hold 10 years ago. We also introduce post-hoc citations, a key mechanism for quality control, impact measurement and post-hoc credit attribution for previous work. Due to the technology involved, computer engineering is likely the scientific field with the most potential to try out and evaluate our model.},
	Acmid = {2618139},
	Address = {Edinburgh, UK},
	Author = {Raphael Poss and Sebastian Altmeyer and Mark Thompson and Rob Jelier},
	Booktitle = {Proc 1st ACM SIGPLAN Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering (TRUST'14)},


	Doi = {10.1145/2618137.2618139}, Urldoi = {http://dx.doi.org/10.1145/2618137.2618139},
	Isbn = {978-1-4503-2951-4},
	Month = {June},
	Pages = {3:1--3:6},
	Publisher = {ACM},
	Title = {{Academia 2.0}: removing the publisher middle-man while retaining impact},
	Urllocal = {pub/poss.14.trust.pdf},
	Year = {2014},
	
	}

Aca 2.0: Questions and Answers. Poss, R.; Altmeyer, S.; Thompson, M.; and Jelier, R. Technical Report University of Amsterdam, May 2014.

Local link bibtex abstract 10 downloads

@techreport{poss14trust2,
	Abstract = {"Academia 2.0" is a proposal to organize scientific publishing around true peer-to-peer distributed dissemination channels and eliminate the traditional role of the academic publisher. This model will be first presented at the 2014 workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering (TRUST'14) in the form of a high-level overview, so as to stimulate discussion and gather feedback on its merits and feasibility. This report complements the 6-page introductory article presented at TRUST, by answering the reviewer's comments in detail and reviewing the related work on open peer review.},
	Author = {R. Poss and S. Altmeyer and M. Thompson and R. Jelier},


	Eprint = {1404.7753},
	Institution = {University of Amsterdam},
	Month = {May},
	Read = {1},
	Title = {Aca 2.0: Questions and Answers},
	Urllocal = {pub/poss.14.trust2.pdf},
	Year = {2014},
	}

Categories from scratch. Poss, R. . April 2014.

Paper

Pdf link bibtex 43 downloads

@article{poss14cfs,
	Author = {{Raphael~`kena'} Poss},


	Eprint = {1405.3073},
	Month = {April},
	Title = {Categories from scratch},
	Url = {http://dr-knz.net/categories-from-scratch.html},
	Urlpdf = {categories-from-scratch.pdf},
	Year = {2014},
	}

Haskell for OCaml programmers. Poss, R. . March 2014.

Paper

Pdf link bibtex 26 downloads

@article{poss14hs4mlp,
	Author = {{Raphael~`kena'} Poss},


	Eprint = {1405.3072},
	Month = {March},
	Title = {Haskell for {OCaml} programmers},
	Url = {http://dr-knz.net/haskell-for-ocaml-programmers.html},
	Urlpdf = {haskell-for-ocaml-programmers.pdf},
	Year = {2014},
	}

A Fault Detection Mechanism in a Data-flow Scheduled Multithreaded Processor. Fu, J.; Yang, Q.; Poss, R.; Jesshope, C.; and Zhang, C. In Proc. 2014 Conference on Design, Automation and Test in Europe (DATE‘14), pages 1–4, Dresden, Germany, March 2014. IEEE

A Fault Detection Mechanism in a Data-flow Scheduled Multithreaded Processor [link]

Doi doi link bibtex abstract 6 downloads

@inproceedings{fu14date,
	Abstract = {This paper designs and implements the Redundant Multi-Threading (RMT) in a Data-flow scheduled Multi-Threaded (DMT) multicore processor, called Data-flow scheduled Redundant Multi-Threading (DRMT). Meanwhile, It presents Asynchronous Output Comparison (AOC) for RMT techniques to avoid fault detection related inter-core communication and alleviate the performance and hardware overheads induced by output comparison. Results show that the performance overhead of DRMT is less than 60\% even when the number of threads is four times the number of processing elements. Also the performance and hardware overheads of AOC are insignificant.},
	Address = {Dresden, Germany},
	Author = {Jian Fu and Qiang Yang and Raphael Poss and Chris Jesshope and Chunyuan Zhang},
	Booktitle = {Proc. 2014 Conference on Design, Automation and Test in Europe (DATE'14)},


	Doi = {10.7873/DATE.2014.076}, Urldoi = {http://dx.doi.org/10.7873/DATE.2014.076},
	Month = {March},
	Pages = {1--4},
	Publisher = {IEEE},
	Title = {A Fault Detection Mechanism in a Data-flow Scheduled Multithreaded Processor},
	Year = {2014},
	}

Analytical-based high-level simulation of the microthreaded many-cores architectures. Uddin, I.; Poss, R.; and Jesshope, C. In Proc. 22nd Euromicro International Conference on Parallel, distributed and network-based processing (PDP‘14), pages 344–351, Turin, Italy, February 2014. IEEE Computer Society

Analytical-based high-level simulation of the microthreaded many-cores architectures [link]

Doi doi link bibtex abstract

@inproceedings{mirfan14pdp,
	Abstract = {High-level simulation is becoming commonly used for design space exploration of many-core systems. We have been working on high-level simulation techniques for the microthreaded many-core architecture at the University of Amsterdam. In previous work different levels of high-level simulation for instruction execution have been proposed, where the objective of every level is to keep the highest possible abstraction in order to achieve the least complexity and highest simulation speed with a compromise on the amount of accuracy. In this article we propose a new breakthrough in abstraction by simulating entire compo- nents in applications using analytical models. This simulation technique greatly reduces the complexity of the simulator and increases the simulation speed by orders of magnitude compared to the other levels of the high-level simulator, without affecting the simulation accuracy.},
	Address = {Turin, Italy},
	Author = {Irfan Uddin and Raphael Poss and Chris Jesshope},
	Booktitle = {Proc. 22nd Euromicro International Conference on Parallel, distributed and network-based processing (PDP'14)},


	Doi = {10.1109/PDP.2014.81}, Urldoi = {http://dx.doi.org/10.1109/PDP.2014.81},
	Issn = {1066-6192},
	Month = {February},
	Pages = {344--351},
	Publisher = {IEEE Computer Society},
	Title = {Analytical-based high-level simulation of the microthreaded many-cores architectures},
	Year = {2014},
	}

People-Specific Languages: a case for automated programming language generation by reverse-engineering programmer minds. Poss, R. In Bagge, A. H.; and Zaytsev, V., editor(s), Proc. 2nd International Workshop on Open and Original Problems in Software Language Engineering (OOPSLE‘14), pages 15–18, Antwerp, Belgium, February 2014.

People-Specific Languages: a case for automated programming language generation by reverse-engineering programmer minds [pdf]

Paper link bibtex abstract 6 downloads

@inproceedings{poss14oopsle,
	Abstract = {The innovation of DSLs was the recognition that each application domain has its few idiomatic patterns of language use, found often in that domain and rarely in others. Capturing these idioms in the language design makes a DSL and yields gains in productivity, reliability and maintainability. Similarly, different groups of programmers have different predominant cognitive quirks. In this article I argue that programmers are attracted to some types of languages that resonate with their quirks and reluctant to use others that grate against them. Hence the question: could we tailor or evolve programming languages to the particular personality of their users? Due to the sheer diversity of personality types, any answer should be combined with automated language generation. The potential benefits include a leap in productivity and more social diversity in software engineering workplaces. The main pitfall is the risk of introducing new language barriers between people and decreased code reuse. However this may be avoidable by combining automated language generation with shared fundamental semantic building blocks.},
	Address = {Antwerp, Belgium},
	Author = {Raphael Poss},
	Booktitle = {Proc. 2nd International Workshop on Open and Original Problems in Software Language Engineering (OOPSLE'14)},


	Editor = {Anya Helene Bagge and Vadim Zaytsev},
	Month = {February},
	Pages = {15--18},
	Title = {People-Specific Languages: a case for automated programming language generation by reverse-engineering programmer minds},
	Url = {http://oopsle.github.io/2014/abstracts.pdf},
	Year = {2014},
	}

Multicore Architectures and Their Software Landscape (Chapter 24). Poss, R. Volume Computer Science and Software Engineering . Computing Handbook, Third Edition. Gonzalez, T.; Diaz-Herrera, J.; and Tucker, A., editor(s). Chapman and Hall/CRC, Third edition, 2014.

Computing Handbook, Third Edition [link]

Doi

Paper

Local doi link bibtex 31 downloads

@inbook{poss13csh,
	Author = {Raphael Poss},
	Chapter = {Multicore Architectures and Their Software Landscape (Chapter 24)},


	Doi = {10.1201/b16812}, Urldoi = {http://dx.doi.org/10.1201/b16812},
	Edition = {Third},
	Editor = {Teofilo Gonzalez and Jorge Diaz-Herrera and Allen Tucker},
	Isbn = {978-1-4398-9852-9},
	Publisher = {Chapman and Hall/CRC},
	Read = {1},
	Title = {Computing Handbook, Third Edition},
	Url = {http://www.crcpress.com/product/isbn/9781439898529},
	Urllocal = {pub/poss.13.csh.pdf},
	Volume = {Computer Science and Software Engineering},
	Year = {2014},
	
	}

Cache-based high-level simulation of the microthreaded many-core architectures. Uddin, I.; Poss, R.; and Jesshope, C. Journal of Systems Architecture, 60(7): 529–552. 2014.

Cache-based high-level simulation of the microthreaded many-core architectures [link]

Doi doi link bibtex abstract 1 download

@article{irfan14jsa,
	Abstract = {The accuracy of simulated cycles in high-level simulators is generally less than the accuracy in detailed simulators for a single-core systems, because high-level simulators simulate the behaviour of components rather than the components themselves as in detailed simulators. The simulation problem becomes more challenging when simulating many-core systems, where many cores are executing instructions concurrently. In these systems data may be accessed from multiple caches and the abstraction of the instruction execution has to consider the dynamic resource sharing on the whole chip. The problem becomes even more challenging in microthreaded many-core systems, because there may exist concurrent hardware threads. Which means that the latency of long latency operations can be tolerated from many cycles to just few cycles. We have previously presented a simulation technique to improve the accuracy in high-level simulation of microthreaded many-core systems, known as Signature-based high-level simulator, which adapts the throughput of the program based on the type of instructions, number of instructions and number of active threads in the pipeline. However, it disregards the access to different levels of the caches on the many-core system. Accessing L1-cache has far less latency than accessing off-chip memory and if the core is not able to tolerate latency, different levels of caches can not be treated equally. The distributed cache network along with the synchronization-aware coherency protocol in the Microgrid is a complicated memory architecture and it is difficult to simulate its behaviour at a high-level. In this article we present a high-level cache model, which aims to improve the accuracy in high-level simulators for general-purpose many-core systems by adding little complexity to the simulator and without affecting the simulation speed.},
	Author = {Irfan Uddin and Raphael Poss and Chris Jesshope},


	Doi = {10.1016/j.sysarc.2014.05.003}, Urldoi = {http://dx.doi.org/10.1016/j.sysarc.2014.05.003},
	Issn = {1383-7621},
	Journal = {Journal of Systems Architecture},
	Number = {7},
	Pages = {529--552},
	Title = {Cache-based high-level simulation of the microthreaded many-core architectures},
	Volume = {60},
	Year = {2014},
	}

2013 (15)

Apple-CORE: harnessing general-purpose many-cores with hardware concurrency management. Poss, R.; Lankamp, M.; Yang, Q.; Fu, J.; van Tol , M. W.; Uddin, I.; and Jesshope, C. Microprocessors and Microsystems, 37(8): 1090–1101. November 2013.

Apple-CORE: harnessing general-purpose many-cores with hardware concurrency management [link]

Doi

Apple-CORE: harnessing general-purpose many-cores with hardware concurrency management [pdf]

Local doi link bibtex abstract 5 downloads

@article{poss13micpro,
	Abstract = {To harness the potential of CMPs for scalable, energy-efficient performance in general-purpose computers, the Apple-CORE project has co-designed a general machine model and concurrency control interface with dedicated hardware support for concurrency management across multiple cores. Its SVP interface combines dataflow synchronisation with imperative programming, towards the efficient use of parallelism in general-purpose workloads. Its implementation in hardware provides logic able to coordinate single-issue, in-order multi-threaded RISC cores into computation clusters on chip, called Microgrids. In contrast with the traditional ``accelerator'' approach, Microgrids are components in distributed systems on chip that consider both clusters of small cores and optional, larger sequential cores as system services shared between applications. The key aspects of the design are asynchrony, i.e. the ability to tolerate irregular long latencies on chip, a scale-invariant programming model, a distributed chip resource model, and the transparent performance scaling of a single program binary code across multiple cluster sizes. This article describes the execution model, the core micro-architecture, its realization in a many-core, general-purpose processor chip and its software environment. This article also presents cycle-accurate simulation results for various key algorithmic and cryptographic kernels. The results show good efficiency in terms of the utilisation of hardware despite the high-latency memory accesses and good scalability across relatively large clusters of cores.},
	Author = {Raphael Poss and Mike Lankamp and Qiang Yang and Jian Fu and Michiel W. {van Tol} and Irfan Uddin and Chris Jesshope},


	Doi = {10.1016/j.micpro.2013.05.004}, Urldoi = {http://dx.doi.org/10.1016/j.micpro.2013.05.004},
	Issn = {0141-9331},
	Journal = {Microprocessors and Microsystems},
	Month = {November},
	Number = {8},
	Pages = {1090--1101},
	Read = {1},
	Title = {{Apple-CORE}: harnessing general-purpose many-cores with hardware concurrency management},
	Urllocal = {pub/poss.13.micpro.pdf},
	Volume = {37},
	Year = {2013},
	
	}

Machines are benchmarked by code, not algorithms. Poss, R. Computing Research Repository. September 2013.

Machines are benchmarked by code, not algorithms [link]

Paper

Machines are benchmarked by code, not algorithms [pdf]

Local link bibtex abstract 3 downloads

@article{poss13bench,
	Abstract = {This article highlights how small modifications to either the source code of a benchmark program or the compilation options may impact its behavior on a specific machine. It argues that for evaluating machines, benchmark providers and users be careful to ensure reproducibility of results based on the machine code actually running on the hardware and not just source code. The article uses color to grayscale conversion of digital images as a running example.},
	Author = {{Raphael~`kena'} Poss},


	Journal = {Computing Research Repository},
	Month = {September},
	Read = {1},
	Title = {Machines are benchmarked by code, not algorithms},
	Url = {http://arxiv.org/abs/1309.0534},
	Urllocal = {pub/poss.13.bench.pdf},
	Year = {2013},
	}

Introductie Unix — De eerste dag overleven. Poss, R. . September 2013.

Introductie Unix — De eerste dag overleven [link]

Paper

Introductie Unix — De eerste dag overleven [pdf]

Pdf link bibtex 18 downloads

@article{poss13unix,
	Author = {{Raphael~`kena'} Poss},


	Month = {September},
	Title = {Introductie Unix --- De eerste dag overleven},
	Url = {http://dr-knz.net/intro-unix.html},
	Urlpdf = {intro-unix.pdf},
	Year = {2013},
	}

Optimizing for confidence—Costs and opportunities at the frontier between abstraction and reality. Poss, R. Computing Research Repository. August 2013.

Optimizing for confidence—Costs and opportunities at the frontier between abstraction and reality [link]

Paper

Optimizing for confidence—Costs and opportunities at the frontier between abstraction and reality [pdf]

Local link bibtex abstract 3 downloads

@article{poss13iocosts,
	Abstract = {Is there a relationship between computing costs and the confidence people place in the behavior of computing systems? What are the tuning knobs one can use to optimize systems for human confidence instead of correctness in purely abstract models? This report explores these questions by reviewing the mechanisms by which people build confidence in the match between the physical world behavior of machines and their abstract intuition of this behavior according to models or programming language semantics. We highlight in particular that a bottom-up approach relies on arbitrary trust in the accuracy of I/O devices, and that there exists clear cost trade-offs in the use of I/O devices in computing systems. We also show various methods which alleviate the need to trust I/O devices arbitrarily and instead build confidence incrementally "from the outside" by considering systems as black box entities. We highlight cases where these approaches can reach a given confidence level at a lower cost than bottom-up approaches. },
	Author = {{Raphael~`kena'} Poss},


	Journal = {Computing Research Repository},
	Month = {August},
	Read = {1},
	Title = {Optimizing for confidence---Costs and opportunities at the frontier between abstraction and reality},
	Url = {http://arxiv.org/abs/1308.1602},
	Urllocal = {pub/poss.13.iocosts.pdf},
	Year = {2013},
	}

MGSim—A simulation Environment for Multi-Core Research and Education. Poss, R.; Lankamp, M.; Yang, Q.; Fu, J.; Uddin, I.; and Jesshope, C. In Proc. Intl. Conf. on Embedded Computer Systems: Architectures, MOdeling and Simulation (SAMOS XIII), pages 80–87, July 2013. IEEE

MGSim—A simulation Environment for Multi-Core Research and Education [link]

Doi

MGSim—A simulation Environment for Multi-Core Research and Education [pdf]

Local doi link bibtex abstract 5 downloads

@inproceedings{poss13samos,
	Abstract = {This article presents MGSim, an open source discrete event simulator for on-chip hardware components developed at the University of Amsterdam. MGSim is used as research and teaching vehicle to study the fine-grained hardware/software interactions on many-core chips with and without hardware multithreading. MGSim's component library includes support for core models with different instruction sets, a configurable multi-core interconnect, multiple configurable cache and memory models, a dedicated I/O subsystem, and comprehensive monitoring and interaction facilities. The default model configuration shipped with MGSim implements Microgrids, a multi-core architecture with hardware concurrency management. MGSim is furthermore written mostly in C++ and uses object classes to represent chip components. It is optimized for architecture models that can be described as process networks.},
	Author = {Raphael Poss and Mike Lankamp and Qiang Yang and Jian Fu and Irfan Uddin and Chris Jesshope},
	Booktitle = {Proc. Intl. Conf. on Embedded Computer Systems: Architectures, MOdeling and Simulation (SAMOS XIII)},


	Doi = {10.1109/SAMOS.2013.6621109}, Urldoi = {http://dx.doi.org/10.1109/SAMOS.2013.6621109},
	Month = {July},
	Pages = {80--87},
	Publisher = {IEEE},
	Read = {1},
	Title = {{MGSim}---A simulation Environment for Multi-Core Research and Education},
	Urllocal = {pub/poss.13.samos.pdf},
	Year = {2013},
	}

On-demand Thread-level Fault Detection in a Concurrent Programming Environment. Fu, J.; Yang, Q.; Poss, R.; Jesshope, C.; and Zhang, C. In Proc. Intl. Conf. on Embedded Computer Systems: Architectures, MOdeling and Simulation (SAMOS XIII), pages 255–262, July 2013. IEEE

On-demand Thread-level Fault Detection in a Concurrent Programming Environment [link]

Doi

On-demand Thread-level Fault Detection in a Concurrent Programming Environment [pdf]

Local doi link bibtex abstract

@inproceedings{fu13samos,
	Abstract = {The vulnerability of multi-core processors is increasing due to tighter design margins and greater susceptibility to interference. Moreover, concurrent programming environments are the norm in the exploitation of multi-core systems. In this paper, we present an on-demand thread-level fault detection mechanism for multi-cores. The main contribution is on-demand redundancy, which allows users to set the redundancy scope in the concurrent code. To achieve this we introduce intelligent redundant thread creation and synchronization, which manages concurrency and synchronization between the redundant threads via the master. This framework was implemented in an emulation of a multi-threaded, many-core processor with single, in-order issue cores. It was evaluated by a range of programs in image and signal processing, and encryption. The performance overhead of redundancy is less than 11% for single core execution and is always less than 100% for all scenarios. This efficiency derives from the platform's hardware concurrency management and latency tolerance.},
	Author = {Jian Fu and Qiang Yang and Raphael Poss and Chris Jesshope and Chunyuan Zhang},
	Booktitle = {Proc. Intl. Conf. on Embedded Computer Systems: Architectures, MOdeling and Simulation (SAMOS XIII)},


	Doi = {10.1109/SAMOS.2013.6621132}, Urldoi = {http://dx.doi.org/10.1109/SAMOS.2013.6621132},
	Month = {July},
	Pages = {255--262},
	Publisher = {IEEE},
	Read = {1},
	Title = {On-demand Thread-level Fault Detection in a Concurrent Programming Environment},
	Urllocal = {pub/fu.13.samos.pdf},
	Year = {2013},
	}

Characterizing traits of coordination. Poss, R. Computing Research Repository. July 2013.

Characterizing traits of coordination [link]

Paper

Characterizing traits of coordination [pdf]

Local link bibtex abstract 4 downloads

@article{poss13ctc,
	Abstract = {How can one recognize coordination languages and technologies? As this report shows, the common approach that contrasts coordination with computation is intellectually unsound: depending on the selected understanding of the word "computation", it either captures too many or too few programming languages. Instead, we argue for objective criteria that can be used to evaluate how well programming technologies offer coordination services. Of the various criteria commonly used in this community, we are able to isolate three that are strongly characterizing: black-box componentization, which we had identified previously, but also interface extensibility and customizability of run-time optimization goals. These criteria are well matched by Intel's Concurrent Collections and AstraKahn, and also by OpenCL, POSIX and VMWare ESX. },
	Author = {{Raphael~`kena'} Poss},


	Journal = {Computing Research Repository},
	Month = {July},
	Read = {1},
	Title = {Characterizing traits of coordination},
	Url = {http://arxiv.org/abs/1307.4827},
	Urllocal = {pub/poss.13.ctc.pdf},
	Year = {2013},
	}

S+Net: extending functional coordination with extra-functional semantics. Poss, R.; Verstraaten, M.; Penczek, F.; Grelck, C.; Kirner, R.; and Shafarenko, A. Technical Report arXiv:1306.2743v1 [cs.PL], University of Amsterdam and University of Hertfordshire, June 2013.

S+Net: extending functional coordination with extra-functional semantics [link]

Paper

S+Net: extending functional coordination with extra-functional semantics [pdf]

Local link bibtex abstract 1 download

@techreport{poss13spnet,
	Abstract = {This technical report introduces S+Net, a compositional coordination language for streaming networks with extra-functional semantics. Compositionality simplifies the specification of complex parallel and distributed applications; extra-functional semantics allow the application designer to reason about and control resource usage, performance and fault handling. The key feature of S+Net is that functional and extra-functional semantics are defined orthogonally from each other. S+Net can be seen as a simultaneous simplification and extension of the existing coordination language S-Net, that gives control of extra-functional behavior to the S-Net programmer. S+Net can also be seen as a transitional research step between S-Net and AstraKahn, another coordination language currently being designed at the University of Hertfordshire. In contrast with AstraKahn which constitutes a re-design from the ground up, S+Net preserves the basic operational semantics of S-Net and thus provides an incremental introduction of extra-functional control in an existing language.},
	Author = {Raphael Poss and Merijn Verstraaten and Frank Penczek and Clemens Grelck and Raimund Kirner and Alex Shafarenko},


	Institution = {University of Amsterdam and University of Hertfordshire},
	Month = {June},
	Number = {arXiv:1306.2743v1 [cs.PL]},
	Read = {1},
	Title = {{S+Net}: extending functional coordination with extra-functional semantics},
	Url = {http://arxiv.org/abs/1306.2743},
	Urllocal = {pub/poss.13.spnet.pdf},
	Year = {2013},
	}

Extrinsically adaptable systems. Poss, R. Computing Research Repository. June 2013.

Paper

Local link bibtex abstract 2 downloads

@article{poss13exadapt,
	Abstract = {Are there qualitative and quantitative traits of system design that
contribute to the ability of people to further innovate? We propose that
extrinsic adaptability, the ability given to secondary parties to change a
system to match new requirements not envisioned by the primary provider, is
such a trait. "Extrinsic adaptation" encompasses the popular concepts of
"workaround", "fast prototype extension" or "hack", and extrinsic adaptability
is thus a measure of how friendly a system is to tinkering by curious minds. In
this report, we give "hackability" or "hacker-friendliness" scientific
credentials by formulating and studying a generalization of the concept. During
this exercise, we find that system changes by secondary parties fall on a
subjective gradient of acceptability, with extrinsic adaptations on one side
which confidently preserve existing system features, and invasive modifications
on the other side which are perceived to be disruptive to existing system
features. Where a change is positioned on this gradient is dependent on how an
external observer perceives component boundaries within the changed system. We
also find that the existence of objective cost functions can alleviate but not
fully eliminate this subjectiveness. The study also enables us to formulate an
ethical imperative for system designers to promote extrinsic adaptability.},
	Author = {{Raphael~`kena'} Poss},


	Journal = {Computing Research Repository},
	Month = {June},
	Read = {1},
	Title = {Extrinsically adaptable systems},
	Url = {http://arxiv.org/abs/1306.5445},
	Urllocal = {pub/poss.13.exadapt.pdf},
	Year = {2013},
	
	}

The essence of component-based design and coordination. Poss, R. Computing Research Repository. June 2013.

The essence of component-based design and coordination [link]

Paper

The essence of component-based design and coordination [pdf]

Local link bibtex abstract 4 downloads

@article{poss13coord,
	Abstract = {Is there a characteristic of coordination languages that makes them qualitatively different from general programming languages and deserves special academic attention? This report proposes a nuanced answer in three parts. The first part highlights that coordination languages are the means by which composite software applications can be specified using components that are only available separately, or later in time, via standard interfacing mechanisms. The second part highlights that most currently used languages provide mechanisms to use externally provided components, and thus exhibit some elements of coordination. However not all do, and the availability of an external interface thus forms an objective and qualitative criterion that distinguishes coordination. The third part argues that despite the qualitative difference, the segregation of academic attention away from general language design and implementation has non-obvious cost trade-offs. },
	Author = {{Raphael~`kena'} Poss},


	Institution = {University of Amsterdam},
	Journal = {Computing Research Repository},
	Month = {June},
	Read = {1},
	Title = {The essence of component-based design and coordination},
	Url = {http://arxiv.org/abs/1306.3375},
	Urllocal = {pub/poss.13.coord.pdf},
	Year = {2013},
	
	}

On whether and how D-RISC and Microgrids can be kept relevant (self-assessment report). Poss, R. Technical Report arXiv:1303.4892v1 [cs.AR], University of Amsterdam, March 2013.

On whether and how D-RISC and Microgrids can be kept relevant (self-assessment report) [link]

Paper

On whether and how D-RISC and Microgrids can be kept relevant (self-assessment report) [pdf]

Local link bibtex abstract 3 downloads

@techreport{poss13mg,
	Abstract = {This report lays flat my personal views on D-RISC and Microgrids as of March 2013. It reflects the opinions and insights that I have gained from working on this project during the period 2008-2013. This report is structed in two parts: deconstruction and reconstruction. In the deconstruction phase, I review what I believe are the fundamental motivation and goals of the D-RISC/Microgrids enterprise, and identify what I judge are shortcomings: that the project did not deliver on its expectations, that fundamental questions are left unanswered, and that its original motivation may not even be relevant in scientific research any more in this day and age. In the reconstruction phase, I start by identifying the merits of the current D-RISC/Microgrids technology and know-how taken at face value, re-motivate its existence from a different angle, and suggest new, relevant research questions that could justify continued scientific investment.},
	Author = {{Raphael~`kena'} Poss},


	Institution = {University of Amsterdam},
	Month = {March},
	Number = {arXiv:1303.4892v1 [cs.AR]},
	Read = {1},
	Title = {On whether and how {D-RISC} and {Microgrids} can be kept relevant (self-assessment report)},
	Url = {http://arxiv.org/abs/1303.4892},
	Urllocal = {pub/poss.13.mg.pdf},
	Year = {2013},
	}

On-Chip Traffic Regulation to Reduce Coherence Protocol Cost on a Micro-threaded Many-Core Architecture with Distributed Caches. Yang, Q.; Fu, J.; Poss, R.; and Jesshope, C. ACM Trans. Embed. Comput. Syst., 13(3s): 103:1–103:21. March 2013.

On-Chip Traffic Regulation to Reduce Coherence Protocol Cost on a Micro-threaded Many-Core Architecture with Distributed Caches [link]

Doi doi link bibtex abstract

@article{yang13tecs,
	Abstract = {When hardware cache coherence scales to many cores on chip, the coherence protocol of the shared memory system may offset the benefit from massive hardware concurrency. In this article, we investigate the cost of a write-update policy in terms of on-chip memory network traffic and its adverse effects on the system performance based on a multi-threaded many-core architecture with distributed caches. We discuss possible software and hardware solutions to alleviate the network pressure without changing the protocol. We find that in the context of massive concurrency, by introducing a write-merging buffer with 0.46% area overhead to each core, applications with good locality and concurrency are boosted up by 18.74% in performance on average. Other applications also benefit from this addition and even achieve a throughput increase of 5.93%. In addition, this improvement indicates that higher levels of concurrency per core can be exploited without impacting performance, thus tolerating latency better and giving higher processor efficiencies compared to other solutions.},
	Acmid = {2567931},
	Address = {New York, NY, USA},
	Author = {Qiang Yang and Jian Fu and Raphael Poss and Chris Jesshope},


	Doi = {10.1145/2567931}, Urldoi = {http://dx.doi.org/10.1145/2567931},
	Issn = {1539-9087},
	Journal = {ACM Trans. Embed. Comput. Syst.},
	Month = {March},
	Number = {3s},
	Pages = {103:1--103:21},
	Publisher = {ACM},
	Title = {On-Chip Traffic Regulation to Reduce Coherence Protocol Cost on a Micro-threaded Many-Core Architecture with Distributed Caches},
	Volume = {13},
	Year = {2013},
	}

MGSim—Simulation tools for multi-core processor architectures. Lankamp, M.; Poss, R.; Yang, Q.; Fu, J.; Uddin, I.; and Jesshope, C. R. Technical Report arXiv:1302.1390v1 [cs.AR], University of Amsterdam, February 2013.

MGSim—Simulation tools for multi-core processor architectures [link]

Paper

MGSim—Simulation tools for multi-core processor architectures [pdf]

Local link bibtex abstract 4 downloads

@techreport{lankamp13mgsim,
	Abstract = {MGSim is an open source discrete event simulator for on-chip hardware components, developed at the University of Amsterdam. It is intended to be a research and teaching vehicle to study the fine-grained hardware/software interactions on many-core and hardware multithreaded processors. It includes support for core models with different instruction sets, a configurable multi-core interconnect, multiple configurable cache and memory models, a dedicated I/O subsystem, and comprehensive monitoring and interaction facilities. The default model configuration shipped with MGSim implements Microgrids, a many-core architecture with hardware concurrency management. MGSim is furthermore written mostly in C++ and uses object classes to represent chip components. It is optimized for architecture models that can be described as process networks.},
	Author = {Mike Lankamp and Raphael Poss and Qiang Yang and Jian Fu and Irfan Uddin and Chris R. Jesshope},


	Institution = {University of Amsterdam},
	Month = {February},
	Number = {arXiv:1302.1390v1 [cs.AR]},
	Read = {1},
	Title = {{MGSim}---Simulation tools for multi-core processor architectures},
	Url = {http://arxiv.org/abs/1302.1390},
	Urllocal = {pub/lankamp.13.mgsim.pdf},
	Year = {2013},
	}

Task Migration for S-Net/LPEL. Verstraaten, M.; Kok, S.; Poss, R.; and Grelck, C. In Grelck, C.; Hammond, K.; and Scholz, S., editor(s), Proc. 2nd HiPEAC Workshop on Feedback-Directed Compiler Optimization for Multi-Core Architectures, January 2013.

Paper

Local link bibtex abstract

@inproceedings{verstraaten13fdcoma,
	Abstract = {We propose an extension to S-NET's light-weight parallel execution layer (LPEL): dynamic migration of tasks between cores for improved load balancing and higher throughput of S-NET streaming networks. We sketch out the necessary implementation steps and empirically analyse the impact of task migration on a variety of S-NET applications.},
	Author = {Merijn Verstraaten and Stefan Kok and Raphael Poss and Clemens Grelck},
	Booktitle = {Proc. 2nd HiPEAC Workshop on Feedback-Directed Compiler Optimization for Multi-Core Architectures},


	Editor = {Clemens Grelck and Kevin Hammond and Sven-Bodo Scholz},
	Month = {January},
	Read = {1},
	Title = {Task Migration for {S-Net/LPEL}},
	Url = {http://www.project-advance.eu/wp-content/uploads/2012/07/proceedings.pdf},
	Urllocal = {pub/verstraaten.13.fdcoma.pdf},
	Year = {2013},
	}

Statistical Performance Analysis of an Ant-Colony Optimisation Application in S-Net. MacKenzie, K.; Hölzenspies, P. K. F.; Hammond, K.; Kirner, R.; Nga, N. V. T.; te Boekhorst, R.; Grelck, C.; Poss, R.; and Verstraaten, M. In Grelck, C.; Hammond, K.; and Scholz, S., editor(s), Proc. 2nd HiPEAC Workshop on Feedback-Directed Compiler Optimization for Multi-Core Architectures, January 2013.

Statistical Performance Analysis of an Ant-Colony Optimisation Application in S-Net [pdf]

Paper

Local link bibtex abstract

@inproceedings{mckenzie13fdcoma,
	Abstract = {We consider an ant-colony optimsation problem implemented on a multicore system as a collection of asynchronous stream-processing components under the control of the S-NET coordination language. Statistical analysis and visualisation techniques are used to study the behaviour of the application, and this enables us to discover and correct problems in both the application program and the run-time system underlying S-NET.},
	Author = {Kenneth MacKenzie and Philip Kaj Ferdinand H\"{o}lzenspies and Kevin Hammond and Raimund Kirner and Nguyen Vu Tien Nga and Rene te Boekhorst and Clemens Grelck and Raphael Poss and Merijn Verstraaten},
	Booktitle = {Proc. 2nd HiPEAC Workshop on Feedback-Directed Compiler Optimization for Multi-Core Architectures},


	Editor = {Clemens Grelck and Kevin Hammond and Sven-Bodo Scholz},
	Month = {January},
	Read = {1},
	Title = {Statistical Performance Analysis of an Ant-Colony Optimisation Application in {S-Net}},
	Url = {http://www.project-advance.eu/wp-content/uploads/2012/07/proceedings.pdf},
	Urllocal = {pub/mckenzie.13.fdcoma.pdf},
	Year = {2013},
	}

2012 (7)

Apple-CORE: Microgrids of SVP cores (invited paper). Poss, R.; Lankamp, M.; Yang, Q.; Fu, J.; van Tol , M. W.; and Jesshope, C. In Niar, S., editor(s), Proc. 15th Euromicro Conference on Digital System Design (DSD 2012), September 2012. IEEE Computer Society

Apple-CORE: Microgrids of SVP cores (invited paper) [link]

Doi

Apple-CORE: Microgrids of SVP cores (invited paper) [pdf]

Local doi link bibtex abstract 4 downloads

@inproceedings{poss12dsd,
	Abstract = {To harness the potential of CMPs for scalable, energy-efficient performance in general-purpose computers, the Apple-CORE project has co-designed a general machine model and concurrency control interface with dedicated hardware support for concurrency management across multiple cores. Its SVP interface combines dataflow synchronisation with imperative programming, towards the efficient use of parallelism in general- purpose workloads. The corresponding hardware implementation provides logic able to coordinate single-issue, in-order multi- threaded RISC cores into computation clusters on chip, called Microgrids. In contrast with the traditional ``accelerator'' approach, Microgrids are intended to be used as components in distributed systems on chip that consider both clusters of small cores and optional larger cores optimized towards sequential performance as system services shared between applications. The key aspects of the design are asynchrony, i.e. the ability to tolerate operations with irregular long latencies, a scale-invariant programming model, a distributed vision of the chip's structure, and the transparent performance scaling of a single program binary code across multiple cluster sizes. This paper describes the execution model, the core micro-architecture, its realization in a many-core, general-purpose processor chip and its software environment. The reference chip parameters include 128 cores, a 4 MB on-chip distributed cache network and four DDR3-1600 memory channels. This paper presents cycle-accurate simulation results for various key algorithmic and cryptographic kernels. The results show good efficiency in terms of the utilisation of hardware despite the high-latency memory accesses and good scalability across relatively large clusters of cores.},
	Author = {Raphael Poss and Mike Lankamp and Qiang Yang and Jian Fu and Michiel W. {van Tol} and Chris Jesshope},
	Booktitle = {Proc. 15th Euromicro Conference on Digital System Design (DSD 2012)},


	Doi = {10.1109/DSD.2012.25}, Urldoi = {http://dx.doi.org/10.1109/DSD.2012.25},
	Editor = {Smail Niar},
	Isbn = {978-0-7695-4798-5},
	Month = {September},
	Publisher = {IEEE Computer Society},
	Read = {1},
	Title = {{Apple-CORE}: {Microgrids} of {SVP} cores (invited paper)},
	Urllocal = {pub/poss.12.dsd.pdf},
	Year = {2012},
	}

An Infrastructure for Multi-Level Optimisation through Property Annotation and Aggregation. Penczek, F.; Kirner, R.; Poss, R.; Grelck, C.; and Shafarenko, A. In Proc. 4th International Workshop on Non-functional System Properties in Domain Specific Modeling Languages, of NFPinDSML ‘12, pages 5:1–5:6, New York, NY, USA, September 2012. ACM

An Infrastructure for Multi-Level Optimisation through Property Annotation and Aggregation [link]

Doi

An Infrastructure for Multi-Level Optimisation through Property Annotation and Aggregation [pdf]

Local doi link bibtex abstract

@inproceedings{poss12nfsp,
	Abstract = {Optimising software for efficiency on a parallel hardware platform by analysing the performance of the application is often a complex and time-consuming task. In this paper we present a constraint annotation and aggregation system that allows programmers to annotate code by using a dedicated language for describing functional and extra-functional properties, such as for example algorithmic complexity, scaling factors or the number of required cores. The goal is to derive properties of the entire application that are parametrised over characteristics of the execution platform to assist programmers in better understanding the behaviour of an application and to assist the execution platform in making informed mapping and scheduling decisions.},
	Address = {New York, NY, USA},
	Author = {Frank Penczek and Raimund Kirner and Raphael Poss and Clemens Grelck and Alex Shafarenko},
	Booktitle = {Proc. 4th International Workshop on Non-functional System Properties in Domain Specific Modeling Languages},


	Doi = {10.1145/2420942.2420947}, Urldoi = {http://dx.doi.org/10.1145/2420942.2420947},
	Isbn = {978-1-4503-1807-5},
	Location = {Innsbruck, Austria},
	Month = {September},
	Numpages = {6},
	Pages = {5:1--5:6},
	Publisher = {ACM},
	Read = {1},
	Series = {NFPinDSML '12},
	Title = {An Infrastructure for Multi-Level Optimisation through Property Annotation and Aggregation},
	Urllocal = {pub/poss.12.nfsp.pdf},
	Year = {2012},
	}

On the realizability of hardware microthreading—Revisting the general-purpose processor interface: consequences and challenges. Poss, R. Ph.D. Thesis, University of Amsterdam, September 2012.

On the realizability of hardware microthreading—Revisting the general-purpose processor interface: consequences and challenges [link]

Doi

Paper doi link bibtex abstract 1 download

@phdthesis{poss12,
	Abstract = {Multi-core chips are currently in the spotlight as a potential means to overcome the limits of frequency scaling for performance increases in processors. In this direction, the CSA group at the University of Amsterdam is investigating a new design for processors towards faster and more efficient general-purpose multi-core chips. However this design changes the interface between the hardware and software, compared to existing chips, in ways that have not been dared previously. Consequently, the concepts underlying existing operating systems and compilers must be adapted before this new design can be fully integrated and evaluated in computing systems.
This dissertation investigates the impact of the changes in the machine interface on operating software and makes four contributions. The first contribution is a comprehensive presentation of the design proposed by the CSA group. The second contribution is formed by technology that demonstrates that the chip can be programmed using standard programming tools. The third contribution is a demonstration that the hardware components can be optimized by starting to implement operating software during the hardware design instead of afterwards. The fourth contribution is an analysis of which parts of the hardware design will require further improvements before it can be fully accepted as a general- purpose platform. The first conclusion is a confirmation that the specific design considered can yield higher performance at lower cost with relatively minimal implementation effort in software. The second conclusion is that the processor interface can be redefined while designing multi-core chips as long as the design work is carried out hand in hand with operating software providers.},
	Author = {Poss, {Raphael `kena'}},


	Doi = {11245/2.109482}, Urldoi = {http://dx.doi.org/11245/2.109482},
	Isbn = {978-94-6108-320-3},
	Month = {September},
	Publisher = {Gildeprint Drukkerijen},
	Read = {1},
	School = {University of Amsterdam},
	Title = {On the realizability of hardware microthreading---Revisting the general-purpose processor interface: consequences and challenges},
	Url = {http://www.raphael.poss.name/on-the-realizability-of-hardware-microthreading/},
	Year = {2012},
	}

SL—a “quick and dirty” but working intermediate language for SVP systems. Poss, R. Technical Report arXiv:1208.4572v1 [cs.PL], University of Amsterdam, August 2012.

SL—a ``quick and dirty'' but working intermediate language for SVP systems [link]

Paper

SL—a ``quick and dirty'' but working intermediate language for SVP systems [pdf]

Local link bibtex abstract 9 downloads

@techreport{poss12sl,
	Abstract = {Many-core architectures of the future are likely to have distributed memory organizations and need fine grained concurrency management to be used effectively. The Self-adaptive Virtual Processor (SVP) is an abstract concurrent programming model which can provide this, but the model and its current implementations assume a single address space shared memory. We investigate and extend SVP to handle distributed environments, and discuss a prototype SVP implementation which transparently supports execution on heterogeneous distributed memory clusters over TCP/IP connections, while retaining the original SVP programming model. },
	Author = {{Raphael~`kena'} Poss},


	Institution = {University of Amsterdam},
	Month = {August},
	Number = {arXiv:1208.4572v1 [cs.PL]},
	Read = {1},
	Title = {{SL}---a ``quick and dirty'' but working intermediate language for {SVP} systems},
	Url = {http://arxiv.org/abs/1208.4572},
	Urllocal = {pub/poss.12.sl.pdf},
	Year = {2012},
	}

Lazy Reference Counting for the Microgrid. Poss, R.; Grelck, C.; Herhut, S.; and Scholz, S. In Proc. 16th Workshop on Interaction between Compilers and Computer Architectures (INTERACT‘16), pages 41–48, February 2012. IEEE

Lazy Reference Counting for the Microgrid [link]

Doi

Lazy Reference Counting for the Microgrid [pdf]

Local doi link bibtex abstract

@inproceedings{poss12interact,
	Abstract = {This papers revisits non-deferred reference counting, a common technique to ensure that potentially shared large heap objects can be reused safely when they are both input and output to computations. Traditionally, thread-safe reference counting exploit implicit memory-based communication of counter data and require means to achieve a globally consistent memory state, either using barriers or locks. Acknowledgeing the distributed nature of upcoming many-core chips, we have developed a novel approach that keeps reference counters at single physical locations and ships the counting operations asynchronously to these locations us- ing hardware primitives, rather than implicitely moving the counter data between threads. Compared to previous methods, our approach does not require full cache coherency.},
	Author = {Raphael Poss and Clemens Grelck and Stephan Herhut and Sven-Bodo Scholz},
	Booktitle = {Proc. 16th Workshop on Interaction between Compilers and Computer Architectures (INTERACT'16)},


	Doi = {10.1109/INTERACT.2012.6339625}, Urldoi = {http://dx.doi.org/10.1109/INTERACT.2012.6339625},
	Isbn = {978-1-4673-2613-1},
	Issn = {1550-6207},
	Month = {February},
	Pages = {41--48},
	Publisher = {IEEE},
	Read = {1},
	Title = {Lazy Reference Counting for the {Microgrid}},
	Urllocal = {pub/poss.12.interact.pdf},
	Year = {2012},
	}

Heterogeneous integration to simplify many-core architecture simulations. Poss, R.; Lankamp, M.; Uddin, M. I.; Sýkora, J.; and Kafka, L. In Proc. 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, of RAPIDO ‘12, pages 17–24, January 2012. ACM

Heterogeneous integration to simplify many-core architecture simulations [link]

Doi

Heterogeneous integration to simplify many-core architecture simulations [pdf]

Local doi link bibtex abstract

@inproceedings{poss12rapido,
	Abstract = {The EU Apple-CORE project has explored the design and implementation of novel general-purpose many-core chips featuring hardware microthreading and hardware support for concurrency management. The introduction of the latter in the cores ISA has required simultaneous investigation into compilers and multiple layers of the software stack, including operating systems. The main challenge in such vertical approaches is the cost of implementing simultaneously a detailed simulation of new hardware components and a complete system platform suitable to run large software benchmaks. In this paper, we describe our use case and our solutions to this challenge.},
	Acmid = {2162134},
	Author = {Poss, Raphael and Lankamp, Mike and Uddin, M. Irfan and S\'{y}kora, Jaroslav and Kafka, Leo\v{s}},
	Booktitle = {Proc. 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools},


	Doi = {10.1145/2162131.2162134}, Urldoi = {http://dx.doi.org/10.1145/2162131.2162134},
	Isbn = {978-1-4503-1114-4},
	Keywords = {hardware multithreading, hardware/software co-design, many-core architecture, simulation, system design, system evaluation, system-on-chip design, vertical approach},
	Location = {Paris, France},
	Month = {January},
	Numpages = {8},
	Pages = {17--24},
	Publisher = {ACM},
	Read = {1},
	Series = {RAPIDO '12},
	Title = {Heterogeneous integration to simplify many-core architecture simulations},
	Urllocal = {pub/poss.12.rapido.pdf},
	Year = {2012},
	
	
	}

Collecting signatures to model latency tolerance in high-level simulations of microthreaded cores. Uddin, M. I.; Jesshope, C. R.; van Tol, M. W.; and Poss, R. In Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, of RAPIDO ‘12, pages 1–8, New York, NY, USA, January 2012. ACM

Collecting signatures to model latency tolerance in high-level simulations of microthreaded cores [link]

Doi

Collecting signatures to model latency tolerance in high-level simulations of microthreaded cores [pdf]

Local doi link bibtex abstract

@inproceedings{mirfan12,
	Abstract = {The current many-core architectures are generally evaluated using cycle-accurate simulations. However these detailed simulations of the architecture make the evaluation of large programs very slow. Since the focus in many-core architecture is shifting from the performance of individual cores to the overall behavior of the chip, high-level simulations are becoming necessary, which evaluate the same architecture at less detailed level and allow the designer to make quick and reasonably accurate design decisions. We have developed a high-level simulator for the design space exploration of the Microgrid, which is a many-core architecture comprised of many fine-grained multi-threaded cores. This simulator al- lows us to investigate mapping and scheduling strategies of families (i.e. groups of threads) in developing an operating environment for the Microgrid. The previous method to count and evaluate the workload in basic blocks was not accurate enough. The key problem was that with many concurrent threads the latency of certain instructions is hidden because of the multi-threaded nature of the core. This paper presents a technique to determine the execution time of different types of instructions with thread concurrency. We believe to achieve high accuracy in evaluating programs in the high-level simulator.},
	Acmid = {2162132},
	Address = {New York, NY, USA},
	Author = {Uddin, M. Irfan and Jesshope, Chris R. and van Tol, Michiel W. and Poss, Raphael},
	Booktitle = {Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools},


	Doi = {10.1145/2162131.2162132}, Urldoi = {http://dx.doi.org/10.1145/2162131.2162132},
	Isbn = {978-1-4503-1114-4},
	Keywords = {automatic annotation of basic blocks with performance, estimation, performance estimation},
	Location = {Paris, France},
	Month = {January},
	Numpages = {8},
	Pages = {1--8},
	Publisher = {ACM},
	Read = {1},
	Series = {RAPIDO '12},
	Title = {Collecting signatures to model latency tolerance in high-level simulations of microthreaded cores},
	Urllocal = {pub/mirfan.12.pdf},
	Year = {2012},
	
	
	}

2011 (5)

Implementation of SVP on at least one target (Software), ADVANCE deliverable D16. Poss, R.; Grelck, C.; and Verstraaten, M. November 2011.

Implementation of SVP on at least one target (Software), ADVANCE deliverable D16 [link]

Paper

Implementation of SVP on at least one target (Software), ADVANCE deliverable D16 [pdf]

Local link bibtex

@misc{poss11adv16,
	Author = {Raphael Poss and Clemens Grelck and Merijn Verstraaten},


	Month = {November},
	Title = {Implementation of {SVP} on at least one target (Software), {ADVANCE} deliverable {D16}},
	Url = {http://www.project-advance.eu/deliverables/},
	Urllocal = {pub/poss.11.adv16.pdf},
	Year = {2011},
	}

Concurrent Non-Deferred Reference Counting on the Microgrid: First Experiences. Herhut, S.; Joslin, C.; Scholz, S.; Poss, R.; and Grelck, C. In Haage, J.; and Morazán, M., editor(s), Implementation and Application of Functional Languages, volume 6647, of Lecture Notes in Computer Science, pages 185–202. Springer Berlin / Heidelberg, October 2011.

Concurrent Non-Deferred Reference Counting on the Microgrid: First Experiences [link]

Doi

Concurrent Non-Deferred Reference Counting on the Microgrid: First Experiences [pdf]

Local doi link bibtex abstract

@incollection{herhut11ifl,
	Abstract = {We present a first evaluation of our novel approach for non-deferred reference counting on the Microgrid many-core architecture. Non-deferred reference counting is a fundamental building block of implicit heap management of functional array languages in general and Single Assignment C in particular. Existing lock-free approaches for multi-core and SMP settings do not scale well for large numbers of cores in emerging many-core platforms. We, instead, employ a dedicated core for reference counting and use asynchronous messaging to emit reference counting operations. This novel approach decouples computational work- load from reference-counting overhead. Experiments using cycle-accurate simulation of a realistic Microgrid show that, by exploiting asynchronism, we are able to tolerate even worst-case reference counting loads reasonably well. Scalability is essentially limited only by the combined sequential runtime of all reference counting operations, in accordance with Amdahl's law. Even though developed in the context of Single Assignment C and the Microgrid, our approach is applicable to a wide range of languages and platforms.},
	Author = {Herhut, Stephan and Joslin, Carl and Scholz, Sven-Bodo and Poss, Raphael and Grelck, Clemens},
	Booktitle = {Implementation and Application of Functional Languages},


	Doi = {10.1007/978-3-642-24276-2_12}, Urldoi = {http://dx.doi.org/10.1007/978-3-642-24276-2_12},
	Editor = {J. Haage and M. Moraz\'an},
	Isbn = {978-3-642-24275-5},
	Month = {October},
	Pages = {185--202},
	Publisher = {Springer Berlin / Heidelberg},
	Read = {1},
	Series = {Lecture Notes in Computer Science},
	Title = {Concurrent Non-Deferred Reference Counting on the {Microgrid}: First Experiences},
	Urllocal = {pub/herhut.11.ifl.pdf},
	Volume = {6647},
	Year = {2011},
	}

Resource-agnostic programming for many-core Microgrids. Bernard, T.; Grelck, C.; Hicks, M.; Jesshope, C.; and Poss, R. In Guarracino, M.; Vivien, F.; Träff, J.; Cannatoro, M.; Danelutto, M.; Hast, A.; Perla, F.; Knüpfer, A.; Di Martino, B.; and Alexander, M., editor(s), Euro-Par 2010 Parallel Processing Workshops, volume 6586, of Lecture Notes in Computer Science, pages 109–116. Springer Berlin / Heidelberg, August 2011.

Resource-agnostic programming for many-core Microgrids [link]

Doi

Resource-agnostic programming for many-core Microgrids [pdf]

Local doi link bibtex abstract

@incollection{bernard10hppc,
	Abstract = {Many-core architectures are a commercial reality, but programming them efficiently is still a challenge, especially if the mix is heterogeneous. Here granularity must be addressed, i.e. when to make use of concurrency resources and when not to. We have designed a data-driven, fine-grained concurrent execution model (SVP) that captures concurrency in a resource-agnostic way. Our approach separates the concern of describing a concurrent computation from its mapping and scheduling. We have implemented this model as a novel many-core architecture programmed with a language called muTC. In this paper we demonstrate how we achieve our goal of resource-agnostic programming on this target, where heterogeneity is exposed as arbitrarily sized clusters of cores.
},
	Author = {Thomas Bernard and Clemens Grelck and Michael Hicks and Chris Jesshope and Raphael Poss},
	Booktitle = {Euro-Par 2010 Parallel Processing Workshops},


	Doi = {10.1007/978-3-642-21878-1_14}, Urldoi = {http://dx.doi.org/10.1007/978-3-642-21878-1_14},
	Editor = {Guarracino, Mario and Vivien, Fr\'ed\'eric and Tr\"aff, Jesper and Cannatoro, Mario and Danelutto, Marco and Hast, Anders and Perla, Francesca and Kn{\"u}pfer, Andreas and Di Martino, Beniamino and Alexander, Michael},
	Isbn = {978-3-642-21877-4},
	Month = {August},
	Pages = {109--116},
	Publisher = {Springer Berlin / Heidelberg},
	Read = {1},
	Series = {Lecture Notes in Computer Science},
	Title = {Resource-agnostic programming for many-core {Microgrids}},
	Urllocal = {pub/bernard.10.hppc.pdf},
	Volume = {6586},
	Year = {2011},
	}

Final report of benchmark evaluations in different programming paradigms, Apple-CORE deliverable D2.3. Rolls, D.; Joslin, C.; Scholz, S.; Jesshope, C.; and Poss, R. July 2011.

Final report of benchmark evaluations in different programming paradigms, Apple-CORE deliverable D2.3 [link]

Paper

Final report of benchmark evaluations in different programming paradigms, Apple-CORE deliverable D2.3 [pdf]

Local link bibtex

@misc{rolls11ac23,
	Author = {D. Rolls and C. Joslin and Sven-Bodo Scholz and C. Jesshope and R. Poss},


	Month = {July},
	Title = {Final report of benchmark evaluations in different programming paradigms, {Apple-CORE} deliverable {D2.3}},
	Url = {http://apple-core.info/research.html},
	Urllocal = {pub/rolls.11.ac23.pdf},
	Year = {2011},
	}

Hardware I/O interface on the Microgrid. Lankamp, M.; van Tol , M. W.; Jesshope, C.; and Poss, R. Technical Report [mgsim14], University of Amsterdam, May 2011.

Hardware I/O interface on the Microgrid [link]

Paper link bibtex

@techreport{lankamp11mgsim14,
	Author = {Mike Lankamp and Michiel W. {van Tol} and Chris Jesshope and Raphael Poss},


	Institution = {University of Amsterdam},
	Month = {May},
	Number = {[mgsim14]},
	Title = {Hardware {I/O} interface on the {Microgrid}},
	Url = {https://notes.svp-home.org/mgsim14.html},
	Year = {2011},
	}

2010 (6)

Hardware virtualisation notation, ADVANCE deliverable D6. Poss, R.; and Kirner, R. November 2010.

Hardware virtualisation notation, ADVANCE deliverable D6 [link]

Paper

Hardware virtualisation notation, ADVANCE deliverable D6 [pdf]

Local link bibtex 1 download

@misc{poss10adv6,
	Author = {Raphael Poss and Raimund Kirner},


	Month = {November},
	Title = {Hardware virtualisation notation, {ADVANCE} deliverable {D6}},
	Url = {http://www.project-advance.eu/deliverables/},
	Urllocal = {pub/poss.10.adv6.pdf},
	Year = {2010},
	}

Hardware virtualisation for heterogeneous many-core systems. Grelck, C.; Poss, R.; and Jesshope, C. In Proc. Intel European Research and Innovation Conference (ERIC‘10), Braunschweig, Germany, October 2010.
link bibtex abstract

@inproceedings{grelck10eric,
	Abstract = {The multi-core/many-core revolution has brought up a hardly precedented diversity in computer architecture. While parallelism id the common property, granularity of concurrent processing resources may easily span multiple orders of magnitude. This requires design decisions in the organisation of concurrent program execution to be made differently depending on the concrete execution platform. We propose a hardware virtualisation layer that separates the aspect of granularity from the expression (or detection) of concurrency and provides uniform access to concurrent computing resources.},
	Address = {Braunschweig, Germany},
	Author = {Clemens Grelck and Raphael Poss and Chris Jesshope},
	Booktitle = {Proc. Intel European Research and Innovation Conference (ERIC'10)},


	Month = {October},
	Title = {Hardware virtualisation for heterogeneous many-core systems},
	Year = {2010},
	}

Resource-agnostic programming of microgrids (talk at HPPC‘10). Poss, R. September 2010.

Resource-agnostic programming of microgrids (talk at HPPC'10) [pdf]

Paper link bibtex

@misc{poss10hppc-pres,
	Author = {Raphael Poss},


	Month = {September},
	Read = {1},
	Title = {Resource-agnostic programming of microgrids (talk at {HPPC}'10)},
	Url = {http://www.hppc-workshop.org/HPPC10-Poss.pdf},
	Year = {2010},
	}

Report on Porting Operating System to SVP/Microgrid Platform, Apple-CORE deliverable D5.3. Hicks, M.; Poss, R.; Jesshope, C.; van Tol , M.; and Lankamp, M. September 2010.

Report on Porting Operating System to SVP/Microgrid Platform, Apple-CORE deliverable D5.3 [link]

Paper

Report on Porting Operating System to SVP/Microgrid Platform, Apple-CORE deliverable D5.3 [pdf]

Local link bibtex 1 download

@misc{hicks10ac53,
	Author = {M.A. Hicks and R. Poss and C. Jesshope and M.W. {van Tol} and M. Lankamp},


	Month = {September},
	Title = {Report on Porting Operating System to {SVP/Microgrid} Platform, {Apple-CORE} deliverable {D5.3}},
	Url = {http://apple-core.info/research.html},
	Urllocal = {pub/hicks.10.ac53.pdf},
	Year = {2010},
	}

Towards scalable implicit communication and synchronization. Poss, R.; and Jesshope, C. In The First Workshop on Advances in Message Passing (AMP‘10), Toronto, Canada, June 2010.

Towards scalable implicit communication and synchronization [pdf]

Paper

Local link bibtex abstract

@inproceedings{poss10amp,
	Abstract = {There exists several divides between implicit and explicit paradigms in concurrent programming models, for example between the as- sumption of coherent shared memory (e.g. OpenMP), and the as- sumption of distributed memory (e.g. MPI). Explicit paradigms exist to provide control to programmers, but cause scalability con- cerns: programs need to be adapted whenever the granularity of concurrency changes. With the rise of large heterogeneous pools of computing resources, we must increasingly distribute tasks au- tomatically. Implicit paradigms allow this in theory and are de- sirable for expressivity and intuitiveness, but their scalability in heterogeneous environments is yet unclear. In this position paper, we propose to consolidate previous knowledge by seeking more implicit concurrent programming models that combine three prop- erties. The first desirable property is resource agnosticism, where programs separate clearly the description of computations from the description of task distribution to resources. The second property is scoped synchronization, where programs express no more syn- chronization than required by the described computation. The third property is the visibility of data dependencies between tasks by compilers and run-time systems. Only when these properties exist together, it becomes possible to automatically tailor programs to heterogeneous target systems and achieve efficient execution. We show how specializability is needed to optimize this process.},
	Address = {Toronto, Canada},
	Author = {Raphael Poss and Chris Jesshope},
	Booktitle = {The First Workshop on Advances in Message Passing (AMP'10)},


	Keywords = {programming models, programming languages, concurrency, considered harmful},
	Month = {June},
	Read = {1},
	Title = {Towards scalable implicit communication and synchronization},
	Url = {http://www.cs.rochester.edu/u/cding/amp/papers/pos/Towards%20Scalable%20Implicit%20Communication%20and%20Synchronization.pdf},
	Urllocal = {pub/poss.10.amp.pdf},
	Year = {2010},
	}

Making multi-cores mainstream – from security to scalability. Jesshope, C.; Hicks, M.; Lankamp, M.; Poss, R.; and Zhang, L. In Chapman, B.; Desprez, F.; Joubert, G. R.; Lichnewsky, A.; Peters, F.; and Priol, T., editor(s), Parallel Computing: From Multicores and GPU’s to Petascale, volume 19, of Advances in Parallel Computing, pages 16–31. IOS Press, 2010.

Making multi-cores mainstream – from security to scalability [link]

Doi

Making multi-cores mainstream – from security to scalability [pdf]

Local doi link bibtex abstract

@incollection{jesshope09parco,
	Abstract = {In this paper we will introduce work being supported by the EU in the Apple-CORE project (http://www.apple-core.info). This project is pushing the boundaries of programming and systems development in multi-core architectures in an attempt to make multi-core go mainstream, i.e. continuing the current trends in low-power, multi-core architecture to thousands of cores on chip and supporting this in the context of the next generations of PCs. This work supports dataflow principles but with a conventional programming style. The paper describes the underlying execution model, a core design based on this model and its emulation in software. We also consider system issues that impact security. The major benefits of this approach include asynchrony, i.e. the ability to tolerate long latency operations without impacting performance and binary compatibility. We present results that show very high efficiency and good scalability despite the high memory access latency in the proposed chip architecture. },
	Author = {Chris Jesshope and Michael Hicks and Mike Lankamp and Raphael Poss and Li Zhang},
	Booktitle = {Parallel Computing: From Multicores and GPU's to Petascale},


	Doi = {10.3233/978-1-60750-530-3-16}, Urldoi = {http://dx.doi.org/10.3233/978-1-60750-530-3-16},
	Editor = {Barbara Chapman and Fr{\'e}d{\'e}ric Desprez and Gerhard R. Joubert and Alain Lichnewsky and Frans Peters and Thierry Priol},
	Isbn = {978-1-60750-529-7},
	Pages = {16--31},
	Publisher = {{IOS} Press},
	Read = {1},
	Series = {Advances in Parallel Computing},
	Title = {Making multi-cores mainstream -- from security to scalability},
	Urllocal = {pub/jesshope.09.parco.pdf},
	Volume = {19},
	Year = {2010},
	}

2009 (1)

Core compiler, Apple-CORE deliverable D5.4. Poss, R. May 2009.

Core compiler, Apple-CORE deliverable D5.4 [link]

Paper

Core compiler, Apple-CORE deliverable D5.4 [pdf]

Local link bibtex 3 downloads

@misc{poss09ac54,
	Author = {Raphael Poss},


	Month = {May},
	Title = {Core compiler, {Apple-CORE} deliverable {D5.4}},
	Url = {http://apple-core.info/research.html},
	Urllocal = {pub/poss.09.ac54.pdf},
	Year = {2009},
	}

2008 (1)

Report on memory protection in microthreaded processors, Apple-CORE deliverable D5.2. Masters, J.; Lankamp, M.; Jesshope, C.; Poss, R.; and Hielscher, E. December 2008.

Report on memory protection in microthreaded processors, Apple-CORE deliverable D5.2 [link]

Paper

Report on memory protection in microthreaded processors, Apple-CORE deliverable D5.2 [pdf]

Local link bibtex 1 download

@misc{masters08ac52,
	Author = {J. Masters and M. Lankamp and C. Jesshope and R. Poss and E. Hielscher},


	Month = {December},
	Title = {Report on memory protection in microthreaded processors, {Apple-CORE} deliverable {D5.2}},
	Url = {http://apple-core.info/research.html},
	Urllocal = {pub/masters.08.ac52.pdf},
	Year = {2008},
	}

2003 (3)

A static C++ object-oriented programming (SCOOP) paradigm mixing benefits of traditional OOP and generic programming. Burrus, N.; Duret-Lutz, A.; Géraud, T.; Lesage, D.; and Poss, R. In Proceedings of the Workshop on Multiple Paradigm with OO Languages (MPOOL‘03), Anaheim, CA, USA, October 2003.

A static C++ object-oriented programming (SCOOP) paradigm mixing benefits of traditional OOP and generic programming [pdf]

Local link bibtex abstract 1 download

@inproceedings{burrus03mpool,
	Abstract = {Object-oriented and generic programming are both
                  supported in C++. OOP provides high expressiveness
                  whereas GP leads to more efficient programs by
                  avoiding dynamic typing. This paper presents SCOOP,
                  a new paradigm which enables both classical OO
                  design and high performance in C++ by mixing OOP and
                  GP. We show how classical and advanced OO features
                  such as virtual methods, multiple inheritance,
                  argument covariance, virtual types and multimethods
                  can be implemented in a fully statically typed
                  model, hence without run-time overhead.},
	Address = {Anaheim, CA, USA},
	Author = {Nicolas Burrus and Alexandre Duret-Lutz and Thierry G{\'e}raud and David Lesage and Raphael Poss},
	Booktitle = {Proceedings of the Workshop on Multiple Paradigm with OO Languages (MPOOL'03)},


	Keywords = {generic programming, performance, C++},
	Month = {October},
	Project = {Olena},
	Read = {Oui},
	Title = {A static {C++} object-oriented programming ({SCOOP}) paradigm mixing benefits of traditional {OOP} and generic programming},
	Urllocal = {pub/burrus.03.mpool.pdf},
	Year = {2003},
	
	}

Introducing Vaucanson. Lombardy, S.; Poss, R.; Régis-Gianas, Y.; and Sakarovitch, J. In Proc. 8th International Conference on Implementation and Application of Automata (CIAA‘03), volume 2759, of Lecture Notes in Computer Science Series, pages 96–107, Santa Barbara, CA, USA, July 2003. Springer-Verlag

Doi

Local doi link bibtex abstract 3 downloads

@inproceedings{lombardy03ciaa,
	Abstract = {This paper reports on a new software platform
                  dedicated to the computation with automata and
                  transducers, called Vaucanson, the main feature of
                  which is the capacity of dealing with automata whose
                  labels may belong to various algebraic
                  structures. The paper successively shows how
                  Vaucanson allows to program algorithms on automata
                  in a way which is very close to the mathematical
                  expression of the algorithm, describes some features
                  of the Vaucanson platform, including the fact that
                  the very rich data structure used to implement
                  automata does not weight too much on the performance
                  and finally explains the main issues of the
                  programming design that allow to achieve both
                  genericity and efficiency.},
	Address = {Santa Barbara, CA, USA},
	Author = {Sylvain Lombardy and Raphael Poss and Yann R{\'e}gis-Gianas and Jacques Sakarovitch},
	Booktitle = {Proc. 8th International Conference on Implementation and Application of Automata (CIAA'03)},


	Doi = {10.1007/3-540-45089-0_10}, Urldoi = {http://dx.doi.org/10.1007/3-540-45089-0_10},
	Keywords = {Vaucanson, finite state automata, C++, generic programming},
	Month = {July},
	Pages = {96--107},
	Project = {Vaucanson},
	Publisher = {Springer-Verlag},
	Read = {Oui},
	Series = {Lecture Notes in Computer Science Series},
	Title = {Introducing {V}aucanson},
	Urllocal = {pub/lombardy.03.ciaa.pdf},
	Volume = {2759},
	Year = {2003},
	}

On orthogonal specialization in C++: dealing with efficiency and algebraic abstraction in Vaucanson. Régis-Gianas, Y.; and Poss, R. In Striegnitz, J.; and Davis, K., editor(s), Proc. Workshop on Parallel/High-performance Object-Oriented Scientific Computing (POOSC; in conjunction with ECOOP), of John von Neumann Institute for Computing (NIC), Darmstadt, Germany, July 2003.

On orthogonal specialization in C++: dealing with efficiency and algebraic abstraction in Vaucanson [pdf]

Local link bibtex abstract 1 download

@inproceedings{regisgianas03poosc,
	Abstract = {Vaucanson is a C++ generic library for weighted
                  finite state machine manipulation. For the sake of
                  generality, FSM are defined using algebraic
                  structures such as alphabet (for the letters), free
                  monoid (for the words), semiring (for the weights)
                  and series (mapping from words to weights). As
                  usual, what is at stake is to maintain efficiency
                  while providing a high-level layer for the writing
                  of generic algorithms. Yet, one of the
                  particularities of FSM manipulation is the need of a
                  fine grained specialization power on an object which
                  is both an algebraic concept and an intensive
                  computing machine.},
	Address = {Darmstadt, Germany},
	Author = {Yann R{\'e}gis-Gianas and Raphael Poss},
	Booktitle = {Proc. Workshop on Parallel/High-performance Object-Oriented Scientific Computing (POOSC; in conjunction with ECOOP)},


	Editor = {J{\"o}rg Striegnitz and Kei Davis},
	Keywords = {Vaucanson, C++, generic programming},
	Month = {July},
	Number = {FZJ-ZAM-IB-2003-09},
	Page = {71--82},
	Project = {Vaucanson},
	Read = {Oui},
	Series = {John von Neumann Institute for Computing (NIC)},
	Title = {On orthogonal specialization in {C++}: dealing with efficiency and algebraic abstraction in {V}aucanson},
	Urllocal = {pub/regisgianas.03.poosc.pdf},
	Year = {2003},
	}