Project proposals

The following list collects open research topics and proposals for student graduation projects. Contact me in case of interest!

Contents

Open topics

Robustness analysis of distributed databases via fault injection

Problem statement:

With the general interest towards Clouds, multiple database engines have appeared that claim to be Cloud-ready, ie. both scalable and robust: RethinkDB, etcd, Cassandra, CockroachDB, etc. Robustness is important because Cloud infrastructures usually do not guarantee hardware stability of individual servers; merely the availability of resources for a given budget (a faulty server is replaced for free). However Database products claim robustness merely in the context of a server that stops, and usually fail to test what happens when the underlying storage (hard drive, SSD) becomes faulty and starts corrupting data. It is thus largely unknown how robust these databases are in the face of data errors.

Project description

The candidate will work to answer the following question: how do state-of-the-art database engines react to the appearance of data errors in the underlying storage? To answer this question the candidate will design and develop a fault injection framework that runs different database engines in virtual machines (VMs) and creates faults in virtual hard drives.

Designing an OS that is Functional ⋀ Safe ⋀ Productive for education (Excellence)

Problem statement

As Moore’s law is coming to an end, the industry has started to diversify processor types again. New operating systems will be needed to support this diversity while paying attention to the new challenges of the 21th century: security and the cost of software engineering. Unfortunately most technical experts currently active have been trained using education curricula specialized to the PC platform and 40 years old technology with a poor security record and high engineering costs (C and Unix). We thus need new education materials that train people to work on OS issues in a modern context.

Project description

The candidate will work to design and implement the basic bricks of a new operating system suitable for use in the BSc education curriculum (a.k.a a new sort of MINIX). This must use a modern programming language with the desired characteristics, for example O’Caml or Rust.

Changes in context: an interactive visualisation for git blame

Problem statement:

When a programmer joins an existing project with an already large source base, this person usually has a hard time answering the question “why is this code written this way?” or “what problem or feature has motivated this code?” for many places in the source code. Luckily, version control systems like Git maintain a database of changes over time which can be mined to obtain this information: git blame for example can annotate individual lines of code with who and when modified that line over time. However these tools are rather dumb and can only explain individual lines or files, and fail to deliver the big picture. A need thus exists for a tool that presents related change sets with context for better understanding.

Project description:

The candidate will design and develop a new tool in the git toolbox that mines a Git repository for information about a specific portion of source code, then presents the history of that portion together with all related changes in other files. The technical challenge will be to use concurrency properly so that the user interface stays responsive even when the analysis has to process a large repository. The interface should support terminal output (preferred) or a portable UI toolkit.

Secure on-chip protocols for clouds-on-a-chip

Problem statement:

The “Cloud” is a way to organize business where the owners of physical servers rent their resources to software companies to run their application as virtual machines. With the growing availability of multiple cores on a chip, it becomes interesting to rent different parts of a chip to different companies. In the near future, multiple virtual machines will co-exist and run simultaneously on larger and larger multi-core chips.

Meanwhile, the technology used to implement virtual machines on a chip is based on very old principles that were designed in the 1970’s for single-processor systems, namely the virtualization of shared memory using virtual address translation within the core.

The problem with this old technique is that it assumes that the connection between cores is “secure”. The physical memory accesses are communicated over the chip without any protection: if a VM running on core A exchanges data with off-chip memory, a VM running on core B that runs malicious code can exploit hardware errors or hardware design bugs to snoop and tamper with the traffic of core A.

Project description:

To make Clouds-on-a-chip viable from a security perspective, further research is needed to harden the on-chip protocols, in particular the protocols for accessing memory, virtual address translation and the routing of I/O data and interrupts.

The candidate for this project should perform a thorough analysis of the various on-chip protocols required to implement VMs on individual cores, then design protocol modifications that provide resistance against snooping and tampering by other cores on the same chip, together with an analysis of the corresponding overheads in hardware complexity and operating costs (extra network latencies and/or energy usage).

The research will be carried out in a simulation environment so that inspection of on-chip network traffic becomes possible. Simulation tools will be provided prior to the start of the project.

Efficient networking for clouds-on-a-chip

Problem statement:

The “Cloud” is a way to organize business where the owners of physical servers rent their resources to software companies to run their application as virtual machines. With the growing availability of multiple cores on a chip, it becomes interesting to rent different parts of a chip to different companies. In the near future, multiple virtual machines will co-exist and run simultaneously on larger and larger multi-core chips.

Meanwhile, the technology used to implement virtual machines on a chip is based on very old principles that were designed in the 1970’s for single-processor systems, namely the use of shared memory to communicate data between processes running on the same processor.

As multi-core chip become prevalent, we can do better and use more modern techniques. In particular, the direct connections between cores on the chip can be used to implement a faster network than using the off-chip shared memory.

Project description:

To demonstrate that direct use of on-chip networks yield better networking between VMs on the same chip than using shared memory.

The challenge in this project is that the on-chip network is programmatically different than “regular” network adapters like Ethernet, so we cannot use existing network stacks as-is.

The project candidate will thus need to explore the adaptation and simplification of an existing network stack to use on-chip networking.

The research should be carried out either on a current multi-core product or simulations of future many-core accelerators. Simulation technology will be provided as needed.

CivC 2.0 - a new and better framework for the compiler course (Excellence)

Problem statement:

The compilation (“Compilerbouw”) course at the UvA has a long-running practical assignment where students must implement a compiler for a mini-language called CivC. The assignments should test the ability of the students to apply the general methods seen during the lectures on a real compiler. However, the learning experience is slowed by two specific issues. The first is that CivC has a large number of language elements (“AST node types”), so each program transformation must implement many different cases. The second is that the assignment is based on a framework implemented using the C language, which is a low-productivity language and thus slows down the programming tasks. Students spend more time “fixing C problems” than really solving compilation issues. A need thus arises to both simplify the framework, increase programming productivity and simplify the language so that students can look at more diverse aspects of program transformation.

Project description:

The project should aim at providing 1) an improved framework; 2) an improved assignment text; and 3) an improved reference implementation and test suite. The improved framework must enable students to complete the assignment while writing less code; and ensure a programming style that limits programming errors. The improved assignment text should specify a language with less AST node types while keeping the teaching objectives strong for a 6 EC course.

The suggested approach is to a) introduce Pattern Matching and Tail Recursion as a new feature in CivC in the assignment and the framework; then b) organize the assignment, framework and reference implementation, so that many features of CivC are transformed into pattern matches early on in the compiler: IF (match on True/False), WHILE/FOR (match on condition + recursion), && and || (nested pattern matches). This way, many AST nodes are replaced by a single pattern match node.

Finally, the framework could be provided in a high-level language that is both a) very functional with strong implicit typing, which will make it possible to complete the assignment with less code; and b) very easy to understand to students who do not know this language already. ML is a good candidate but other choices can be proposed.

Parallel implementation of arbitrary precision arithmetic

After two decades of investment on frequency increases, CPU manufacturers have hit the power wall and the memory wall and instead shift their attention back to explicit concurrency to improve performance. However concurrency is hard to program with and manage. The CSA group at the University of Amsterdam is developing a vertical solution to these issues that comprises a novel many-core architecture, operating systems, software run-time systems and compilers.

Project description:

Arbitrary-precision arithmetic [1] is a technique whereby calculations are performed on numbers whose digits of precision are limited only by the available memory of the host system. This contrasts with the faster fixed-precision arithmetic found in most ALU hardware, which typically offers between 6 and 16 decimal digits. A common application is public-key cryptography, another is checking the results of fixed-precision calculations.

[1]http://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic

The main task of our project is to implement (or port) a library of aribitrary precision arithmetic operations to our concurrent programming language SL, an extension of C with concurrency constructs. The implementation can reuse existing open source algorithms but should introduce the concurrency constructs to obtain speedups for large integer addition, substraction, multiplication and division.

Targeting the supercomputing language Chapel to the Microgrid (Excellence)

Chapel is a new parallel programming language being developed by Cray. It is being developed to increase supercomputer productivity. Chapel strives to improve the programmability of parallel computers in general and the Cascade system in particular, by providing a higher level of expression than current programming languages do and by improving the separation between algorithmic expression and data structure implementation details.

As a result of two decades of research, the CSA group at the University of Amsterdam has designed a novel chip architecture called the Microgrid. This connects hundreds of multithreaded cores through a high-bandwidth on-chip network with low-latency core-to-core synchronization channels in hardware. The purpose of this chip is to both explore programmability issues with fine-grained hardware multithreading and address the problems of efficiency and energy usage in many-core systems on chip.

Project description:

This project proposes to retarget the Chapel compiler back-end and run-time system to the Microgrid. The task will be facilitated by reusing the existing support in Chapel for fine-grained microthreading and dataflow synchronization on the Cray XMT architecture, which is similar to the mechanisms available on the Microgrid cores. The C code generated by the compiler and/or the run-time system would need to be adapted to make use of the C language extensions that give access to the Microgrid features.

This work intends to answer the following research questions:

  • what are the challenges involved in retargeting programs and algorithms designed for supercomputers to a many-core system on chip?
  • to which extent is the run-time system able to adapt the concurrency constructs in Chapel programs to the fine granularity of multithreaded cores with register-based dataflow synchronization?
  • what class of existing Chapel programs can be reused as-is on the Microgrid while achieving high pipeline utilization and scalability across multiple cores?
  • how much semantic overlap exists between the concept of locale in Chapel and the concept of place on the Microgrid?

The research will first select a subset of the features of Chapel’s run-time system, that is sufficient to explore the topics above. A set of reference Chapel programs will then be selected to serve both as a test suite and benchmarks for the implementation. Evaluation will occur on a cycle-accurate Microgrid simulation to allow detailed performance monitoring and ease debugging.

Crazy research” - long term and risky topics

People-Specific languages and automatic programming language generation

(this idea was presented and well-received at OOPSLE 2014)

Abstract:

The innovation of DSLs was the recognition that each application domain has its few idiomatic patterns of language use, found often in that domain and rarely in others. Capturing these idioms in the language design makes a DSL and yields gains in productivity, reliability and maintainability. Similarly, different groups of programmers have different predominant cognitive quirks. I argue that programmers are attracted to some types of languages that resonate with their quirks and reluctant to use others that grate against them. Hence the question: could we tailor or evolve programming languages to the particular personality of their users? Due to the sheer diversity of personality types, any answer should be combined with automated language generation. The potential benefits include a leap in productivity and more social diversity in software engineering workplaces. The main pitfall is the risk of introducing new language barriers between people and decreased code reuse. However this may be avoidable by combining automated language generation with shared fundamental semantic building blocks.

Open research questions:

This work will likely require more than a decade of inter-disciplinary research before convincing results can be obtained. Any concrete effort to deliver PSLs on demand will face at least the following fundamental questions:

  • How to quantitatively compare the benefits of a PSL delivered to a person who never programmed before, with this person’s potential performance using one or more previously existing language?

    This is a consequence of the “first use bias”: programmers are influenced by the first language(s) they learn, and a PSL produced for an already experienced programmer may not differ significantly from the language(s) they already know and use. Overcoming this experimental obstacle will be key to scientifically validate the benefits of PSLs.

  • Can we make different PSLs interoperable?

    This will be key to preserve code reuse: while the current economical pressure for expertise reuse, an obstacle to the adoption of DSLs, would be diminished by PSLs, code reuse stays crucial for productivity (“avoid solving the same problem twice”). However achieving inter-operability between PSLs requires a common semantico-linguistic system between humans, the existence of which is a known open problem in linguistics.

  • How to design a meta-meta-language for PSL generation?

    This language’s library, if/when it exists, must necessarily feature at least composable grammars, composable type systems, composable semantics, composable documentation, as building blocks, all of which are still currently research topics. Whether this is at all possible is still an open question of computing theory.

The philosophical, linguistic and sociological insights such exercise would deliver would be ample reward on their own regardless of the eventual success of PSLs.

Bio warfare against mega corps

Human beings are a disease, a cancer of this planet.—-Agent Smith, The Matrix

With support from international free trade agreements, strong homogeneizing cultural influences, but especially the software revolution and the Internet, we have seen recently the rise of giant international organizations like IBM, SAP, Microsoft, Google, Amazon, Samsung and Apple with a totally unprecedented impact on human society. Even during the middle ages, the most successful church or lord would never have dreamt to control the wealth production commitment of a few hundred thousands peons at a time, a mere trifle for today’s mega-corporations.

Only idealists are left to pretend these entities are still under control of people, and that their impact on society can be regulated only using laws made by people for people. The growing consensus is that once an organization gets large enough it gets “a life of its own,” starts to escape regulation and displaces the responsibility for its continued existence from the conscious hands of its owners towards a nebulous corporate identity.

We need to better understand these organizations, both to predict their future evolutions and to design regulatory systems to keep them in check. For this purpose, I propose the following novel theory: that large public-facing IT organizations, when modeled as a symbiosis between a) evolving complex systems of human thought and knowledge, ie. memeplexes and b) evolving complex systems of software code and IT infrastructure, ie. techplexes, are subject to diseases with causes and outcomes predictable using biological models. In particular, I propose to recognize separately congenial affects due to “genetic” mishaps, and viral and parasitic affects due to “external” influences.

The idea to view memeplexes, ie. complex organizations of human thoughts, as evolving para-biological systems is not new, but is limited: memeplexes are not bounded to a particular organization. The innovation in my idea is to recognize that techplexes are the stuff that create a social identity to the memeplexes of IT megacorps, although techplexes cannot themselves exist without their managing humans and their knowledge/thoughts. That’s why I am quite convinced they interact in symbiosis, ie. they form a symbiont.

From this we could do scientific research as follows.

  • we could formalize the theory by establishing which traits of mega-corporations should be part of the para-biological model, and clearly defining the terms and concepts particular to memeplex-techplex symbionts.
  • we could test this theory retrospectively, by studying systemic problems encountered in the past large corporations.
  • we could apply this theory by considering, of course “only as an intellectual exercise,” whether diseases specific to these para-organisms could be engineered and used to regulate them in lieu of traditional legal frameworks.

In short: create “diseases” that stunt the development of large megacorps and prevent them from hurting society.

Past / completed topics

Fault injection in multi-core simulators (Excellence)

Problem statement:

Computer architects in charge of designing tomorrow’s computer processors use simulators to predict and study the behavior of the future chips. Because parallelism is currently accepted as the way to the future in processor design, multi-core simulators have become common in the last five years. Yet most architecture simulators that exist nowadays focus on the function of a chip: how data moves from one component to another, how software programs control the components in the processor, etc. Simulation results are obtained assuming that everything works according to design. Unfortunately, in practice larger processors also mean there are more faults during operation: cosmic rays and heat in particular cause bits to change value randomly. So far, multi-core simulators do not simulate the existence of these faults. To make the quality of architecture predictions better, we thus need algorithms to inject simulated faults in multi-core simulators.

Project description:

This project should aim at injecting data faults in an existing multi-core simulation framework. The user should be able to configure where and when faults are injected, and see in the simulation results when faults have occurred. The user should be able to choose between deterministic faults (at a specific, chosen moment) and random faults defined using a specific distribution.

The suggested approach is to extend an existing multi-core simulator previously developed by students at the UvA (MGSim, uses C++).

Incremental visualisation and analysis for very large event traces

Problem statement:

Simulation is one of the techniques used in science to analyse complex processes (physical systems, large computer systems, etc). In a simulation, a model of the process is implemented as a program that replicates the internal interactions, and “imitates” them over time in a controlled environment. The output of a simulation is an event trace, which can then be further analysed. However, recently the processes being simulated are becoming so complex that event traces become “crazingly large”, from multiple gigabytes to even terabytes or beyond. With this sizes, human operators or trace analysis tools cannot load the entire trace in the computer’s memory before examining it. To support scientific analysis of large systems, we thus need incremental algorithms and visualization tools able to analyse large traces without needing to load them all at once.

Project description:

This project should aim at designing a visualisation tool for large simulation event traces. The user should be able to point the tool to a large file, and get an overview of the overall trace, and only obtain a detailed view for a part of the trace (“view window”). The project should investigate data windowing and caching techniques that minimize the amount of I/O operations to the trace file. The user should be able to specify an analysis algorithm or transformation formula in a scripting language, so that the program can run incrementally without loading all the file at once.

The suggested approach is to use a high-level, dynamic language with both powerful I/O abilities and GUI functions (Python, ML, Haskell, etc.) and focus on traces generated by an existing detailed multi-core simulator previously developed by students at the UvA (MGSim, uses C++).

Inference of per-process energy usage in multi-cores (Excellence)

Problem statement:

Why is my phone’s battery suddenly empty today?” Many smartphone owners ask themselves at least once which program is “not behaving properly” and causes their battery level to go down faster than usual. The problem is that operating systems cannot directly collect energy usage statistics per program or application: the energy sensors are connected to the entire processor or memory, and measure combined energy usage. The operating system only has detailed information of processing time and memory usage.

Project description:

The candidate will work to answer the following question: is it possible to compute energy usage per process, using combined energy measurements and per-process time/space measurements as input?

Three answers will be acceptable: 1) “yes,” to be supported by mathematical formulas to do this computation, and empirical evidence that the formulas are correct; 2) “no,” to be supported by a semi-formal argument why this is not possible; and 3) “maybe/don’t know, BUT…” where the first question is not answered but the candidate builds a good software framework that can be used by future researchers to manipulate measurement statistics more efficiently.

The project should be based on the energy sensors on ARM BIG.little (DROID UX3) and Intel SCC, which will be provided by the supervisor.

Graphical geography-based selection of news items

Problem statement:

Usually news items are presented as text around a specific topic, and an observer will infer the location from the text. Likewise, search engines are topic- or keyword-based: there are currently no mechanisms to find news items related to a specific location on Earth. For example, one can find news about the events in Kiev by using both “Kiev” and “Ukraine” as search terms. In contrast, to answer “What is currently happening around the Black Sea?” one must search separately for keywords related to Ukraine, Russia, Georgia, Turkey, Bulgaria and Romania. This limits the understanding of news by the public, because the high-level strategic significance of news is often related to large geographical features which are not directly represented by country borders or local keywords. A need thus exists to geolocate news feeds and enable geography-based selection of news items.

Project description:

The project should aim at providing an Earth map with a “live” representation of news items as positions on the map. The proposed solution should integrate a map visualization tool, “news feeds” from various sources (for example online newspapers, Bing and Google News) and a text recognition algorithm able to recognize place names in the text and look them up in geolocation databases. The user should be able to select a region on Earth and time period, and see a list of URLs to news items for that region and period.

Both web-based solutions and standalone programs are accepted.

Automatic BibBase interface for research web pages

Problem statement:

Scientific researchers often move from one research group to another during their career. Also, different individuals have different preferences regarding the software they use to maintain their information. Therefore, it is commonly accepted that each researcher in a group or department maintains their own home page and personal information in the way they personally see fit, which they can keep as they move from one group to another during their career. However, for marketing purposes it is also desirable to maintain a single “group web site” that shows the work of the people currently working in the group. One particular goal of such a web site is to display the publications of all researchers. However, researchers already maintain their bibliography using separate tools and are unlikely to agree to spend more time to copy their bibliography to the common web site. It is thus necessary to implement an interface that collects the bibliography of multiple researchers automatically to publish on a group web site.

Project description:

The project should aim at proposing a tool or framework able to scan the web page or home directory of different researchers, detect bibliography information automatically, merge it in a single database, then use an existing presentation tool to generate a web site. The solution should allow the scientists to mark some entries to be excluded from the common publication page.

The suggested approach is to scan web pages (HTML) and file repositories (SVN/Git) and detect the BibTeX format, and subsequently use BibBase (http://bibbase.org/) to generate the common bibliography web page. High-level languages with easy access to web pages, file system and text processing (eg. Python, Perl or other functional languages) are recommended.

Automatic network layout using Moore and Hilbert curves

Problem statement:

Network layout is the process of choosing where to position components in space that must be connected together. For example, when designing a multi-core processor, the cores must be positioned on the 2D chip in a way that minimizes the connection lengths. Traditionally, high-level layout for multi-cores is a somewhat manual process that occurs once per chip design. A particular multi-core chip is the Microgrid, a novel parallel processor developed at the University of Amsterdam. A particular feature is that the Microgrid can be customized on demand to different number of cores, from 1 to 1000’s of cores, so manual layout is impractical. Moreover, the Microgrid cores are connected both to a network of memory caches, which requires that cores sharing the same cache be positioned next to each other; and a linear network between all cores. To optimize both layouts it is thus necessary to use a Moore or Hilbert curve to position all cores on the chip. Moore and Hilbert curves are fractal shapes that connect points in 2D without crossovers and while respecting locality.

Project description:

The project should aim at designing a placement algorithm for the cores of a many-core chip using a Moore or Hilbert curve. The algorithm must be able to handle a varying number of cores, and produce a list of 2D coordinates for each core’s position as output. The algorithm should also take into account constraints to place certain cores next to the edge of the chip.

The suggested approach is to extend the Microgrid simulator MGSim (C++) and implement the algorithm as a new phase of the chip instantiation code.

Multi-touch visualisation of multi-core simulations

Problem statement:

Future processors will have more and more cores and accelerators. Computer architects and operating system designers must work together to decide how to best design and manage multi-cores with accelerators. Simulation is the tool these people use to prepare a future design and predict how it will perform. However, most simulation tools nowadays are very linear and not very human-friendly. Some only report text files. Others with high-level interfaces only show a single processor or memory. To make research in multi-core easier, we need more powerful interactive visualizations of multi-core simulators.

Project description:

This project should aim at designing a multi-touch interface to a simulation of a multi-core system. The user should be able to see which processes (programs) are running on which core; and how they communicate with each other and with memory. The user should be able to influence program and data placement on the multi-core processor using the touch interface.

The suggested approach is to extend and integrate two tools previously developed by students of the UvA: an existing “simple” multi-core visualization and an existing multi-core simulator. This technology is based on the Python and C++ languages.

Trusting web sites using geographical consistency (Excellence)

Problem statement:

Roberta is living in the Netherlands and has worked in Canada. She usually orders items online using E-Bay and Amazon from the USA, and online stores from both Canada and the Netherlands. In the fall of 2013, she ordered an item from a web site that looked Canadian. Many weeks later, her item was still not delivered, but her bank statement showed the payment was processed in Beijing, China.

A factor common to most malicious web sites is the use of different geographical regions for different technologies: the domain name is registered in one country, the SSL certificate in another, the IP address of the server perhaps in yet another. Often, the user browsing from the malicious web site to a payment engine will also cross a country boundary. This geographical separation is highly unusual in “honest” web sites.

Many such dishonest activities could thus be detected (and prevented) by geolocating the different technology components involved in the trust relationship between user, browser, web server and web site.

Project description:

The project should aim at providing a tracking tool which geolocates the web technologies and warns the user when an inconsistency is detected (eg “You are accessing a domain name ending with .ca but the IP address is in China, do you want to continue?”). The tool should attempt to geolocate the domain name, both local and remote IP address, SSL certificate, SSL certification authorites, and the transitions when the user navigates from one page to another.

The suggested approach is to consider either a web browser extension (Firefox/Chrome) or build a program running in the background and scanning the network communication on a personal computer.

Minecraft-to-C compiler for Raspberry Pi (Excellence)

Problem statement:

The Raspberry Pi is a small computer platorm produced for education and “hardware hacking”, able to run Linux and control electronic components. Minecraft is a popular game where the player interacts with a virtual world. Within Minecraft, the player can use “redstone” to define virtual circuits and automatic mechanisms. This Minecraft feature has become an extremely popular way for beginners to learn about logic and circuit programming. In 2013, Angus Taggart has created am interface which allows Minecraft players to control the electric signals of the Raspberry Pi from within the game: http://hackaday.com/2013/01/30/controlling-minecraft-with-a-raspberry-pi/ With this, beginners can easily design and implement automated mechanisms in the virtual 3D world with input and effects on the real world. However, the Minecraft game engine is relatively slow and these user-defined circuits cannot run faster than a few steps per second. It becomes thus interesting to compile Minecraft-defined “programs” to C code that can run natively on the Raspberry Pi.

Project description:

This project should aim at designing a compiler able to read a Minecraft save file, detect the Redstone circuits defined therein, and generate a C program as output which performs the same signal transformations. When the Minecraft world uses A. Taggart’s Minecraft extension, the generated C code should control the same Raspberry Pi signals as defined in the Minecraft world.

The suggested approach is to first develop an abstract algorithm that detects redstone circuits in the Minecraft 3D environment and produces a virtual circuit description; then use a functional language (ML/LISP/Haskell/etc) to implement the algorithm and translate virtual circuits to the C language.

Multi-modal visualisation of resource consumption in computing systems

Problem statement:

Why is my phone’s battery suddenly empty today?” Many smartphone owners ask themselves at least once which program is “not behaving properly” and causes their battery level to go down faster than usual. The same applies to main memory (RAM) usage, or flash storage (filesystem). In general, the operating systems of our daily appliances do a very poor job to report how much system resources are consumed by each application, nor do they give us control over resource usage. To improve upon this situation, we should improve monitoring tools to report the consumption of different types of resources by each application, ie develop new multi-modal visualizations of resource consumption.

Project description:

This project should aim at developing a “task manager” that displays resource usage of runnning processes visually. The user should be able to select different modes (“views”) for energy, memory, filesystem, network bandwidth, CPU load and get separate reports. When the feature is available in the underlying operating system, the user should be able to place a contraint on the maximum resource usage (eg. filesystem quota, or CPU priority) using a touch interface.

The suggested approach is to develop a Python/Kivy front-end interface to a Linux system, using Linux’ recent “cgroups” feature to monitor and control resource usage by each process or thread.

Automated outfit evaluation based on color recognition

Background:

An important part of selecting an outfit, clothing-wise but also including nail polish, scarves, ties, etc. is to ensure that the selected combination of colors is aesthetically pleasing.

What constitutes “aesthetically pleasing” combinations is, of course, subject to personal taste. However, the consensus is that some combinations can be evaluated based based on numerical metrics about the colors: either the colors “match” or they “do not match”, with a possible grading in between. For example:

  • items with close hues that are not complementary usually do not match;
  • items with same hue but different intensity usually do match;
  • items with complementary hues but different saturations usually do not match,
  • etc.

These criteria can be expressed as mathematical functions on the hue, saturation and value/intensity (HSV) of the colors, as well as their temperature. HSV and temperature are other ways than RGB to encode colors.

Problem statement:

Some people have limited skills to recognize differences in colors (eg. color blindness) or have not received an education in color matching. For these individuals, a situation can occur where they inadvertedly select combinations of outfit items that are visually unpleasing to other people, without knowing.

It would therefore be useful to provide a tool, for example an application in a mobile device with embedded camera, which would evaluate the quality of color matches based on a photo of the person with the desired outfit on a white background. The application could then report:

  • a big green checkmark “yes, this is good” if no problems are found, ie when the combination of colors follows existing color harmony standards,
  • how well (with a grade, e.g between 0 and 10) the colors are compatible,
  • whether there is a single color that should be removed to obtain a more aesthetically pleasing result.

Music-based quality control in software engineering

Background:

Agile software development consists in iterative and incremental programming, where the construction of software happen in many phases quickly in sucession. A key aspect is that quality control is performed by continuous integration, where new features are regularly tested by an automatic system, without human supervision. Only when problems are detected (eg. a bug) is the relevant programmer informed.

As a mechanism to signal the status of testing, build light indicators have been built in the past. For example a popular choice is the green and red lava lamp: green when the build is successful and red when something is wrong.

Problem statement:

The signaling of software quality via a medium other than e-mail or instant messaging is desirable, because a programmer typically does not look at their messaging all the time. However, people with reduced eyesight, or groups that work in separate offices, cannot easily make use of visual indicators. It may thus be desirable to explore sound as a possible signal, in particular changing the style of music that a programmer listens to, when “something goes wrong”.

Trap handling with Hardware Multi-Threading (Excellence)

Background:

Any interactive application on a computing system requires a hardware mechanism to signal the availability of external input to sofware. Historically, computers have used uni-processors with only one hardware thread of execution (one Program Counter in hardware). Therefore, the only mechanism available to signal external events so far has been the interrupt: a circuit that would, upon an external event, force the program counter to be set to a predefined value to run an interrupt handler, e.g. an event handler in software.

The renewed interest in parallel processors since the turn of the 21th century has brought a new feature that is now commonplace: hardware multithreading, supported by multiple separate Program Counters and registers in processors. Hardware multithreading has become an architectural technique to increase performance and efficiency.

So far, interrupts and hardware multithreading are combined in processor architectures by adding interrupts to hardware threads: when any single thread receives an external event, its program counter is overridden to a new value. This can happen separately to each hardware thread in the processor.

Problem statement:

Interrupt handling is expensive to implement in hardware and costs a performance overhead. Indeed, the current program counter and hardware registers must be saved to memory, a new state set up and the program counter initialized. When the interrupt handler terminates, the old state must be restored from memory. The extra time to save/restore the state and the corresponding memory traffic are repeated for each new external event.

Interestingly, once a processor supports multiple hardware threads, interrupts are not any more the only mechanism available to signal external events. Instead, it could be possible to reserve one of the hardware threads for event handling, keep it idle most of the time, and simply “wake it up” upon an external event to run the software event handler. The advantage would be that hardware threads already have their own, private registers: the overhead to save and restore the processor state, as required by interrupts, then disappears.

This would yield more hardware efficiency in nearly all computing applications.

Interactive shell for the Microgrid platform

Background:

Current and future computing systems are relying increasingly heavily on multi-core processors to reach higher computing performance. However, as the number of cores increases, more intelligence is required to coordinate the software running on all cores.

The CSA group at the University of Amsterdam is developing the Microgrid, a processor chip architecture with on-chip intelligence for the coordination of software on multi-cores. This intelligence is implemented directly in hardware, to obtain higher efficiency and lower performance overhead compared to traditional software solutions. In the past five years, research on the Microgrid has reached a point where the platform is able to most general-purpose code written in C. However, the new hardware intelligence of Microgrids makes existing operating systems (eg. Linux, Windows etc) largely inefficient and inappropriate. Instead, new operating software must be developed. With contributions of both researchers and students alike, some existing system software has been rewritten or ported to the new platform. For example, a C run-time library, a Scheme interpreter, a Python interpreter are available.

Problem statement:

At the time of this writing, the basic operating software of the Microgrid (MGOS) can still only execute one application at a time, even though the hardware could execute more.

To demonstrate the benefits of Microgrids for “larger” workloads with multiple simultaneous applications launched interactively by the user, the MGOS must be extended to support separate programs. Since a tasks cheduler, dynamic program loader and terminal driver are already available, this extension is akin to simply developing a new command-line shell on a Unix platform.

Multi-Core Armaggeddon

Background:

The original Core War (or Core Wars) is a programming game in which two or more battle programs (called “warriors”) compete for the control of a virtual computer called the “Memory Array Redcode Simulator”. These battle programs are written in an abstract assembly language called Redcode. At the start each battle program is put into the memory array at a random location, after which each battle program can execute one instruction in turn. The object of the game is to cause all processes of the opposing program(s) to terminate (which happens if it executes a special instruction), leaving the victorious program in sole possession of the machine.

The Core War game is often use in computing science education to help students discover and learn machine-level programming and the close relationship between hardware and software at the machine interface. In other words, it is a fun and productive way to learn about computer architecture, compilers and operating systems.

Problem statement:

The original Core Wars game is based on a single-core virtual machine. Since the turn of the 21th century, most new computers contain multiple processors next to each other, i.e. multi-cores. New “fun” activities will be needed to help newcomers become acquainted to multi-cores.

Dynamic library loader for a shared address space

One of the optimization on the road to scalable performance in future many-core chips is to share caches and address translation entries across multiple cores, possibly containing multiple hardware threads. This is necessary to remove TLB and translation overheads from the critical path to memory.

When cores share a single virtual address space, multiple programs sharing the cores must also share the address space. This is possible since most programming languages (starting from C) do not make the assumption of distinct address spaces between programs.

This project aims to facilitate current and future research on shared address space systems in many-core chips, by implementing a loader program which can place multiple separate applications together in the same memory space.

Project description:

When multiple distinct programs are loaded simultaneously in a single address space, they must be placed in distinct regions of memory. This differs from other systems where each process has its own address space and where programs code and data can be loaded at a fixed address. The main task of our project is to design and implement a loader algorithm that can relocate program executables (containing code and data) into the system’s shared memory at separate locations.

To support this work, the CSA group at the University of Amsterdam provides a virtual platform that emulates a many-core chip containing cores that share a virtual address space. A C compiler and programming environment is also provided for this platform. To facilitate the task, the compiler generates position-independent code and uses relative addressing for global variables. Also, the generated executable files use the relocatable ELF format.

The loader algorithm must thus compute a region in the address space large enough to contain the program, load the binary data using an existing ELF reader, then provide the base address during program initialization. The implementation will reuse the libelf ELF object file access library.

Implementing Scheme in a massively concurrent architecture (Excellence)

After two decades of investment on frequency increases, CPU manufacturers have hit the power wall and the memory wall and instead shift their attention back to explicit concurrency to improve performance. However concurrency is hard to program with and manage. To address these issues, the CSA group at the University of Amsterdam is developing a custom many-core architecture called the Microgrid. It features a massively concurrent system with thousands of hardware threads on chip and a highly distributed operating system.

Project description:

Our on-chip architecture is equipped with its own I/O channels, operating systems, software stack including compilers for the statically typed languages <B5>TC, C, SAC and SL. In order to widen its audience, we propose to implement an interpreter for a dynamic programming language, Scheme [2].

[2]http://en.wikipedia.org/wiki/Scheme_(programming_language)

Scheme is a dialect of Lisp, which in turn was originally created as a practical mathematical notation for computer programs, influenced by the notation of Alonzo Church’s lambda calculus. It quickly became the favored programming language for artificial intelligence (AI) research. As one of the earliest programming languages, Lisp pioneered many ideas in computer science, including tree data structures, automatic storage management, dynamic typing, and the self-hosting compiler.

The main task of our project is to provide a simple interpreter for Scheme on our architecture that supports only arithmetic, lambda expressions, conditionals, recursion and a read-eval-print loop. This will be written in C with SL extensions and should use the hardware concurrency offered by the Microgrid to speed up the Scheme evaluation of independent expressions. Existing Scheme implementations can be reused for inspiration.


Raphael's academic home page © Raphael ‘kena’ Poss. Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome.