Papers


A real-time Java chip-multiprocessor (Christof Pitter and Martin Schoeberl)

ACM Trans. Embed. Comput. Syst., 10(1):9:1–34, 2010.

 

Abstract:

Chip-multiprocessors are an emerging trend for embedded systems. In this article, we introduce a real-time Java multiprocessor called JopCMP. It is a symmetric shared-memory multiprocessor, and consists of up to eight Java Optimized Processor (JOP) cores, an arbitration control device, and a shared memory. All components are interconnected via a system on chip bus. The arbiter synchronizes the access of multiple CPUs to the shared main memory. In this article, three different arbitration policies are presented, evaluated, and compared with respect to their real-time and average-case performance: a fixed priority, a fair-based, and a time-sliced arbiter.

Tasks running on different CPUs of a chip-multiprocessor (CMP) influence each others' execution times when accessing a shared memory. Therefore, the system needs an arbiter that is able to limit the worst-case execution time of a task running on a CPU, even though tasks executing simultaneously on other CPUs access the main memory. Our research shows that timing analysis is in fact possible for homogeneous multiprocessor systems with a shared memory. The timing analysis of tasks, executing on the CMP using time-sliced memory arbitration, leads to viable worst-case execution time bounds.

The time-sliced arbiter divides the memory access time into equal time slots, one time slot for each CPU. This memory arbitration scheme allows for a calculation of upper bounds of Java application worst-case execution times, depending on the number of CPUs, the time slot size, and the memory access time. Examples of worst-case execution time calculation are presented, and the analyzed results of a real-world application task are compared to measured execution time results. Finally, we evaluate the tradeoffs when using a time-predictable solution compared to using average-case optimized chip-multiprocessors, applying three different benchmarks. These experiments are carried out by executing the programs on the CMP prototype.

 

Download: PDF , doi

 


Nonblocking real-time garbage collection (Martin Schoeberl and Wolfgang Puffitsch)

ACM Trans. Embed. Comput. Syst., 10(1):6:1–28, 2010.

 

Abstract:

A real-time garbage collector has to fulfill two basic properties: ensure that programs with bounded allocation rates do not run out of memory and provide short blocking times. Even for incremental garbage collectors, two major sources of blocking exist, namely, root scanning and heap compaction. Finding root nodes of an object graph is an integral part of tracing garbage collectors and cannot be circumvented. Heap compaction is necessary to avoid probably unbounded heap fragmentation, which in turn would lead to unacceptably high memory consumption. In this article, we propose solutions to both issues.

Thread stacks are local to a thread, and root scanning, therefore, only needs to be atomic with respect to the thread whose stack is scanned. This fact can be utilized by either blocking only the thread whose stack is scanned, or by delegating the responsibility for root scanning to the application threads. The latter solution eliminates blocking due to root scanning completely. The impact of this solution on the execution time of a garbage collector is shown for two different variants of such a root scanning algorithm.

During heap compaction, objects are copied. Copying is usually performed atomically to avoid interference with application threads, which could render the state of an object inconsistent. Copying of large objects and especially large arrays introduces long blocking times that are unacceptable for real-time systems. In this article, an interruptible copy unit is presented that implements nonblocking object copy. The unit can be interrupted after a single word move.

We evaluate a real-time garbage collector that uses the proposed techniques on a Java processor. With this garbage collector, it is possible to run high-priority hard real-time tasks at 10 kHz parallel to the garbage collection task on a 100 MHz system.

 

Download: PDF , doi

 


Facing the Challenges for Real-Time Software Development on Multi-Cores (Dr. Fridtjof Siebert)
Embedded Systems Conference (ESC) Boston, 21-23 September 2010

Abstract:
Multicore systems introduce new challenges for the developers of embedded systems. C, C++, and Java developers will pro?t from this class by understanding where caution is required in the use of certain programming language mechanisms. Examples include the memory model provided by the language implementation, effects of compiler optimizations, atomic operations, and use of the volatile modi?er. Multicore often does not produce the expected performance gain due to synchronization problems; practical alternative multithreading programming approaches, such as lock-free algorithms, will be presented to help developers maximize the performance potential of multicore processors.


Download: PDF (170 kB)

 


Design and Implementation of Real-Time Transactional Memory (Martin Schoeberl and Peter Hilber)
In Proceedings of the 20th International Conference on Field Programmable Logic and Applications (FPL 2010), Milano, Italy, August 2010. IEEE Computer Society.

 

Abstract:

Transactional memory is a promising, optimistic synchronization mechanism for chip-multiprocessor systems.
The simplicity of atomic sections, instead of using explicit locks, is also appealing for real-time systems. In this paper
an implementation of real-time transactional memory (RTTM) in the context of a real-time Java chip-multiprocessor (CMP) is
presented. To provide a predictable and analyzable solution of transactional memory, the transaction buffer is organized fully
associative. Evaluation in an FPGA shows that an associativity of up to 64-way is possible without degrading the overall system
performance. The paper presents synthesis results for different RTTM configurations and different number of processor cores
in the CMP system. A CMP system with up to 8 processor cores with RTTM support is feasible in an Altera Cyclone-II FPGA.

 

Download: PDF , doi

 


Concurrent, Parallel, Real-Time Garbage-Collection (Dr. Fridtjof Siebert)
International Symposium on Memory Management (ISMM), Toronto, 5-6 June 2010

Abstract:
With the current developments in CPU implementations, it becomes obvious that ever more parallel multicore systems will be used even in embedded controllers that require real-time guarantees. When garbage collection is used in these systems, parallel and concurrent garbage collection brings important performance advantages in the average case. In a real-time system, however, guarantees on the GC’s performance in the worst case are required.

This paper explains how the single-CPU real-time GC of the Java implementation JamaicaVM was changed to make it a hard real-time garbage collector that is parallel and concurrent. Parallel means that an arbitrary number of CPUs may perform GC work in parallel, while concurrent means that the GC work can be performed concurrently to the application code without preempting the application. In addition, the single units of work that this garbage collector has to perform are very small and uniform and the total amount of GC work is bounded by a function of the heap size, such that it becomes possible for any application that has a bounded amount of reachable memory to run the GC work such
that suf?cient GC progress can be ensured for the application never to run out of heap space.

 

Download: PDF (170 kB)

 


Challenges for the Real­Time Software Development for Multicore Systems (Dr. Fridtjof Siebert)
MultiCoreExpo 2010, San Jose, 27.-29. April 2010


Abstract:
Multicore systems introduce new challenges for the developers of embedded systems. Problems and techniques to avoid them will be presented. Mechanisms present in programming languages such as C/C++ and Java will be compared. Examples include the importance of the memory model provided by the language implementation and how it may cause false program behaviour. The effects of compiler optimization, atomic operations, and the volatile modifier will be explained. The use of multicore systems may have surprising negative effects on performance. How these effects can be avoided by avoiding synchronization or using lock-free algorithms will be presented. The implications that come from real-time requirements in embedded systems on these techniques will be taken into special account.

Download Slides: PDF (1.0 Mb)

 


Multicore Systems -- Challenges \\for the Real-Time Software Developer (Dr. Fridtjof Siebert)
ERTS² 2010 , Toulouse, 19.-20. May 2010

Abstract:
Multicore systems have become the norm for desktop computer systems. The percentage of multicore systems in the embedded domain is still
marginal, but growing at an incredible pace such that multicore will become the norm in the embedded area as well. However, embedded systems have additional requirements with respect to safety, reliability, and real-time behaviour. The use of parallel multicore systems introduces new challenges to the embedded systems developer who has to fulfil these requirements when developing new software or porting existing code to multicore systems.

This paper gives an overview over typical problems that arise with the use of multicore systems. Unfortunately, a multicore system in many cases does not behave the same as a single core system with multithreading. These differences may result in severe program errors. A good understanding of the sources of these errors is required.

It will be explained how problems on multicore systems can be avoided. Mechanisms present in different programming languages such as C/C++ and Java will be presented. An important role is played by the memory model provided by the language implementation and how this may manifest in false program behaviour. The memory model permits different compiler and CPU optimisation on atomic and non-atomic operations, and allow different semantics of the volatile modifier.

The use of multicore systems may have very surprising negative effects on performance. The paper describes these effects and how they can be
avoided by avoiding synchronization or using lock-free algorithms. The implications that come from the real-time requirements on these techniques will be taken into special account.

Download: PDF (160 kB)

 


WCET driven design space exploration of an object cache (Benedikt Huber, Wolfgang Puffitsch, and Martin Schoeberl)
In Proceedings of the 8th International Workshop on Java Technologies for Real-time and Embedded Systems (JTRES 2010), pages 26–35, New York, NY, USA, 2010. ACM.

 

Abstract:

In order to guarantee that real-time systems meet their timing specification, static execution time bounds need to be calculated. Not considering execution time predictability led to architectures which perform well in the average case, but require very pessimistic assumptions when bounding the worst-case execution time (WCET).

Computer architecture design is driven by simulations of standard benchmarks estimating the expected average case performance. The design decisions derived from this design methodology do not necessarily result in a WCET analysis-friendly design. Aiming for a time-predictable computer architecture, we propose to employ WCET analysis techniques for the design space exploration of processor architectures. We exemplify this approach by a WCET driven design of a cache for heap allocated objects.

Depending on the main memory properties (latency and bandwidth), different cache organizations result in the lowest WCET. The evaluation reveals that for certain cache configurations, the analyzed hit rate is comparable to the average case hit rate obtained by measurements. We believe that an early architecture exploration by means of static timing analysis techniques helps to identify configurations suitable for hard real-time systems.

 

Download: PDF , doi

 


Cyclic executive for safety-critical Java on chip-multiprocessors (Anders P. Ravn and Martin Schoeberl)
In Proceedings of the 8th International Workshop on Java Technologies for Real-time and Embedded Systems (JTRES 2010), pages 63–69, New York, NY, USA, 2010. ACM.

 

Abstract:

Chip-multiprocessors offer increased processing power at a low cost. However, in order to use them for real-time systems, tasks have to be scheduled efficiently and predictably. It is well-known that finding optimal schedules is a computationally hard problem. In this paper, we present a solution, that uses model checking to find a static schedule, if one exists at all, which gives an implementation of a table driven multiprocessor scheduler. To evaluate the proposed cyclic executive for multiprocessors we have implemented it in the context of safety-critical Java on a Java processor.

 

Download: PDF , doi

 


Worst-case abalysis of heap allocations (Wolfgang Puffitsch, Benedikt Huber, and Martin Schoeberl)
In Proceedings of the 4th International Symposium On Leveraging Applications of Formal Methods, Verification and Validation (ISoLA 2010), 2010.

 

Abstract:

In object oriented languages, dynamic memory allocation is a fundamental concept. When using such a language in hard real-time
systems, it becomes important to bound both the worst-case execution time and the worst-case memory consumption. In this paper, we present
an analysis to determine the worst-case heap allocations of tasks. The analysis builds upon techniques that are well established for worst-case
execution time analysis. The di erence is that the cost function is not the execution time of instructions in clock cycles, but the allocation in
bytes. In contrast to worst-case execution time analysis, worst-case heap allocation analysis is not processor dependent. However, the cost function
depends on the object layout of the runtime system. The analysis is evaluated with several real-time benchmarks to establish the usefulness
of the analysis, and to compare the memory consumption of di erent object layouts.

 

Download: PDF

 


A locality model for the real-time specification for Java (Malik, Abdul Haseeb, Wellings, Andy, Chang, Yang)
Proceedings of the 8th International Workshop on Java Technologies for Real-Time and Embedded Systems 2010 (JTRES 2010)

Abstract:
The memory architecture of non-uniform memory access (NUMA) systems cause applications to experience variable delays when accessing the main memory. The Real-Time Specification for Java assumes that all memory is uniformly accessed and provides limited support to control the allocation policies of threads and objects. As a result, programmers are unable to predict the behaviour of applications running on NUMA systems. This paper proposes a framework which gives visibility and more control to the programmers over the allocation policies of threads and objects on NUMA systems. A prototype implementation running on top of jRate and Linux has been evaluated on a 16 processor cc-NUMA platform by using the Sieve of Eratosthenes algorithm. Our initial results show that a 4 fold improvement in performance can be obtained by giving the programmer control over thread and object placement.

 

Download: PDF , bibtex

 


Asynchronous event handling and safety critical Java (Wellings, Andy, Kim, MinSeong)
Proceedings of the 8th International Workshop on Java Technologies for Real-Time and Embedded Systems 2010 (JTRES 2010)

Abstract:
Over the last few years, JSR 302 has been developing a subset of Java augmented by the RTSJ for use in safety critical systems. The concurrency model supported by Safety Critical Java (SCJ) relies, almost exclusively, on an event-based model rather than a thread-based model. This paper reviews the advantages and disadvantages of the two models and gives the pragmatic reasons why SCJ has adopted the former model. It argues that by basing the SCJ classes on the RTSJ's BoundAsyncEvent class, some inconsistencies exist between the SCJ and the RTSJ models. Furthermore, some of the optimization that are possible when mapping handlers to server threads are inhibited, even though the programming restrictions necessary for these optimization are imposed by the SCJ specification. A revised model is presented that has a slightly more complicated API but is more consistent with the RTSJ and does allow the optimizations. However, there is a resulting increase in the necessary run-time support, particularly for multiprocessor implementations.

Download: PDF , bibtex

 


Improved Schedulability Analysis for Multiprocessor Systems with Resource Sharing (Y. Chang, R.I. Davis, A. Wellings)
Accepted for 18th International Conference on Real-Time and Network Systems, 2010 (RTNS 2010)

Abstract:
This report presents our recent efforts to close the gap between the state-of-the-art homogeneous (or identical) multiprocessor and uniprocessor schedulability analyses in the context of resource sharing. Although many multiprocessor resource sharing protocols have been proposed, their impact on the schedulability of real-time tasks is largely ignored in most existing literature. Recently, work has been done to integrate queue locks (FIFO-queue-based non-preemptive spin locks) with multiprocessor schedulability analysis but the techniques used introduce a substantial amount of pessimism, some of which, as explained in this report, can be easily eliminated. For global fixed task priority preemptive multiprocessor systems, this pessimism impacts low priority tasks, greatly reducing the number of tasksets that can be recognised as schedulable. We develop a new schedulability analysis lp-CDW to target this issue specifically. By combing lp-CDW with existing techniques, we significantly increase the number of tasksets that can be recognised as schedulable.

Download: PDF , bibtex

 


Investigating average versus worst-case timing behavior of data caches and data scratchpads (Jack Whitham, Neil Audsley)
Proc. ECRTS 2010

Abstract:
This paper shows that a program using a time predictable memory system for data storage can achieve a similar worst-case execution time (WCET) to the average-case execution time (ACET) using a conventional heuristic-based memory system including a data cache. This result is useful within any embedded system where time-predictability and performance are both important, particularly hard real-time systems carrying out intensive data processing activities. It is a counter-example to the conventional wisdom that time-predictable means “slow” in comparison to ACET-focused heuristics. To carry out the investigation, 36 “memory access models” are derived from benchmark programs and assumed to be representative of typical code. The models generate LOAD/STORE instructions to exercise a data cache or scratchpad memory management unit (SMMU). The ACET is determined for the data cache and the WCET is determined for the SMMU. After improvements are applied, results show that the SMMU WCET is within 5% of the data cache ACET for 34 models. In 16 of 36 cases, the SMMU WCET is better than the data cache ACET.

Download: PDF , bibtex

 


Studying the Applicability of the Scratchpad Memory Management Unit (Jack Whitham, Neil Audsley)
Proc. RTAS 2010


Abstract:
A combination of a scratchpad and scratchpad memory management unit (SMMU) has been proposed as a way to implement fast and time-predictable memory access operations in programs that use dynamic data structures. A memory access operation is time-predictable if its execution time is known or bounded–this is important within a hard real-time task so that the worst-case execution time (WCET) can be determined. However, the requirement for time-predictability does not remove the conventional requirement for efficiency: operations must be serviced as quickly as possible under worst-case conditions. This paper studies the capabilities of the SMMU when applied to a number of benchmark programs. A new allocation algorithm is proposed to dynamically manage the scratchpad space. In many cases, the SMMU vastly reduces the number of accesses to dynamic data structures stored in external memory along the worst-case execution path (WCEP). Across all the benchmarks, an average of 47% of accesses are rerouted to scratchpad, with nearly 100% for some programs. In previous scratchpad-based work, time-predictability could only be assured for these operations using external memory. The paper also examines situations in which the SMMU does not perform so well, and discusses how these could be addressed.

Download: PDF , bibtex

 


RTTM: Real-time transactional memory (Martin Schoeberl, Florian Brandner, and Jan Vitek)
In Proceedings of the 25th ACM Symposium on Applied Computing, Sierre, Switzerland, March 2010. ACM Press. (ACM 2010)

 

Abstract:

Hardware transactional memory is a promising synchronization technology for chip-multiprocessors. It simplifies programming of concurrent applications and allows for higher concurrency than lock based synchronization. Standard transactional memory is optimized for average case throughput, but for real-time systems we are interested in worst-case execution times. We propose real-time transactional memory (RTTM) as a time-predictable synchronization solution for chip-multiprocessors in real-time systems. We define the hardware for time-predictable transactions and provide a bound for the maximum transaction retries. The proposed RTTM is evaluated with a simulation of a Java chip-multiprocessor.

 

Download: PDF , doi

 


A Survey of Hard Real-Time Scheduling Algorithms and Schedulability Analysis Techniques for Multiprocessor Systems (R.I. Davis, A. Burns)
Accepted for ACM Computer Surveys

Abstract:
This survey covers hard real-time scheduling algorithms and schedulability analysis techniques for homogeneous multiprocessor systems. It reviews the key results in this field from its origins in the late 1960's to the latest research published in late 2009. The survey outlines fundamental results about multiprocessor real-time scheduling that hold independent of the scheduling algorithms employed. It provides a taxonomy of the different scheduling methods, and considers the various performance metrics that can be used for comparison purposes. A detailed review is provided covering partitioned, global, and hybrid scheduling algorithms, approaches to resource sharing, and the latest results from empirical investigations. The survey identifies open issues, key research challenges and likely productive research directions.

Download: PDF , bibtex

 


Unit testing for multi-threaded Java programs (Gábor Szeder)
PADTAD'09: Proceedings of the 7th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging 2009

Abstract:
Unit testing focuses on testing the smallest units that comprise a software system and allows developers to detect bugs early in the development process.  However, unit testing is unsuitable for detecting concurrency bugs in multi-threaded programs. Most concurrency bug detection tools work only with full programs. Hence, they can only be used at a later phase of development, thus postponing the detection of concurrency bugs.

This paper presents a unit testing framework for detecting concurrency bugs in multi-threaded Java programs. Concurrency bugs are uncovered by applying model checking techniques to explore all possible execution paths of concurrent unit tests. We have implemented our approach based on the JUnit unit testing framework and the Java PathFinder software model checker. Concurrent unit tests can be integrated seamlessly into existing unit test suites. Our approach does not report false positives.

Download: PDF , bibtex

 


Using Hardware Methods to Improve Time-predictable Performance in Real-time Java Systems (Jack Whitham, Neil Audsley, Martin Schoeberl)
Proc. JTRES 2009

Abstract:
This paper describes hardware methods, a lightweight and platform-independent scheme for linking real-time Java code to co-processors implemented using a hardware description language (HDL). Intended for use in embedded systems, hardware methods have similar semantics to the native methods used to interface Java code to legacy C/C++ software, but are also time-predictable, facilitating accurate worst-case execution time (WCET) analysis. By reference to several examples, the paper demonstrates the applicability of hardware methods and shows that they can (1) reduce the WCET of embedded real-time Java, and (2) improve the quality of WCET estimates in the presence of infeasible paths.

Download: PDF , bibtex

 


Enhancing the platform independence of the real-time specification for Java (A.J. Wellings, Y. Chang, T. Richardson)
Proceedings of the 7th International Workshop on Java Technologies for Real-Time and Embedded Systems 2009 (JTRES 2009)

Abstract:
The Real-Time Specification for Java (RTSJ) extends Java's support for platform-independence into the realms of real-time systems. It supports an environment that facilitates on-line feasibility analysis. However, an RTSJ program is normally run on an execution platform that might be shared with other (possibly non Java) applications. This paper explores how the notion of service contracts can be used to enhance the platform-independence of RTSJ applications. It also considers the role of real-time application components within an RTSJ environment.

Download: PDF , bibtex

 


Priority Assignment for Global Fixed Priority Pre-emptive Scheduling in Multiprocessor Real-Time Systems (R.I.Davis, A.Burns)
Real-Time Systems Symposium (RTSS) 2009

Abstract:
This paper addresses the problem of priority assignment in multiprocessor real-time systems using global fixed task-priority pre-emptive scheduling. In this paper, we prove that Audsley's Optimal Priority Assignment (OPA) algorithm, originally devised for uniprocessor scheduling, is applicable to the multiprocessor case, provided that three conditions hold with respect to the schedulability tests used. Our empirical investigations show that the combination of optimal priority assignment policy and a simple compatible schedulability test is highly effective, in terms of the number of tasksets deemed to be schedulable. We also examine the performance of heuristic priority assignment policies such as Deadline Monotonic, and an extension of the TkC priority assignment policy called DkC that can be used with any schedulability test. Here we find that Deadline Monotonic priority assignment has relatively poor performance in the multiprocessor case, while DkC priority assignment is highly effective.

Download: N/A , bibtex

 


Implementing Time-predictable Load and Store Operations (Jack Whitham, Neil Audsley)

Proc. EMSOFT 2009


Abstract:
Scratchpads have been widely proposed as an alternative to caches for embedded systems. Advantages of scratchpads include reduced energy consumption in comparison to a cache and access latencies that are independent of the preceding memory access pattern. The latter property makes memory accesses time-predictable, which is useful for hard real-time tasks as the worst-case execution time (WCET) must be safely estimated in order to check that the system will meet timing requirements. However, data must be explicitly moved between scratchpad and external memory as a task executes in order to make best use of the limited scratchpad space. When dynamic data is moved, issues such as pointer aliasing and pointer invalidation become problematic. Previous work has proposed solutions that are not suitable for hard real-time tasks because memory accesses are not time-predictable. This paper proposes the scratchpad memory management unit (SMMU) as an enhancement to scratchpad technology. The SMMU implements an alternative solution to the pointer aliasing and pointer invalidation problems which (1) does not require whole-program pointer analysis and (2) makes every memory access operation time-predictable. This allows WCET analysis to be applied to hard-real time tasks which use a scratchpad and dynamic data, but results are also applicable in the wider context of minimizing energy consumption or average execution time. Experiments using C software show that the combination of an SMMU and scratchpad compares favorably with the best and worst case performance of a conventional data cache.

Download: PDF , bibtex

 


JEOPARD - Java Enviroment for Parallel Real-Time Development  (Invited Paper) (Fridtjof Siebert)

ISORC 2009, Tokyo, Japan, 17-20 March 2009


Abstract:

Multicore systems have become standard for desktop computers today. Current operating systems and software development tools provide straightforward means to use the additional computing power. However, a more fundamental change in the design and development of software is required to fully exploit the power of multicore systems. Furthermore, the fast growing market of embedded systems is currently largely unaffected by the introduction of multicore systems. This will change quickly in the future, which will mean that there will be a demand on ef?cient development of reliable embedded software that can give real-time guarantees and exploit the available power on multicore systems. The JEOPARD project addresses this demand by devel- oping Java software tools to exploit multicore power while ensuring correctness and predictable timing. This paper gives an overview of the JEOPARD project and focuses on key technical issues such as real-time scheduling and real-time garbage collection on multi-core systems.

Download: PDF (349kB)

 


A hardware abstraction layer in Java (Martin Schoeberl, Stephan Korsholm, Tomas Kalibera, and Anders P. Ravn)
Trans. on Embedded Computing Sys., 42 pages, accepted 2009.

 

Download: PDF

 


Non-blocking real-time garbage collection (Martin Schoeberl and Wolfgang Puffitsch)

Trans. on Embedded Computing Sys., 26 pages,accepted, 2009.

 


A real-time Javachip-multiprocessor (Christof Pitter and Martin Schoeberl)

Trans. on Embedded Computing Sys., 31 pages,accepted, 2009

 


Towards time-predictable data caches for chip-multiprocessors (Martin Schoeberl, Wolfgang Puffitsch, and Benedikt Huber)
In Proceedings of the Seventh IFIP Workshop on Software Technologies for Future Embedded and Ubiquitous Systems (SEUS 2009), 2009. Springer, November 2009.

 

Abstract:

Future embedded systems are expected to use chip-multiprocessors to provide the execution power for increasingly demanding applications. Multiprocessors increase the pressure on the memory bandwidth and processor local caching is mandatory. However, data caches are known to be very hard to integrate into the worst-case execution time (WCET) analysis. We tackle this issue from the computer architecture side: provide a data cache organization that enables tight WCET analysis. Similar to the cache splitting between instruction and data, we argue to split the data cache for different data areas. In this paper we show cache simulation results for the split-cache organization, propose the modularization of the data cache analysis for the different data areas, and evaluate the implementation costs in a prototype chip-multiprocessor system.

 

Download: PDF

 


A single-path chip-multiprocessor system (Martin Schoeberl, Peter Puschner, and Raimund Kirner)
In Proceedings of the Seventh IFIP Workshop on Software Technologies for Future Embedded and Ubiquitous Systems (SEUS 2009), 2009. Springer, November 2009.

 

Abstract:

In this paper we explore the combination of a time-predictable chipmultiprocessor system with the single-path programming paradigm. Time-sliced
arbitration of the main memory access provides time-predictable memory load and store instructions. Single-path programming avoids control flow dependent timing variations. To keep the execution time of tasks constant, even in the case of shared memory access of several processor cores, the tasks on the cores are synchronized with the time-sliced memory arbitration unit.

 

Download: PDF

 


Using hardware methods to improve timepredictable performance in real-time Java systems. (Jack Whitham, Neil Audsley, and Martin Schoeberl)
In Proceedings of the 7th International Workshop on Java Technologies for Real-time and Embedded Systems (JTRES 2009), Madrid, Spain, September 2009. ACM Press. 

 

Abstract:

This paper describes hardware methods, a lightweight and platform-independent scheme for linking real-time Java code to co-processors implemented using a hardware description language (HDL). Intended for use in embedded systems, hardware methods have similar semantics to the native methods used to interface Java code to legacy C/C++ software, but are also time-predictable, facilitating accurate worstcase execution time (WCET) analysis. By reference to several examples, the paper demonstrates the applicability of hardware methods and shows that they can (1) reduce the WCET of  embedded real-time Java, and (2) improve the quality of WCET estimates in the presence of infeasible paths.

 

Download: PDF

 


Is chip-multiprocessing the end of real-time scheduling? (Martin Schoeberl and Peter Puschner)
In Proceedings of the 9th International Workshop on Worst-Case Execution Time (WCET) Analysis, Dublin, Ireland, July 2009. OCG.

 

Abstract:

Chip-multiprocessing is considered the future path for performance enhancements in computer architecture. Eight processor cores on a single chip are state-of-the art and several hundreds of cores on a single die are expected in the near future. General purpose computing is facing the challenge how
to use the many cores. However, in embedded real-time systems thread-level parallelism is naturally used. In this paper we assume a system where we can dedicate a single core for each thread. In that case classic real-time scheduling disappears. However, the threads, running on their dedicated
core, still compete for a shared resource, the main memory. A time-sliced memory arbiter is used to avoid timing influences between threads. The schedule of the arbiter is integrated into the worst-case execution time (WCET) analysis. The WCET results are used as a feedback to regenerate the arbiter schedule. Therefore, we schedule memory access instead of CPU time.

 

Download: PDF, bibtex

 


Time-predictable cache organization (Martin Schoeberl)

In Proceedings of the First International Workshop on Software Technologies for Future Dependable Distributed Systems (STFSSD 2009), Tokyo, Japan, March 2009. IEEE Computer Society. Invited paper.

 

Abstract:

Caches are a mandatory feature of current processors to deliver instructions and data to a fast processor pipeline. However, standard cache organizations are designed to increase the average case performance. They are hard to model for worst-case execution time (WCET) analysis.
Unknown abstract cache states during the analysis result in conservative WCET bounds. Therefore, we propose to adapt the cache organization to simplify the analysis. The data cache is split into several independent caches for the stack, static data, constants, and heap allocated data.

 

Download: PDF, bibtex

 


Thread-local scope caching for real-time Java (Andy Wellings and Martin Schoeberl)

In Proceedings of the 12th IEEE International Symposium on Object/component/service-oriented Real-time distributed Computing ISORC 2009, Tokyo, Japan, 17-20 March 2009 . IEEE Computer Society.

 

Abstract:

There is increasing convergence between the fields of parallel and embedded computing. The demand for more functionality in embedded devices means that complex multicore architectures will be used. In order to promote scalability and obtain predictability, on-chip processor-local private
memory subsystems will be used. Whilst at the hardware level this is technical feasible, the more pressing problem is how such memory is presented to the programmer and how its local access is policed. In this paper we illustrate how Java augmented by the Real-time Specification for Java can be used to present the abstraction of a thread-local scoped memory area. We show how to enforce access to the memory area to a single realtime thread. We implement the model on the JOP multiprocessor system and report on our experiences.

 

Download: PDF, bibtex

 


Limits of Parallel Marking Garbage Collection (Fridtjof Siebert)

ISMM'08 - International Symposium on Memory Management - 7-8 June 2008, Tucson, Arizona, USA

 

Abstract:
More and more, parallel multicore systems will be used even in low-end devices such as embedded controllers that require realtime guarantees. When garbage collection is used in these systems, parallel or concurrent garbage collection brings important performance advantages. In the context of realtime systems, it has to be shown that a parallel garbage collector implementation not only performs well in most cases, but guarantees on its performance in the worst case are required.

This paper analyses the difficulties a parallel mark-and-sweep garbage collector faces during a parallel mark phase. The performance of such a garbage collector degrades when only some of the available processors can perform scanning work in the mark phase. Whenever the $grey$ set contains fewer elements than the number of available processors, some processors may be stalled waiting for new objects to be added to the $grey$ set. This paper gives an upper bound for the number of stalls that may occur as a function of simple properties of the memory graph.

This upper bound is then determined for the Java applications that are part of the SPECjvm98 benchmark suite and the theoretical worst-case scalability of a parallel mark phase is analysed. The presented approach is then applied to a Java virtual machine that has uniform mark steps, which at first results in poor worst-case scalability. A small change in the implementation is then proposed and analysed to achieve good scalability even in the worst case.

 

Download: PDF (231kB)

 


Performance Evaluation of a Java Chip-Multiprocessor (Christof Pitter and Martin Schoeberl)

Proceedings of the 3rd IEEE Symposium on Industrial Embedded Systems (SIES 2008)
Date: Jun. 2008

 

Abstract:

Chip multiprocessing design is an emerging trend for embedded systems. In this paper, we introduce a Java multiprocessor system-on-chip called JopCMP. It is a symmetric shared-memory multiprocessor and consists of up to 8 Java Optimized Processor (JOP) cores, an arbitration control device,
and a global shared memory. All components are interconnected with a system-on-chip bus. This paper focuses on the performance evaluation of different hardware configurations of the multicore system. Therefore, we vary the instruction cache sizes, the number of processors and the memory bandwidth. Within our experiments, we measure the performance by running three benchmarks on real hardware: an embedded application from industry, a computationally intensive matrix multiplication and a synthetic benchmark that continuously accesses a shared data structure. Two different
field-programmable gate arrays are used for the presented experiments. Our results illustrate the promises and limits of the proposed multiprocessor architecture concerning synchronization, memory bandwidth and caching. Furthermore, we compare the performance and size of JopCMP with a complex Java processor.

 

Download: PDF, bibtex

 


Non-blocking Object Copy for Real-Time Garbage Collection (Martin Schoeberl and Wolfgang Puffitsch)

Proceedings of the 6th International Workshop on Java Technologies for Real-time and Embedded Systems (JTRES 2008)
Date: September 2008

 

Abstract:

A real-time garbage collector has to fulfill two conflicting properties: avoid heap fragmentation and provide short blocking time. The heap needs to be compacted to avoid probably unbounded fragmentation. During compaction all objects are copied; copying is usually performed atomically to avoid interference with mutator threads. Copying of large objects and especially large arrays introduces long blocking times that are unacceptable for real-time
systems. In this paper an interruptible copy unit is presented that implements non-blocking object copy. The unit intercepts object and array field
access and redirects the access either to the source or destination part of the moving object. The unit can be interrupted after a single word move. The resulting maximum blocking time is the time for a word read and write. We have implemented the proposed non-blocking copy unit in the Java processor JOP and are able to high priority real-time tasks at 10 kHz parallel to the garbage collection task on a 100 MHz system.

 

Download: PDF

 


Non-Blocking Root Scanning for Real-Time Garbage Collection (Wolfgang Puffitsch and Martin Schoeberl)

Proceedings of the 6th International Workshop on Java Technologies for Real-time and Embedded Systems (JTRES 2008)
Date: September 2008

 

Abstract:

Root scanning is a well known source of blocking times due to garbage collection. In this paper, we show that root scanning only needs to be atomic with respect to the thread whose stack is scanned. We propose two solutions to utilize this fact: (a) block only the thread whose stack is scanned, or (b) shift the responsibility for root scanning from the garbage collector to the application threads. The latter solution eliminates blocking due to root scanning completely. Furthermore, we show that a snapshot-atbeginning write barrier is sufficient to ensure the consistency of the root set even if local root sets are scanned independently of each other. The impact of solution (b) on the execution time of a garbage collector is shown for two different variants of the root scanning algorithm. Finally, we evaluate the resulting real-time garbage collector in a real system to confirm our theoretical findings.

 

Download: PDF


Decoupled Root Scanning in Multi-Processor Systems (Wolfgang Puffitsch)

Proceedings of the 2008 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES '08)
Date: October 2008

 
Copyright © 2017 Jeopard - Java Environment for Parallel Realtime Development. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.