SINGLE CHIP PARALLEL COMPUTERS

Get Complete Project Material File(s) Now! »

To fill this gap of halted performance in processing capacity of single microprocessor, single chip parallel computers were introduced, known as chip multiprocessors or multicore processors. The theme of this type of architecture is to put two or more processors onto a single chip. The architecture defined in single chip processors is similar to shared memory multiprocessors. The number of processors that can be fixed on a chip can be increased and the number of instructions that can be processed in a second will also keep on increasing according to Moore’s law [15], even without increasing clock speed. If we want higher performance, we can add more processors according to this architecture.
Although people have worked with parallel computing structures for more than 40 years, there are not many well appreciated programs written for this architecture. It’s a tough job to write programs for parallel architectures as compared to sequential architecture. Coding, debugging, testing parallel programs is tough, because there are not well defined standards to debug and test parallel programs [3].

Database Systems and Transactions

In [17], Herb Sutter and James Larus pointed out, The concurrency revolution is primarily a software revolution. The problem is not building multicore hardware, but programming it in a way that lets mainstream applications benefit from the continued exponential growth in CPU performance. To write programs for parallel systems has been a difficult task, but on the other side database community has been using concurrency for decades.
Databases are working successfully on parallel and sequential architectures. All the concurrency control mechanisms are handled implicitly by the database. At the core of database systems are transactions. A transaction is a set of instructions executed as a whole and used to access and modify concurrent objects. A transaction can be successful when it changes some state in underlying database, and non-successful or aborted when there is no change in the state of a database. Transactions are implemented through a database system or by a transaction monitor, which hides complex details from a user and provides a simple interface to communicate with [3].
As databases were dealing well with transactions on both sequential and parallel architectures, and providing satisfactory results. Therefore, using transactions in the memory was considered a good idea to employ with parallel architectures, so the basic idea of transactional memory is taken from database transactions. Database transactions and transactional memory differ in some aspects because of their basic design needs. Database transactions are saved on disks which can afford to take more time to execute a transaction as compared to transactional memory works in memory which have trivial time to complete a transaction. A database transaction constitutes of a set of instructions that are indivisible and instantaneous. Database transactions have four basic properties Atomicity, Consistency, Isolation and Durability collectively called ACID.
Atomicity: Atomicity means that either all actions are completed successfully or none of them starts its execution. If a transaction fails partially due to some reasons then it fails completely and needs to be restarted.
Consistency: Consistency means transition from one stable state to another stable state. If there is some transaction taken place in a database or memory, then that database or memory will only be considered in a consistent state when the transaction is executed or aborted successfully.
Isolation: Isolation means a transaction is producing results correctly without intervention of other transactions. Running parallel transactions will have no effect on the working of other transactions. A successful transaction will not have any effect and interference on concurrent transactions during its execution.
Durability: The final property of database systems is durability and it is only specific to database transactions. Durability means if some changes have taken place in a database then these are made permanent. This proper is only needed in databases, because memory transactions become obsolete as soon as the system shuts down.

Transactions vs. Locks

The following figure 1 taken from [1] gives us an idea of how the performance improves as we compare transactions with locking (coarse-grained and fine-grained).
Figure 1: Performance of Transactions vs. Locks [1]
In this figure three different versions of Hash Map are compared. It compares time different versions take to complete a fixed set of insert, update and delete operations on a 16-way Symmetric Multiprocessor (SMP) machine. As the figure shows, increasing number of processors has no effect on coarse-grain while fine-grain and transactions give better performance. Thus coarse-grain locking is not scalable whereas fine-grain locking and transactions are scalable. That’s why according to Adl-Tabatabai in [1] transactions give us same results as lock-grain with less programming effort.
It is hard and time consuming to select an appropriate locking strategy for any given problem, and it becomes even more difficult by following additional challenges of the lock-based programming language as presented in [18]. Due to the below mentioned problems and drawbacks, lock-based parallel programming is not a suitable paradigm for an average programmer.
• Deadlock primarily occurs when two or more threads acquire locks which are required by some other threads in order to proceed, and it causes a state known as circular dependence which is hard to satisfy. As the entire threads wait until lock is released by the other thread, none of threads can make any sort of progress which results in application hang. Deadlock may easily arise if fine-grained locking is used and no strict method of lock acquisitions is enforced. If such methods are not sufficient, resolution schemes and deadlock detection can provide backup in this regard. It is worth mentioning that these schemes are quite difficult to implement and are also vulnerable to live locks, specifically where threads frequently interfere with each other and as a result demising the progress.
• Convoying occurs upon de-scheduling of a thread holding a lock. During sleep, all other threads execute until and unless they require a lock, due to which many threads had to wait for the acquisition of same lock. As the lock releases, all the threads in a wait contend for this lock which thus causing excessive context switching. However, unlike deadlocks, application continues to progress but at relatively slower pace.
• Priority inversion occurs when a thread of lower priority holds a lock which is required by some other thread having high priority. In such a scenario, high priority threads will have to discontinue its execution until the lock is released by lower priority thread causing its effective and temporary demotion to the priority level of other thread. In other scenario, if a thread having medium priority is present it may further delay both high and low priority threads and cause inversion of medium and high priorities. There is a problem in priority inversion when discussing it for real time systems, because a thread having high priority may be blocked thus breaching time and response guarantees. However, for general purpose computing, these high priority threads are quite often used to accommodate user interaction tasks not the critical ones. Priority reduction may affect the performance of an application.
• Lock based code cannot be considered as composable. It means that combining lock protected atomic operations into operations having large magnitude and still remain atomic is quite impossible.
• Finally, lock-based code is quite susceptible to faults and failures which are known as fault tolerance. In a case when a single thread holding a lock fails, all of the other threads requiring that particular lock will eventually stop making progress. Failures are likely to increase as the numbers of processors are growing in parallel machines.

Transactional Memory

Transactional memory is a lock free synchronization mechanism defined for multiprocessor architectures. It is an alternative to lock-based programming. It provides programmers the ease of using read-modify-write operations on independently chosen words of memory. Transactional memory makes parallel programming easy by allowing programmers to enclose multiple statements accessing shared memory into transactions. Isolation is the primary task of transactions however, failure atomicity and consistency are also important. In a TM system a failure atomicity provides automatic recovery on errors. But if a transaction is in an inconsistent state then it will not be possible for a written program to produce consistent and correct results. If a transaction fails it can leave results in an inconsistent state. There should be some proper mechanism to revert changes to a previous consistent state.
Many proposed TM systems exist, ranging from full hardware solution (HTM) [19-22] to full software approach (STM) [23-26]. Hybrid TM (HyTM) [6, 27-29] is an additional loom which combines the best of both hardware and software i.e. the performance of HTM and the virtualization, cost, and flexibility of STM.

Hardware Transactional Memory

The idea of hardware transactional memory was introduced by Herlihy and Moss [4] in 1993. Hardware transactional memory (HTM) was first presented as a cache and cache-coherency mechanism to ease lock-free synchronization [30]. The HTM system must provide atomicity and isolation properties for application threads to operate on shared data without sacrificing concurrency. It supports atomicity through architectural means [19], and proposes strong isolation. It also provides an outstanding performance with a little overhead. However, it is often not efficient in generality. It bounds TM implementations to hardware to keep the speculative updated state and as such is fast but suffer from resource limitations [31].
Modern HTMs are divided into different categories that support unbounded transactions and those that support large but bounded transactions. Most of them concentrate on a mechanism to enlarge the buffering for transactions seamlessly [3]. Bounded HTMs enforce limits on transaction working set size, ensuring that transactions following this set size will be able to commit. Best-effort HTMs implement limits by leverage available memory already present in L1 and L2 caches. Unbounded HTMs have been proposed recently that is contrary to bounded HTMs, it allows a transaction to survive context switch events [3]. However to implement these systems is complex and costly. It is most likely that HTMs will prevail as STMs are particularly gaining a lot of attention these days. These HTMs can effectively be utilized with the existing hardware products and also provide an early prospect of gaining experience by utilizing actual TM systems and programming models.

READ Environmental Kuznets Curve

Software Transactional Memory

The idea of software transactional memory was introduced in 1995 by N. Shavit and D. Touitou [5]. Software transactional memory (STM) implements TM mechanisms in software without imposing any hardware requirements. Since all TM mechanisms are implemented entirely in software without having any particular hardware requirements, STMs offers a better flexibility and generality as all mechanisms are implemented in the entire software. The STM can be defined as non-blocking synchronization mechanism where sequential objects are automatically converted into concurrent objects. In STM, a transaction is a finite sequence of instructions which atomically modifies a set of concurrent objects [32]. The STM system supports atomicity through languages, compilers, and libraries [19].
The recent research in STM systems has focused mainly on realistic dynamic transactions. The latest work in STM systems has made them a perfect tool for experimenting and prototyping. As software is more flexible than hardware, it is possible to implement and experiment it on new algorithms. It supports different features like garbage collection that are already available in different languages [3]. In addition, STMs are based on a very critical component being used by number of hybrid TMs, which provide leverage to HTM hardware. Because of this perspective STMs provide a basic foundation to build more efficient HTMs. As a matter of fact, primarily STM systems are considered for this thesis.
STM vs. database transactions: We believe an STM needs not to preserve its transactions to survive the crash as databases do. Concurrency analysis by Felber et al. [33] is a sensitive and crucial issue which needs full attention of the programmer.
• Durability is a challenge for database transactions. An STM system does not need to preserve its transactions to survive the crash. Therefore in STM transactions, we do not need durability as we need it in databases.
• In terms of programming languages, the database transactions run as SQL statements where each statement runs as a single transaction, different transactions cooperate with each other in order to accomplish a task. While in memory transactions, it is the responsibility of the programmer to define a block of code that runs atomically.
• In terms of Semantics, databases use serializability to protect its data from expected behavior. Serializability means each individual transaction is marked non-overlapping in time if it produces the same results as it would have been executing serially. Concurrency analysis is a sensitive and crucial issue which needs full attention of the programmer. As accessing data from transactional and non transactional code, any shortfall in concurrency analysis may lead towards totally inconsistent and devastating results. However concurrent STM transactions may lead to read-write conflicts, producing non-serialized results. STM runtime should implement recoverability theory to avoid this problem. Another problem is to handle conflicts caused by transaction reading between two updates of concurrent transactions overwriting each other.
• Transformation of transactional code is also a challenge in STM. In databases non transactional code runs inherently as a transaction. In STM this is done by either separating transactional and non transactional code or dynamically categorizing their access to shared objects. Monitoring of read access and write access is very crucial in the implementation of STM transactions. It is a challenging task to differentiate between these accesses, as even the use of encapsulation is not sufficient for their separation. To gain optimization and boost in performance, STM transactions are designed to run on multi-core systems, in contradiction to database transactions. To achieve this level of optimal performance is yet another challenging task in STM.

Hybrid Transactional Memory

Hybrid Transactional Memory (HyTM) was introduced in 2006 by P. Damron et al [6]. They worked on a new approach by which transactional memory can work on already existing systems. It has both the flavors of HTM and STM. HyTM can give the best performance and is scalable as well. The HyTM can utilize HTM properties to get better performance for transactions that do not exceed hardware limitations and can obviously execute transactions in STM. When STMs are combined with HTMs like in HyTM, they provide support for unbounded transactions without requiring any complex hardware. In HyTM small transactions are processed on lower overhead of HTMs, while larger transactions fall back onto unbounded STMs. This model of transaction handling is quite appealing in TM as it gives flexibility of adding new hardware with lower development and testing cost and decreased risk [27].

STM Design Alternatives

Design differences in STM systems can affect a system’s programming model and performance [3]. In this section we review some distinctive STM design differences that already have been explored in the literature. Our purpose is to be able to identify the impact of these design differences on system performance.

Transaction Granularity

The basic unit of storage over which an STM system detects conflicts is called transaction granularity [3]. Word-based and object-based are two classes of transaction granularity in STMs. Detecting conflicts at word level gives the highest accuracy but it also has higher communication and bookkeeping cost. In object-based STMs, resources are managed at the object level granularity. This implies that the STM uses an object oriented language approach which is more understandable, easy to implement, and less expensive.

Update Policy

A transaction normally updates an object and modifies its contents. When a transaction completes its execution successfully it updates the object’s original values with updated values. Based on the update strategy, direct update and deferred update are two alternative methods described in [3].
Direct update: In direct update, a transaction directly updates the value of an object. Direct update requires a mechanism to record the original value of an updated object, so that it can be reversed in case if a transaction aborts.
Deferred update: In deferred update, a transaction updates the value of an object in a location that is private to the transaction. The transaction ensures that it has read the updated value from this location. The value of this object is updated when a transaction commits. The transaction is updated by copying values from the private copy. In case the transaction aborts, the private copy is discarded.

Write Policy

Whenever a transaction is executed it can make some changes in shared resources. Atomically the transaction either modifies all or nothing. Committing or aborting a transaction is not always successful. That is why, in STM systems, a mechanism is provided to handle both successful commits and aborts. The two approaches that are used to handle this problem are Write-through or undo and Buffered write described in [34].
Write-through or undo: In this approach changes are directly written to the shared memory. For safe side, each transaction keeps an undo list and reverts their updates in case they need to abort. The write-through approach is really fast as changes are made directly to the shared memory. But aborting a transaction can be very expensive as all the made changes need to be undone.
Buffered write: In this approach, writes are buffered and changes are only made upon successful commit to the shared memory. Here in buffered write approach, aborting a transaction is simple as no changes are made to the shared memory. To commit a transaction values are copied from the buffer to the shared memory.

Table of contents :

INTRODUCTION
THESIS OUTLINE
1 CHAPTER 1: PROBLEM DEFINITION
1.1 PROBLEM FOCUS
1.2 AIMS AND OBJECTIVES
1.3 RESEARCH QUESTIONS
2 CHAPTER 2: BACKGROUND
2.1 SINGLE CHIP PARALLEL COMPUTERS
2.2 DATABASE SYSTEMS AND TRANSACTIONS
2.3 TRANSACTIONS VS. LOCKS
2.4 TRANSACTIONAL MEMORY
2.4.1 Hardware Transactional Memory
2.4.2 Software Transactional Memory
2.4.3 Hybrid Transactional Memory
2.5 STM DESIGN ALTERNATIVES
2.5.1 Transaction Granularity
2.5.2 Update Policy
2.5.3 Write Policy
2.5.4 Acquire Policy
2.5.5 Read Policy
2.5.6 Conflict Detection
2.5.7 Concurrency Control
2.5.8 Memory Management
2.5.9 Contention Management
3 CHAPTER 3: METHODOLOGY
3.1 QUALITATIVE RESEARCH METHODOLOGY
3.1.1 Literature Review
3.1.2 Background Study
3.1.3 Selection and Suitability of STM systems
3.1.4 Selection and Suitability of Benchmarks
3.2 QUANTITATIVE RESEARCH METHODOLOGY
3.2.1 Selection and Suitability of STM Performance Metrics
3.2.2 Experimentation
3.2.3 Analysis of Gathered Results
4 CHAPTER 4: THEORETICAL WORK
4.1 RSTM
4.1.1 RSTM Overview
4.1.2 Design Features
4.1.3 Implementation
4.2 TL2
4.2.1 TL2 Overview
4.2.2 Global Version Clock
4.2.3 TL2 – Algorithm
4.2.4 TL2 – Variants
4.3 TINYSTM
4.3.1 TinySTM Overview
4.3.2 TinySTM – Algorithm
4.3.3 Implementation
4.3.4 Hierarchical Locking
4.3.5 Dynamic Tuning
4.4 SWISSTM
4.4.1 SwissTM Overview
4.4.2 Design philosophy
4.4.3 Locking granularity
4.4.4 Contention Manager – Algorithm
4.5 STM FEATURE COMPARISON
5 CHAPTER 5: EMPIRICAL STUDY
5.1 EXPERIMENTAL PLATFORM
5.2 STAMP BENCHMARK
5.2.1 STAMP – Design
5.2.2 STAMP – Applications
5.3 APPLICATION TM CHARACTERISTICS
6 CHAPTER 6: EMPIRICAL RESULTS
6.1 A FIRST-ORDER RUNTIME ANALYSIS
6.2 ANALYZED METRICS
6.2.1 Aborts per Commit (ApC)
6.2.2 Transaction Retry Rate
6.3 VALIDITY THREATS
6.3.1 Internal Validity
6.3.2 External Validity
7 CHAPTER 7: DISCUSSION AND RELATED WORK
CONCLUSION
FUTURE WORK
REFERENCES