Get Complete Project Material File(s) Now! »
Lexicon
Because of trending topics on cloud computing, virtualization, hypervisors and so on, we will provide some definitions here. We begin with real life comparisons to give an intuition on two key concepts: virtualization and abstraction. Afterwards, we will illustrate these two concepts with computing-related examples.
Virtualization or abstraction?
In optics, a virtual image refers to an image of an actual object which appears to be located at the point of apparent divergence. In practice, the rays never converge, which is why a virtual image cannot be projected on screen (see Fig 2.1). A virtual object is a representation of an actual object that could exist, but does not.
Figure 2.1: A is a source sending light through a mirror. All the rays seem to come from A’, the virtual image of A. Thus, it seems that A is located behind the mirror, when in fact it is in front of it.
In real life, talking about television is an abstraction. Everybody (almost) has an object called television, which can represent slightly different objects. It is a screen with speakers that displays animated pictures and sounds. When talking about a television, we refer to it has its function, not what it is. For instance, no information is given on its size, resolution, display technology etc…
Definition 1 (Virtualization). Virtualizing an object consists of creating a virtual version of that object. A virtual object does not physically exist as such, but is made by software to appear to do so.
Definition 2 (Abstraction). Abstraction comes from the latin verb “abstrahere” which means “draw away”. An abstraction considers something as a general quality or character-istic, apart from concrete realities, specific objects, or actual instances.
Example of virtual systems in computing
Computer science makes a heavy use of abstractions, and virtualization. The limit of those concepts is sometimes fuzzy, but it often corresponds to links between different layers. The following sections illustrate those concepts applied to computer hardware, and more specifically memory and microprocessor.
Virtual memory
A common usage for virtualization in computing is virtual memory. Tanenbaum [1] provides an overview of virtual memory. The basic idea behind virtual memory is that combined size of program, data and stack may exceed the amount of physical memory available. The operating system keeps the parts of the program that are currently in use in the main memory, and the rest on disk. This mechanism is called swapping.
Virtual memory also allows multiprogramming, that is multiple processes running at once. Of course, if only one CPU is available, only one process will run at a time. The scheduler will implement time sharing on the CPU, to make users feel like several processes run concurrently.
Another issue tackled by virtual memory is relocation. When compiling a program, no information are provided to the linker regarding usable addresses. Additionally, programs shoud run on any system without memory restriction (size or available address space). To achieve that, virtual addresses are used. Processes manipulate virtual addresses, which are not the actual addresses where data is physically stored. The latter are called physical ad-dresses. The operating system ensures that the memory management system is configured accordingly so that virtual addresses are mapped on the physical addresses associated to the running process. Each process is free to use any virtual address, as long as there is no conflict in physical addresses1 . The physical address conflict or swapping (in case of memory exhaustion) is handled by the operating system.
Virtual CPU
There are several CPU emulators: BOCHS [2] or I will focus only on QEMU and BOCHS, which are
According to BOCHS’ description, “BOCHS is a highly portable open source IA-32 (x86) PC emulator written in C++, that runs on most popular platforms. It includes emula-tion of the Intel x86 CPU, common I/O devices, and a custom BIOS.” Basically, the code mainly consists of a large decoding-loop which models the fetch-decode-execute actions of “QEMU is a FAST! processor emulator using a portable dynamic translator.” [7] . It uses dynamic translation to native code for reasonable speed. When it first encounters a piece of code, QEMUconverts it to the host instruction set. Usually dynamic translators are very complicated and highly CPU dependent. QEMUuses some tricks which make it relatively easily portable and simple while achieving good performances [8].
Despite the obvious design differences (BOCHS beeing static based and QEMU dynamic), they both are an exact replica of the emulated CPU. That is, emulated code cannot make the difference between native on-cpu execution and an emulated one.
The processes case
Another example is processes [9]. Processes are fundamental for multiprogramming sys-tems (also known as multitask) [10]. They are the objects manipulated by the scheduler to implement context-switching. A process is an instance of a computer program being executed. It contains an image of the associated program, and a state of the underlying hardware:
• virtual interrupts (signals);
• virtual memory;
• virtual CPU (general purpose and status registers);
• hardware access through software interruptions.
Accessing hardware through interruptions characterizes it in term of features, not as a specified piece of hardware. For instance, performing a write syscall makes no difference whether the data will be written on a mechanical hard drive or on a USB flash drive. This turns the process into an abstraction of the hardware rather than a virtualized represen-tation.
Remark. virtual addresses can also be seen as an abstraction. On x86, PAE paging trans-lates 32-bit linear addresses to 52-bit physical addresses [11].
Example of abstractions in computing
We have seen that the limit between abstraction and virtualization can be thin. In this section, we will illustrate the use of abstractions in well known areas of computing.
TCP/IP
We use TCP over IP on a daily basis for many different usages like browsing internet, sending emails, synchronize agendas etc… TCP is a reliable transport protocol, built on top of IP, the network layer in the OSI stack (ISO/IEC 7498-1). It provides an end-to-end service to applications running on end hosts. If you send a message over TCP, it will eventually be delivered without being altered. IP on the other hand, provides an unreliable datagram service and must be implemented by all systems addressable on the Internet. TCP deals with problems such as packet loss, duplication, and reordering that are not detected nor corrected by the IP layer. It provides a reliable flow of data between two hosts. It is concerned with things such as dividing the data passed to it from the application into appropriately sized chunks for the network layer below, acknowledging received packets, and setting timeouts to ensure that the other end acknowledges packets that are sent, and because this reliable flow of data is provided by the transport layer, the application layer can ignore all these details [12].
TCP is what computer scientists like to call an abstraction: a simplification of something much more complicated that is going on under the covers [13]. It lets the user communicate with remote or local nodes without handling complex problems such as packets fragmen-tation, retransmission, reordering etc… Hence, TCP provides an abstraction for a reliable communication channel.
Operating systems
Operating systems can be depicted by two unrelated functions. Namely, extending the machine and managing resources [1]. The latter is not relevant for us now, since it only ensures (fair if possible) sharing of resources among different processes. The former, on the other hand is precisely an abstraction as defined by Def. 2. Fig. 2.2 illustrates a command run in an operating system. It looks very simple and straightforward. In fact, the operating system executes many operations to perform this simple task, as shown in Fig 2.3.
In this shell session, the user asks the system to display the content of the file /etc/passwd. The filename itself is an abstraction which exposes complex file layouts under a conveniant, human-readable file hierarchy. The user can browse this hierarchy using a path from the root item to the file separating each node from another with a delimiter. In fact, most filesystems address files using inode numbers. When typing this command, several opera-tions happen: the current shell will fork to create a new child process, then exec to replace the current (child) shell process by cat, with the parameter /etc/passwd. Cat will then open (2) /etc/passwd, read (2) its content, and write (2) it back on its standard out-put, and finally close (2) the file. That abstraction is provided by processes, as claimed in 2.1.2.3 (p 22). The operating system’s abstraction is hidden behind the read implemen-tation. For this example, we assume that /etc/passwd is actually stored on disk (and not on an NFS share for instance). As shown in Figure 2.3, there are quite many functions called by sys_read. The kernel provides an abstraction of the underlying filsystem (so that read can work on every filesystem). The common part is handled by the VFS layer. Afterwards, the VFS deals with memory pages, and constructs bio which are gathered in requests. Later on, those requests are submitted to the disk. Disk specificities (cache size, buffer size, mechanical or flash-based) are handled by lower level code.
Table of contents :
1 Introduction
1.1 Context
1.2 Claim
1.3 Document structure
2 State of the art
2.1 Lexicon
2.1.1 Virtualization or abstraction?
2.1.2 Example of virtual systems in computing
2.1.3 Example of abstractions in computing
2.2 System virtualization mechanisms
2.2.1 Prologue
2.2.2 Software approach
2.2.3 Hardware approach
2.2.4 Hybrid approaches
2.2.5 Summary
2.3 Certified systems and Common Criteria
2.3.1 Landscape
2.3.2 Common Criteria
3 Problem statement
3.1 Hypervisors and security
3.1.1 Introduction
3.1.2 How does virtualization instructions enable security?
3.1.3 How do formal methods provide additional security?
3.1.4 Existing hypervisors with security focus
3.2 Software security and certified products
3.2.1 A market requirement
3.2.2 Certified products
3.3 Certification’s impact on design
3.3.1 How to gain confidence in software?
3.3.2 Issues with end to end proof
3.4 A new architecture for embedded hypervisors
3.4.1 What is to be formalized?
3.4.2 Factorizing formalization: confound ISA and hardware specification
3.4.3 Factorizing the interpreter by executing code directly on the CPU
3.5 Discussion
4 Contribution
4.1 Prototype
4.1.1 Foreword: general design of the VMM
4.1.2 Design
4.1.3 Analysing guest code
4.1.4 Interrupt handling
4.1.5 Tracking privileges changes
4.1.6 Metrics
4.2 Performance overview
4.2.1 Benchmarks
4.2.2 Evaluation
4.2.3 Results
4.2.4 Analysis
4.3 Hypervising cost – Micro benchmarks overview
4.3.1 Benchmarks
4.3.2 Results
4.3.3 Discussion
4.4 Conclusion
5 In depth implementation details
5.1 General concepts about the ARM architecture and instruction set
5.1.1 The ARM architecture overview
5.1.2 The ARM instruction set families
5.1.3 The ARM instruction set encoding
5.2 Analyser’s implementation: the instruction scanner
5.2.1 Instruction behaviors
5.2.2 Instruction matching
5.3 Arbitration’s implementation
5.3.1 Trap to the hypervisor: the callback mechanism
5.3.2 Tracking indirect branches
5.3.3 Condition validation
5.3.4 Conditional blocks
5.3.5 Summary
5.4 Example on a simple guest
5.4.1 description
5.5 Summary
6 Conclusion and future work
6.1 Synthesis
6.2 Future work
6.2.1 Increasing the hypervisor performances
6.2.2 Reducing the context-switch cost
6.2.3 Expectations and foreseeable issues with a formal verification
6.2.4 Security assessment
6.3 Personal feedback