Jonathan Worthington :: jnthn.net

Virtual Machine Glossary

The area of virtual machines has lots of bits of terminology. This page will hopefully demistify some of it.

Index

Either select the term you're interested in, or jump to the start of the list if you want to read everything.

G J O R S

Garbage Collection

Virtual machines for high level languages usually provide memory management support in the form of a garbage collector. Instead of having to explicitly free memory when it is no longer in use, GC enables this to happen automatically, with some loss of control over just how soon the memory is freed.

In general, garbage collectors decide what is no longer in use by starting with a root set of objects (using object to mean an area of memory that is used for something) that are known to be in use and exploring what other objects are referenced from by them. This is done recursively, until all objects that are still reachable (that is, referenced directly or indirectly from the root set) have been found. This phase is sometimes called dead object detection or marking.

Objects that are found to be unreachable are then freed up so that the memory may be used for other purposes. Some GC implementations do this as a distinct phase from dead object detection, while others always do the two together.

Back To Top

JIT

JIT is short for "Just In Time" and is often followed by the word "compiler". A JIT compiler takes the virtual machine's instructions and translates them into machine code for the current hardware platform. This is done as a block of code is needed, though what constitutes a block varies widely; it may be a single method/subroutine or a much larger unit of code. Then, when that block of code is to be executed, the CPU does a jump to the start of the machine code. The memory holding the code may need to be marked as executable for this to work.

JIT enables one or more virtual machine instructions to map to a single hardware instruction, allowing for performance approaching that offerred by compiled languages. There is no need to JIT every single instruction; it is possible to call the normal routines that a normal interpreter would call.

JIT compilation has the notable overhead of the time taken to produce the machine code. In some cases, this can outweigh the gains made by being able to execute the instructions more quickly. Research suggests that keeping statistics on code usage or using other heuristics to determine whether or not to JIT code may be worthwhile.

As the idea is to speed up execution, machine code produced by a JIT compiler tends not to be optimized as this would take time. Using usage statistics to decide whether or not to optimize the machine code is a possible strategy. It is possible that, as the JIT compiler has runtime information available to it to assist with making optimizations, one day virtual machines could out-perform traditional native code applications.

Back To Top

Opcode

The term opcode, short for operation code and sometimes shortened to op, tends to refer to an instruction in the virtual machine's instruction set.

Back To Top

Reference Counting

Reference counting is one method of implementing garbage collection. The basic idea is that every object in memory has a reference count, which is a number specifying how many things are referencing that object. When the count drops to zero, that means that there is no possible way that the object could be alive and therefore the memory associated with it can be reclaimed and used for other things.

One problem with reference counting is that if an object A references another object B and B also references A and nothing else references either of them, they still both have a reference count of 1, so never get freed. In addition, there is a burden accross the virtual machine's source code to always maintain the reference counts. The runtime costs of doing so aside, it means the complexity for memory management is spread throughout the codebase. Other schemes for garbage collection isolate the complexity to a small chunk of code.

Back To Top

Register Machine

Some virtual machiens are implemented as register machines. Registers are small, numbered and fast-access areas of storage that are used to hold working data. Opcodes consist of the instruction name and a number of registers; for example, the add op might specify 3 registers: the destination register for the result of the calculation and 2 source registers for the numbers that are being added together. An example of a virtual machine that has a register based architecture is Parrot, which has registers of four types. Other virtual machines choose to work as a stack machine.

Back To Top

Stack Frame

A stack is commonly used to keep track of what functions or methods have been called. Every time another one is called, data relating to the call is added to the top of the stack. This set data is called a stack frame. It generally has a structure as shown in the diagram below.

+---------------------+ <-- Top of the stack | | | Evaluation stack | | | +---------------------+ | | | Local Variables | | | +---------------------+ | Last Stack Frame | +---------------------+ | Return Address | +---------------------+ | | | Arguments | | | +---------------------+

For virtual machines that pass parameters using the stack, the first thing that appears in the stack frame are the arguments that are being passed. The address in the code to return to after the function has been executed and a reference to where on the stack the previous stack frame started usually follow this, in some order. After this is space for storing any local variables in the function that is being called. Stack space beyond that may be used for evaluating the program in the case the virtual machine is a stack machine, or used by some register machines when they run out of registers (but not all).

Back To Top

Stack Machine

Some virtual machiens are implemented as stack machines. A stack is a data structure where items are pushed onto and popped off the top, just like a real-life stack of items. In reality, certain operations also look at items further down the stack.

Instructions for stack machines all do stack manipulations. For example, there is an instruction to load some data from memory and put it on the top of the stack, and a similar instruction to take data off the stack and store it. To add two numbers together, the two numbers would be pushed onto the stack and then the add instruction would be carried out. The add instruction would take the two numbers on the top of the stack, removing them from the stack in the process, add them together and put the result on the stack.

Virtual machines that are not stack based (e.g. those that are register machines) may also use a stack for other purposes. Examples of virtual machines that are stack based include the JVM (Java Virtual Machine) and the .NET CLI.

Back To Top

Stack Walking

Stack walking is usually done as part of garbage collection to find the root set of objects that are still reachable. The stack is made up of stack frames, which will include references to objects in memory (for this is where local variables are kept). Walking the stack simply involves looking at every location of the stack between two points (the current stack top and a position that is considered to be the bottom). Conservative garbage collectors have to treat every single value on the stack that matches the address of an object as a reference to it, even though it may just be another item of data that happens to be of the same value. Fixed garbage collectors know what the types of values on the stack are and only needs to consider those that are certainly references to other objects.

Back To Top

All content Copyright (C) Jonathan Worthington 2003-2005 unless otherwise stated.