Computing in the real world
SEARCH FOR: IN:
Guest  Level 00    Register Log in

Real World Computing

Where did the speed go?

5th March 2008 [PC Pro]

There's a need for a utility that analyses system performance more rigorously and in more detail (that's where the devil is said to lurk). Consider an application that manipulates a large collection of files: conventional performance tools will show the disk in heavy use and may even suggest that disk I/O is the problem, but it won't show whether the specific application is at fault and, if so, which part of it. Is the app opening too many files, or accessing some particular file more often than others? This style of analysis means we need to see what's going on under the hood of the application - what files is it opening and how much data is it reading from or writing to each one? Most Unix systems have an application called "truss" that will show what system calls an application is making, but this sort of information is too diffuse to be much use.

Dtrace

A modern operating system, whether Windows or some Unix variant, maintains a large collection of counters about its various operations. We use these counters to reveal the external performance of the system, but the operating system itself also uses them to tailor its own behaviour. Within the operating system, there's always a crucial component called the scheduler, which manages how much time on the processors (or processor cores) each process will get compared with others. The most refined of these scheduling modules are referred to as "fair share" schedulers, based on a set of design ideas developed in the 1970s.

The problem with fair-share schedulers is that the schedule they generate is based on information collected by counters, which makes the scheduling process quite complex. Aside from updating its counters, the operating system has to handle a huge number of events. Processes are continually asking it to do something such as read from a disk, peripherals are announcing that they've done something such as transferring a disk block into main memory, and internal decisions - such as which process to run next - have to be taken. Machine performance will be related to how these events get ordered in time with respect to changes in the counters, and understanding this whole sequence is virtually impossible. A different approach is required; namely, to develop a theory of the operation of the system and then apply this theory to the system.

This requires more proactive system administration - the system administrator no longer just sits there ploughing through pages of numbers and graphs, but must attempt to develop a theory for the system's operation and then go out to prove this theory correct. At its highest level and purest form, it's a language for developing and proving such theories.

Dtrace is just such a programming language for monitoring systems. The idea is that as a system runs, it carries out actions that generate events; dtrace is based on the idea of probes, which are handlers for these events. Put simply, your dtrace program creates probes that fire whenever a particular event occurs, and these probes can report more than just the fact that the event occurred - they can collect information about the event. Depending on the probe being monitored, a number of different types of information can be returned. For example, if you're probing a system call then all the arguments to that call will be available. Kernel modules and some other programs provide sets of probes and so are known as providers - a dtrace program can associate actions with the probes implemented by providers.

Continued....