Chapter 3. Understanding How SystemTap Works

SystemTap allows users to write and reuse simple scripts to deeply examine the activities of a running Linux system. These scripts can be designed to extract data, filter it, and summarize it quickly (and safely), enabling the diagnosis of complex performance (or even functional) problems.

The essential idea behind a SystemTap script is to name events, and to give them handlers. When SystemTap runs the script, SystemTap monitors for the event; once the event occurs, the Linux kernel then runs the handler as a quick sub-routine, then resumes.

There are several kind of events; entering/exiting a function, timer expiration, session termination, etc. A handler is a series of script language statements that specify the work to be done whenever the event occurs. This work normally includes extracting data from the event context, storing them into internal variables, and printing results.

3.1. Architecture

A SystemTap session begins when you run a SystemTap script. This session occurs in the following fashion:

Procedure 3.1. SystemTap Session

First, SystemTap checks the script against the existing tapset library (normally in /usr/share/systemtap/tapset/ for any tapsets used. SystemTap will then substitute any located tapsets with their corresponding definitions in the tapset library.
SystemTap then translates the script to C, running the system C compiler to create a kernel module from it. The tools that perform this step are contained in the systemtap package (refer to Section 2.1.1, "Installing SystemTap" for more information).
SystemTap loads the module, then enables all the probes (events and handlers) in the script. The staprun in the systemtap-runtime package (refer to Section 2.1.1, "Installing SystemTap" for more information) provides this functionality.
As the events occur, their corresponding handlers are executed.
Once the SystemTap session is terminated, the probes are disabled, and the kernel module is unloaded.

This sequence is driven from a single command-line program: stap. This program is SystemTap's main front-end tool. For more information about stap, refer to man stap (once SystemTap is properly installed on your machine).

3.2. SystemTap Scripts

For the most part, SystemTap scripts are the foundation of each SystemTap session. SystemTap scripts instruct SystemTap on what type of information to collect, and what to do once that information is collected.

As stated in Chapter 3, Understanding How SystemTap Works, SystemTap scripts are made up of two components: events and handlers. Once a SystemTap session is underway, SystemTap monitors the operating system for the specified events and executes the handlers as they occur.

Note

An event and its corresponding handler is collectively called a probe. A SystemTap script can have multiple probes.

A probe's handler is commonly referred to as a probe body.

In terms of application development, using events and handlers is similar to instrumenting the code by inserting diagnostic print statements in a program's sequence of commands. These diagnostic print statements allow you to view a history of commands executed once the program is run.

SystemTap scripts allow insertion of the instrumentation code without recompilation of the code and allows more flexibility with regard to handlers. Events serve as the triggers for handlers to run; handlers can be specified to record specified data and print it in a certain manner.

Format

SystemTap scripts use the file extension .stp, and contains probes written in the following format:

probeevent {statements}

SystemTap supports multiple events per probe; multiple events are delimited by a comma (,). If multiple events are specified in a single probe, SystemTap will execute the handler when any of the specified events occur.

Each probe has a corresponding statement block. This statement block is enclosed in braces ({ }) and contains the statements to be executed per event. SystemTap executes these statements in sequence; special separators or terminators are generally not necessary between multiple statements.

Note

Statement blocks in SystemTap scripts follow the same syntax and semantics as the C programming language. A statement block can be nested within another statement block.

Systemtap allows you to write functions to factor out code to be used by a number of probes. Thus, rather than repeatedly writing the same series of statements in multiple probes, you can just place the instructions in a function, as in:

function function_name(arguments) {statements}probe event {function_name(arguments)}

The statements in function_name are executed when the probe for event executes. The arguments are optional values passed into the function.

Important

Section 3.2, "SystemTap Scripts" is designed to introduce readers to the basics of SystemTap scripts. To understand SystemTap scripts better, it is advisable that you refer to Chapter 4, Useful SystemTap Scripts; each section therein provides a detailed explanation of the script, its events, handlers, and expected output.

3.2.1. Event

SystemTap events can be broadly classified into two types: synchronous and asynchronous.

Synchronous Events

A synchronous event occurs when any process executes an instruction at a particular location in kernel code. This gives other events a reference point from which more contextual data may be available.

Examples of synchronous events include:

syscall.system_call

The entry to the system call system_call. If the exit from a syscall is desired, appending a .return to the event monitor the exit of the system call instead. For example, to specify the entry and exit of the system call close, use syscall.close and syscall.close.return respectively.

vfs.file_operation

The entry to the file_operation event for Virtual File System (VFS). Similar to syscall event, appending a .return to the event monitors the exit of the file_operation operation.

kernel.function("function")

The entry to the kernel function function. For example, kernel.function("sys_open") refers to the "event" that occurs when the kernel function sys_open is called by any thread in the system. To specify the return of the kernel function sys_open, append the return string to the event statement; i.e. kernel.function("sys_open").return.

When defining probe events, you can use asterisk (*) for wildcards. You can also trace the entry or exit of a function in a kernel source file. Consider the following example:

Example 3.1. wildcards.stp

probe kernel.function("*@net/socket.c") { }probe kernel.function("*@net/socket.c").return { }

In the previous example, the first probe's event specifies the entry of ALL functions in the kernel source file net/socket.c. The second probe specifies the exit of all those functions. Note that in this example, there are no statements in the handler; as such, no information will be collected or displayed.

kernel.trace("tracepoint")

The static probe for tracepoint. Recent kernels (2.6.30 and newer) include instrumentation for specific events in the kernel. These events are statically marked with tracepoints. One example of a tracepoint available in systemtap is kernel.trace("kfree_skb") which indicates each time a network buffer is freed in the kernel.

module("module").function("function")

Allows you to probe functions within modules. For example:

Example 3.2. moduleprobe.stp

probe module("ext3").function("*") { }probe module("ext3").function("*").return { }

The first probe in Example 3.2, "moduleprobe.stp" points to the entry of all functions for the ext3 module. The second probe points to the exits of all functions for that same module; the use of the .return suffix is similar to kernel.function(). Note that the probes in Example 3.2, "moduleprobe.stp" do not contain statements in the probe handlers, and as such will not print any useful data (as in Example 3.1, "wildcards.stp").

A system's kernel modules are typically located in /lib/modules/kernel_version, where kernel_version refers to the currently loaded kernel version. Modules use the file name extension .ko.

Asynchronous Events

Asynchronous events are not tied to a particular instruction or location in code. This family of probe points consists mainly of counters, timers, and similar constructs.

Examples of asynchronous events include:

begin

The startup of a SystemTap session; i.e. as soon as the SystemTap script is run.

end

The end of a SystemTap session.

timer events

An event that specifies a handler to be executed periodically. For example:

Example 3.3. timer-s.stp

probe timer.s(4){  printf("hello world\n")}

Example 3.3, "timer-s.stp" is an example of a probe that prints hello world every 4 seconds. Note that you can also use the following timer events:

timer.ms(milliseconds)
timer.us(microseconds)
timer.ns(nanoseconds)
timer.hz(hertz)
timer.jiffies(jiffies)

When used in conjunction with other probes that collect information, timer events allows you to print out get periodic updates and see how that information changes over time.

Important

SystemTap supports the use of a large collection of probe events. For more information about supported events, refer to man stapprobes. The SEE ALSO section of man stapprobes also contains links to other man pages that discuss supported events for specific subsystems and components.

3.2.2. Systemtap Handler/Body

Consider the following sample script:

Example 3.4. helloworld.stp

probe begin{  printf ("hello world\n")  exit ()}

In Example 3.4, "helloworld.stp", the event begin (i.e. the start of the session) triggers the handler enclosed in { }, which simply prints hello world followed by a new-line, then exits.

Note

SystemTap scripts continue to run until the exit() function executes. If the users wants to stop the execution of the script, it can interrupted manually with Ctrl+C.

printf ( ) Statements

The printf () statement is one of the simplest functions for printing data. printf () can also be used to display data using a wide variety of SystemTap functions in the following format:

printf ("format string\n", arguments)

The format string specifies how arguments should be printed. The format string of Example 3.4, "helloworld.stp" simply instructs SystemTap to print hello world, and contains no format specifiers.

You can use the format specifiers %s (for strings) and %d (for numbers) in format strings, depending on your list of arguments. Format strings can have multiple format specifiers, each matching a corresponding argument; multiple arguments are delimited by a comma (,).

Note

Semantically, the SystemTap printf function is very similar to its C language counterpart. The aforementioned syntax and format for SystemTap's printf function is identical to that of the C-style printf.

To illustrate this, consider the following probe example:

Example 3.5. variables-in-printf-statements.stp

probe syscall.open{  printf ("%s(%d) open\n", execname(), pid())}

Example 3.5, "variables-in-printf-statements.stp" instructs SystemTap to probe all entries to the system call open; for each event, it prints the current execname() (a string with the executable name) and pid() (the current process ID number), followed by the word open. A snippet of this probe's output would look like:

vmware-guestd(2206) openhald(2360) openhald(2360) openhald(2360) opendf(3433) opendf(3433) opendf(3433) openhald(2360) open

SystemTap Functions

SystemTap supports a wide variety of functions that can be used as printf () arguments. Example 3.5, "variables-in-printf-statements.stp" uses the SystemTap functions execname() (name of the process that called a kernel function/performed a system call) and pid() (current process ID).

The following is a list of commonly-used SystemTap functions:

tid()

The ID of the current thread.

uid()

The ID of the current user.

cpu()

The current CPU number.

gettimeofday_s()

The number of seconds since UNIX epoch (January 1, 1970).

ctime()

Convert number of seconds since UNIX epoch to date.

pp()

A string describing the probe point currently being handled.

thread_indent()

This particular function is quite useful, providing you with a way to better organize your print results. The function takes one argument, an indentation delta, which indicates how many spaces to add or remove from a thread's "indentation counter". It then returns a string with some generic trace data along with an appropriate number of indentation spaces.

The generic data included in the returned string includes a timestamp (number of microseconds since the first call to thread_indent() by the thread), a process name, and the thread ID. This allows you to identify what functions were called, who called them, and the duration of each function call.

If call entries and exits immediately precede each other, it is easy to match them. However, in most cases, after a first function call entry is made several other call entries and exits may be made before the first call exits. The indentation counter helps you match an entry with its corresponding exit by indenting the next function call if it is not the exit of the previous one.

Consider the following example on the use of thread_indent():

Example 3.6. thread_indent.stp

probe kernel.function("*@net/socket.c") {  printf ("%s -> %s\n", thread_indent(1), probefunc())}probe kernel.function("*@net/socket.c").return {  printf ("%s <- %s\n", thread_indent(-1), probefunc())}

Example 3.6, "thread_indent.stp" prints out the thread_indent() and probe functions at each event in the following format:

0 ftp(7223): -> sys_socketcall1159 ftp(7223):  -> sys_socket2173 ftp(7223):   -> __sock_create2286 ftp(7223): -> sock_alloc_inode2737 ftp(7223): <- sock_alloc_inode3349 ftp(7223): -> sock_alloc3389 ftp(7223): <- sock_alloc3417 ftp(7223):   <- __sock_create4117 ftp(7223):   -> sock_create4160 ftp(7223):   <- sock_create4301 ftp(7223):   -> sock_map_fd4644 ftp(7223): -> sock_map_file4699 ftp(7223): <- sock_map_file4715 ftp(7223):   <- sock_map_fd4732 ftp(7223):  <- sys_socket4775 ftp(7223): <- sys_socketcall

This sample output contains the following information:

The time (in microseconds) since the initial thread_indent() call for the thread (included in the string from thread_indent()).
The process name (and its corresponding ID) that made the function call (included in the string from thread_indent()).
An arrow signifying whether the call was an entry (<-) or an exit (->); the indentations help you match specific function call entries with their corresponding exits.
The name of the function called by the process.

name

Identifies the name of a specific system call. This variable can only be used in probes that use the event syscall.system_call.

target()

Used in conjunction with stap script -x process ID or stap script -c command. If you want to specify a script to take an argument of a process ID or command, use target() as the variable in the script to refer to it. For example:

Example 3.7. targetexample.stp

probe syscall.* {  if (pid() == target()) printf("%s/n", name)}

When Example 3.7, "targetexample.stp" is run with the argument -x process ID, it watches all system calls (as specified by the event syscall.*) and prints out the name of all system calls made by the specified process.

This has the same effect as specifying if (pid() == process ID) each time you wish to target a specific process. However, using target() makes it easier for you to re-use the script, giving you the ability to simply pass a process ID as an argument each time you wish to run the script (e.g. stap targetexample.stp -x process ID).

For more information about supported SystemTap functions, refer to man stapfuncs.

3.3. Basic SystemTap Handler Constructs

SystemTap supports the use of several basic constructs in handlers. The syntax for most of these handler constructs are mostly based on C and awk syntax. This section describes several of the most useful SystemTap handler constructs, which should provide you with enough information to write simple yet useful SystemTap scripts.

3.3.1. Variables

Variables can be used freely throughout a handler; simply choose a name, assign a value from a function or expression to it, and use it in an expression. SystemTap automatically identifies whether a variable should be typed as a string or integer, based on the type of the values assigned to it. For instance, if you use set the variable foo to gettimeofday_s() (as in foo = gettimeofday_s()), then foo is typed as a number and can be printed in a printf() with the integer format specifier (%d).

Note, however, that by default variables are only local to the probe they are used in. This means that variables are initialized, used and disposed at each probe handler invocation. To share a variable between probes, declare the variable name using global outside of the probes. Consider the following example:

Example 3.8. timer-jiffies.stp

global count_jiffies, count_msprobe timer.jiffies(100) { count_jiffies ++ }probe timer.ms(100) { count_ms ++ }probe timer.ms(12345){  hz=(1000*count_jiffies) / count_ms  printf ("jiffies:ms ratio %d:%d => CONFIG_HZ=%d\n", count_jiffies, count_ms, hz)  exit ()}

Example 3.8, "timer-jiffies.stp" computes the CONFIG_HZ setting of the kernel using timers that count jiffies and milliseconds, then computing accordingly. The global statement allows the script to use the variables count_jiffies and count_ms (set in their own respective probes) to be shared with probe timer.ms(12345).

Note

The ++ notation in Example 3.8, "timer-jiffies.stp" (i.e. count_jiffies ++ and count_ms ++) is used to increment the value of a variable by 1. In the following probe, count_jiffies is incremented by 1 every 100 jiffies:

probe timer.jiffies(100) { count_jiffies ++ }

In this instance, SystemTap understands that count_jiffies is an integer. Because no initial value was assigned to count_jiffies, its initial value is zero by default.

3.3.2. Conditional Statements

In some cases, the output of a SystemTap script may be too big. To address this, you need to further refine the script's logic in order to delimit the output into something more relevant or useful to your probe.

You can do this by using conditionals in handlers. SystemTap accepts the following types of conditional statements:

If/Else Statements

Format:

if (condition)  statement1else  statement2

The statement1 is executed if the condition expression is non-zero. The statement2 is executed if the condition expression is zero. The else clause (else statement2) is optional. Both statement1 and statement2 can be statement blocks.

Example 3.9. ifelse.stp

global countread, countnonreadprobe kernel.function("vfs_read"),kernel.function("vfs_write"){  if (probefunc()=="vfs_read") countread ++   else countnonread ++}probe timer.s(5) { exit() }probe end {  printf("VFS reads total %d\n VFS writes total %d\n", countread, countnonread)}

Example 3.9, "ifelse.stp" is a script that counts how many virtual file system reads (vfs_read) and writes (vfs_write) the system performs within a 5-second span. When run, the script increments the value of the variable countread by 1 if the name of the function it probed matches vfs_read (as noted by the condition if (probefunc()=="vfs_read")); otherwise, it increments countnonread (else {countnonread ++}).

While Loops

Format:

while (condition)  statement

So long as condition is non-zero the block of statements in statement are executed. The statement is often a statement block and it must change a value so condition will eventually be zero.

For Loops

Format:

for (initialization; conditional; increment) statement

The for loop is simply shorthand for a while loop. The following is the equivalent while loop:

initializationwhile (conditional) {   statement   increment}

Conditional Operators

Aside from == ("is equal to"), you can also use the following operators in your conditional statements:

>=: Greater than or equal to
<=: Less than or equal to
!=: Is not equal to

3.3.3. Command-Line Arguments

You can also allow a SystemTap script to accept simple command-line arguments using a $ or @ immediately followed by the number of the argument on the command line. Use $ if you are expecting the user to enter an integer as a command-line argument, and @ if you are expecting a string.

Example 3.10. commandlineargs.stp

probe kernel.function(@1) { }probe kernel.function(@1).return { }

Example 3.10, "commandlineargs.stp" is similar to Example 3.1, "wildcards.stp", except that it allows you to pass the kernel function to be probed as a command-line argument (as in stap commandlineargs.stp kernel function). You can also specify the script to accept multiple command-line arguments, noting them as @1, @2, and so on, in the order they are entered by the user.

3.4. Associative Arrays

SystemTap also supports the use of associative arrays. While an ordinary variable represents a single value, associative arrays can represent a collection of values. Simply put, an associative array is a collection of unique keys; each key in the array has a value associated with it.

Since associative arrays are normally processed in multiple probes (as we will demonstrate later), they should be declared as global variables in the SystemTap script. The syntax for accessing an element in an associative array is similar to that of awk, and is as follows:

array_name[index_expression]

Here, the array_name is any arbitrary name the array uses. The index_expression is used to refer to a specific unique key in the array. To illustrate, let us try to build an array named foo that specifies the ages of three people (i.e. the unique keys): tom, dick, and harry. To assign them the ages (i.e. associated values) of 23, 24, and 25 respectively, we'd use the following array statements:

Example 3.11. Basic Array Statements

foo["tom"] = 23foo["dick"] = 24foo["harry"] = 25

You can specify up to nine index expressons in an array statement, each one delimited by a comma (,). This is useful if you wish to have a key that contains multiple pieces of information. The following line from disktop.stp uses 5 elements for the key: process ID, executable name, user ID, parent process ID, and string "W". It associates the value of devname with that key.

device[pid(),execname(),uid(),ppid(),"W"] = devname

Important

All associate arrays must be declared as global, regardless of whether the associate array is used in one or multiple probes.

3.5. Array Operations in SystemTap

This section enumerates some of the most commonly used array operations in SystemTap.

3.5.1. Assigning an Associated Value

Use = to set an associated value to indexed unique pairs, as in:

array_name[index_expression] = value

Example 3.11, "Basic Array Statements" shows a very basic example of how to set an explicit associated value to a unique key. You can also use a handler function as both your index_expression and value. For example, you can use arrays to set a timestamp as the associated value to a process name (which you wish to use as your unique key), as in:

Example 3.12. Associating Timestamps to Process Names

foo[tid()] = gettimeofday_s()

Whenever an event invokes the statement in Example 3.12, "Associating Timestamps to Process Names", SystemTap returns the appropriate tid() value (i.e. the ID of a thread, which is then used as the unique key). At the same time, SystemTap also uses the function gettimeofday_s() to set the corresponding timestamp as the associated value to the unique key defined by the function tid(). This creates an array composed of key pairs containing thread IDs and timestamps.

In this same example, if tid() returns a value that is already defined in the array foo, the operator will discard the original associated value to it, and replace it with the current timestamp from gettimeofday_s().

3.5.2. Reading Values From Arrays

You can also read values from an array the same way you would read the value of a variable. To do so, include the array_name[index_expression] statement as an element in a mathematical expression. For example:

Example 3.13. Using Array Values in Simple Computations

delta = gettimeofday_s() - foo[tid()]

This example assumes that the array foo was built using the construct in Example 3.12, "Associating Timestamps to Process Names" (from Section 3.5.1, "Assigning an Associated Value"). This sets a timestamp that will serve as a reference point, to be used in computing for delta.

The construct in Example 3.13, "Using Array Values in Simple Computations" computes a value for the variable delta by subtracting the associated value of the key tid() from the current gettimeofday_s(). The construct does this by reading the value of tid() from the array. This particular construct is useful for determining the time between two events, such as the start and completion of a read operation.

Note

If the index_expression cannot find the unique key, it returns a value of 0 (for numerical operations, such as Example 3.13, "Using Array Values in Simple Computations") or a null/empty string value (for string operations) by default.

3.5.3. Incrementing Associated Values

Use ++ to increment the associated value of a unique key in an array, as in:

array_name[index_expression] ++

Again, you can also use a handler function for your index_expression. For example, if you wanted to tally how many times a specific process performed a read to the virtual file system (using the event vfs.read), you can use the following probe:

Example 3.14. vfsreads.stp

probe vfs.read{  reads[execname()] ++}

In Example 3.14, "vfsreads.stp", the first time that the probe returns the process name gnome-terminal (i.e. the first time gnome-terminal performs a VFS read), that process name is set as the unique key gnome-terminal with an associated value of 1. The next time that the probe returns the process name gnome-terminal, SystemTap increments the associated value of gnome-terminal by 1. SystemTap performs this operation for all process names as the probe returns them.

3.5.4. Processing Multiple Elements in an Array

Once you've collected enough information in an array, you will need to retrieve and process all elements in that array to make it useful. Consider Example 3.14, "vfsreads.stp": the script collects information about how many VFS reads each process performs, but does not specify what to do with it. The obvious means for making Example 3.14, "vfsreads.stp" useful is to print the key pairs in the array reads, but how?

The best way to process all key pairs in an array (as an iteration) is to use the foreach statement. Consider the following example:

Example 3.15. cumulative-vfsreads.stp

global readsprobe vfs.read{   reads[execname()] ++}probe timer.s(3){  foreach (count in reads) printf("%s : %d \n", count, reads[count])}

In the second probe of Example 3.15, "cumulative-vfsreads.stp", the foreach statement uses the variable count to reference each iteration of a unique key in the array reads. The reads[count] array statement in the same probe retrieves the associated value of each unique key.

Given what we know about the first probe in Example 3.15, "cumulative-vfsreads.stp", the script prints VFS-read statistics every 3 seconds, displaying names of processes that performed a VFS-read along with a corresponding VFS-read count.

Now, remember that the foreach statement in Example 3.15, "cumulative-vfsreads.stp" prints all iterations of process names in the array, and in no particular order. You can instruct the script to process the iterations in a particular order by using + (ascending) or - (descending). In addition, you can also limit the number of iterations the script needs to process with the limit value option.

For example, consider the following replacement probe:

probe timer.s(3){  foreach (count in reads- limit 10) printf("%s : %d \n", count, reads[count])}

This foreach statement instructs the script to process the elements in the array reads in descending order (of associated value). The limit 10 option instructs the foreach to only process the first ten iterations (i.e. print the first 10, starting with the highest value).

3.5.5. Clearing/Deleting Arrays and Array Elements

Sometimes, you may need to clear the associated values in array elements, or reset an entire array for re-use in another probe. Example 3.15, "cumulative-vfsreads.stp" in Section 3.5.4, "Processing Multiple Elements in an Array" allows you to track how the number of VFS reads per process grows over time, but it does not show you the number of VFS reads each process makes per 3-second period.

To do that, you will need to clear the values accumulated by the array. You can accomplish this using the delete operator to delete elements in an array, or an entire array. Consider the following example:

Example 3.16. noncumulative-vfsreads.stp

global readsprobe vfs.read{   reads[execname()] ++}probe timer.s(3){  foreach (count in reads) printf("%s : %d \n", count, reads[count])  delete reads}

In Example 3.16, "noncumulative-vfsreads.stp", the second probe prints the number of VFS reads each process made within the probed 3-second period only. The delete reads statement clears the reads array within the probe.

Note

You can have multiple array operations within the same probe. Using the examples from Section 3.5.4, "Processing Multiple Elements in an Array" and Section 3.5.5, "Clearing/Deleting Arrays and Array Elements" , you can track the number of VFS reads each process makes per 3-second period and tally the cumulative VFS reads of those same processes. Consider the following example:

global reads, totalreadsprobe vfs.read{  reads[execname()] ++  totalreads[execname()] ++}probe timer.s(3){  printf("=======\n")  foreach (count in reads-) printf("%s : %d \n", count, reads[count])  delete reads}probe end{  printf("TOTALS\n")  foreach (total in totalreads-) printf("%s : %d \n", total, totalreads[total])}

In this example, the arrays reads and totalreads track the same information, and are printed out in a similar fashion. The only difference here is that reads is cleared every 3-second period, whereas totalreads keeps growing.

3.5.6. Using Arrays in Conditional Statements

You can also use associative arrays in if statements. This is useful if you want to execute a subroutine once a value in the array matches a certain condition. Consider the following example:

Example 3.17. vfsreads-print-if-1kb.stp

global readsprobe vfs.read{  reads[execname()] ++}probe timer.s(3){  printf("=======\n")  foreach (count in reads-) if (reads[count] >= 1024)  printf("%s : %dkB \n", count, reads[count]/1024) else  printf("%s : %dB \n", count, reads[count])}

Every three seconds, Example 3.17, "vfsreads-print-if-1kb.stp" prints out a list of all processes, along with how many times each process performed a VFS read. If the associated value of a process name is equal or greater than 1024, the if statement in the script converts and prints it out in kB.

Testing for Membership

You can also test whether a specific unique key is a member of an array. Further, membership in an array can be used in if statements, as in:

if([index_expression] in array_name) statement

To illustrate this, consider the following example:

Example 3.18. vfsreads-stop-on-stapio2.stp

global readsprobe vfs.read{  reads[execname()] ++}probe timer.s(3){  printf("=======\n")  foreach (count in reads+) printf("%s : %d \n", count, reads[count])  if(["stapio"] in reads) { printf("stapio read detected, exiting\n") exit()  }}

The if(["stapio"] in reads) statement instructs the script to print stapio read detected, exiting once the unique key stapio is added to the array reads.

3.5.7. Computing for Statistical Aggregates

Statistical aggregates are used to collect statistics on numerical values where it is important to accumulate new data quickly and in large volume (i.e. storing only aggregated stream statistics). Statistical aggregates can be used in global variables or as elements in an array.

To add value to a statistical aggregate, use the operator <<< value.

Example 3.19. stat-aggregates.stp

global readsprobe vfs.read{  reads[execname()] <<< count}

In Example 3.19, "stat-aggregates.stp", the operator <<< count stores the amount returned by count to the associated value of the corresponding execname() in the reads array. Remember, these values are stored; they are not added to the associated values of each unique key, nor are they used to replace the current associated values. In a manner of speaking, think of it as having each unique key (execname()) having multiple associated values, accumulating with each probe handler run.

Note

In the context of Example 3.19, "stat-aggregates.stp", count returns the amount of data written by the returned execname() to the virtual file system.

To extract data collected by statistical aggregates, use the syntax format @extractor(variable/array index expression). extractor can be any of the following integer extractors:

count: Returns the number of all values stored into the variable/array index expression. Given the sample probe in Example 3.19, "stat-aggregates.stp", the expression @count(writes[execname()]) will return how many values are stored in each unique key in array writes.
sum: Returns the sum of all values stored into the variable/array index expression. Again, given sample probe in Example 3.19, "stat-aggregates.stp", the expression @sum(writes[execname()]) will return the total of all values stored in each unique key in array writes.
min: Returns the smallest among all the values stored in the variable/array index expression.
max: Returns the largest among all the values stored in the variable/array index expression.
avg: Returns the average of all values stored in the variable/array index expression.

When using statistical aggregates, you can also build array constructs that use multiple index expressions (to a maximum of 5). This is helpful in capturing additional contextual information during a probe. For example:

Example 3.20. Multiple Array Indexes

global readsprobe vfs.read{  reads[execname(),pid()] <<< 1}probe timer.s(3){  foreach([var1,var2] in reads) printf("%s (%d) : %d \n", var1, var2, @count(reads[var1,var2]))}

In Example 3.20, "Multiple Array Indexes", the first probe tracks how many times each process performs a VFS read. What makes this different from earlier examples is that this array associates a performed read to both a process name and its corresponding process ID.

The second probe in Example 3.20, "Multiple Array Indexes" demonstrates how to process and print the information collected by the array reads. Note how the foreach statement uses the same number of variables (i.e. var1 and var2) contained in the first instance of the array reads from the first probe.

3.6. Tapsets

Tapsets are scripts that form a library of pre-written probes and functions to be used in SystemTap scripts. When a user runs a SystemTap script, SystemTap checks the script's probe events and handlers against the tapset library; SystemTap then loads the corresponding probes and functions before translating the script to C (refer to Section 3.1, "Architecture" for information on what transpires in a SystemTap session).

Like SystemTap scripts, tapsets use the file name extension .stp. The standard library of tapsets is located in /usr/share/systemtap/tapset/ by default. However, unlike SystemTap scripts, tapsets are not meant for direct execution; rather, they constitute the library from which other scripts can pull definitions.

Simply put, the tapset library is an abstraction layer designed to make it easier for users to define events and functions. In a manner of speaking, tapsets provide useful aliases for functions that users may want to specify as an event; knowing the proper alias to use is, for the most part, easier than remembering specific kernel functions that might vary between kernel versions.

Several handlers and functions in Section 3.2.1, "Event" and SystemTap Functions are defined in tapsets. For example, thread_indent() is defined in indent.stp.

SystemTap Beginners Guide

Chapter 3. Understanding How SystemTap Works

3.1. Architecture

3.2. SystemTap Scripts

3.2.1. Event

3.2.2. Systemtap Handler/Body

3.3. Basic SystemTap Handler Constructs

3.3.1. Variables

3.3.2. Conditional Statements

3.3.3. Command-Line Arguments

3.4. Associative Arrays

3.5. Array Operations in SystemTap

3.5.1. Assigning an Associated Value

3.5.2. Reading Values From Arrays

3.5.3. Incrementing Associated Values

3.5.4. Processing Multiple Elements in an Array

3.5.5. Clearing/Deleting Arrays and Array Elements

3.5.6. Using Arrays in Conditional Statements

3.5.7. Computing for Statistical Aggregates

3.6. Tapsets