| SystemTap Beginners GuideChapter 4. Useful SystemTap ScriptsThis chapter enumerates several SystemTap scripts you can use to monitor and investigate different subsystems. All of these scripts are available at /usr/share/systemtap/testsuite/systemtap.examples/ once you install the systemtap-testsuite RPM. The following sections showcase scripts that trace network-related functions and build a profile of network activity. This section describes how to profile network activity. nettop.stp provides a glimpse into how much network traffic each process is generating on a machine. nettop.stp #! /usr/bin/env stapglobal ifxmit, ifrecvglobal ifmergedprobe netdev.transmit{ ifxmit[pid(), dev_name, execname(), uid()] <<< length}probe netdev.receive{ ifrecv[pid(), dev_name, execname(), uid()] <<< length}function print_activity(){ printf("%5s %5s %-7s %7s %7s %7s %7s %-15s\n", "PID", "UID", "DEV", "XMIT_PK", "RECV_PK", "XMIT_KB", "RECV_KB", "COMMAND") foreach ([pid, dev, exec, uid] in ifrecv) { ifmerged[pid, dev, exec, uid] += @count(ifrecv[pid,dev,exec,uid]); } foreach ([pid, dev, exec, uid] in ifxmit) { ifmerged[pid, dev, exec, uid] += @count(ifxmit[pid,dev,exec,uid]); } foreach ([pid, dev, exec, uid] in ifmerged-) { n_xmit = @count(ifxmit[pid, dev, exec, uid]) n_recv = @count(ifrecv[pid, dev, exec, uid]) printf("%5d %5d %-7s %7d %7d %7d %7d %-15s\n", pid, uid, dev, n_xmit, n_recv, n_xmit ? @sum(ifxmit[pid, dev, exec, uid])/1024 : 0, n_recv ? @sum(ifrecv[pid, dev, exec, uid])/1024 : 0, exec) } print("\n") delete ifxmit delete ifrecv delete ifmerged}probe timer.ms(5000), end, error{ print_activity()} Note that function print_activity() uses the following expressions: n_xmit ? @sum(ifxmit[pid, dev, exec, uid])/1024 : 0n_recv ? @sum(ifrecv[pid, dev, exec, uid])/1024 : 0 These expressions are if/else conditionals. The first statement is simply a more concise way of writing the following psuedo code: if n_recv != 0 then @sum(ifrecv[pid, dev, exec, uid])/1024else 0 nettop.stp tracks which processes are generating network traffic on the system, and provides the following information about each process: PID - the ID of the listed process.
UID - user ID. A user ID of 0 refers to the root user.
DEV - which ethernet device the process used to send / receive data (e.g. eth0, eth1)
XMIT_PK - number of packets transmitted by the process
RECV_PK - number of packets received by the process
XMIT_KB - amount of data sent by the process, in kilobytes
RECV_KB - amount of data received by the service, in kilobytes
Example 4.1. nettop.stp Sample Output [...] PID UID DEV XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND 0 0 eth0 0 5 0 0 swapper 11178 0 eth0 2 0 0 0 synergyc PID UID DEV XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND 2886 4 eth0 79 0 5 0 cups-polld 11362 0 eth0 0 61 0 5 firefox 0 0 eth0 3 32 0 3 swapper 2886 4 lo 4 4 0 0 cups-polld 11178 0 eth0 3 0 0 0 synergyc PID UID DEV XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND 0 0 eth0 0 6 0 0 swapper 2886 4 lo 2 2 0 0 cups-polld 11178 0 eth0 3 0 0 0 synergyc 3611 0 eth0 0 1 0 0 Xorg PID UID DEV XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND 0 0 eth0 3 42 0 2 swapper 11178 0 eth0 43 1 3 0 synergyc 11362 0 eth0 0 7 0 0 firefox 3897 0 eth0 0 1 0 0 multiload-apple[...] 4.1.2. Tracing Functions Called in Network Socket CodeThis section describes how to trace functions called from the kernel's net/socket.c file. This task helps you identify, in finer detail, how each process interacts with the network at the kernel level. Example 4.2. socket-trace.stp Sample Output [...]0 Xorg(3611): -> sock_poll3 Xorg(3611): <- sock_poll0 Xorg(3611): -> sock_poll3 Xorg(3611): <- sock_poll0 gnome-terminal(11106): -> sock_poll5 gnome-terminal(11106): <- sock_poll0 scim-bridge(3883): -> sock_poll3 scim-bridge(3883): <- sock_poll0 scim-bridge(3883): -> sys_socketcall4 scim-bridge(3883): -> sys_recv8 scim-bridge(3883): -> sys_recvfrom12 scim-bridge(3883):-> sock_from_file16 scim-bridge(3883):<- sock_from_file20 scim-bridge(3883):-> sock_recvmsg24 scim-bridge(3883):<- sock_recvmsg28 scim-bridge(3883): <- sys_recvfrom31 scim-bridge(3883): <- sys_recv35 scim-bridge(3883): <- sys_socketcall[...] 4.1.3. Monitoring Incoming TCP ConnectionsThis section illustrates how to monitor incoming TCP connections. This task is useful in identifying any unauthorized, suspicious, or otherwise unwanted network access requests in real time. While tcp_connections.stp is running, it will print out the following information about any incoming TCP connections accepted by the system in real time: Current UID CMD - the command accepting the connection
PID of the command
Port used by the connection IP address from which the TCP connection originated
Example 4.3. tcp_connections.stp Sample Output UID CMD PID PORT IP_SOURCE0 sshd 3165 22 10.64.0.2270 sshd 3165 22 10.64.0.227 4.1.4. Monitoring Network Packets Drops in Kernel The network stack in Linux can discard packets for various reasons. Some Linux kernels include a tracepoint, kernel.trace("kfree_skb") , which easily tracks where packets are discarded. dropwatch.stp uses kernel.trace("kfree_skb") to trace packet discards; the script summarizes which locations discard packets every five-second interval. The kernel.trace("kfree_skb") traces which places in the kernel drop network packets. The kernel.trace("kfree_skb") has two arguments: a pointer to the buffer being freed ($skb ) and the location in kernel code the buffer is being freed ($location ). Example 4.4. dropwatch.stp Sample Output Monitoring for dropped packets51 packets dropped at location 0xffffffff8024cd0f2 packets dropped at location 0xffffffff8044b47251 packets dropped at location 0xffffffff8024cd0f1 packets dropped at location 0xffffffff8044b47297 packets dropped at location 0xffffffff8024cd0f1 packets dropped at location 0xffffffff8044b472Stopping dropped packet monitor To make the location of packet drops more meaningful, refer to the /boot/System.map-`uname -r` file. This file lists the starting addresses for each function, allowing you to map the addresses in the output of Example 4.4, "dropwatch.stp Sample Output" to a specific function name. Given the following snippet of the /boot/System.map-`uname -r` file, the address 0xffffffff8024cd0f maps to the function unix_stream_recvmsg and the address 0xffffffff8044b472 maps to the function arp_rcv : [...]ffffffff8024c5cd T unlock_new_inodeffffffff8024c5da t unix_stream_sendmsgffffffff8024c920 t unix_stream_recvmsgffffffff8024cea1 t udp_v4_lookup_longway[...]ffffffff8044addc t arp_processffffffff8044b360 t arp_rcvffffffff8044b487 t parp_redoffffffff8044b48c t arp_solicit[...] The following sections showcase scripts that monitor disk and I/O activity. 4.2.1. Summarizing Disk Read/Write TrafficThis section describes how to identify which processes are performing the heaviest disk reads/writes to the system. disktop.stp #!/usr/bin/stap## Copyright (C) 2007 Oracle Corp.## Get the status of reading/writing disk every 5 seconds,# output top ten entries ## This is free software,GNU General Public License (GPL);# either version 2, or (at your option) any later version.## Usage:# ./disktop.stp#global io_stat,deviceglobal read_bytes,write_bytesprobe vfs.read.return { if ($return>0) { if (devname!="N/A") {/*skip read from cache*/ io_stat[pid(),execname(),uid(),ppid(),"R"] += $return device[pid(),execname(),uid(),ppid(),"R"] = devname read_bytes += $return } }}probe vfs.write.return { if ($return>0) { if (devname!="N/A") { /*skip update cache*/ io_stat[pid(),execname(),uid(),ppid(),"W"] += $return device[pid(),execname(),uid(),ppid(),"W"] = devname write_bytes += $return } }}probe timer.ms(5000) { /* skip non-read/write disk */ if (read_bytes+write_bytes) { printf("\n%-25s, %-8s%4dKb/sec, %-7s%6dKb, %-7s%6dKb\n\n", ctime(gettimeofday_s()), "Average:", ((read_bytes+write_bytes)/1024)/5, "Read:",read_bytes/1024, "Write:",write_bytes/1024) /* print header */ printf("%8s %8s %8s %25s %8s %4s %12s\n", "UID","PID","PPID","CMD","DEVICE","T","BYTES") } /* print top ten I/O */ foreach ([process,cmd,userid,parent,action] in io_stat- limit 10) printf("%8d %8d %8d %25s %8s %4s %12d\n", userid,process,parent,cmd, device[process,cmd,userid,parent,action], action,io_stat[process,cmd,userid,parent,action]) /* clear data */ delete io_stat delete device read_bytes = 0 write_bytes = 0 }probe end{ delete io_stat delete device delete read_bytes delete write_bytes} UID - user ID. A user ID of 0 refers to the root user.
PID - the ID of the listed process.
PPID - the process ID of the listed process's parent process.
CMD - the name of the listed process.
DEVICE - which storage device the listed process is reading from or writing to.
T - the type of action performed by the listed process; W refers to write, while R refers to read.
BYTES - the amount of data read to or written from disk.
The time and date in the output of disktop.stp is returned by the functions ctime() and gettimeofday_s() . ctime() derives calendar time in terms of seconds passed since the Unix epoch (January 1, 1970). gettimeofday_s() counts the actual number of seconds since Unix epoch, which gives a fairly accurate human-readable timestamp for the output. In this script, the $return is a local variable that stores the actual number of bytes each process reads or writes from the virtual file system. $return can only be used in return probes (e.g. vfs.read.return and vfs.read.return ). Example 4.5. disktop.stp Sample Output [...]Mon Sep 29 03:38:28 2008 , Average: 19Kb/sec, Read: 7Kb, Write: 89KbUID PID PPID CMD DEVICE T BYTES0 26319 26294 firefox sda5 W 902290 2758 2757 pam_timestamp_c sda5 R 80640 2885 1 cupsd sda5 W 1678Mon Sep 29 03:38:38 2008 , Average: 1Kb/sec, Read: 7Kb, Write: 1KbUID PID PPID CMD DEVICE T BYTES0 2758 2757 pam_timestamp_c sda5 R 80640 2885 1 cupsd sda5 W 1678 4.2.2. Tracking I/O Time For Each File Read or WriteThis section describes how to monitor the amount of time it takes for each process to read from or write to any file. This is useful if you wish to determine what files are slow to load on a given system. iotime.stp tracks each time a system call opens, closes, reads from, and writes to a file. For each file any system call accesses, iotime.stp counts the number of microseconds it takes for any reads or writes to finish and tracks the amount of data (in bytes) read from or written to the file. iotime.stp also uses the local variable $count to track the amount of data (in bytes) that any system call attempts to read or write. Note that $return (as used in disktop.stp from Section 4.2.1, "Summarizing Disk Read/Write Traffic") stores the actual amount of data read/written. $count can only be used on probes that track data reads or writes (e.g. syscall.read and syscall.write ). Example 4.6. iotime.stp Sample Output [...]825946 3364 (NetworkManager) access /sys/class/net/eth0/carrier read: 8190 write: 0825955 3364 (NetworkManager) iotime /sys/class/net/eth0/carrier time: 9[...]117061 2460 (pcscd) access /dev/bus/usb/003/001 read: 43 write: 0117065 2460 (pcscd) iotime /dev/bus/usb/003/001 time: 7[...]3973737 2886 (sendmail) access /proc/loadavg read: 4096 write: 03973744 2886 (sendmail) iotime /proc/loadavg time: 11[...] A timestamp, in microseconds. Process ID and process name. An access or iotime flag. The file accessed.
If a process was able to read or write any data, a pair of access and iotime lines should appear together. The access line's timestamp refers to the time that a given process started accessing a file; at the end of the line, it will show the amount of data read/written (in bytes). The iotime line will show the amount of time (in microseconds) that the process took in order to perform the read or write. If an access line is not followed by an iotime line, it simply means that the process did not read or write any data. 4.2.3. Track Cumulative IOThis section describes how to track the cumulative amount of I/O to the system. traceio.stp prints the top ten executables generating I/O traffic over time. In addition, it also tracks the cumulative amount of I/O reads and writes done by those ten executables. This information is tracked and printed out in 1-second intervals, and in descending order. Example 4.7. traceio.stp Sample Output [...] Xorg r: 583401 KiB w: 0 KiB floaters r: 96 KiB w: 7130 KiBmultiload-apple r: 538 KiB w: 537 KiB sshd r: 71 KiB w: 72 KiBpam_timestamp_c r: 138 KiB w: 0 KiB staprun r: 51 KiB w: 51 KiB snmpd r: 46 KiB w: 0 KiB pcscd r: 28 KiB w: 0 KiB irqbalance r: 27 KiB w: 4 KiB cupsd r: 4 KiB w: 18 KiB Xorg r: 588140 KiB w: 0 KiB floaters r: 97 KiB w: 7143 KiBmultiload-apple r: 543 KiB w: 542 KiB sshd r: 72 KiB w: 72 KiBpam_timestamp_c r: 138 KiB w: 0 KiB staprun r: 51 KiB w: 51 KiB snmpd r: 46 KiB w: 0 KiB pcscd r: 28 KiB w: 0 KiB irqbalance r: 27 KiB w: 4 KiB cupsd r: 4 KiB w: 18 KiB 4.2.4. I/O Monitoring (By Device)This section describes how to monitor I/O activity on a specific device. traceio2.stp takes 1 argument: the whole device number. To get this number, use stat -c "0x%D" directory , where directory
is located in the device you wish to monitor. The usrdev2kerndev() function converts the whole device number into the format understood by the kernel. The output produced by usrdev2kerndev() is used in conjunction with the MKDEV() , MINOR() , and MAJOR() functions to determine the major and minor numbers of a specific device. The output of traceio2.stp includes the name and ID of any process performing a read/write, the function it is performing (i.e. vfs_read or vfs_write ), and the kernel device number. The following example is an excerpt from the full output of stap traceio2.stp 0x805 , where 0x805 is the whole device number of /home . /home resides in /dev/sda5 , which is the device we wish to monitor. Example 4.8. traceio2.stp Sample Output [...]synergyc(3722) vfs_read 0x800005synergyc(3722) vfs_read 0x800005cupsd(2889) vfs_write 0x800005cupsd(2889) vfs_write 0x800005cupsd(2889) vfs_write 0x800005[...] 4.2.5. Monitoring Reads and Writes to a FileThis section describes how to monitor reads from and writes to a file in real time. inodewatch.stp takes the following information about the file as arguments on the command line: To get this information, use stat -c '%D %i' filename , where filename is an absolute path. For instance: if you wish to monitor /etc/crontab , run stat -c '%D %i' /etc/crontab first. This gives the following output: 805 1078319 805 is the base-16 (hexadecimal) device number. The lower two digits are the minor device number and the upper digits are the major number. 1078319 is the inode number. To start monitoring /etc/crontab , run stap inodewatch.stp 0x8 0x05 1078319 (The 0x prefixes indicate base-16 values).
The output of this command contains the name and ID of any process performing a read/write, the function it is performing (i.e. vfs_read or vfs_write ), the device number (in hex format), and the inode number. Example 4.9, "inodewatch.stp Sample Output" contains the output of stap inodewatch.stp 0x8 0x05 1078319 (when cat /etc/crontab is executed while the script is running) : Example 4.9. inodewatch.stp Sample Output cat(16437) vfs_read 0x800005/1078319cat(16437) vfs_read 0x800005/1078319 4.2.6. Monitoring Changes to File AttributesThis section describes how to monitor if any processes are changing the attributes of a targeted file, in real time. Example 4.10. inodewatch2-simple.stp Sample Output chmod(17448) inode_setattr 0x800005/6011835 100777 500chmod(17449) inode_setattr 0x800005/6011835 100666 500 The following sections showcase scripts that profile kernel activity by monitoring function calls. 4.3.1. Counting Function Calls MadeThis section describes how to identify how many times the system called a specific kernel function in a 30-second sample. Depending on your use of wildcards, you can also use this script to target multiple kernel functions. functioncallcount.stp takes the targeted kernel function as an argument. The argument supports wildcards, which enables you to target multiple kernel functions up to a certain extent. Example 4.11. functioncallcount.stp Sample Output [...]__vma_link 97__vma_link_file 66__vma_link_list 97__vma_link_rb 97__xchg 103add_page_to_active_list 102add_page_to_inactive_list 19add_to_page_cache 19add_to_page_cache_lru 7all_vm_events 6alloc_pages_node 4630alloc_slabmgmt 67anon_vma_alloc 62anon_vma_free 62anon_vma_lock 66anon_vma_prepare 98anon_vma_unlink 97anon_vma_unlock 66arch_get_unmapped_area_topdown 94arch_get_unmapped_exec_area 3arch_unmap_area_topdown 97atomic_add 2atomic_add_negative 97atomic_dec_and_test 5153atomic_inc 470atomic_inc_and_test 1[...] 4.3.2. Call Graph TracingThis section describes how to trace incoming and outgoing function calls. The function(s) whose entry/exit you'd like to trace ($1 ). A second optional trigger function ($2 ), which enables or disables tracing on a per-thread basis. Tracing in each thread will continue as long as the trigger function has not exited yet.
para-callgraph.stp uses thread_indent() ; as such, its output contains the timestamp, process name, and thread ID of $1 (i.e. the probe function you are tracing). For more information about thread_indent() , refer to its entry in SystemTap Functions. The following example contains an excerpt from the output for stap para-callgraph.stp 'kernel.function("*@fs/*.c")' 'kernel.function("sys_read")' : Example 4.12. para-callgraph.stp Sample Output [...] 267 gnome-terminal(2921): <-do_sync_read return=0xfffffffffffffff5 269 gnome-terminal(2921):<-vfs_read return=0xfffffffffffffff5 0 gnome-terminal(2921):->fput file=0xffff880111eebbc0 2 gnome-terminal(2921):<-fput 0 gnome-terminal(2921):->fget_light fd=0x3 fput_needed=0xffff88010544df54 3 gnome-terminal(2921):<-fget_light return=0xffff8801116ce980 0 gnome-terminal(2921):->vfs_read file=0xffff8801116ce980 buf=0xc86504 count=0x1000 pos=0xffff88010544df48 4 gnome-terminal(2921): ->rw_verify_area read_write=0x0 file=0xffff8801116ce980 ppos=0xffff88010544df48 count=0x1000 7 gnome-terminal(2921): <-rw_verify_area return=0x1000 12 gnome-terminal(2921): ->do_sync_read filp=0xffff8801116ce980 buf=0xc86504 len=0x1000 ppos=0xffff88010544df48 15 gnome-terminal(2921): <-do_sync_read return=0xfffffffffffffff5 18 gnome-terminal(2921):<-vfs_read return=0xfffffffffffffff5 0 gnome-terminal(2921):->fput file=0xffff8801116ce980 4.3.3. Determining Time Spent in Kernel and User SpaceThis section illustrates how to determine the amount of time any given thread is spending in either kernel or user-space. thread-times.stp lists the top 20 processes currently taking up CPU time within a 5-second sample, along with the total number of CPU ticks made during the sample. The output of this script also notes the percentage of CPU time each process used, as well as whether that time was spent in kernel space or user space. Example 4.13. thread-times.stp Sample Output tid %user %kernel (of 20002 ticks) 0 0.00% 87.88%32169 5.24% 0.03% 9815 3.33% 0.36% 9859 0.95% 0.00% 3611 0.56% 0.12% 9861 0.62% 0.01%11106 0.37% 0.02%32167 0.08% 0.08% 3897 0.01% 0.08% 3800 0.03% 0.00% 2886 0.02% 0.00% 3243 0.00% 0.01% 3862 0.01% 0.00% 3782 0.00% 0.00%21767 0.00% 0.00% 2522 0.00% 0.00% 3883 0.00% 0.00% 3775 0.00% 0.00% 3943 0.00% 0.00% 3873 0.00% 0.00% 4.3.4. Monitoring Polling ApplicationsThis section describes how to identify and monitor which applications are polling. Doing so allows you to track unnecessary or excessive polling, which can help you pinpoint areas for improvement in terms of CPU usage and power savings. timeout.stp tracks how many times each application used the following system calls over time: poll
select
epoll
itimer
futex
nanosleep
signal
In some applications, these system calls are used excessively. As such, they are normally identified as "likely culprits" for polling applications. Note, however, that an application may be using a different system call to poll excessively; sometimes, it is useful to find out the top system calls used by the system (refer to Section 4.3.5, "Tracking Most Frequently Used System Calls" for instructions). Doing so can help you identify any additional suspects, which you can add to timeout.stp for tracking. Example 4.14. timeout.stp Sample Output uid | poll select epoll itimer futex nanosle signal| process28937 | 148793 0 0 4727 37288 0 0| firefox22945 | 0 56949 0 1 0 0 0| scim-bridge 0 | 0 0 0 36414 0 0 0| swapper 4275 | 23140 0 0 1 0 0 0| mixer_applet2 4191 | 0 14405 0 0 0 0 0| scim-launcher22941 | 7908 1 0 62 0 0 0| gnome-terminal 4261 | 0 0 0 2 0 7622 0| escd 3695 | 0 0 0 0 0 7622 0| gdm-binary 3483 | 0 7206 0 0 0 0 0| dhcdbd 4189 | 6916 0 0 2 0 0 0| scim-panel-gtk 1863 | 5767 0 0 0 0 0 0| iscsid 2562 | 0 2881 0 1 0 1438 0| pcscd 4257 | 4255 0 0 1 0 0 0| gnome-power-man 4278 | 3876 0 0 60 0 0 0| multiload-apple 4083 | 0 1331 0 1728 0 0 0| Xorg 3921 | 1603 0 0 0 0 0 0| gam_server 4248 | 1591 0 0 0 0 0 0| nm-applet 3165 | 0 1441 0 0 0 0 0| xterm29548 | 0 1440 0 0 0 0 0| httpd 1862 | 0 0 0 0 0 1438 0| iscsid You can increase the sample time by editing the timer in the second probe ( timer.s() ). The output of functioncallcount.stp contains the name and UID of the top 20 polling applications, along with how many times each application performed each polling system call (over time). Example 4.14, "timeout.stp Sample Output" contains an excerpt of the script: 4.3.5. Tracking Most Frequently Used System Callspoll
select
epoll
itimer
futex
nanosleep
signal
However, in some systems, a different system call might be responsible for excessive polling. If you suspect that a polling application is using a different system call to poll, you need to identify first the top system calls used by the system. To do this, use topsys.stp. Example 4.15. topsys.stp Sample Output -------------------------------------------------------------- SYSCALL COUNT gettimeofday 1857 read 1821 ioctl 1568 poll 1033 close 638 open 503 select 455 write 391 writev 335 futex 303 recvmsg 251 socket 137 clock_gettime 124 rt_sigprocmask 121 sendto 120 setitimer 106 stat 90 time 81 sigreturn 72 fstat 66-------------------------------------------------------------- 4.3.6. Tracking System Call Volume Per ProcessThis section illustrates how to determine which processes are performing the highest volume of system calls. In previous sections, we've described how to monitor the top system calls used by the system over time ( Section 4.3.5, "Tracking Most Frequently Used System Calls"). We've also described how to identify which applications use a specific set of "polling suspect" system calls the most ( Section 4.3.4, "Monitoring Polling Applications"). Monitoring the volume of system calls made by each process provides more data in investigating your system for polling processes and other resource hogs. Example 4.16. topsys.stp Sample Output Collecting data... Type Ctrl-C to exit and display results#SysCalls Process Name1577 multiload-apple692 synergyc408 pcscd376 mixer_applet2299 gnome-terminal293 Xorg206 scim-panel-gtk95 gnome-power-man90 artsd85 dhcdbd84 scim-bridge78 gnome-screensav66 scim-launcher[...] If you prefer the output to display the process IDs instead of the process names, use the following script instead. As indicated in the output, you need to manually exit the script in order to display the results. You can add a timed expiration to either script by simply adding a timer.s() probe; for example, to instruct the script to expire after 5 seconds, add the following probe to the script: probe timer.s(5){exit()} 4.4. Identifying Contended User-Space LocksThis section describes how to identify contended user-space locks throughout the system within a specific time period. The ability to identify contended user-space locks can help you investigate hangs that you suspect may be caused by futex contentions. Simply put, a futex contention occurs when multiple processes are trying to access the same region of memory. In some cases, this can result in a deadlock between the processes in contention, thereby appearing as an application hang. futexes.stp #! /usr/bin/env stap# This script tries to identify contended user-space locks by hooking# into the futex system call.global thread_thislock # shortglobal thread_blocktime # global FUTEX_WAIT = 0 /*, FUTEX_WAKE = 1 */global lock_waits # long-lived stats on (tid,lock) blockage elapsed timeglobal process_names # long-lived pid-to-execname mappingprobe syscall.futex { if (op != FUTEX_WAIT) next # don't care about WAKE event originator t = tid () process_names[pid()] = execname() thread_thislock[t] = $uaddr thread_blocktime[t] = gettimeofday_us()}probe syscall.futex.return { t = tid() ts = thread_blocktime[t] if (ts) { elapsed = gettimeofday_us() - ts lock_waits[pid(), thread_thislock[t]] <<< elapsed delete thread_blocktime[t] delete thread_thislock[t] }}probe end { foreach ([pid+, lock] in lock_waits) printf ("%s[%d] lock %p contended %d times, %d avg us\n", process_names[pid], pid, lock, @count(lock_waits[pid,lock]), @avg(lock_waits[pid,lock]))} futexes.stp needs to be manually stopped; upon exit, it prints the following information: Name and ID of the process responsible for a contention The region of memory it contested How many times the region of memory was contended Average time of contention throughout the probe
Example 4.17. futexes.stp Sample Output [...]automount[2825] lock 0x00bc7784 contended 18 times, 999931 avg ussynergyc[3686] lock 0x0861e96c contended 192 times, 101991 avg ussynergyc[3758] lock 0x08d98744 contended 192 times, 101990 avg ussynergyc[3938] lock 0x0982a8b4 contended 192 times, 101997 avg us[...] SystemTap Beginners GuideChapter 5. Understanding SystemTap ErrorsThis chapter explains the most common errors you may encounter while using SystemTap. 5.1. Parse and Semantic ErrorsThese types of errors occur while SystemTap attempts to parse and translate the script into C, prior to being converted into a kernel module. For example type errors result from operations that assign invalid values to variables or arrays. The following invalid SystemTap script is missing its probe handlers: probe vfs.readprobe vfs.write It results in the following error message showing that the parser was expecting something other than the probe keyword in column 1 of line 2: parse error: expected one of '. , ( ? ! { = +='saw: keyword at perror.stp:2:11 parse error(s). If you are sure of the safety of any similar constructs in the script and are member of stapdev group (or have root privileges), run the script in "guru" mode by using the option -g (i.e. stap -g script ). Example 5.1. error-variable.stp probe syscall.open{ printf ("%d(%d) open\n", execname(), pid())} probe begin { printf("x") = 1 } 5.2. Run Time Errors and WarningsRuntime errors and warnings occur when the SystemTap instrumentation has been installed and is collecting data on the system. This chapter enumerates other references for more information about SystemTap. It is advisable that you refer to these sources in the course of writing advanced probes and tapsets. - SystemTap Wiki
- SystemTap Tutorial
Much of the content in this book comes from the SystemTap Tutorial. The SystemTap Tutorial is a more appropriate reference for users with intermediate to advanced knowledge of C++ and kernel development, and can be found at http://sourceware.org/systemtap/tutorial/. - man stapprobes
The stapprobes man page enumerates a variety of probe points supported by SystemTap, along with additional aliases defined by the SystemTap tapset library. The bottom of the man page includes a list of other man pages enumerating similar probe points for specific system components, such as stapprobes.scsi , stapprobes.kprocess , stapprobes.signal , etc. - man stapfuncs
The stapfuncs man page enumerates numerous functions supported by the SystemTap tapset library, along with the prescribed syntax for each one. Note, however, that this is not a complete list of all supported functions; there are more undocumented functions available. - SystemTap Language Reference
This document is a comprehensive reference of SystemTap's language constructs and syntax. It is recommended for users with a rudimentary to intermediate knowledge of C++ and other similar programming languages. The SystemTap Language Reference is available to all users at http://sourceware.org/systemtap/langref/- Tapset Developers Guide
Once you have sufficient proficiency in writing SystemTap scripts, you can then try your hand out on writing your own tapsets. The Tapset Developers Guide describes how to add functions to your tapset library. - Test Suite
The systemtap-testsuite package allows you to test the entire SystemTap toolchain without having to build from source. In addition, it also contains numerous examples of SystemTap scripts you can study and test; some of these scripts are also documented in Chapter 4, Useful SystemTap Scripts. By default, the example scripts included in systemtap-testsuite are located in /usr/share/systemtap/testsuite/systemtap.examples .
Revision History |
---|
Revision 2-25 | Tue Feb 17 2013 | Jacquelynn East | Version for 6.4 GA release |
| Revision 2-22 | Tue May 17 2011 | Jacquelynn East | | Revision 2-18 | Mon May 16 2011 | Jacquelynn East | | Revision 2-15 | Mon Nov 29 2010 | Michael Hideo-Smith | | Revision 2-14 | Thu Nov 18 2010 | Jacquelynn East | | Revision 2-13 | Thu Nov 18 2010 | Jacquelynn East | |
|
| |