Page fault

A page fault (sometimes #pf or pf) is a trap to the software raised by the hardware when a program accesses a page that is mapped in the virtual address space, but not loaded in physical memory. In the typical case the operating system tries to handle the page fault by making the required page accessible at a location in physical memory or terminates the program in the case of an illegal access. The hardware that detects a page fault is the memory management unit in a processor. The exception handling software that handles the page fault is generally part of the operating system.

Contrary to what the name 'page fault' might suggest, page faults are not always errors and are common and necessary to increase the amount of memory available to programs in any operating system that utilizes virtual memory, including Microsoft Windows, Unix-like systems (including Mac OS X, Linux, *BSD, Solaris, AIX, and HP-UX), and z/OS. Microsoft uses the term hard fault in more recent versions of the Resource Monitor (e.g., Windows Vista) to mean 'page fault'.^[1]

1 Types
2 Handling illegal accesses and invalid page faults
3 Performance
4 References
5 External links

Types

Minor

If the page is loaded in memory at the time the fault is generated, but is not marked in the memory management unit as being loaded in memory, then it is called a minor or soft page fault. The page fault handler in the operating system merely needs to make the entry for that page in the memory management unit point to the page in memory and indicate that the page is loaded in memory; it does not need to read the page into memory. This could happen if the memory is shared by different programs and the page is already brought into memory for other programs. The page could also have been removed from a process's Working Set, but not yet written to disk or erased, such as in operating systems that use Secondary Page Caching. For example, HP OpenVMS may remove a page that does not need to be written to disk (if it has remained unchanged since it was last read from disk, for example) and place it on a Free Page List if the working set is deemed too large. However, the page contents are not overwritten until the page is assigned elsewhere, meaning it is still available if it is referenced by the original process before being allocated. Since these faults do not involve disk latency, they are faster and less expensive than major page faults.

Major

If the page is not loaded in memory at the time the fault is generated, then it is called a major or hard page fault. The page fault handler in the operating system needs to find a free page in memory, or choose another non-free page in memory to be used for this page's data, which might be used by another process. In this latter case, the OS first needs to write out the data in that page if it hasn't already been written out since it was last modified, and mark that page as not being loaded into memory in its process page table. Once the page has thus been made available, the OS can read the data for the new page into the physical page, and then make the entry for that page in the memory management unit point to the page in memory and indicate that the page is loaded in memory.^{[clarification needed]} Major faults are more expensive than minor page faults and add disk latency to the interrupted program's execution. This is the mechanism used by an operating system to increase the amount of program memory available on demand. The operating system delays loading parts of the program from disk until the program attempts to use it and the page fault is generated.

Invalid

If a page fault occurs for a reference to an address that's not part of the virtual address space, so that there can't be a page in memory corresponding to it, then it is called an invalid page fault. The page fault handler in the operating system then needs to terminate the code that made the reference, or deliver an indication to that code that the reference was invalid. A null pointer is usually represented as a pointer to address 0 in the address space; many operating systems set up the memory management unit to indicate that the page that contains that address is not in memory, and do not include that page in the virtual address space, so that attempts to read or write the memory referenced by a null pointer get an invalid page fault.

Handling illegal accesses and invalid page faults

Illegal accesses and invalid page faults can result in a program crash, segmentation error, bus error or core dump depending on the operating system environment. Often these problems are caused by software bugs, but hardware memory errors, such as those caused by overclocking, may corrupt pointers and make correct software fail.

Operating systems such as Windows and UNIX (and other UNIX-like systems) provide differing mechanisms for reporting errors caused by page faults. Windows uses structured exception handling to report page fault-based invalid accesses as access violation exceptions, and UNIX (and UNIX-like) systems typically use signals, such as SIGSEGV, to report these error conditions to programs.

If the program receiving the error does not handle it, the operating system performs a default action, typically involving the termination of the running process that caused the error condition, and notifying the user that the program has malfunctioned. Recent versions of Windows often report such problems by simply stating something like "this program must close" (an experienced user or programmer with access to a debugger can still retrieve detailed information). Recent Windows versions also write a minidump (similar in principle to a core dump) describing the state of the crashed process. UNIX and UNIX-like operating systems report these conditions to the user with error messages such as "segmentation violation", or "bus error", and may also produce a core dump.

Performance

Page faults, by their very nature, degrade the performance of a program or operating system and in the degenerate case can cause thrashing. Optimizations to programs and the operating system that reduce the number of page faults improve the performance of the program or even the entire system. The two primary focuses of the optimization effort are reducing overall memory usage and improving memory locality. Generally, making more physical memory available also reduces page faults. Many page replacement algorithms have been proposed, such as implementing heuristic-based algorithms to reduce the incidence of page faults.

An average hard disk has an average rotational latency of 3ms, a seek-time of 5ms, and a transfer-time of 0.05ms/page. So the total time for paging comes in near 8ms (8 000 000 ns). If the memory access time is 200ns, then the page fault would make the operation about 40,000 times slower. To reduce the page faults in the system, programmers must make use of an appropriate page replacement algorithm that suits the current requirements and maximizes the page hits.

References

John L. Hennessy, David A. Patterson, Computer Architecture, A Quantitative Approach (ISBN 1-55860-724-2)
Tanenbaum, Andrew S. Operating Systems: Design and Implementation (Second Edition). New Jersey: Prentice-Hall 1997.
Intel Architecture Software Developer's Manual–Volume 3: System Programming

^ cf. Resource View Help in Microsoft operating systems

External links

"So What Is A Page Fault?" from OSR Online (a Windows-specific explanation)
"Virtual Memory Details" from the Red Hat website.
"UnhandledExceptionFilter (Windows)" from MSDN Online.

Contents