Red Hat Enterprise Linux Manual

Daftar Isi ☛

(Sebelumnya) 10 : The LVM Configuration Files

11 : Chapter 3. Monitoring and ... (Berikutnya)

Performance Tuning Guide

Optimizing subsystem throughput in Red Hat Enterprise Linux 6

Edition 4.0

Red Hat Subject Matter Experts

Edited by

Don Domingo

Edited by

Laura Bailey

Legal Notice

The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution-Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version.

Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.

Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, MetaMatrix, Fedora, the Infinity Logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.

Linux® is the registered trademark of Linus Torvalds in the United States and other countries.

Java® is a registered trademark of Oracle and/or its affiliates.

XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.

MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.

All other trademarks are the property of their respective owners.

1801 Varsity Drive
Raleigh, NC 27606-2072 USA
Phone: +1 919 754 3700
Phone: 888 733 4281
Fax: +1 919 754 3701

Daftar Isi

Abstract

The Performance Tuning Guide describes how to optimize the performance of a system running Red Hat Enterprise Linux 6. It also documents performance-related upgrades in Red Hat Enterprise Linux 6.

While this guide contains procedures that are field-tested and proven, Red Hat recommends that you properly test all planned configurations in a testing environment before applying it to a production environment. You should also back up all your data and pre-tuning configurations.

Preface

1. Document Conventions

1.1. Typographic Conventions
1.2. Pull-quote Conventions
1.3. Notes and Warnings

2. Getting Help and Giving Feedback

2.1. Do You Need Help?
2.2. We Need Feedback!

1. Overview

1.1. Audience

1.2. Horizontal Scalability

1.2.1. Parallel Computing

1.3. Distributed Systems

1.3.1. Communication
1.3.2. Storage
1.3.3. Converged Networks

2. Red Hat Enterprise Linux 6 Performance Features

2.1. 64-Bit Support
2.2. Ticket Spinlocks
2.3. Dynamic List Structure
2.4. Tickless Kernel
2.5. Control Groups
2.6. Storage and File System Improvements

3. Monitoring and Analyzing System Performance

3.1. The proc File System

3.2. GNOME and KDE System Monitors

3.3. Built-in Command-line Monitoring Tools

3.4. Tuned and ktune

3.5. Application Profilers

3.5.1. SystemTap
3.5.2. OProfile
3.5.3. Valgrind
3.5.4. Perf

3.6. Red Hat Enterprise MRG

4. CPU

4.1. CPU Topology

4.1.1. CPU and NUMA Topology
4.1.2. Tuning CPU Performance
4.1.3. numastat
4.1.4. NUMA Affinity Management Daemon (numad)

4.2. CPU Scheduling

4.2.1. Realtime scheduling policies
4.2.2. Normal scheduling policies
4.2.3. Policy selection

4.3. Interrupts and IRQ Tuning

4.4. Enhancements to NUMA in Red Hat Enterprise Linux 6

4.4.1. Bare-metal and Scalability Optimizations
4.4.2. Virtualization Optimizations

5. Memory

5.1. Huge Translation Lookaside Buffer (HugeTLB)

5.2. Huge Pages and Transparent Huge Pages

5.3. Using Valgrind to Profile Memory Usage

5.3.1. Profiling Memory Usage with Memcheck
5.3.2. Profiling Cache Usage with Cachegrind
5.3.3. Profiling Heap and Stack Space with Massif

5.4. Capacity Tuning

5.5. Tuning Virtual Memory

6.4. Configuration

6.4.1. Completely Fair Queuing (CFQ)
6.4.2. Deadline I/O Scheduler
6.4.3. Noop

7. File Systems

7.1. Tuning Considerations for File Systems

7.1.1. Formatting Options
7.1.2. Mount Options
7.1.3. File system maintenance
7.1.4. Application Considerations

7.2. Profiles for file system performance

7.3. File Systems

7.3.1. The Ext4 File System
7.3.2. The XFS File System

7.4. Clustering

7.4.1. Global File System 2

8. Networking

8.1. Network Performance Enhancements

8.2. Optimized Network Settings

8.3. Overview of Packet Reception

8.4. Resolving Common Queuing/Frame Loss Issues

8.4.1. NIC Hardware Buffer
8.4.2. Socket Queue

8.5. Multicast Considerations

A. Revision History

Preface

1. Document Conventions

This manual uses several conventions to highlight certain words and phrases and draw attention to specific pieces of information.

In PDF and paper editions, this manual uses typefaces drawn from the Liberation Fonts set. The Liberation Fonts set is also used in HTML editions if the set is installed on your system. If not, alternative but equivalent typefaces are displayed. Note: Red Hat Enterprise Linux 5 and later includes the Liberation Fonts set by default.

1.1. Typographic Conventions

Four typographic conventions are used to call attention to specific words and phrases. These conventions, and the circumstances they apply to, are as follows.

Mono-spaced Bold

Used to highlight system input, including shell commands, file names and paths. Also used to highlight keys and key combinations. For example:

To see the contents of the file my_next_bestselling_novel in your current working directory, enter the cat my_next_bestselling_novel command at the shell prompt and press Enter to execute the command.

The above includes a file name, a shell command and a key, all presented in mono-spaced bold and all distinguishable thanks to context.

Key combinations can be distinguished from an individual key by the plus sign that connects each part of a key combination. For example:

Press Enter to execute the command.
Press Ctrl+Alt+F2 to switch to a virtual terminal.

The first example highlights a particular key to press. The second example highlights a key combination: a set of three keys pressed simultaneously.

If source code is discussed, class names, methods, functions, variable names and returned values mentioned within a paragraph will be presented as above, in mono-spaced bold. For example:

File-related classes include filesystem for file systems, file for files, and dir for directories. Each class has its own associated set of permissions.

Proportional Bold

This denotes words or phrases encountered on a system, including application names; dialog box text; labeled buttons; check-box and radio button labels; menu titles and sub-menu titles. For example:

Choose System ⤍ Preferences ⤍ Mouse from the main menu bar to launch Mouse Preferences. In the Buttons tab, click the Left-handed mouse check box and click Close to switch the primary mouse button from the left to the right (making the mouse suitable for use in the left hand).
To insert a special character into a gedit file, choose Applications ⤍ Accessories ⤍ Character Map from the main menu bar. Next, choose Search ⤍ Find . . . . . . from the Character Map menu bar, type the name of the character in the Search field and click Next. The character you sought will be highlighted in the Character Table. Double-click this highlighted character to place it in the Text to copy field and then click the Copy button. Now switch back to your document and choose Edit ⤍ Paste from the gedit menu bar.

The above text includes application names; system-wide menu names and items; application-specific menu names; and buttons and text found within a GUI interface, all presented in proportional bold and all distinguishable by context.

Mono-spaced Bold Italic or Proportional Bold Italic

Whether mono-spaced bold or proportional bold, the addition of italics indicates replaceable or variable text. Italics denotes text you do not input literally or displayed text that changes depending on circumstance. For example:

To connect to a remote machine using ssh, type ssh username@domain.name at a shell prompt. If the remote machine is example.com and your username on that machine is john, type ssh [email protected].
The mount -o remount file-system command remounts the named file system. For example, to remount the /home file system, the command is mount -o remount /home.
To see the version of a currently installed package, use the rpm -q package command. It will return a result as follows: package-version-release.

Note the words in bold italics above - username, domain.name, file-system, package, version and release. Each word is a placeholder, either for text you enter when issuing a command or for text displayed by the system.

Aside from standard usage for presenting the title of a work, italics denotes the first use of a new and important term. For example:

Publican is a DocBook publishing system.

1.2. Pull-quote Conventions

Terminal output and source code listings are set off visually from the surrounding text.

Output sent to a terminal is set in mono-spaced roman and presented thus:

books Desktop   documentation  drafts  mss photos   stuff  svnbooks_tests  Desktop1  downloads  images  notes  scripts  svgs

Source-code listings are also set in mono-spaced roman but add syntax highlighting as follows:

package org.jboss.book.jca.ex1;import javax.naming.InitialContext;public class ExClient{   public static void main(String args[]) throws Exception   {  InitialContext iniCtx = new InitialContext();  Object ref = iniCtx.lookup("EchoBean");  EchoHome   home   = (EchoHome) ref;  Echo   echo   = home.create();  System.out.println("Created Echo");  System.out.println("Echo.echo('Hello') = " + echo.echo("Hello"));   }}

1.3. Notes and Warnings

Finally, we use three visual styles to draw attention to information that might otherwise be overlooked.

Note

Notes are tips, shortcuts or alternative approaches to the task at hand. Ignoring a note should have no negative consequences, but you might miss out on a trick that makes your life easier.

Important

Important boxes detail things that are easily missed: configuration changes that only apply to the current session, or services that need restarting before an update will apply. Ignoring a box labeled 'Important' will not cause data loss but may cause irritation and frustration.

Warning

Warnings should not be ignored. Ignoring warnings will most likely cause data loss.

2. Getting Help and Giving Feedback

2.1. Do You Need Help?

If you experience difficulty with a procedure described in this documentation, visit the Red Hat Customer Portal at http://access.redhat.com. Through the customer portal, you can:

search or browse through a knowledgebase of technical support articles about Red Hat products.
submit a support case to Red Hat Global Support Services (GSS).
access other product documentation.

Red Hat also hosts a large number of electronic mailing lists for discussion of Red Hat software and technology. You can find a list of publicly available mailing lists at https://www.redhat.com/mailman/listinfo. Click on the name of any mailing list to subscribe to that list or to access the list archives.

2.2. We Need Feedback!

If you find a typographical error in this manual, or if you have thought of a way to make this manual better, we would love to hear from you! Please submit a report in Bugzilla: http://bugzilla.redhat.com/ against the product Red Hat Enterprise Linux 6.

When submitting a bug report, be sure to mention the manual's identifier: doc-Performance_Tuning_Guide

If you have a suggestion for improving the documentation, try to be as specific as possible when describing it. If you have found an error, please include the section number and some of the surrounding text so we can find it easily.

Chapter 1. Overview

1.1. Audience

1.2. Horizontal Scalability

1.2.1. Parallel Computing

1.3. Distributed Systems

1.3.1. Communication
1.3.2. Storage
1.3.3. Converged Networks

The Performance Tuning Guide is a comprehensive reference on the configuration and optimization of Red Hat Enterprise Linux. While this release also contains information on Red Hat Enterprise Linux 5 performance capabilities, all instructions supplied herein are specific to Red Hat Enterprise Linux 6.

This book is divided into chapters discussing specific subsystems in Red Hat Enterprise Linux. The Performance Tuning Guide focuses on three major themes per subsystem:

Features: Each subsystem chapter describes performance features unique to (or implemented differently) in Red Hat Enterprise Linux 6. These chapters also discuss Red Hat Enterprise Linux 6 updates that significantly improved the performance of specific subsystems over Red Hat Enterprise Linux 5.
Analysis: The book also enumerates performance indicators for each specific subsystem. Typical values for these indicators are described in the context of specific services, helping you understand their significance in real-world, production systems.
In addition, the Performance Tuning Guide also shows different ways of retrieving performance data (that is, profiling) for a subsystem. Note that some of the profiling tools showcased here are documented elsewhere with more detail.
Configuration: Perhaps the most important information in this book are instructions on how to adjust the performance of a specific subsystem in Red Hat Enterprise Linux 6. The Performance Tuning Guide explains how to fine-tune a Red Hat Enterprise Linux 6 subsystem for specific services.

Keep in mind that tweaking a specific subsystem's performance may affect the performance of another, sometimes adversely. The default configuration of Red Hat Enterprise Linux 6 is optimal for most services running under moderate loads.

The procedures enumerated in the Performance Tuning Guide were tested extensively by Red Hat engineers in both lab and field. However, Red Hat recommends that you properly test all planned configurations in a secure testing environment before applying it to your production servers. You should also back up all data and configuration information before you start tuning your system.

1.1. Audience

This book is suitable for two types of readers:

System/Business Analyst: This book enumerates and explains Red Hat Enterprise Linux 6 performance features at a high level, providing enough information on how subsystems perform for specific workloads (both by default and when optimized). The level of detail used in describing Red Hat Enterprise Linux 6 performance features helps potential customers and sales engineers understand the suitability of this platform in providing resource-intensive services at an acceptable level.
The Performance Tuning Guide also provides links to more detailed documentation on each feature whenever possible. At that detail level, readers can understand these performance features enough to form a high-level strategy in deploying and optimizing Red Hat Enterprise Linux 6. This allows readers to both develop and evaluate infrastructure proposals.
This feature-focused level of documentation is suitable for readers with a high-level understanding of Linux subsystems and enterprise-level networks.
System Administrator: The procedures enumerated in this book are suitable for system administrators with RHCE ^[1] skill level (or its equivalent, that is, 3-5 years experience in deploying and managing Linux). The Performance Tuning Guide aims to provide as much detail as possible about the effects of each configuration; this means describing any performance trade-offs that may occur.
The underlying skill in performance tuning lies not in knowing how to analyze and tune a subsystem. Rather, a system administrator adept at performance tuning knows how to balance and optimize a Red Hat Enterprise Linux 6 system for a specific purpose. This means also knowing which trade-offs and performance penalties are acceptable when attempting to implement a configuration designed to boost a specific subsystem's performance.

1.2. Horizontal Scalability

Red Hat's efforts in improving the performance of Red Hat Enterprise Linux 6 focus on scalability. Performance-boosting features are evaluated primarily based on how they affect the platform's performance in different areas of the workload spectrum - that is, from the lonely web server to the server farm mainframe.

Focusing on scalability allows Red Hat Enterprise Linux to maintain its versatility for different types of workloads and purposes. At the same time, this means that as your business grows and your workload scales up, re-configuring your server environment is less prohibitive (in terms of cost and man-hours) and more intuitive.

Red Hat makes improvements to Red Hat Enterprise Linux for both horizontal scalability and vertical scalability; however, horizontal scalability is the more generally applicable use case. The idea behind horizontal scalability is to use multiple standard computers to distribute heavy workloads in order to improve performance and reliability.

In a typical server farm, these standard computers come in the form of 1U rack-mounted servers and blade servers. Each standard computer may be as small as a simple two-socket system, although some server farms use large systems with more sockets. Some enterprise-grade networks mix large and small systems; in such cases, the large systems are high performance servers (for example, database servers) and the small ones are dedicated application servers (for example, web or mail servers).

This type of scalability simplifies the growth of your IT infrastructure: a medium-sized business with an appropriate load might only need two pizza box servers to suit all their needs. As the business hires more people, expands its operations, increases its sales volumes and so forth, its IT requirements increase in both volume and complexity. Horizontal scalability allows IT to simply deploy additional machines with (mostly) identical configurations as their predecessors.

To summarize, horizontal scalability adds a layer of abstraction that simplifies system hardware administration. By developing the Red Hat Enterprise Linux platform to scale horizontally, increasing the capacity and performance of IT services can be as simple as adding new, easily configured machines.

1.2.1. Parallel Computing

Users benefit from Red Hat Enterprise Linux's horizontal scalability not just because it simplifies system hardware administration; but also because horizontal scalability is a suitable development philosophy given the current trends in hardware advancement.

Consider this: most complex enterprise applications have thousands of tasks that must be performed simultaneously, with different coordination methods between tasks. While early computers had a single-core processor to juggle all these tasks, virtually all processors available today have multiple cores. Effectively, modern computers put multiple cores in a single socket, making even single-socket desktops or laptops multi-processor systems.

As of 2010, standard Intel and AMD processors were available with two to sixteen cores. Such processors are prevalent in pizza box or blade servers, which can now contain as many as 40 cores. These low-cost, high-performance systems bring large system capabilities and characteristics into the mainstream.

To achieve the best performance and utilization of a system, each core must be kept busy. This means that 32 separate tasks must be running to take advantage of a 32-core blade server. If a blade chassis contains ten of these 32-core blades, then the entire setup can process a minimum of 320 tasks simultaneously. If these tasks are part of a single job, they must be coordinated.

Red Hat Enterprise Linux was developed to adapt well to hardware development trends and ensure that businesses can fully benefit from them. Section 1.3, "Distributed Systems" explores the technologies that enable Red Hat Enterprise Linux's horizontal scalability in greater detail.

1.3. Distributed Systems

To fully realize horizontal scalability, Red Hat Enterprise Linux uses many components of distributed computing. The technologies that make up distributed computing are divided into three layers:

Communication: Horizontal scalability requires many tasks to be performed simultaneously (in parallel). As such, these tasks must have interprocess communication to coordinate their work. Further, a platform with horizontal scalability should be able to share tasks across multiple systems.
Storage: Storage via local disks is not sufficient in addressing the requirements of horizontal scalability. Some form of distributed or shared storage is needed, one with a layer of abstraction that allows a single storage volume's capacity to grow seamlessly with the addition of new storage hardware.
Management: The most important duty in distributed computing is the management layer. This management layer coordinates all software and hardware components, efficiently managing communication, storage, and the usage of shared resources.

The following sections describe the technologies within each layer in more detail.

1.3.1. Communication

The communication layer ensures the transport of data, and is composed of two parts:

Hardware
Software

The simplest (and fastest) way for multiple systems to communicate is through shared memory. This entails the usage of familiar memory read/write operations; shared memory has the high bandwidth, low latency, and low overhead of ordinary memory read/write operations.

Ethernet

The most common way of communicating between computers is over Ethernet. Today, Gigabit Ethernet (GbE) is provided by default on systems, and most servers include 2-4 ports of Gigabit Ethernet. GbE provides good bandwidth and latency. This is the foundation of most distributed systems in use today. Even when systems include faster network hardware, it is still common to use GbE for a dedicated management interface.

10GbE

Ten Gigabit Ethernet (10GbE) is rapidly growing in acceptance for high end and even mid-range servers. 10GbE provides ten times the bandwidth of GbE. One of its major advantages is with modern multi-core processors, where it restores the balance between communication and computing. You can compare a single core system using GbE to an eight core system using 10GbE. Used in this way, 10GbE is especially valuable for maintaining overall system performance and avoiding communication bottlenecks.

Unfortunately, 10GbE is expensive. While the cost of 10GbE NICs has come down, the price of interconnect (especially fibre optics) remains high, and 10GbE network switches are extremely expensive. We can expect these prices to decline over time, but 10GbE today is most heavily used in server room backbones and performance-critical applications.

Infiniband

Infiniband offers even higher performance than 10GbE. In addition to TCP/IP and UDP network connections used with Ethernet, Infiniband also supports shared memory communication. This allows Infiniband to work between systems via remote direct memory access (RDMA).

The use of RDMA allows Infiniband to move data directly between systems without the overhead of TCP/IP or socket connections. In turn, this reduces latency, which is critical to some applications.

Infiniband is most commonly used in High Performance Technical Computing (HPTC) applications which require high bandwidth, low latency and low overhead. Many supercomputing applications benefit from this, to the point that the best way to improve performance is by investing in Infiniband rather than faster processors or more memory.

RoCCE

RDMA over Ethernet (RoCCE) implements Infiniband-style communications (including RDMA) over a 10GbE infrastructure. Given the cost improvements associated with the growing volume of 10GbE products, it is reasonable to expect wider usage of RDMA and RoCCE in a wide range of systems and applications.

Each of these communication methods is fully-supported by Red Hat for use with Red Hat Enterprise Linux 6.

1.3.2. Storage

An environment that uses distributed computing uses multiple instances of shared storage. This can mean one of two things:

Multiple systems storing data in a single location
A storage unit (e.g. a volume) composed of multiple storage appliances

The most familiar example of storage is the local disk drive mounted on a system. This is appropriate for IT operations where all applications are hosted on one host, or even a small number of hosts. However, as the infrastructure scales to dozens or even hundreds of systems, managing as many local storage disks becomes difficult and complicated.

Distributed storage adds a layer to ease and automate storage hardware administration as the business scales. Having multiple systems share a handful of storage instances reduces the number of devices the administrator needs to manage.

Consolidating the storage capabilities of multiple storage appliances into one volume helps both users and administrators. This type of distributed storage provides a layer of abstraction to storage pools: users see a single unit of storage, which an administrator can easily grow by adding more hardware. Some technologies that enable distributed storage also provide added benefits, such as failover and multipathing.

NFS

Network File System (NFS) allows multiple servers or users to mount and use the same instance of remote storage via TCP or UDP. NFS is commonly used to hold data shared by multiple applications. It is also convenient for bulk storage of large amounts of data.

SAN

Storage Area Networks (SANs) use either Fibre Channel or iSCSI protocol to provide remote access to storage. Fibre Channel infrastructure (such as Fibre Channel host bus adapters, switches, and storage arrays) combines high performance, high bandwidth, and massive storage. SANs separate storage from processing, providing considerable flexibility in system design.

The other major advantage of SANs is that they provide a management environment for performing major storage hardware administrative tasks. These tasks include:

Controlling access to storage
Managing large amounts of data
Provisioning systems
Backing up and replicating data
Taking snapshots
Supporting system failover
Ensuring data integrity
Migrating data

GFS2

The Red Hat Global File System 2 (GFS2) file system provides several specialized capabilities. The basic function of GFS2 is to provide a single file system, including concurrent read/write access, shared across multiple members of a cluster. This means that each member of the cluster sees exactly the same data "on disk" in the GFS2 filesystem.

GFS2 allows all systems to have concurrent access to the "disk". To maintain data integrity, GFS2 uses a Distributed Lock Manager (DLM), which only allows one system to write to a specific location at a time.

GFS2 is especially well-suited for failover applications that require high availability in storage.

For further information about GFS2, refer to the Global File System 2. For further information about storage in general, refer to the Storage Administration Guide. Both are available from http://access.redhat.com/knowledge/docs/Red_Hat_Enterprise_Linux/.

1.3.3. Converged Networks

Communication over the network is normally done through Ethernet, with storage traffic using a dedicated Fibre Channel SAN environment. It is common to have a dedicated network or serial link for system management, and perhaps even heartbeat^[2]. As a result, a single server is typically on multiple networks.

Providing multiple connections on each server is expensive, bulky, and complex to manage. This gave rise to the need for a way to consolidate all connections into one. Fibre Channel over Ethernet (FCoE) and Internet SCSI (iSCSI) address this need.

FCoE

With FCoE, standard fibre channel commands and data packets are transported over a 10GbE physical infrastructure via a single converged network card (CNA). Standard TCP/IP ethernet traffic and fibre channel storage operations can be transported via the same link. FCoE uses one physical network interface card (and one cable) for multiple logical network/storage connections.

FCoE offers the following advantages:

Reduced number of connections: FCoE reduces the number of network connections to a server by half. You can still choose to have multiple connections for performance or availability; however, a single connection provides both storage and network connectivity. This is especially helpful for pizza box servers and blade servers, since they both have very limited space for components.
Lower cost: Reduced number of connections immediately means reduced number of cables, switches, and other networking equipment. Ethernet's history also features great economies of scale; the cost of networks drops dramatically as the number of devices in the market goes from millions to billions, as was seen in the decline in the price of 100Mb Ethernet and gigabit Ethernet devices.
Similarly, 10GbE will also become cheaper as more businesses adapt to its use. Also, as CNA hardware is integrated into a single chip, widespread use will also increase its volume in the market, which will result in a significant price drop over time.

iSCSI

Internet SCSI (iSCSI) is another type of converged network protocol; it is an alternative to FCoE. Like fibre channel, iSCSI provides block-level storage over a network. However, iSCSI does not provide a complete management environment. The main advantage of iSCSI over FCoE is that iSCSI provides much of the capability and flexibility of fibre channel, but at a lower cost.

^[1]Red Hat Certified Engineer. For more information, refer to http://www.redhat.com/training/certifications/rhce/.

^[2]Heartbeat is the exchange of messages between systems to ensure that each system is still functioning. If a system "loses heartbeat" it is assumed to have failed and is shut down, with another system taking over for it.

Chapter 2. Red Hat Enterprise Linux 6 Performance Features

2.1. 64-Bit Support
2.2. Ticket Spinlocks
2.3. Dynamic List Structure
2.4. Tickless Kernel
2.5. Control Groups
2.6. Storage and File System Improvements

2.1. 64-Bit Support

Red Hat Enterprise Linux 6 supports 64-bit processors; these processors can theoretically use up to 18 exabytes of memory. As of general availability (GA), Red Hat Enterprise Linux 6 is tested and certified to support up to 8TB of physical memory.

The size of memory supported by Red Hat Enterprise Linux 6 is expected to grow over several minor updates, as Red Hat continues to introduce and improve more features that enable the use of larger memory blocks. Examples of such improvements (as of Red Hat Enterprise Linux 6 GA) are:

Huge pages and transparent huge pages
Non-Uniform Memory Access improvements

These improvements are outlined in greater detail in the sections that follow.

Huge pages and transparent huge pages

The implementation of huge pages in Red Hat Enterprise Linux 6 allows the system to manage memory use efficiently across different memory workloads. Huge pages dynamically utilize 2 MB pages compared to the standard 4 KB page size, allowing applications to scale well from processing gigabytes and even terabytes of memory.

Huge pages are difficult to manually create, manage, and use. To address this, Red Hat Enterprise 6 also features the use of transparent huge pages (THP). THP automatically manages many of the complexities involved in the use of huge pages.

For more information on huge pages and THP, refer to Section 5.2, "Huge Pages and Transparent Huge Pages".

NUMA improvements

Many new systems now support Non-Uniform Memory Access (NUMA). NUMA simplifies the design and creation of hardware for large systems; however, it also adds a layer of complexity to application development. For example, NUMA implements both local and remote memory, where remote memory can take several times longer to access than local memory. This feature (among others) has many performance implications that impact operating systems, applications, and system configurations should be deployed.

Red Hat Enterprise Linux 6 is better optimized for NUMA use, thanks to several additional features that help manage users and applications on NUMA systems. These features include CPU affinity, CPU pinning (cpusets), numactl and control groups, which allow a process (affinity) or application (pinning) to "bind" to a specific CPU or set of CPUs.

For more information about NUMA support in Red Hat Enterprise Linux 6, refer to Section 4.1.1, "CPU and NUMA Topology".

2.2. Ticket Spinlocks

A key part of any system design is ensuring that one process does not alter memory used by another process. Uncontrolled data change in memory can result in data corruption and system crashes. To prevent this, the operating system allows a process to lock a piece of memory, perform an operation, then unlock or "free" the memory.

One common implementation of memory locking is through spin locks, which allow a process to keep checking to see if a lock is available and take the lock as soon as it becomes available. If there are multiple processes competing for the same lock, the first one to request the lock after it has been freed gets it. When all processes have the same access to memory, this approach is "fair" and works quite well.

Unfortunately, on a NUMA system, not all processes have equal access to the locks. Processes on the same NUMA node as the lock have an unfair advantage in obtaining the lock. Processes on remote NUMA nodes experience lock starvation and degraded performance.

To address this, Red Hat Enterprise Linux implemented ticket spinlocks. This feature adds a reservation queue mechanism to the lock, allowing all processes to take a lock in the order that they requested it. This eliminates timing problems and unfair advantages in lock requests.

While a ticket spinlock has slightly more overhead than an ordinary spinlock, it scales better and provides better performance on NUMA systems.

2.3. Dynamic List Structure

The operating system requires a set of information on each processor in the system. In Red Hat Enterprise Linux 5, this set of information was allocated to a fixed-size array in memory. Information on each individual processor was obtained by indexing into this array. This method was fast, easy, and straightforward for systems that contained relatively few processors.

However, as the number of processors for a system grows, this method produces significant overhead. Because the fixed-size array in memory is a single, shared resource, it can become a bottleneck as more processors attempt to access it at the same time.

To address this, Red Hat Enterprise Linux 6 uses a dynamic list structure for processor information. This allows the array used for processor information to be allocated dynamically: if there are only eight processors in the system, then only eight entries are created in the list. If there are 2048 processors, then 2048 entries are created as well.

A dynamic list structure allows more fine-grained locking. For example, if information needs to be updated at the same time for processors 6, 72, 183, 657, 931 and 1546, this can be done with greater parallelism. Situations like this obviously occur much more frequently on large, high-performance systems than on small systems.

2.4. Tickless Kernel

In previous versions of Red Hat Enterprise Linux, the kernel used a timer-based mechanism that continuously produced a system interrupt. During each interrupt, the system polled; that is, it checked to see if there was work to be done.

Depending on the setting, this system interrupt or timer tick could occur several hundred or several thousand times per second. This happened every second, regardless of the system's workload. On a lightly loaded system, this impacts power consumption by preventing the processor from effectively using sleep states. The system uses the least power when it is in a sleep state.

The most power-efficient way for a system to operate is to do work as quickly as possible, go into the deepest sleep state possible, and sleep as long as possible. To implement this, Red Hat Enterprise Linux 6 uses a tickless kernel. With this, the interrupt timer has been removed from the idle loop, transforming Red Hat Enterprise Linux 6 into a completely interrupt-driven environment.

The tickless kernel allows the system to go into deep sleep states during idle times, and respond quickly when there is work to be done.

For further information, refer to the Power Management Guide, available from http://access.redhat.com/knowledge/docs/Red_Hat_Enterprise_Linux/.

2.5. Control Groups

Red Hat Enterprise Linux provides many useful options for performance tuning. Large systems, scaling to hundreds of processors, can be tuned to deliver superb performance. But tuning these systems requires considerable expertise and a well-defined workload. When large systems were expensive and few in number, it was acceptable to give them special treatment. Now that these systems are mainstream, more effective tools are needed.

To further complicate things, more powerful systems are being used now for service consolidation. Workloads that may have been running on four to eight older servers are now placed into a single server. And as discussed earlier in Section 1.2.1, "Parallel Computing", many mid-range systems nowadays contain more cores than yesterday's high-performance machines.

Many modern applications are designed for parallel processing, using multiple threads or processes to improve performance. However, few applications can make effective use of more than eight threads. Thus, multiple applications typically need to be installed on a 32-CPU system to maximize capacity.

Consider the situation: small, inexpensive mainstream systems are now at parity with the performance of yesterday's expensive, high-performance machines. Cheaper high-performance machines gave system architects the ability to consolidate more services to fewer machines.

However, some resources (such as I/O and network communications) are shared, and do not grow as fast as CPU count. As such, a system housing multiple applications can experience degraded overall performance when one application hogs too much of a single resource.

To address this, Red Hat Enterprise Linux 6 now supports control groups (cgroups). Cgroups allow administrators to allocate resources to specific tasks as needed. This means, for example, being able to allocate 80% of four CPUs, 60GB of memory, and 40% of disk I/O to a database application. A web application running on the same system could be given two CPUs, 2GB of memory, and 50% of available network bandwidth.

As a result, both database and web applications deliver good performance, as the system prevents both from excessively consuming system resources. In addition, many aspects of cgroups are self-tuning, allowing the system to respond accordingly to changes in workload.

A cgroup has two major components:

A list of tasks assigned to the cgroup
Resources allocated to those tasks

Tasks assigned to the cgroup run within the cgroup. Any child tasks they spawn also run within the cgroup. This allows an administrator to manage an entire application as a single unit. An administrator can also configure allocations for the following resources:

CPUsets
Memory
I/O
Network (bandwidth)

Within CPUsets, cgroups allow administrators to configure the number of CPUs, affinity for specific CPUs or nodes ^[3], and the amount of CPU time used by a set of tasks. Using cgroups to configure CPUsets is vital for ensuring good overall performance, preventing an application from consuming excessive resources at the cost of other tasks while simultaneously ensuring that the application is not starved for CPU time.

I/O bandwidth and network bandwidth are managed by other resource controllers. Again, the resource controllers allow you to determine how much bandwidth the tasks in a cgroup can consume, and ensure that the tasks in a cgroup neither consume excessive resources nor are starved of resources.

Cgroups allow the administrator to define and allocate, at a high level, the system resources that various applications need (and will) consume. The system then automatically manages and balances the various applications, delivering good predictable performance and optimizing the performance of the overall system.

For more information on how to use control groups, refer to the Resource Management Guide, available from http://access.redhat.com/knowledge/docs/Red_Hat_Enterprise_Linux/.

2.6. Storage and File System Improvements

Red Hat Enterprise Linux 6 also features several improvements to storage and file system management. Two of the most notable advances in this version are ext4 and XFS support. For more comprehensive coverage of performance improvements relating to storage and file systems, refer to Chapter 7, File Systems.

Ext4

Ext4 is the default file system for Red Hat Enterprise Linux 6. It is the fourth generation version of the EXT file system family, supporting a theoretical maximum file system size of 1 exabyte, and single file maximum size of 16TB. Red Hat Enterprise Linux 6 supports a maximum file system size of 16TB, and a single file maximum size of 16TB. Other than a much larger storage capacity, ext4 also includes several new features, such as:

Extent-based metadata
Delayed allocation
Journal check-summing

For more information about the ext4 file system, refer to Section 7.3.1, "The Ext4 File System".

XFS

XFS is a robust and mature 64-bit journaling file system that supports very large files and file systems on a single host. This file system was originally developed by SGI, and has a long history of running on extremely large servers and storage arrays. XFS features include:

Delayed allocation
Dynamically-allocated inodes
B-tree indexing for scalability of free space management
Online defragmentation and file system growing
Sophisticated metadata read-ahead algorithms

While XFS scales to exabytes, the maximum XFS file system size supported by Red Hat is 100TB. For more information about XFS, refer to Section 7.3.2, "The XFS File System".

Large Boot Drives

Traditional BIOS supports a maximum disk size of 2.2TB. Red Hat Enterprise Linux 6 systems using BIOS can support disks larger than 2.2TB by using a new disk structure called Global Partition Table (GPT). GPT can only be used for data disks; it cannot be used for boot drives with BIOS; therefore, boot drives can only be a maximum of 2.2TB in size. The BIOS was originally created for the IBM PC; while BIOS has evolved considerably to adapt to modern hardware, Unified Extensible Firmware Interface (UEFI) is designed to support new and emerging hardware.

Red Hat Enterprise Linux 6 also supports UEFI, which can be used to replace BIOS (still supported). Systems with UEFI running Red Hat Enterprise Linux 6 allow the use of GPT and 2.2TB (and larger) partitions for both boot partition and data partition.

Important - UEFI for 32-bit x86 systems

Red Hat Enterprise Linux 6 does not support UEFI for 32-bit x86 systems.

Important - UEFI for AMD64 and Intel 64

Note that the boot configurations of UEFI and BIOS differ significantly from each other. Therefore, the installed system must boot using the same firmware that was used during installation. You cannot install the operating system on a system that uses BIOS and then boot this installation on a system that uses UEFI.

Red Hat Enterprise Linux 6 supports version 2.2 of the UEFI specification. Hardware that supports version 2.3 of the UEFI specification or later should boot and operate with Red Hat Enterprise Linux 6, but the additional functionality defined by these later specifications will not be available. The UEFI specifications are available from http://www.uefi.org/specs/agreement/.

^[3]A node is generally defined as a set of CPUs or cores within a socket.

Source : http://www.redhat.com

(Sebelumnya) 10 : The LVM Configuration Files

11 : Chapter 3. Monitoring and ... (Berikutnya)