Processor – Most Important Counters
Every application makes use of processor (CPU) resources during execution. Requests to processor resources are divided between user-state and systemstate processing. User-state processing relates to the actual amount of time the CPU spends running the users program in the user state. It includes time spent executing library calls, but does not include time spent in the Kernel on its behalf. System-state processing indicates the amount of time the CPU spends in the system state on behalf of this program. All I/O routines require Kernel services.
It is usually easy to recognize a CPU bottleneck: When the overall CPU utilization (average across all existing processors) is or near 100%, and there are always processes waiting to be served. However, it is not always easy to find out why a CPU bottleneck occurs. Therefore it is very important to obtain prior knowledge of the application’s behavior during normal times to use as a baseline when analyzing the load.
UNIX is a powerful and very flexible operating system. It allows users to run processes as needed, either in the foreground or in the background. Programs running in the foreground have full read and write access, while those running in the background don’t have any read access.
Performance counters are available that measure how much CPU processing time specific threads and other executable units of work consume. These processor utilization measurements allow you to determine which applications are responsible for CPU consumption.
While there is no generic facility available on all UNIX flavors, using HP SiteScope’s Process object gives statistical information per selected process/thread where the following data is available (not all counters are available on all variants):
- CPU. CPU utilization per selected process in percentage points of overall CPU usage.
- MEMSIZE. Amount of memory consumed by the selected process.
- PID. Process ID as registered with the operating system.
- THREADS. Number of threads forked by the selected process.
- USER. Number of user sessions.
If HP SiteScope does not provide satisfactory details of process monitoring, there is always a possibility to issue built-in UNIX commands:
- ps. Shows a static list of currently running processes. In addition, the ps command shows specific details of processes, such as PID, memory used, and the command line used to run the processes. In most of the cases, adding –aux attribute is recommended as it gives data on user and nonterminal processes
- top. Shows a list of all currently running processes and the amount of memory occupied by them. The top command automatically updates the list every few seconds to display active processes on the computer.
- proc tools. Enables getting even more information about processes. These tools should be used with caution because they suspend the execution of processes when executed. Proc tools are located in /var/proc and contain pfiles (active processes), pflags (the status information and flags for processes), pldd (all dynamic library files attached to each process), pmap (address space map for processes), psig (actions taken for various signals and thread handlers), prun (runs or begins a process), pstack (stack trace), pstop (suspends the execution of a specific process).
Memory – Most Important Counters
UNIX maintains physical (resident) and virtual memory. Operating systems shield the actual amount of memory on hand from applications – hence they tend to overstate its availability. UNIX uses the term virtual memory which essentially includes the amount of memory allocated by programs for all their data, including shared memory, heap space, program text, shared libraries, and memory-mapped files. The total amount of virtual memory allocated to all processes on the system roughly translates to the amount of swap space that will be reserved (with the exception of program text). Virtual memory actually has little to do with how much actual physical memory is allocated, because not all data mapped into virtual memory will be active (‘Resident’) in physical memory. When the program gets an “out of
memory” error, it typically means it is out of reservable swap space (Virtual memory), not out of physical (Resident) memory.
A shortage of RAM is often indirect evidence of a disk performance problem, when excessive paging to disk consumes too much of the available disk bandwidth. Consequently, paging rates to disk are an important memory performance indicator.
It is commonly said that memory today is relatively cheap – hence buying more memory can solve all problems. However, having large amounts of physical memory does not prevent a shortage of virtual memory and may lead to fatal crashes in case of memory leaks when the application does not release allocated memory after usage. In some cases, if the underlying UNIX system is set to host a database or similar high volume transaction processing application, adding a lot of memory may significantly improve database performance by allowing a larger in-memory cache.
When observing a shortage of available RAM, it is often important to determine how the allocated physical memory is being used and count resident pages of a problematic process known as its resident memory set. In addition to the common counters below, it is important to track the usage of cached and buffered memory – a decline in amount of available free memory does not necessarily indicate a memory leak as it becomes part of it (see %rcache/%wcache and bread/s bwrit/s on Solaris and HP/UX and Cached and Buffers on Linux).
I/O – Most Important Counters
Through I/O Manager stack, UNIX maintains physical and logical disk operations. A logical volume represents a single file system with a unique drive letter. A physical (raw) volume is the internal representation of a specific storage device – be it SCSI or RAID or SATA or other technology.
When using complex storage systems such as array controllers or RAID, the underlying physical disk hardware characteristics are not directly visible to the operating system. These characteristics – namely, the number of disks, the speed of the disks, their seek time, rotational speed, and bit density as well as some optimization features such as on-board memory buffers – can have a major impact on performance. Advance features like memory buffers and command queueing can boost the performance by 25–50 percent.
It is important to be proactive about disk performance because it tends to degrade rapidly, particularly when disk-paging activity occurs.
Network – Most Important Counters
Networking performance has become ever more important today with proliferation of distributed and cloud applications. However, UNIX operating system usually provide limited statistics on various levels: At the lowest level hardware interface, and at higher level of network protocol such as TCP/IP. Network interface statistics are gathered by software embedded in the network interface driver layer. This software counts the number of packets that are sent and received.
Network statistics are gathered through UNIX facilities such as netstat, netperf and iozone and nfsstat (for NFS monitoring) – one for every network interface chip or card that is installed. HP products like Network Node Manager and SiteScope can collect statistics over time to give insight into the real causes of performance bottlenecks.
Networking bottlenecks are tricky to catch and analyze. Packet rates, collision rates and error rates do not always point to the cause of the problem:
- Only excessive collision rates may indicate network bottleneck. If their level is relatively low over time, it is usually normal behavior. Collisions which are essentially errors happen as a result of mismatches in either duplex or speed settings. When corrected, collision rates go down along with performance improvement.
- Sudden increase in packet rates along with high network output queue can also be an indication of network bottleneck. However, to reach informed decision, there is a need to observe pattern behavior over time.
- If NFS is extensively used, there is a need to watch data collected by nfsstat , especially on the server side. If NFS statistics show a lot of activity caused by one specific client, it is recommended to run the tool on that client host to identify the process.
- There can be a network bottleneck in a situation of high System-mode CPU utilization or Interrupt Rate on one of the processors while other(s) are mostly idle. Checking device configuration and hardware may be the reason.