Microsoft Exchange Monitoring

clip_image001This is the second article in a series about Exchange Server that follows up on the article monitoring Exchange Server with Monitis.

In the previous post we discussed the Exchange Message and Service Counters. In this article we’ll go over the Memory and Network Monitoring metrics.

Memory monitoring

You can use Event Viewer and Performance Logs and Alerts to monitor for virtual memory problems on an Exchange Server. For example, Event 9582 appears as a warning event in the application log when the largest free block of virtual memory decreases to 32 MB. If you see this warning, you should restart the Exchange store process at the next opportunity. If the largest block decreases to 16 MB, Event 9582 appears again as an error event, indicating that the server might fail soon, and that you should restart the server at the earliest opportunity. Failure to act on these events can cause sporadic mail delivery and IMAIL conversion failures.

In Performance Logs and Alerts, you should monitor counters in the following list.

Memory\Pool Page Bytes Shows the portion of shared system memory that can be paged to the disk paging file. Paged pool is created during system initialization and is used by kernel-mode components to allocate system memory.

When the /3GB switch is used, amounts larger than 200 MB indicate a problem except when backups are running. During backups, each page in the cache manager is copied into the pool page, which causes an increase in pool page size.

Memory\Pool Nonpaged Bytes Consists of system virtual addresses guaranteed to be resident in physical memory at all times and can thus be accessed from any address space without incurring paging input/output (I/O). Like paged pool, nonpaged pool is created during system initialization and is used by kernel-mode components to allocate system memory.

Memory\% Committed Bytes in use shows the ratio of Memory\Committed Bytes to the Memory\Commit Limit. Committed memory is the physical memory in use for which space has been reserved in the paging file should it need to be written to disk. The commit limit is determined by the size of the paging file. If the paging file is enlarged, the commit limit increases, and the ratio is reduced. This counter displays the current percentage value only; it isn’t an average. This counter is the ratio of Committed Bytes (physical memory in use for which space has been reserved in the paging file) to Commit Limit (determined by the paging file size). Trigger an alert if the use of virtual memory exceeds 80%.

Memory\Committed Bytes shows the amount of committed virtual memory, in bytes. Committed memory is the physical memory that has space reserved on the disk paging files. There can be one or more paging files on each physical drive. This counter displays the last observed value only; it isn’t an average.

 

Memory\Pages/sec measures memory paging in the virtual memory paging file. A sustained high number of pages per second indicate the need for additional memory. Brief spikes generally do not indicate a problem and can be ignored. A good idea here is to configure a PerfMon alert that triggers when the number of pages per second exceeds 50 per paging disk on your system.

Memory\Avaliable Bytes: If this counter is greater than 10% of the actual RAM in your machine then you probably have more than enough RAM and don’t need to worry. We recommend that you create a performance log for this counter and monitor it regularly to see if any downward trend develops, and set an alert to trigger if it drops below 2% of the installed RAM.

Memory\Chache Bytes shows the current size, in bytes, of the file system cache. By default, the cache uses up to 50% of available physical memory. The counter value is the sum of Memory\System Cache Resident Bytes, Memory\System Driver Resident Bytes, Memory\System Code Resident Bytes, and Memory\Pool Paged Resident Bytes.

Should remain steady after applications cache their memory usage. Check for large dips in this counter, which could attribute to working set trimming and excessive paging.

Used by the content index catalog and continuous replication log copying.

This counter measures the working set for the system i.e. the number of allocated pages kernel threads can address without generating a page fault.

Memory\Transition Faults/sec measures how often recently trimmed page on the standby list are re-referenced. If this counter slowly starts to rise over time then it could also indicating you’re reaching a point where you no longer have enough RAM for your server to function well.

Network monitoring

 

Much of the network interface subsystem is tuned automatically. Server-based network adapters are capable of detecting the type and level of traffic passing through the network interface, and they self-tune to reflect this information. Beyond making sure that you have the latest device driver on the server, there is not much to do here.

For mailbox servers, a full duplex 100 megabits per second (Mbps) network connection is typically sufficient. However, if you plan to back up and restore across the network, consider using gigabit Ethernet (1,000 Mbps or 1 gigabits per second [Gbps]).

Generally, the greatest bottleneck in a front-end and back-end server configuration is the network that separates the two sets of servers. Front-end servers can consume a 100 Mbps LAN connection. Therefore, consider multiple switched fast Ethernet networks of gigabit Ethernet connections. Performance-related issues may be because your hardware, firmware, or software drivers are not designed to work in your configuration.

Exchange is layered on the corporate network. For example, the Active Directory Global Catalog servers are especially important for Exchange, and a Global Catalog that is experiencing problems can adversely affect Exchange. Responsibility for monitoring the underlying network generally falls to a network operations group rather than to the Exchange administrators. Rather than duplicate the work of the network operations group, the Exchange administrators should rely on existing network monitoring facilities.

However, the Exchange administrators should ensure that the network operations group would notify them if a network problem occurs that will affect the Exchange messaging environment. An Urgent alert should be generated if the network problem also will cause the messaging backbone to be down.

The following list shows common process memory consumption counters.

 

Network Interface(*)\Bytes Total/sec Indicates the rate at which the network adapter is processing data bytes. This counter includes all application and file data, in addition to protocol information such as packet headers.

Network Interface(*)\Packets Outbound Errors Indicates the number of outbound packets that couldn’t be transmitted because of errors.

TCPv4\Connections Established Shows the number of TCP connections for which the current state is either ESTABLISHED or CLOSE-WAIT. The number of TCP connections that can be established is constrained by the size of the nonpaged pool. When the nonpaged pool is depleted, no new connections can be established.

TCPv6\Connection Failures Shows the number of TCP connections for which the current state is either ESTABLISHED or CLOSE-WAIT. The number of TCP connections that can be established is constrained by the size of the nonpaged pool. When the nonpaged pool is depleted, no new connections can be established.

TCPv4\Connections Reset Shows the number of times TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE-WAIT state.

TCPv6\Connections Reset Shows the number of times TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE-WAIT state.

You might also like