.NET Instrumentation Workshop
Home About Workshops Articles Writing Talks Books Contact

7. Performance Counters

Windows XP provides a mechanism to collect and display performance information generated by applications. Performance information is whatever that makes sense to the application. If an application transfers data then a suitable counter could be bytes moved per second, or the total number of bytes moved. The system also provides counters that indicate how the system is faring (for example the amount of free memory) and it provides counters for individual processes (for example the working set, number of threads or the nuber of open handles).

The operating system provides a tool called the performance monitor (perfmon) to gather and display performance counter information. Since perfmon is a separate process to the application generating the performance counter information this means that collecting performance data requires inter-process communication. The model used by perfmon allows it to gather a specific performance counter from your process, or to ask for all performance counters from your process, so this means that the mechanism involves a query function. Furthermore, gathering performance counter data is not limited to perfmon, any application can gather performance data and to allow this Windows provides a public API with a querying mechanism to allow your application to determine what processes are providing performance data and what the data is.

In unmanaged code generating and reading performance counter data is a pain to write. It really is a pain. There have been several attempts to provide libraries that abstract away some of the more tedious parts, but they still do not get away from the fact that the performance monitor API is not the simplest to work with.

The real joy of .NET instrumentation is that Microsoft have done all the hard work for you, and what's more, they have done it in such a way that you do not lose any of the functionality that you would have had if you had accessed the API with unmanaged code. In fact, because writing and accessing performance counters is so easy with .NET you can achieve more with it than with the unmanaged API. The solution provided by Microsoft is elegant and flexible. I wished the same developers had worked on the .NET EventLog class, because, as you'll see in the next page the EventLog class is inelegant and lazily written.

7.1 Performance Monitor Example

Performance monitoring uses terminology that is not immediately understandable, so in this section I will describe what they are and where the terms come from. I will present an example monitoring a .NET application using perfmon to illustrate the terms.

Every NT machine (XP, Windows 2000, Windows 2003 Server and Vista) has performance counters and you can view the counters of any machine that you have administrative access to. Once you have chosen the machine, you have to determine which performance object you wish to access. The term performance object is fairly abstract. It relates to the type of data you want to read. Examples of performance objects are: processor, process, thread and memory. All of these are 'objects', but not in the OO sense. A better term would have been class, because they refer to a type of data. In fact, it gets a bit more complicated because some objects have instances and some do not. When you use the processor, process or thread object you have to specify which instance of these objects you want to use, that is, which physical CPU, which process or which thread. Other performance objects, like the memory object, do not have more than one instance, because there is just one memory 'entity' in the machine.

Each object (or object instance) has one or more performance counters. Counters are the actual values that you will view, so for a process this includes the number of threads executing in the process and the working set (the amount of memory) used by the process. These counters can be related to counters in other objects, so you can view the amount of available memory from the memory object; or the amount of virtual memory used by a process through one of the Process instances. If an object can have instances, and there are no instances on the system, then you will not be able to access the counters for that object. However, if there are instances available, then you will see a pseudo instance called _Total, which gives access to the total of a specified counter in all the instances.

Counters can be instantaneous counts, like the number of handles used in a process; or a cumulative value, like the elapse time of a process; or a rate, like the number of page faults per second; or they can even be values that ordinarily cannot be graphed, like a thread's state (where the state is given a numeric value and the 'height' of this value from the graph's baseline is meaningless). As you'll see later, the type of the counter determines how a counter reader will display the counter and it will also determine if the counter reader has to perform a calculation.

To test this, all you need is a really simple application. Create a file with this code:

using System;
using System.Windows.Forms;

class App : Form
{
   static void Main()
   {
      Application.Run(new App());
   }
}

Compile this as a GUI application:

csc /t:winexe test.cs

This is a simple process that merely shows a form. Run the application, but do not close it, just leave it running on the desktop.

Now start the performance counter, from the Control Panel, Administrative Tools, Performance, or by typing perfmon on the command line, or in the Run dialog. Ensure that System Monitor is selected (it should be) in the left hand tree view so that you can see the counter graphed and then click on the Add button (the + button, or press Ctrl-I).

In this screen shot you'll see that perfmon has determined the name of my machine (MARS) and will get the counters for that machine, this is the same as Use local machine counters. You can monitor counters on any machine that you have access to, but note that since performance counters require registration it means that they can only be accessed from machines that have the appropriate registration. The system counters will be available on all machines.

The first step is to select the appropriate object from the Performance object drop down list. Peruse the various objects in this control and then select Process. At this point notice that the Select instance from list list box will be populated with all the processes on your machine. Scroll down this list and select the test process. Now scroll through the entries in the Select counters from list list box, select one, and then click on the Explain button. You should see a modeless dialog with the help text for the selected counter. For this test, make sure that Working Set counter is selected (and that the test application is still selected) and click on the Add button, then click on the Close button to close the dialog. (I will explain what the working set is in a moment, but for now, a partial explanation is that it is the amount of RAM being used by the process.)

You'll see that the performance monitor will start gathering information. On my machine I get a value of about 8.2Mb for the working set. Now task switch to the test application and minimise it. Take a look at the performance monitor again. You'll find that the working set has reduced considerably (in my case to 850Kb). Restore the application and notice that the working set rises, but not to the previous high (on my machine it rises to 2.7Mb).

So what is happening here? Well, when the application is loaded Windows loads all of the DLLs that it needs. Since this is a .NET GUI application the application is JIT compiled on a per-method basis for each method that is used, and when a method uses a new .NET assembly, that assembly will be loaded and if a native image is not found for the assembly then the method will be JIT compiled. If a native image is available for the assembly then clearly this extra JIT step is not needed.

But note what I said, if a method calls code in another assembly and that assembly has not already been loaded then when the method is called the assembly is loaded. For example, if you have an application that has a button which makes a call to code in, say, System.Data, then the assembly will only be loaded the first time that a class in that assembly is actually called. But once that assembly has been loaded it will always be in your process (.NET does not have a mechanism to unload an individual assembly, but you can unload an application domain which will unload all the assemblies it uses). Similarly, if you call an unmanaged DLL through interop, that DLL will only be loaded the first time that you call the interop code and once loaded, the DLL will remain in the application domain until the domain is unloaded; if you only use the DLL once you will still have the DLL in memory until the domain is unloaded.

In addition to this, Windows manages your memory usage through virtual memory. Physical memory is the actual RAM in your machine. However, when an application is loaded Windows assigns to it virtual memory, some of which will be physical memory and the remainder will be disk based. The pages of memory that are infrequently used are candidates to be paged out to disk. Using disk space (the paging file) like this enables you to appear to have more memory than your machine actually has, and allows you to run more applications. When an application requests a memory page that is not in physical memory a page fault occurs and the operating system loads the page from disk into physical memory. The amount of memory represented by virtual pages in physical memory is called the working set.

Accessing disk is many magnitudes slower than accessing RAM and a large number of page faults is an indication that your machine has too little RAM. In fact monitoring page faults is so important measure that perfmon provides a performance counter called Page Faults/sec for each process and for the system in total. In addition the Process object has a Working Set counter that gives the size of the working set for a process (ie the amount of physical memory being used) and a Page File Bytes counter that is the number of bytes paged out to disk.

When you minimize the application the operating system deduces that you have finished using the application for the time being and it takes the attitude that the application that is top-most is the one that you are using and hence the one which you want to have the best performance.

The System control panel applet allows you to configure this behaviour. From the command line you can type Rundll32.exe shell32.dll,Control_RunDLL or you can simply go to the control panel and double click on the System applet and select the Advanced tab. The Performance section has a Settings button and the Advanced tab gives you access to whether Windows gives various performance benefits to foreground or background applications.

Since the system assumes that the minimised application is not being used it can free up it's physical memory usage by paging as many pages as it can to the paging file. (In fact, the paging file will already have a page for each page in the working set, so the operating system only needs to copy to the paging file the pages that have changed.) This is known as trimming the working set and is the reason for the dramatic drop in memory usage (8.2Mb to 850Kb in my case). You can force this to happen by calling the Windows API function EmptyWorkingSet.

Close the test application. You should see that the performance gathering will stop. Start a new instance of the test application and look at the performance monitor. The performance gathering will start again, and this time the working set will show a value of 8.2Mb again. Close the application again.

Edit test.cs to add a button that calls EmptyWorkingSet:

using System;
using System.Windows.Forms;
using System.Runtime.InteropServices;
using System.Diagnostics;

class App : Form
{
   [DllImport("psapi")]
   static extern int EmptyWorkingSet(IntPtr handle);

   App()
   {
      Button button = new Button();
      button.Text = "Trim Working Set";
      button.Click += new EventHandler(Trim);
      button.Dock = DockStyle.Fill;
      this.Controls.Add(button);
   }

   void Trim(object o, EventArgs a)
   {
      EmptyWorkingSet(Process.GetCurrentProcess().Handle);
   }

   static void Main()
   {
      Application.Run(new App());
   }
}

Compile this and run it. Note the value of the working set (look in the Last box) and repeat the test you did before: minimize, note the working set, restore, note the working set. In my tests I get values similar to these: 9Mb, 0.9Mb, 3Mb. Now close the application, start it again and note the working set. Now click on the button. You should find that the working set will reduce (I get a reduction from 9Mb to 1Mb) but if I move the window then the working set rises to about 3Mb, which is still a lot less than the initial value.

So what's happening here? Well, this example uses Windows Forms which means that it uses System.Windows.Forms.dll which accesses various Windows GDI and GDI+ DLLs. To see this close test and run it again leaving it open while you run the command line tasklist utility:

tasklist /m /fi "imagename eq test.exe"

This says "show the DLLs loaded by test.exe". The result will look like this:

Image Name                   PID Modules
========================= ====== =============================================
test.exe                     844 ntdll.dll, mscoree.dll, KERNEL32.dll,
                                 ADVAPI32.dll, RPCRT4.dll, SHLWAPI.dll,
                                 GDI32.dll, USER32.dll, msvcrt.dll,
                                 IMM32.DLL, LPK.DLL, USP10.dll, mscorwks.dll,
                                 MSVCR80.dll, shell32.dll, comctl32.dll,
                                 comctl32.dll, mscorlib.ni.dll, ole32.dll,
                                 System.ni.dll, System.Drawing.ni.dll,
                                 System.Windows.Forms.ni.dll, mscorjit.dll,
                                 uxtheme.dll, MSCTF.dll, gdiplus.dll,
                                 msctfime.ime, OLEAUT32.DLL

Since this application does not perform any inter-process calls it will not need to call rpcrt4.dll, equally so, this application does not use Windows common controls so it will not use comctl32.dll (either version) However, these DLLs will be dependencies of one or more of the other DLLs that the application will use, which is why they have been loaded. When the working set is trimmed the pages that contains these DLLs will be removed, but since they are in the paging file it means that if a  function in one of those DLLs is required the appropriate pages can be paged back into the working set. The test application has an interop call to the psapi.dll library but notice that from this first test it is not included in the module list. The reason is that the DLL has not been called yet. Click on the button so that EmptyWorkingSet is called and then run tasklist again. This time you'll see that psapi.dll will be listed. This illustrates that platform invoke will only load a DLL the first time it is called and after that the DLL will remain in memory.

Of course trimming the working set does more than just this, it can remove some pages of a DLL from the working set and leave other pages from the same DLL in the working set, and this can be seen by the results above. When the working set is first trimmed it reduces to 1Mb, but when the window is moved (requiring other system methods to be called) it rises to 3Mb as pages are moved back into the working set.

As you can see from this example, the performance monitor allows you to look at various system values concerning your application. This example has concentrated only on the working set, the pages of physical memory used by your application, and it has shown that by one simple function call, you can reduce the working set to about a third. However, you can see from perusing the performance monitor counters that there are many other things you can monitor.

For example, threads are quite expensive, so you really should keep the number of threads in your application to a minimum (a good rule of thumb is that normally you should not have more than four threads for each processor your machine has, but it is acceptable to create more threads for brief bursts to handle, for example, a sudden heavy workload). If your application creates threads then you should make sure that you restrict the number created. You can use perfmon to monitor the number of threads that the application creates and if you find that the application creates a large number of threads lasting for a large period of time then you should review your code.

Another example is Windows system handles. One situation when handles are used is with files: when you open a file you will be given a handle; when you have finished using a file you must close the handle. Closing a handle allows Windows to clear the resources that were associated with it. If you do not close a handle then that will be a resource, and hence a memory, leak. The performance monitor allows you to monitor the total number of handles that a process has (Process object Handle Count counter) and you can use it to monitor an application over as long a period of time as you wish. If you find that the handle count rises over time the likelihood is that you are leaking a handle. This usually happens because you are making platform invoke calls to functions that create handles but do not release them.

7.2 Performance Monitor Registration

In the last section you saw that the performance monitor architecture defines an object as being a collection of associated measurements, each of which is called a counter. An object can be system wide, or there can be several instances of objects. For each object and each counter there is a help text that perfmon can display to the user. When the user selects a counter perfmon clearly must be able to gather the values of this counter. As you have seen, many counters are provided by the operating system, but applications can provide their own counters, this means that perfmon must be able to obtain those counters from those applications (and hence make some kind of inter-process call) and to do this it must be able to get information about what application has those counters and how to access them. Since the performance monitor has been in Windows since the very first version of NT you will understand that all of this information will be held in the repository that Windows has used all of that time: the system registry.

As an example, run the performance monitor and click on the + button to add a counter. In the Performance object box scroll up to the top. There you will see objects for the .NET framework and ASP.NET:

These objects were added when you installed the .NET framework (and hence ASP.NET) on your machine. They allow you to monitor the performance of .NET and ASP.NET applications. Close down this dialog and perfmon.

The .NET objects and their counters are summarised here:

ObjectDescription
.NET Data For SqlClient classes this gives: current number of pools associated with the process; current number of connections, pooled or not; current number of connections in all pools associated with the process; the highest number of connections in all pools since the process started; the total number of command executes that have failed for any reason and the total number of connection open attempts that have failed for any reason
.NET Exceptions This gives: a count and rate of the numbers of exceptions thrown; the rate of exception filters executes; the rate of finally clauses executed; and the rate of the depth of exception stack frames
.NET Interop This gives: a count of the current COM callable wrappers; the total number of times marshalling from managed to unmanaged and vice versa have occurred; and the current number of marshalling stubs
.NET Jit This gives: the total number of IL bytes JITted; the total number of IL methods JITted; the percentage of the execution time (since a JIT was last performed) the application has been JITting code; the rate of JITting last occurred; and the number of JIT attempts that have failed.
.NET Loading This shows: the size of the memory committed (ie reserved in paged memory) by the class loader; the current number of application domains in the application; the current number of assemblies loaded; the number of classes loaded from all assemblies; the rate of loading application domains; the rate of unloading application domains; the rate of loading assemblies; the rate of failure to load classes; the number of application domains loaded since the application started; the total number of application domains unloaded; the total number of assemblies loaded and the total number of classes loaded since the application started
.NET LocksAndThreads This gives: the number of current managed threads in the application; the number of OS threads used as the underlying thread for the managed threads created by the application; the current number of threads used by the runtime; the rate that threads are used by the runtime for managed threads; the total number of threads used by the runtime; the rate that threads obtain locks; the number of threads waiting for a lock; the rate of the threads waiting for a lock; the total number of threads that have waited for a a lock and the rate that threads fail to get a lock
.NET Memory This gives: the number of GC handles (ie handles to external resources); the number of times GC.Collect was called; the number of pinned objects at the last garbage collection; the number of synchronization blocks (with weak references, note this is mispelt to be Sink Blocks); the amount of committed memory used by the garbage collector; the amount of reserved memory used by the garbage collector; the percentage of time spent in garbage collection (for the last GC); the rate that memory is allocated on the GC heap; the number of objects in the last garbage collection that were finalised (and so weren't garbage collected); the size of the large (20Kb or more) object heap (which are not promoted through GC generations); for the three GC generations (0, 1 and 2 ) there are counters for the number of times the generation has been collected, the size of the heap, the rate that objects are promoted to the next generation because of finalisation or because they are used by a long lived object
.NET Networking This gives: the total number of bytes received over sockets since the process started; the total number of bytes send over sockets since the process started; the number of socket connections established since the process started; the number of datagram packets sent and the number of datagram packets received since the process started
.NET Remoting This gives: the total number of channels registered since the application started; the total number of proxies created since the application started; the current number of context bound classes loaded; the rate that context bound objects are allocated; the current number of contexts; the rate that remote calls are invoked and the number of remote calls that have been made
.NET Security This gives: the total number of link time code access security checks have been performed; the percentage of time that has been spent doing security checks; the depth of the stack during the last code access security check; the total number of code access security checks that have been performed
.NET Data Provider for Oracle Connection information (connections, disconnections, number of pooled and nonpooled connections) for the Oracle data provider
.NET Data Provider for SqlServer Connection information (connections, disconnections, number of pooled and nonpooled connections) for the SQL Server data provider

To get the counter descriptions you can use the Explain button on the Add Counter dialog, or you can access the source of these descriptions. Performance monitor information is stored in two places in the registry. The first place is:

HKLM\Software\Microsoft\Windows NT\CurrentVersion\Perflib

Under this key is a key with the value of the machine locale. This is because the key will hold text data and hence this data is locale specific. The two values are of type REG_MULTI_SZ, that is, they contain multiple strings. They are called Counter and Help. In spite of the name, these values actually contain information about performance objects as well as performance counters. The Counter strings come in pairs (each on a separate line): an even number, then the name of the counter (or object). For example, here's a sample from on my machine (the numbers will be different on your machine):

3398
.NET CLR Interop
3400
# of CCWs
3402
# of Stubs
3404
# of marshalling
3406
# of TLB imports / sec
3408
# of TLB exports / sec

This shows that the .NET CLR Interop object is number 3398 and the # of marshalling counter is number 3404.

The Help strings are in pairs too: an odd number and the help string. These are related to the strings in the Counter value the help string of an object (or counter) in the Counter value has an identifying number one more than the number of the object (or counter). For example, here's the help strings for the counters given above:

3399
Stats for CLR interop.
3401
This counter displays the current number of Com-Callable-Wrappers (CCWs). A CCW is a proxy for the .NET managed object being referenced from unmanaged COM client(s). This counter was designed to indicate the number of managed objects being referenced by unmanaged COM code.
3403
This counter displays the current number of stubs created by the CLR. Stubs are responsible for marshalling arguments and return values from managed to unmanaged code and vice versa; during a COM Interop call or PInvoke call.
3405
This counter displays the total number of times arguments and return values have been marshaled from managed to unmanaged code and vice versa since the start of the application. This counter is not incremented if the stubs are inlined. (Stubs are responsible for marshalling arguments and return values). Stubs usually get inlined if the marshalling overhead is small.
3407
Reserved for future use.
3409
Reserved for future use.

To add your own objects and counters you have to append appropriate strings to these registry values. To help you, the Perflib key also has values called Last Counter and Last Help. An installation program clearly has to give the object and counter names, and object and counter help text, numbers greater than these values and then update these values once the new data has been added. There is also a value called Base Index below which are the numbers assigned exclusively to Windows.

Performance counters refer to something within an application, therefore, when the performance monitor (or any process that reads performance counters) reads a counter it must make an inter-process call. The reader of performance counters is not responsible for performing inter-process communication, and the Windows SDK does not mandate any particular inter-process communication mechanism. Instead (as you'll see in a moment) the reader process just reads the performance data using the performance monitor API and the IPC is performed automatically. This means that if you add performance counters to your application you will have to write IPC code (that is, if you were writing performance counters using the unmanaged API; as you'll see later, .NET does all the work for you).

The performance monitor API specifies that the provider of the counters writes a DLL that acts as the reader endpoint for the IPC, this DLL will be loaded into the reader and it exposes functions that the reader calls. This decouples the reader process from the IPC, indeed the reader knows nothing about the IPC it just knows about the standard entrypoints of the DLL.

The DLL is registered under:

HKLM\System\CurrentControlSet\Services\<name>\Performance

where <name> is a unique name you give your application. Under this key is a value called Library that gives the path to your DLL. Note that although this key appears under the Services key, the performance monitor is not a service and your application does not have to be a service either.

From the command line start the registry editor (regedit) and locate the Services key mentioned above. You'll see that there are keys that correspond to some of the .NET data that can be accessed through perfmon:

The .NETFramework key indicates that mscoree.dll provides the reader (ie perfmon) side of the IPC. As you can see it also has values indicating the numbers for the first and last counters it supports.

Such a DLL must export three functions: one to 'open' (or initialize) the performance counters, one to 'close' (or cleanup) the counters and a final one to retrieve the counter values. These functions are exported from the DLL by name and the name for each is specified in values in the application's key: Open, Close and Collect, respectively. In the case of mscoree.dll these three entrypoints are called OpenCtrs, CloseCtrs and CollectCtrs, but if you were to write your own DLL then you can use any names you like.

These are the values required by the performance counter API, but the writer of the performance monitor DLL can store other values. As I mentioned earlier, when an application registers its counter information it has to use numbers that are not used by other perfmon DLLs and to do this its installation application  accesses the last counter number and increments it to give the number of its first counter. This means that the writer of a perfmon DLL will not know the numbers associated with its counters at compile time. One way to get round this is to get the application's installation program to write the values it obtained for this application into the application's key. This is why key in the screenshot above has the values First Counter, Last Counter, First Help, Last Help.

This DLL is loaded in-process into the reader of the performance counters, and the Collect function will perform some IPC to access the associated application and pass the request for counters. The Collect function will then wait for the application to gather the information and return it via the IPC. The DLL then returns the data to the caller of the Collect function. This is summarized in the following picture:

Using the values in the picture, the registry will have an entry called:

HKLM\System\CurrentControlSet\Services\MyApp\Performance

This key will have a value giving the path to MyPerfLib.dll in the Library value, and will have a value of MyCollect for the Collect value.

Notice that perfmon is a system tool and can display system performance data. perfmon loads your DLL which means that your code will run within a tool provided by the operating system, and so that means that perfmon is dependent upon how good your coding is. (Note again, that this is for unmanaged performance monitor code.) Because of this, perfmon, takes steps to protect itself against bad code: it spawns a separate thread for each performance monitor DLL, and it wraps the thread function in structured exception handling so if the thread becomes unresponsive then perfmon can kill it, and if your code throws an exception then it will be caught so that it will not kill perfmon.

If your code accesses performance data remotely then an RPC call will be made from perfmon to winlogon.exe on the remote machine and the perfmon DLL will be loaded in that process. winlogon.exe is vital to an XP system and if it dies then the whole system will die. Again, winlogon.exe will take steps to make sure that a poorly written perfmon DLL will not kill it, but it cannot protect itself from everything. For example, if the perfmon DLL makes a call to a call to ExitProcess that could be disastrous for the remote machine.

Equally so, if your application uses the unmanaged perfmon API to read performance data generated by another application, that application's perfmon DLL will be loaded into your application: you are inviting someone else's code into your process. Thus you have to be careful about what counters you monitor and make sure that you only monitor counters provided by publishers that you trust. This is not an problem with the .NET performance counter classes because the IPC code, and hence the perfmon DLL that is injected into a reader, is provided by the framework and hence if you trust the framework then you will trust the .NET perfmon DLLs too. An interesting corollary is that if the reader is anj unmanaged application (like perfmon itself) then the framework will be hosted to allow the framework's perfmon DLL to be called.

But what is the perfmon API to access performance counters? Well, there is no specific unmanaged function you must call, instead your application reads a specific registry key. You do this by accessing the special hive, HKEY_PERFORMANCE_DATA. In response to this, the registry API will direct your calls to the performance monitor API, which will then use the name of the 'key' that you want to read to determine the perfmon DLL to load, it will then load the library, initialise it and then call the Collect function.

At this point I will not go into any more detail because it gets complicated and the .NET API hides all of these details from you. In essence, the performance data reader calls RegQueryValueEx and for the 'key' it passes a list of numbers that are the IDs of the counters the reader wants. The perfmon API calls the Collect function on the appropriate DLLs and passes the string of numbers, the Collect function has to parse this string to determine if it can supply the counters. If it can, then it fills a caller allocated buffer with the requested counter values; a performance monitor reader can pass the string Global to get the perfmon DLL to return data for all of the counters it supports.

This raises the question of how the the performance monitor API knows which perfmon DLL to call. The answer is that it does not know. Instead, it loads all the perfmon DLLs registered with the system and asks each one (by passing the counter's number to the Collect function) whether it supports the counter. If a DLL does not support the specified counter it returns an error value. When a DLL returns a non-error status code, the API knows that it has the correct DLL.

However, because you can request more than one counter from an application, and you can request counters from one or more objects and from more than one instance, it means that a mixture of data is returned, so the perfmon API requires that the data is returned in a specific format which has multiple nested structures. Constructing these structures and extracting data from these structures is tedious and error prone.

The .NET API does all of this for you (registering the perfmon DLL, the IPC code to talk between the reader and counter provider, generating the perfmon structures and extracting data from such structures). This makes accessing perfmon data and providing perfmon counters in your application incredibly easy. I've said it before, and I'll say it again: I wish the designers of these classes had also designed the EventLog class.

7.3 Reading .NET Performance Counters

The two most important classes in System.Diagnostics for performance monitoring are the PerformanceCounter and PerformanceCounterCategory classes. The former represents a performance monitor counter and the latter represents a performance monitor object, which makes perfect sense because the term 'performance monitor object' is confusing. These two classes are used to read and write values, that is, they are used by perfmon readers and by applications that supply perfmon data. In this section you'll see how to read perfmon data.

Start by creating this simple application (app.cs):

using System;
using System.Diagnostics;

class App
{
   static void Main()
   {
      PerformanceCounterCategory[] cats = PerformanceCounterCategory.GetCategories();
      foreach (PerformanceCounterCategory cat in cats)
      {
         Console.WriteLine(cat.CategoryName);
      }
   }
}

Compile this code (csc app.cs) and run it. This code reads all of the categories and then prints out the names on the command line. It will produce the equivalent of the contents of the Performance object drop down list on the Add Counters dialog. Now add the following:

PerformanceCounterCategory[] cats = PerformanceCounterCategory.GetCategories();
foreach (PerformanceCounterCategory cat in cats)
{
   Console.WriteLine(cat.CategoryName);
   string[] instances = cat.GetInstanceNames();
   foreach (string instance in instances)
   {
      Console.WriteLine("\t" + instance);
   }
}

Compile and run this code. This time you will get a list of each instance that is available for each category. Note that some categories do not have instances. This complicates things a little because if a category supports instances then to get access to counters you have to provide an instance name, but if the category does not support instances you have to access it without an instance name (you can, however, use an empty string as the instance name).

To get the counter names change the code like this:

PerformanceCounterCategory[] cats = PerformanceCounterCategory.GetCategories();
foreach (PerformanceCounterCategory cat in cats)
{
   Console.WriteLine(cat.CategoryName);
   string[] instances = cat.GetInstanceNames();
   if (instances.Length > 0)
   {
     
foreach (string instance in instances)
      {
         Console.WriteLine("\t" + instance);
      }
      PerformanceCounter[] counters = cat.GetCounters(instances[0]);
      foreach (PerformanceCounter counter in counters)
      {
         Console.WriteLine("\t\t" + counter.CounterName);
         counter.Close();
      }
   }
   else
   {
      PerformanceCounter[] counters = cat.GetCounters();
      foreach (PerformanceCounter counter in counters)
      {
         Console.WriteLine("\t" + counter.CounterName);
         counter.Close();
      }
   }
}

The PerformanceCounter class derives from Component which means that it implements IDisposable. Whenever you see this interface you know that the class that implements it will contain an unmanaged resource and that the resource must be released as soon as possible. A class that implements IDisposable will have a Dispose method, and often another method like Close to release these resources. PerformanceCounter has a Close method and so in this code I ensure that this method is called when the counter object has been used. Since all instances in a category will have the same counters I have made sure that if a category has more than one instance the counter names are only printed once.

Compile and run this code. Now you will have the same information that is present in the Performance object, Select instances from list and Select counters from list controls on the Add Counters dialog.

You can see that the key to accessing counters is a category object. Through a category object you can call InstanceExists to see if a specific instance is available through a particular category and there are two static versions of this that you can call to see if a specific instance is available for a particular category on the current machine or if that instance/category combination is available on another machine. You can also call CounterExists to see if there is a specified counter on the category. Again, there are two static versions to test for a counter/category pair on the current machine and on another machine. All of these methods test for the condition (and return a Boolean), to get access to the counter you will have to pass counter, category and instance information to the PerformanceCounter constructor.

Create a file to test for a specific instance (display.cs):

using System;
using System.Diagnostics;
using System.Threading;

class App
{
   static void Main(string[] args)
   {
      if (args.Length < 1) return;
      while (!PerformanceCounterCategory.InstanceExists(args[0], "Process"))
      {
         Thread.Sleep(0);
      }
      Console.WriteLine("Found {0}", args[0]);

      try
      {
         using (PerformanceCounter time = new PerformanceCounter("Process", "% User Time", args[0]))
         {
         }
      } catch (InvalidOperationException){}
   }
}

The idea is that the code continuously tests to see if the specified instance exists. If the instance does not exist the code calls Thread.Sleep(0). This is not strictly necessary in this example because there is only one application thread, but in a multi-threaded application it would be important. But, you may ask, why wait zero time? Well, regardless of the value passed to Sleep another thread will be allowed to execute. So if this was a multi-threaded application another thread in the application would be allowed to perform a 'time slice' of work, the 'time slice' determined by the operating system. Calling Thread.Sleep(0) says "let another thread do some work and when it has finished let me do some work". If the loop finds that the instance exists the user is informed and then a counter is obtained. Note that I use the using statement because the counter must be closed when you have finished with it.

Compile this code (csc display.cs). Make sure that there are no instance of notepad running and then run the example to test for notepad (display notepad). You should see that the application pauses, there is no output. Now start notepad and you should find that a message is printed and the application finishes.

While notepad is still running, run the application again. Notice that there is a perceivable delay between when the application starts and when it prints out that it has found notepad. This is because the performance monitor API takes a while to load and query the perfmon DLLs to see if they support the specified category and if so, whether they have the specified instance.

Now let's get a value from the counter. As you can see, the counter is called % User Time, and the help text is:

% User Time is the percentage of elapsed time that the process threads spent executing code in user mode. Applications, environment subsystems, and integral subsystems execute in user mode. Code executing in user mode cannot damage the integrity of the Windows executive, kernel, and device drivers. Unlike some early operating systems, Windows uses process boundaries for subsystem protection in addition to the traditional protection of user and privileged modes. Some work done by Windows on behalf of the application might appear in other subsystem processes in addition to the privileged time in the process.

The only relevant bit is the first sentence, that is, the counter gives the proportion of time that all the threads are executing in user mode, and in effect, it is a measure of the process doing something. Since it is a percentage value, that means that it is a comparison with something else; and in this case, the time spent in user mode is compared to the entire elapse time. Other types of measurements are rates (how a value varies over time) or an instantaneous value. A rate clearly requires that at least two values are measured (to get the change in the value) and the time between the measurements.

The perfmon API defines that the value can be a value, a counter, a rate, or a fraction. If it is a counter then the value will increment as it is read, and the counter description must indicate what the change means. If it is a rate then the counter is made up of two values and the difference between two values, divided by the elapse time between those two samples, gives the rate. Similarly, if the value is a fraction then two values have to be obtained, the denominator and numerator. Sometimes values are compared to another, base value, which means that the base value has to be read as well as the sample, and other times a rate is compared to a base value, which means three values must be read. Thus, some counters are provided as single numbers, some as two numbers, and some are three numbers. If you were to use the performance monitor API you would have to read the correct number of values. The .NET classes do all of this for you.

The PerformanceCounter class has one property called RawValue and two methods called, NextValue and NextSample. The NextSample method is the key to the other members, this method obtains the raw data from the appropriate perfmon library in a CounterSample object, shown here:

[StructLayout(LayoutKind.Sequential)]
public struct CounterSample
{
   public static CounterSample Empty;
   public CounterSample(
      long rawValue, long baseValue, long counterFrequency, long systemFrequency,
      long timeStamp, long timeStamp100nSec, PerformanceCounterType counterType);
   public CounterSample(
      long rawValue, long baseValue, long counterFrequency, long systemFrequency,
      long timeStamp, long timeStamp100nSec, PerformanceCounterType counterType,
      long counterTimeStamp);

   public long RawValue { get; }
   public long BaseValue { get; }
   public long SystemFrequency { get; }
   public long CounterFrequency { get; }
   public long CounterTimeStamp { get; }
   public long TimeStamp { get; }
   public long TimeStamp100nSec { get; }
   public PerformanceCounterType CounterType { get; }
   public static float Calculate(CounterSample counterSample);
   public static float Calculate(CounterSample counterSample, CounterSample nextCounterSample);
   static CounterSample();
}

You do not create instances of this struct, instead you allow the performance library to do it for you. RawValue is the value of the counter and TimeStamp (and TimeStamp100nSec) indicate the time the value was taken. Rate counters aside, some counters depend on more than one value, for example they may be a ratio of two values or they may be a difference between two different counter values. In this case, one value will be in RawValue and the other will be in BaseValue. The relationship of these properties to each other is determined by the type of the counter and this is given by the CounterType property. The static overloaded method Calculate will use the CounterType property to determine how to calculate a single precision floating point number from the counter's values. The PerformanceCounter.RawValue calls the NextSample method and returns CounterSample.RawValue. The NextValue method is particularly useful for counters that represent rates because it calls the overload of Calculate that takes two CounterSample objects; one is obtained by calling NextSample and the other is a value cached from the last time NextValue was called (at the end of NextValue the most recent sample is cached). This means that the first time NextValue is called the values held in the cached object will be zero and so you should always ignore the result from first call to NextValue.

The following table gives the various values of The PerformanceCounterType and explains how Calculate determines the counter's value:

ValueDescription
AverageBase Base value. This is used as part of the calculation and so the user should not access it
AverageCount64 This represents a rate. This is calculated from two samples, first the difference between the raw values is calculated and this is divided by the time interval. The base value of the sample is the time, so the time interval is taken as the difference between the base values of the two samples.
AverageTimer32 This is calculated from two samples. The calculated value is the difference between the base values of the samples divided by the time interval. The time interval is calculated from the difference between the raw values of the samples divided by the frequency.
CounterDelta32 This is the difference between the raw value of two samples
CounterDelta64 This is the difference between the raw value of two samples
CounterMultiBase A base value that indicates the number of items sampled. This is used as part of the calculation and so the user should not access it.
CounterMultiTimer A percentage timer that shows the time taken to do something as a percentage of the time interval; it can exceed 100%. The time interval is taken from the difference in the timestamp of two samples divided by the frequency. The time taken to perform the action is the difference between the raw values of the two samples divided by the base value.
CounterMultiTimer100Ns A percentage timer that shows the time taken to do something as a percentage of the time interval; it can exceed 100%. The time interval is taken from the difference in the 100ns time value of two samples. The time taken to perform the action is the difference between the raw values of the two samples divided by the base value.
CounterMultiTimer100NsInverse A percentage timer that shows the time taken to do something as a percentage of the time interval; it can exceed 100%. The time interval is taken from the difference in the 100ns time value of two samples. The result is the base value minus the difference between the raw values of the two samples divided by the base value.
CounterMultiTimerInverse A percentage timer that shows the time taken to do something as a percentage of the time interval; it can exceed 100%. The time interval is taken from the difference in the timestamp of two samples divided by the frequency. The time taken to perform the action is the base value minus the difference between the raw values of the two samples.
CounterTimer A percentage counter that gives the average time that something is active; it cannot exceed 100%. The time interval is the difference between the time stamp of the two samples divided by the frequency. The time that the item is active is taken as the difference between the in raw values of the sample. The percentage is calculated from the active time divided by the time interval multiplied by 100 and any value over 100 is capped to 100.
CounterTimerInverse A percentage counter that gives the average time that something is not active; it cannot exceed 100%. The calculation is the same as CounterTimer except that 1 minus the difference in raw times is divided by the time interval.
CountPerTimeInterval32 This represents a rate. This is calculated from two samples, first the difference between the raw values is calculated and this is divided by the time interval. The base value of the sample is the time, so the time interval is taken as the difference between the base values of the two samples.
CountPerTimeInterval64 This represents a rate. This is calculated from two samples, first the difference between the raw values is calculated and this is divided by the time interval. The base value of the sample is the time, so the time interval is taken as the difference between the base values of the two samples.
ElapsedTime This is taken from two samples. The value is taken by calculating the interval between the two samples and then dividing by the frequency. The frequency is taken from the first (older) sample. The time interval is the difference between the timestamp of the second (later) sample and the raw value of the first sample.
NumberOfItems32 This is the raw value of one sample
NumberOfItems64 This is the raw value of one sample
NumberOfItemsHEX32 This is the raw value of one sample
NumberOfItemsHEX64 This is the raw value of one sample
RateOfCountsPerSecond32 Represents a rate, that is the change of the value over time. This is calculated from two samples, first the difference between the raw values is calculated and this is divided by the time interval. The time interval is calculated as the difference between the time stamp of each sample divided by the frequency.
RateOfCountsPerSecond64 Represents a rate, that is the change of the value over time. This is calculated from two samples, first the difference between the raw values is calculated and this is divided by the time interval. The time interval is calculated as the difference between the time stamp of each sample divided by the frequency.
RawBase Base value for the RawFraction. This is used as part of the calculation and so the user should not access it
RawFraction This calculated from a single sample. The counter value is calculated from the raw value divided by the base value.
SampleBase A base value with the number of sampling interrupts taken. This is used as part of the calculation of SampleFraction, and so the user should not access it
SampleCounter Represents a rate, that is the change of the value over time. This is calculated from two samples, first the difference between the raw values is calculated and this is divided by the time interval. The time interval is calculated as the difference between the time stamp of each sample divided by the frequency.
SampleFraction This is calculated from two samples. The value is calculated as the difference between the raw values of the samples divided by the difference between the base values of the samples. The value is then multiplied by 100.
Timer100Ns A percentage time that shows the active time of an item as a percentage of the total time. The time interval is measured in 100ns units and is the difference between the 100ns time measurement of two samples. The active period is the difference between the raw values of the two samples and the percentage is this value divided by the time interval multiplied by 100 and then capped to 100.
Timer100NsInverse A percentage time that shows the non-active time of an item as a percentage of the total time. The time interval is measured in 100ns units and is the difference between the 100ns time measurement of two samples. The non-active period is one minus the difference between the raw values of the two samples and the percentage is this value divided by the time interval multiplied by 100 and then capped to 100.

Note that not all counter types are represented by PerformanceCounterType. The enumeration is actually a bit combination of various values that indicate: the size of the counter (32 or 64 bit), the type of data (number, counter, string), format (hex, decimal, decimal/1000), type of counter (eg rate or value), text type (if the value is text), the timer type (eg system clock frequency or 100ns time intervals), delta type and the display suffix (eg per sec or %). Some of the counters installed on XP have combinations that do not correspond to PerformanceCounterType, which means that not all counters on your system will be readable through PerformanceCounter.NextValue. In this case you have no option other than to call NextSample an appropriate number of times and use the CounterSample values (RawValue, BaseValue and one or other of the time properties) calculate the counter value.

When reading counters (that .NET recognises) you do not have to worry too much about the type of a counter value because Calculate does the appropriate calculation for you. However, bear in mind that a rate counter will require two samples to be taken with a time interval between them, and then Calculate will determine the value of the rate from those two samples. Quite apart from the time taken to perform the calculation, there is the time taken to make two measurements and the time interval between them that must be taken into account. A counter that is an instantaneous value (eg NumberOfItems32) requires only one measurement and needs no calculation, so you can call the RawValue property rather than NextValue method.

Furthermore, you'll need to use the counter type to determine how to display the performance counter data. A percentage, for example will have a top value of 100, whereas a count could be limitless. In some cases you need to have more information when you decide how to display the data. For example, the Thread category has an instance for each thread on the machine, the name of each instance is in the form <process name>/<thread number>#<process instance>, where <process name> is the name of the process and if there is more than one instance of the process then #<process instance> is used to differentiate between them. One of these counters is Thread State which indicates whether the thread is waiting, running, terminated etc, and it makes no sense to plot this out as a graph value.

One word of warning about calculated rates. Rates are useful because they give a measure of how a system is changing and if they are calculated correctly they can remove spurious values to give you a more representative view of the system. However, the mechanism used by PerformanceCounter is merely to have a sampling set of two, if one or other of the samples is spurious then the rate will obtained will not be representative (for example, if you monitor saving data to disk and the second sample is taken at just the point when your system starts a backup - and hence disk activity raises considerably - your sample will give a result that is a lot longer than normal). This comes from the lack of any statistical analysis of the samples (which is not possible from two samples anyway) and so if the rate is important to you then you should save the raw values (through a call to RawValue or NextSample) and analyse the data yourself.

.NET Version 3.0 
The calculation of performance counters has changed in .NET 3.0/2.0 to how it was in .NET 1.1. In both versions the static methods CounterSample.Calculate delegates the work to a static class called CounterSampleCalculator. This has one public method: ComputerCounterValue. In .NET 1.1 this method has a switch based on the counter type and calls one of several private methods. In .NET 3.0/2.0 ComputerCounterValue calls a function called FormatFromRawValue exported from the perfcounter.dll unmanaged library. In other words Microsoft have taken managed code and re-written it as unmanaged code. As the .NET framework gets better I would imagine that things would be the opposite way around - unmanaged code rewritten as managed code - but this is not the case here. I do not know why this was done because calculating performance counter values from raw sample data is quite straightforward and not the kind of calculation that would benefit from being carried out in unmanaged code.

In the example you've been typing the counter % User Time is of the type Timer100Ns, so the raw value will be the time that the process has been running in user mode, the Calculate method will determine the change in this value and divide it by the time interval to get the percentage. Add the following:

try
{
   using (PerformanceCounter time = new PerformanceCounter("Process", "% User Time", args[0]))
   {
      while (true)
      {
         float val = time.NextValue();
         if (val == 0) continue;
         Console.WriteLine(val);
      }
  
}
}
catch(InvalidOperationException){}

This calls NextValue to calculate the percentage and if this is not zero it prints the value out on the command line. Compile this code and run it to monitor notepad (display notepad). If notepad is not running, start it. You'll find that the application will find it, but it will not give any values. This is because notepad will be doing so little that it spends next to no time in user mode (and hence zero as far as this counter is concerned). To get the application to do some work move the window around a lot. You'll see that now the application spends some time executing in user mode. Close notepad. You'll find that display will end. The reason is that the call to NextValue will throw InvalidOperationException because the application no longer exists.

In general you will want to read single counters from a category, but there may be a situation when you want to read all instances and all counters for a category. To do this you can call PerformanceCounterCategory.ReadCategory, this will return an InstanceDataCollectionCollection object. As this name suggests, it is a collection of collections. Here is a schematic:

The InstanceDataCollectionCollection object is a dictionary of InstanceDataCollection objects, one for each counter, the key used by this collection is the CounterName. Each InstanceDataCollection object is a dictionary containing InstanceData objects where the key is the InstanceName. Each InstanceData object contains a CounterSample which has the counter data. This order seems a little odd to me, I would have preferred the outer collection to be a collection of instances and the inner collection to be a collection of counters which seems more logical to me since the innermost data is counter data.

This section has shown how to read performance data. As you can see, it is straightforward. The next section will show how you can provide your own counters.

I hope that you enjoy this tutorial and value the knowledge that you will gain from it. I am always pleased to hear from people who use this tutorial (contact me). If you find this tutorial useful then please also email your comments to mvpga@microsoft.com.

Errata

If you see an error on this page, please contact me and I will fix the problem.

Continuing Page Seven

This page is (c) 2007 Richard Grimes, all rights reserved