7. Performance Counters
Windows XP provides a mechanism to collect and display performance information generated by applications. Performance information is whatever that makes sense to the application. If an application transfers data then a suitable counter could be bytes moved per second, or the total number of bytes moved. The system also provides counters that indicate how the system is faring (for example the amount of free memory) and it provides counters for individual processes (for example the working set, number of threads or the nuber of open handles).
The operating system provides a tool called the performance monitor (perfmon) to gather and display performance counter information. Since perfmon is a separate process to the application generating the performance counter information this means that collecting performance data requires inter-process communication. The model used by perfmon allows it to gather a specific performance counter from your process, or to ask for all performance counters from your process, so this means that the mechanism involves a query function. Furthermore, gathering performance counter data is not limited to perfmon, any application can gather performance data and to allow this Windows provides a public API with a querying mechanism to allow your application to determine what processes are providing performance data and what the data is.
In unmanaged code generating and reading performance counter data is a pain to write. It really is a pain. There have been several attempts to provide libraries that abstract away some of the more tedious parts, but they still do not get away from the fact that the performance monitor API is not the simplest to work with.
The real joy of .NET instrumentation is that Microsoft have done all the
hard work for you, and what's more, they have done it in
such a way that you do not lose any of the functionality that you would have
had if you had accessed the API with unmanaged code. In fact, because writing
and accessing performance counters is so easy with .NET you can achieve more
with it than with the unmanaged API. The solution provided by Microsoft is
elegant and flexible. I wished the same developers had worked on the .NET EventLog
class, because, as you'll see in the next page
the EventLog class is inelegant and lazily written.
7.1 Performance Monitor Example
Performance monitoring uses terminology that is not immediately understandable, so in this section I will describe what they are and where the terms come from. I will present an example monitoring a .NET application using perfmon to illustrate the terms.
Every NT machine (XP, Windows 2000, Windows 2003 Server and Vista) has performance counters and you can view the counters of any machine that you have administrative access to. Once you have chosen the machine, you have to determine which performance object you wish to access. The term performance object is fairly abstract. It relates to the type of data you want to read. Examples of performance objects are: processor, process, thread and memory. All of these are 'objects', but not in the OO sense. A better term would have been class, because they refer to a type of data. In fact, it gets a bit more complicated because some objects have instances and some do not. When you use the processor, process or thread object you have to specify which instance of these objects you want to use, that is, which physical CPU, which process or which thread. Other performance objects, like the memory object, do not have more than one instance, because there is just one memory 'entity' in the machine.
Each object (or object instance) has one or more performance counters. Counters are the actual values that you will view, so for a process this includes the number of threads executing in the process and the working set (the amount of memory) used by the process. These counters can be related to counters in other objects, so you can view the amount of available memory from the memory object; or the amount of virtual memory used by a process through one of the Process instances. If an object can have instances, and there are no instances on the system, then you will not be able to access the counters for that object. However, if there are instances available, then you will see a pseudo instance called _Total, which gives access to the total of a specified counter in all the instances.
Counters can be instantaneous counts, like the number of handles used in a process; or a cumulative value, like the elapse time of a process; or a rate, like the number of page faults per second; or they can even be values that ordinarily cannot be graphed, like a thread's state (where the state is given a numeric value and the 'height' of this value from the graph's baseline is meaningless). As you'll see later, the type of the counter determines how a counter reader will display the counter and it will also determine if the counter reader has to perform a calculation.
To test this, all you need is a really simple application. Create a file with this code:
using System.Windows.Forms;
class App : Form
{
static void Main()
{
Application.Run(new App());
}
}
Compile this as a GUI application:
This is a simple process that merely shows a form. Run the application, but do not close it, just leave it running on the desktop.
Now start the performance counter, from the Control Panel,
Administrative Tools, Performance, or by typing perfmon
on the command line, or in the Run dialog. Ensure that System
Monitor is selected (it should be) in the left hand tree view so that you
can see the counter graphed and then
click on the Add button (the + button, or press Ctrl-I).

In this screen shot you'll see that perfmon has determined the
name of my machine (MARS) and will get the counters for that
machine,
this is the same as Use local machine counters. You can monitor
counters on any machine that you have access to, but note that since
performance counters require registration it means that they can only be
accessed from machines that have the appropriate registration. The system
counters will be available on all machines.
The first step is to
select the appropriate object from the Performance object drop down
list. Peruse the various objects in this control and then select Process. At
this point notice that the Select instance from list list box will be
populated with all the processes on your machine. Scroll down this list and
select the test process. Now scroll through the entries in the
Select counters from list list box, select one, and then click on the
Explain button. You should see a modeless dialog with the help text for
the selected counter. For this test, make sure that Working Set counter is selected (and that
the test application is still selected) and click on the Add
button, then click on the Close button to close the dialog. (I will explain what the
working set is in a moment, but for now, a partial explanation is that it is
the amount of RAM being used by the process.)
You'll see that the performance monitor will start gathering information. On my machine I get a value of about 8.2Mb for the working set.
Now task switch to
the test application and minimise it. Take a look at the
performance monitor again. You'll find that the working set has reduced
considerably (in my case to 850Kb). Restore the application and notice that the working set rises, but not to the previous high (on my machine it rises to 2.7Mb).

So what is happening here? Well, when the application is loaded Windows loads all of the DLLs that it needs. Since this is a .NET GUI application the application is JIT compiled on a per-method basis for each method that is used, and when a method uses a new .NET assembly, that assembly will be loaded and if a native image is not found for the assembly then the method will be JIT compiled. If a native image is available for the assembly then clearly this extra JIT step is not needed.
But note what I said, if a method calls code in another assembly and that
assembly has not already been loaded then when the method is called the
assembly is loaded. For example, if you have an application that has a button
which makes a call to code in, say, System.Data, then the assembly will
only be loaded the first time that a class in that assembly is actually
called. But once that assembly has been loaded it will always be in your
process (.NET does not have a mechanism to unload an individual assembly, but
you can unload an application domain which will unload all the assemblies it
uses). Similarly, if you call an unmanaged
DLL through interop, that DLL will only be loaded the first time that you call
the interop code and once loaded, the DLL will remain in the application
domain until the domain is unloaded; if you only use the DLL once you will
still have the DLL in memory until the domain is unloaded.
In addition to this, Windows manages your memory usage through virtual memory. Physical memory is the actual RAM in your machine. However, when an application is loaded Windows assigns to it virtual memory, some of which will be physical memory and the remainder will be disk based. The pages of memory that are infrequently used are candidates to be paged out to disk. Using disk space (the paging file) like this enables you to appear to have more memory than your machine actually has, and allows you to run more applications. When an application requests a memory page that is not in physical memory a page fault occurs and the operating system loads the page from disk into physical memory. The amount of memory represented by virtual pages in physical memory is called the working set.
Accessing disk is many magnitudes slower than accessing RAM and a large number of page faults is an indication that your machine has too little RAM. In fact monitoring page faults is so important measure that perfmon provides a performance counter called Page Faults/sec for each process and for the system in total. In addition the Process object has a Working Set counter that gives the size of the working set for a process (ie the amount of physical memory being used) and a Page File Bytes counter that is the number of bytes paged out to disk.
When you minimize the application the operating system deduces that you have finished using the application for the time being and it takes the attitude that the application that is top-most is the one that you are using and hence the one which you want to have the best performance.
The System control panel applet allows you to configure this behaviour.
From the command line you can type Rundll32.exe shell32.dll,Control_RunDLL
or you can simply go to the control panel and double click on the System
applet and select the Advanced tab. The Performance section has
a Settings button and the Advanced tab gives you access to
whether Windows gives various performance benefits to foreground or background
applications. |
Since the system assumes that the minimised application is not being used
it can free up it's physical memory usage by paging as many pages as it can to
the paging file. (In fact, the paging file will already have a page for each
page in the working set, so the operating system only needs to copy to the
paging file the pages that have changed.) This is known as trimming the
working set and is the reason for the dramatic drop in memory usage (8.2Mb
to 850Kb in my case). You can force this to happen by calling the Windows API
function EmptyWorkingSet.
Close the test application. You should see that the
performance gathering will stop. Start a new instance of the test
application and look at the performance monitor. The performance gathering
will start again, and this time the working set will show a value of 8.2Mb
again. Close the application again.
Edit
test.cs to add a button
that calls EmptyWorkingSet:
using System.Windows.Forms;
using System.Runtime.InteropServices;
using System.Diagnostics;
class App : Form
{
[DllImport("psapi")]
static extern int EmptyWorkingSet(IntPtr handle);
App()
{
Button button = new Button();
button.Text = "Trim Working Set";
button.Click += new EventHandler(Trim);
button.Dock = DockStyle.Fill;
this.Controls.Add(button);
}
void Trim(object o, EventArgs a)
{
EmptyWorkingSet(Process.GetCurrentProcess().Handle);
}
static void Main()
{
Application.Run(new App());
}
}
So what's happening here? Well, this example uses Windows Forms which means
that it uses System.Windows.Forms.dll which accesses various
Windows GDI and GDI+ DLLs. To see this close test and run it
again leaving it open while you run the command line tasklist
utility:
This says "show the DLLs loaded by test.exe". The result will
look like this:
========================= ====== =============================================
test.exe 844 ntdll.dll, mscoree.dll, KERNEL32.dll,
ADVAPI32.dll, RPCRT4.dll, SHLWAPI.dll,
GDI32.dll, USER32.dll, msvcrt.dll,
IMM32.DLL, LPK.DLL, USP10.dll, mscorwks.dll,
MSVCR80.dll, shell32.dll, comctl32.dll,
comctl32.dll, mscorlib.ni.dll, ole32.dll,
System.ni.dll, System.Drawing.ni.dll,
System.Windows.Forms.ni.dll, mscorjit.dll,
uxtheme.dll, MSCTF.dll, gdiplus.dll,
msctfime.ime, OLEAUT32.DLL
Since this application does not perform any inter-process calls it will not
need to call rpcrt4.dll, equally so, this application does not
use Windows common controls so it will not use comctl32.dll
(either version)
However, these DLLs will be dependencies of one or more of the other DLLs that
the application will use, which is why they have been loaded. When the working
set is trimmed the pages that contains these DLLs will be removed, but since
they are in the paging file it means that if a function in one of those
DLLs is required the appropriate pages can be paged back into the working set.
The test application has an interop call to the psapi.dll library
but notice that from this first test it is not included in the module list. The reason is that the
DLL has not been called yet. Click on the button so that EmptyWorkingSet
is called and then run tasklist again. This time you'll see that
psapi.dll will be listed. This illustrates that platform
invoke will only load a DLL the first time it is called and after that the DLL
will remain in memory.
Of course trimming the working set does more than just this, it can remove some pages of a DLL from the working set and leave other pages from the same DLL in the working set, and this can be seen by the results above. When the working set is first trimmed it reduces to 1Mb, but when the window is moved (requiring other system methods to be called) it rises to 3Mb as pages are moved back into the working set.
As you can see from this example, the performance monitor allows you to look at various system values concerning your application. This example has concentrated only on the working set, the pages of physical memory used by your application, and it has shown that by one simple function call, you can reduce the working set to about a third. However, you can see from perusing the performance monitor counters that there are many other things you can monitor.
For example, threads are quite expensive, so you really should keep the number of threads in your application to a minimum (a good rule of thumb is that normally you should not have more than four threads for each processor your machine has, but it is acceptable to create more threads for brief bursts to handle, for example, a sudden heavy workload). If your application creates threads then you should make sure that you restrict the number created. You can use perfmon to monitor the number of threads that the application creates and if you find that the application creates a large number of threads lasting for a large period of time then you should review your code.
Another example is Windows system handles. One situation when handles are used is with files: when you open a file you will be given a handle; when you have finished using a file you must close the handle. Closing a handle allows Windows to clear the resources that were associated with it. If you do not close a handle then that will be a resource, and hence a memory, leak. The performance monitor allows you to monitor the total number of handles that a process has (Process object Handle Count counter) and you can use it to monitor an application over as long a period of time as you wish. If you find that the handle count rises over time the likelihood is that you are leaking a handle. This usually happens because you are making platform invoke calls to functions that create handles but do not release them.
7.2 Performance Monitor Registration
In the last section you saw that the performance monitor architecture defines an object as being a collection of associated measurements, each of which is called a counter. An object can be system wide, or there can be several instances of objects. For each object and each counter there is a help text that perfmon can display to the user. When the user selects a counter perfmon clearly must be able to gather the values of this counter. As you have seen, many counters are provided by the operating system, but applications can provide their own counters, this means that perfmon must be able to obtain those counters from those applications (and hence make some kind of inter-process call) and to do this it must be able to get information about what application has those counters and how to access them. Since the performance monitor has been in Windows since the very first version of NT you will understand that all of this information will be held in the repository that Windows has used all of that time: the system registry.
As an example, run the performance monitor and click on the + button to add a counter. In the Performance object box scroll up to the top. There you will see objects for the .NET framework and ASP.NET:

These objects were added when you installed the .NET framework (and hence ASP.NET) on your machine. They allow you to monitor the performance of .NET and ASP.NET applications. Close down this dialog and perfmon.
The .NET objects and their counters are summarised here:
| Object | Description |
|---|---|
| .NET Data | For SqlClient classes this gives: current number of pools
associated with the process; current number of connections, pooled or not;
current number of connections in all pools associated with the process;
the highest number of connections in all pools since the process started;
the total number of command executes that have failed for any reason and
the total number of connection open attempts that have failed for any
reason |
| .NET Exceptions | This gives: a count and rate of the numbers of exceptions thrown; the
rate of exception filters executes; the rate of finally
clauses executed; and the rate of the depth of exception stack frames |
| .NET Interop | This gives: a count of the current COM callable wrappers; the total number of times marshalling from managed to unmanaged and vice versa have occurred; and the current number of marshalling stubs |
| .NET Jit | This gives: the total number of IL bytes JITted; the total number of IL methods JITted; the percentage of the execution time (since a JIT was last performed) the application has been JITting code; the rate of JITting last occurred; and the number of JIT attempts that have failed. |
| .NET Loading | This shows: the size of the memory committed (ie reserved in paged memory) by the class loader; the current number of application domains in the application; the current number of assemblies loaded; the number of classes loaded from all assemblies; the rate of loading application domains; the rate of unloading application domains; the rate of loading assemblies; the rate of failure to load classes; the number of application domains loaded since the application started; the total number of application domains unloaded; the total number of assemblies loaded and the total number of classes loaded since the application started |
| .NET LocksAndThreads | This gives: the number of current managed threads in the application; the number of OS threads used as the underlying thread for the managed threads created by the application; the current number of threads used by the runtime; the rate that threads are used by the runtime for managed threads; the total number of threads used by the runtime; the rate that threads obtain locks; the number of threads waiting for a lock; the rate of the threads waiting for a lock; the total number of threads that have waited for a a lock and the rate that threads fail to get a lock |
| .NET Memory | This gives: the number of GC handles (ie handles to external
resources); the number of times GC.Collect was called; the number of
pinned objects at the last garbage collection; the number of
synchronization blocks (with weak references, note this is mispelt to be
Sink Blocks); the amount of committed memory used by the garbage
collector; the amount of reserved memory used by the garbage collector;
the percentage of time spent in garbage collection (for the last GC); the
rate that memory is allocated on the GC heap; the number of objects in the
last garbage collection that were finalised (and so weren't garbage
collected); the size of the large (20Kb or more) object heap (which are
not promoted through GC generations); for the three GC generations (0, 1
and 2 ) there are counters for the number of times the generation has been
collected, the size of the heap, the rate that objects are promoted to the
next generation because of finalisation or because they are used by a long
lived object |
| .NET Networking | This gives: the total number of bytes received over sockets since the process started; the total number of bytes send over sockets since the process started; the number of socket connections established since the process started; the number of datagram packets sent and the number of datagram packets received since the process started |
| .NET Remoting | This gives: the total number of channels registered since the application started; the total number of proxies created since the application started; the current number of context bound classes loaded; the rate that context bound objects are allocated; the current number of contexts; the rate that remote calls are invoked and the number of remote calls that have been made |
| .NET Security | This gives: the total number of link time code access security checks have been performed; the percentage of time that has been spent doing security checks; the depth of the stack during the last code access security check; the total number of code access security checks that have been performed |
| .NET Data Provider for Oracle | Connection information (connections, disconnections, number of pooled and nonpooled connections) for the Oracle data provider |
| .NET Data Provider for SqlServer | Connection information (connections, disconnections, number of pooled and nonpooled connections) for the SQL Server data provider |
To get the counter descriptions you can use the Explain button on the Add Counter dialog, or you can access the source of these descriptions. Performance monitor information is stored in two places in the registry. The first place is:
Under this key is a key with the value of the machine locale. This is
because the key will hold text data and hence this data is locale specific.
The two values are of type REG_MULTI_SZ, that is, they contain
multiple strings. They are called Counter and Help. In spite of the name, these
values actually contain information about performance objects as well as
performance counters. The Counter strings come in pairs (each on a separate
line): an even number, then the name of the counter (or object). For example,
here's a sample from on my machine (the numbers will be different on your
machine):
.NET CLR Interop
3400
# of CCWs
3402
# of Stubs
3404
# of marshalling
3406
# of TLB imports / sec
3408
# of TLB exports / sec
This shows that the .NET CLR Interop object is number 3398 and the # of marshalling counter is number 3404.
The Help
strings are in pairs too: an odd number and the help string. These are related
to the strings in the Counter value the help string of an object (or
counter) in the Counter value has an identifying number one more
than the number of the object (or counter). For example, here's the help
strings for the counters given above:
Stats for CLR interop.
3401
This counter displays the current number of Com-Callable-Wrappers (CCWs). A CCW is a proxy for the .NET managed object being referenced from unmanaged COM client(s). This counter was designed to indicate the number of managed objects being referenced by unmanaged COM code.
3403
This counter displays the current number of stubs created by the CLR. Stubs are responsible for marshalling arguments and return values from managed to unmanaged code and vice versa; during a COM Interop call or PInvoke call.
3405
This counter displays the total number of times arguments and return values have been marshaled from managed to unmanaged code and vice versa since the start of the application. This counter is not incremented if the stubs are inlined. (Stubs are responsible for marshalling arguments and return values). Stubs usually get inlined if the marshalling overhead is small.
3407
Reserved for future use.
3409
Reserved for future use.
To add your own objects and counters you have to append appropriate strings
to these registry values. To help you, the Perflib key also has values called
Last
Counter and Last Help. An installation program clearly has
to give the object and counter names, and object and counter help text,
numbers greater than these values and then update these values once the new
data has been added. There is also a value called Base Index below which are
the numbers assigned exclusively to Windows.
Performance counters refer to something within an application, therefore, when the performance monitor (or any process that reads performance counters) reads a counter it must make an inter-process call. The reader of performance counters is not responsible for performing inter-process communication, and the Windows SDK does not mandate any particular inter-process communication mechanism. Instead (as you'll see in a moment) the reader process just reads the performance data using the performance monitor API and the IPC is performed automatically. This means that if you add performance counters to your application you will have to write IPC code (that is, if you were writing performance counters using the unmanaged API; as you'll see later, .NET does all the work for you).
The performance monitor API specifies that the provider of the counters writes a DLL that acts as the reader endpoint for the IPC, this DLL will be loaded into the reader and it exposes functions that the reader calls. This decouples the reader process from the IPC, indeed the reader knows nothing about the IPC it just knows about the standard entrypoints of the DLL.
The DLL is registered under:
where <name> is a unique name you give your application. Under this key is
a value called Library that gives the path to your DLL. Note that
although this key appears under the Services key, the performance
monitor is not a service and your application does not have to be a service
either.
From the command line start the registry editor (regedit)
and locate the Services key mentioned above. You'll see that
there are keys that correspond to some of the .NET data that can be accessed
through perfmon:

The .NETFramework key indicates that mscoree.dll
provides the reader (ie perfmon) side of the IPC. As you can see it
also has values indicating the numbers for the first and last counters it
supports.
Such a DLL must export
three functions: one to 'open' (or initialize) the performance counters, one
to 'close' (or cleanup) the counters and a final one to retrieve the counter
values.
These functions are exported from the DLL by name and the name for each is
specified in values in the application's key: Open, Close and
Collect, respectively. In the case of mscoree.dll
these three entrypoints are called OpenCtrs, CloseCtrs
and CollectCtrs, but if you were to write your own DLL then you
can use any names you like.
These are the values required by the performance counter API, but the
writer of the performance monitor DLL can store other values. As I mentioned
earlier, when an application registers its counter information it has to use
numbers that are not used by other perfmon DLLs and to do this its
installation application accesses the last counter number and increments
it to give the number of its first counter. This means that the writer of a
perfmon DLL will not know the numbers associated with its counters at
compile time. One way to get round this is to get the application's installation program to
write the values it obtained for this application into the application's key.
This is why key in the screenshot above has the values First Counter,
Last Counter, First Help, Last Help.
This DLL is loaded in-process into the reader
of the performance counters, and the Collect function will perform some IPC to
access the associated application and pass the request for counters. The Collect function
will then wait for the
application to gather the information and return it via the IPC. The DLL then returns the data to the caller of the Collect function. This is summarized in the following picture:
![]() |
Using the values in the picture, the registry will have an entry called:
This key will have a value giving the path to MyPerfLib.dll in the Library
value, and will have a value of MyCollect for the Collect value.
Notice that perfmon is a system tool and can display system performance data. perfmon loads your DLL which means that your code will run within a tool provided by the operating system, and so that means that perfmon is dependent upon how good your coding is. (Note again, that this is for unmanaged performance monitor code.) Because of this, perfmon, takes steps to protect itself against bad code: it spawns a separate thread for each performance monitor DLL, and it wraps the thread function in structured exception handling so if the thread becomes unresponsive then perfmon can kill it, and if your code throws an exception then it will be caught so that it will not kill perfmon.
If your code accesses performance data remotely then an RPC call will be
made from perfmon to winlogon.exe on the remote machine and the
perfmon DLL will be loaded
in that process. winlogon.exe is vital to an XP system and if it dies then the
whole system will die. Again, winlogon.exe will take steps to make sure that a
poorly written perfmon DLL will not kill it, but it cannot protect itself from
everything. For example, if the perfmon DLL makes a call to a call to ExitProcess
that could be disastrous for the remote machine.
Equally so, if your application uses the unmanaged perfmon API to read performance data generated by another application, that application's perfmon DLL will be loaded into your application: you are inviting someone else's code into your process. Thus you have to be careful about what counters you monitor and make sure that you only monitor counters provided by publishers that you trust. This is not an problem with the .NET performance counter classes because the IPC code, and hence the perfmon DLL that is injected into a reader, is provided by the framework and hence if you trust the framework then you will trust the .NET perfmon DLLs too. An interesting corollary is that if the reader is anj unmanaged application (like perfmon itself) then the framework will be hosted to allow the framework's perfmon DLL to be called.
But what is the perfmon API to access
performance counters? Well, there is no specific unmanaged function you must call,
instead your application reads a specific registry key. You do this by accessing the special hive,
HKEY_PERFORMANCE_DATA. In
response to this, the registry API will direct your calls to the performance
monitor API, which will then use the name of the 'key' that you want to read
to determine the perfmon DLL to load, it will then load the library, initialise it and then
call the Collect function.
At this point I will not go into any more detail because it gets
complicated and the .NET API hides all of these details from you. In essence,
the
performance data reader calls RegQueryValueEx and for the 'key' it passes a
list of numbers that are the IDs of the counters the reader wants. The perfmon
API calls the Collect function on the appropriate DLLs and passes the string
of numbers, the Collect function has to parse this string to determine if it
can supply the counters. If it can, then it fills a caller allocated buffer with the
requested counter values; a performance monitor reader
can pass the string Global to get the perfmon DLL to
return data for all of the counters it supports.
This raises the question of how the the performance monitor API knows which
perfmon DLL to call. The answer is that it does not know. Instead, it
loads all the perfmon DLLs registered with the system and asks
each one (by passing the counter's number to the Collect
function) whether it supports the counter. If a DLL does not support the
specified counter it returns an error value. When a DLL returns a non-error
status code, the API knows that it has the correct DLL.
However, because you can request more than one counter from an application, and you can request counters from one or more objects and from more than one instance, it means that a mixture of data is returned, so the perfmon API requires that the data is returned in a specific format which has multiple nested structures. Constructing these structures and extracting data from these structures is tedious and error prone.
The .NET API does all of this for you (registering the perfmon DLL,
the IPC code to talk between the reader and counter provider,
generating the perfmon structures and extracting data from such
structures). This makes accessing perfmon data and providing
perfmon counters in your application incredibly easy. I've said it before,
and I'll say it again: I wish the designers of these classes had also designed
the EventLog class.
7.3 Reading .NET Performance Counters
The two most important classes in System.Diagnostics for
performance monitoring are the PerformanceCounter and
PerformanceCounterCategory classes. The former represents a performance
monitor counter and the latter represents a performance monitor object, which
makes perfect sense because the term 'performance monitor object' is
confusing. These two classes are used to read and write values, that is, they
are used by perfmon readers and by applications that supply perfmon
data. In this section you'll see how to read perfmon data.
Start by creating this simple application (app.cs):
using System.Diagnostics;
class App
{
static void Main()
{
PerformanceCounterCategory[] cats = PerformanceCounterCategory.GetCategories();
foreach (PerformanceCounterCategory cat in cats)
{
Console.WriteLine(cat.CategoryName);
}
}
}
csc app.cs) and run it. This code reads
all of the categories and then prints out the names on the command line. It
will produce the equivalent of the contents of the Performance object
drop down list on the Add Counters dialog. Now add
the following:
foreach (PerformanceCounterCategory cat in cats)
{
Console.WriteLine(cat.CategoryName);
string[] instances = cat.GetInstanceNames();
foreach (string instance in instances)
{
Console.WriteLine("\t" + instance);
}
}
Compile and run this code. This time you will get a list of each instance that is available for each category. Note that some categories do not have instances. This complicates things a little because if a category supports instances then to get access to counters you have to provide an instance name, but if the category does not support instances you have to access it without an instance name (you can, however, use an empty string as the instance name).
To get the counter names change the code like this:
foreach (PerformanceCounterCategory cat in cats)
{
Console.WriteLine(cat.CategoryName);
string[] instances = cat.GetInstanceNames();
if (instances.Length > 0)
{
foreach (string instance in instances)
{
Console.WriteLine("\t" + instance);
}
PerformanceCounter[] counters = cat.GetCounters(instances[0]);
foreach (PerformanceCounter counter in counters)
{
Console.WriteLine("\t\t" + counter.CounterName);
counter.Close();
}
}
else
{
PerformanceCounter[] counters = cat.GetCounters();
foreach (PerformanceCounter counter in counters)
{
Console.WriteLine("\t" + counter.CounterName);
counter.Close();
}
}
}
The PerformanceCounter class derives from Component
which means that it implements IDisposable. Whenever you see
this interface you know that the class that implements it will contain an
unmanaged resource and that the resource must be released as soon as
possible. A class that implements IDisposable will have a
Dispose method, and often another method like Close
to release these resources. PerformanceCounter has a
Close method and so in this code I ensure that this method is
called when the counter object has been used. Since all instances in a category will have the same counters I have
made sure that if a category has more than one instance the counter names
are only printed once.
Compile and run this code. Now you will have the same information that is present in the Performance object, Select instances from list and Select counters from list controls on the Add Counters dialog.
You can see that the key to accessing counters is a category object.
Through a category object you can call InstanceExists to see
if a specific instance is available through a particular category and
there are two static versions of this that you can call to see if a
specific instance is available for a particular category on the current
machine or if that instance/category combination is available on another
machine. You can also call CounterExists to see if there is a
specified counter on the category. Again, there are two static versions to
test for a counter/category pair on the current machine and on another
machine. All of these methods test for the condition (and return a
Boolean), to get access to the counter you will have to pass counter,
category and instance information to the PerformanceCounter
constructor.
Create a file to test for a specific instance (display.cs):
using System.Diagnostics;
using System.Threading;
class App
{
static void Main(string[] args)
{
if (args.Length < 1) return;
while (!PerformanceCounterCategory.InstanceExists(args[0], "Process"))
{
Thread.Sleep(0);
}
Console.WriteLine("Found {0}", args[0]);
try
{
using (PerformanceCounter time = new PerformanceCounter("Process",
"% User Time", args[0]))
{
}
} catch (InvalidOperationException){}
}
}
Thread.Sleep(0). This is not strictly necessary in this example because
there is only one application thread, but in a multi-threaded application it
would be
important. But, you may ask, why wait zero time? Well, regardless of the value
passed to Sleep another thread will be allowed to execute. So if
this was a multi-threaded application another thread in the application would
be allowed to perform a 'time slice' of work, the 'time slice' determined by
the operating system. Calling Thread.Sleep(0) says "let another
thread do some work and when it has finished let me do some work". If the loop
finds that the instance exists the user is informed and then a counter is
obtained. Note that I use the using statement because the counter
must be closed when you have finished with it.
Compile this code (csc display.cs). Make sure that there are
no instance of notepad running and then run the example to test for notepad (display
notepad). You should see that the application pauses, there is no
output. Now start notepad and you should find that a message is printed and
the application finishes.
While notepad is still running, run the application again. Notice that there is a perceivable delay between when the application starts and when it prints out that it has found notepad. This is because the performance monitor API takes a while to load and query the perfmon DLLs to see if they support the specified category and if so, whether they have the specified instance.
Now let's get a value from the counter. As you can see, the counter is
called % User Time, and the help text is:
% User Time is the percentage of elapsed time that the process threads spent executing code in user mode. Applications, environment subsystems, and integral subsystems execute in user mode. Code executing in user mode cannot damage the integrity of the Windows executive, kernel, and device drivers. Unlike some early operating systems, Windows uses process boundaries for subsystem protection in addition to the traditional protection of user and privileged modes. Some work done by Windows on behalf of the application might appear in other subsystem processes in addition to the privileged time in the process.
The only relevant bit is the first sentence, that is, the counter gives the proportion of time that all the threads are executing in user mode, and in effect, it is a measure of the process doing something. Since it is a percentage value, that means that it is a comparison with something else; and in this case, the time spent in user mode is compared to the entire elapse time. Other types of measurements are rates (how a value varies over time) or an instantaneous value. A rate clearly requires that at least two values are measured (to get the change in the value) and the time between the measurements.
The perfmon API defines that the value can be a value, a counter, a rate, or a fraction. If it is a counter then the value will increment as it is read, and the counter description must indicate what the change means. If it is a rate then the counter is made up of two values and the difference between two values, divided by the elapse time between those two samples, gives the rate. Similarly, if the value is a fraction then two values have to be obtained, the denominator and numerator. Sometimes values are compared to another, base value, which means that the base value has to be read as well as the sample, and other times a rate is compared to a base value, which means three values must be read. Thus, some counters are provided as single numbers, some as two numbers, and some are three numbers. If you were to use the performance monitor API you would have to read the correct number of values. The .NET classes do all of this for you.
The PerformanceCounter class has one property called
RawValue and two methods called, NextValue and
NextSample. The NextSample method is the key to the other
members, this method obtains the raw data from the appropriate perfmon
library in a CounterSample object, shown here:
public struct CounterSample
{
public static CounterSample Empty;
public CounterSample(
long rawValue, long baseValue, long counterFrequency, long systemFrequency,
long timeStamp, long timeStamp100nSec, PerformanceCounterType counterType);
public CounterSample(
long rawValue, long baseValue, long counterFrequency, long systemFrequency,
long timeStamp, long timeStamp100nSec, PerformanceCounterType counterType,
long counterTimeStamp);
public long RawValue { get; }
public long BaseValue { get; }
public long SystemFrequency { get; }
public long CounterFrequency { get; }
public long CounterTimeStamp { get; }
public long TimeStamp { get; }
public long TimeStamp100nSec { get; }
public PerformanceCounterType CounterType { get; }
public static float Calculate(CounterSample counterSample);
public static float Calculate(CounterSample counterSample, CounterSample nextCounterSample);
static CounterSample();
}
You do not create instances of this struct, instead you allow
the performance library to do it for you. RawValue is the value
of the counter and TimeStamp (and TimeStamp100nSec)
indicate the time the value was taken. Rate counters aside, some counters
depend on more than one value, for example they may be a ratio of two values
or they may be a difference between two different counter values. In this
case, one value will be in RawValue and the other will be in
BaseValue. The relationship of these properties to each other is
determined by the type of the counter and this is given by the
CounterType property. The static overloaded method Calculate
will use the CounterType property to determine how to calculate a
single precision floating point number from the counter's values. The
PerformanceCounter.RawValue calls the NextSample
method and returns CounterSample.RawValue. The NextValue
method is particularly useful for counters that represent rates because it
calls the overload of Calculate that takes two
CounterSample objects; one is obtained by calling NextSample
and the other is a value cached from the last time NextValue was
called (at the end of NextValue the most recent sample is
cached). This means that the first time NextValue is called the
values held in the cached object will be zero and so you should always ignore
the result from first call to NextValue.
The following table gives the various values of The PerformanceCounterType
and explains how Calculate determines the counter's value:
| Value | Description |
|---|---|
AverageBase |
Base value. This is used as part of the calculation and so the user should not access it |
AverageCount64 |
This represents a rate. This is calculated from two samples, first the difference between the raw values is calculated and this is divided by the time interval. The base value of the sample is the time, so the time interval is taken as the difference between the base values of the two samples. |
AverageTimer32 |
This is calculated from two samples. The calculated value is the difference between the base values of the samples divided by the time interval. The time interval is calculated from the difference between the raw values of the samples divided by the frequency. |
CounterDelta32 |
This is the difference between the raw value of two samples |
CounterDelta64 |
This is the difference between the raw value of two samples |
CounterMultiBase |
A base value that indicates the number of items sampled. This is used as part of the calculation and so the user should not access it. |
CounterMultiTimer |
A percentage timer that shows the time taken to do something as a percentage of the time interval; it can exceed 100%. The time interval is taken from the difference in the timestamp of two samples divided by the frequency. The time taken to perform the action is the difference between the raw values of the two samples divided by the base value. |
CounterMultiTimer100Ns |
A percentage timer that shows the time taken to do something as a percentage of the time interval; it can exceed 100%. The time interval is taken from the difference in the 100ns time value of two samples. The time taken to perform the action is the difference between the raw values of the two samples divided by the base value. |
CounterMultiTimer100NsInverse |
A percentage timer that shows the time taken to do something as a percentage of the time interval; it can exceed 100%. The time interval is taken from the difference in the 100ns time value of two samples. The result is the base value minus the difference between the raw values of the two samples divided by the base value. |
CounterMultiTimerInverse |
A percentage timer that shows the time taken to do something as a percentage of the time interval; it can exceed 100%. The time interval is taken from the difference in the timestamp of two samples divided by the frequency. The time taken to perform the action is the base value minus the difference between the raw values of the two samples. |
CounterTimer |
A percentage counter that gives the average time that something is active; it cannot exceed 100%. The time interval is the difference between the time stamp of the two samples divided by the frequency. The time that the item is active is taken as the difference between the in raw values of the sample. The percentage is calculated from the active time divided by the time interval multiplied by 100 and any value over 100 is capped to 100. |
CounterTimerInverse |
A percentage counter that gives the average time that something is
not active; it cannot exceed 100%. The calculation is the same as
CounterTimer except that 1 minus the difference in raw times
is divided by the time interval. |
CountPerTimeInterval32 |
This represents a rate. This is calculated from two samples, first the difference between the raw values is calculated and this is divided by the time interval. The base value of the sample is the time, so the time interval is taken as the difference between the base values of the two samples. |
CountPerTimeInterval64 |
This represents a rate. This is calculated from two samples, first the difference between the raw values is calculated and this is divided by the time interval. The base value of the sample is the time, so the time interval is taken as the difference between the base values of the two samples. |
ElapsedTime |
This is taken from two samples. The value is taken by calculating the interval between the two samples and then dividing by the frequency. The frequency is taken from the first (older) sample. The time interval is the difference between the timestamp of the second (later) sample and the raw value of the first sample. |
NumberOfItems32 |
This is the raw value of one sample |
NumberOfItems64 |
This is the raw value of one sample |
NumberOfItemsHEX32 |
This is the raw value of one sample |
NumberOfItemsHEX64 |
This is the raw value of one sample |
RateOfCountsPerSecond32 |
Represents a rate, that is the change of the value over time. This is calculated from two samples, first the difference between the raw values is calculated and this is divided by the time interval. The time interval is calculated as the difference between the time stamp of each sample divided by the frequency. |
RateOfCountsPerSecond64 |
Represents a rate, that is the change of the value over time. This is calculated from two samples, first the difference between the raw values is calculated and this is divided by the time interval. The time interval is calculated as the difference between the time stamp of each sample divided by the frequency. |
RawBase |
Base value for the RawFraction. This is used as part of
the calculation and so the user should not access it |
RawFraction |
This calculated from a single sample. The counter value is calculated from the raw value divided by the base value. |
SampleBase |
A base value with the number of sampling interrupts taken. This is
used as part of the calculation of SampleFraction, and so the
user should not access it |
SampleCounter |
Represents a rate, that is the change of the value over time. This is calculated from two samples, first the difference between the raw values is calculated and this is divided by the time interval. The time interval is calculated as the difference between the time stamp of each sample divided by the frequency. |
SampleFraction |
This is calculated from two samples. The value is calculated as the difference between the raw values of the samples divided by the difference between the base values of the samples. The value is then multiplied by 100. |
Timer100Ns |
A percentage time that shows the active time of an item as a percentage of the total time. The time interval is measured in 100ns units and is the difference between the 100ns time measurement of two samples. The active period is the difference between the raw values of the two samples and the percentage is this value divided by the time interval multiplied by 100 and then capped to 100. |
Timer100NsInverse |
A percentage time that shows the non-active time of an item as a percentage of the total time. The time interval is measured in 100ns units and is the difference between the 100ns time measurement of two samples. The non-active period is one minus the difference between the raw values of the two samples and the percentage is this value divided by the time interval multiplied by 100 and then capped to 100. |
Note that not all counter types are represented by
PerformanceCounterType. The enumeration is actually a bit combination
of various values that indicate: the size of the counter (32 or 64 bit), the type
of data (number, counter, string), format (hex, decimal, decimal/1000), type
of counter (eg rate or value), text type (if the value is text), the timer
type (eg system clock frequency or 100ns time intervals), delta type and the
display suffix (eg per sec or %). Some of the counters installed on XP have
combinations that do not correspond to PerformanceCounterType,
which means that not all counters on your system will be readable through
PerformanceCounter.NextValue. In this case you have no option
other than to call NextSample an appropriate number of times and
use the CounterSample values (RawValue,
BaseValue and one or other of the time properties) calculate the
counter value.
When reading counters (that .NET recognises) you do not have to worry too much about the type of a counter value because
Calculate does the appropriate calculation for you. However, bear
in mind that a rate counter will require two samples to be taken with a
time interval between them, and then Calculate will determine the
value of the rate from those two samples. Quite apart from the time taken to
perform the calculation, there is the time taken to make two measurements and
the time interval between them that must be taken into account. A counter that
is an instantaneous value (eg NumberOfItems32) requires only one
measurement and needs no calculation, so you can call the RawValue
property rather than NextValue method.
Furthermore, you'll need to use the counter type to determine how to display the
performance counter data. A percentage, for example will have a top value of
100, whereas a count could be limitless. In some cases you need
to have more information when you decide how to display the data. For example,
the Thread category has an instance for each thread on the
machine, the name of each instance is in the form <process name>/<thread
number>#<process instance>, where <process name> is the
name of the process and if there is more than one instance of the process then
#<process instance> is used to differentiate between them. One of
these counters is Thread State which indicates whether the thread
is waiting, running, terminated etc, and it makes no sense to plot this out as
a graph value.
One word of warning about calculated rates. Rates are useful because they
give a measure of how a system is changing and if they are calculated
correctly they can remove spurious values to give you a more representative
view of the system. However, the mechanism used by PerformanceCounter
is merely to have a sampling set of two, if one or other of the samples is
spurious then the rate will obtained will not be representative (for example,
if you monitor saving data to disk and the second sample is taken at just the
point when your system starts a backup - and hence disk activity raises
considerably - your sample will give a result that is a lot longer than
normal). This comes from the lack of any statistical analysis of the samples
(which is not possible from two samples anyway) and so if the rate is
important to you then you should save the raw values (through a call to
RawValue or NextSample) and analyse the data yourself.
In the example you've been typing the counter % User Time is
of the type Timer100Ns, so the raw value will be the time that
the process has been running in user mode, the Calculate method
will determine the change in this value and divide it by the time interval to
get the percentage. Add the following:
{
using (PerformanceCounter time = new PerformanceCounter("Process", "% User Time", args[0]))
{
while (true)
{
float val = time.NextValue();
if (val == 0) continue;
Console.WriteLine(val);
}
}
}
catch(InvalidOperationException){}
This calls NextValue to calculate the percentage and if this
is not zero it prints the value out on the command line. Compile this code and
run it to monitor notepad (display notepad). If notepad is not
running, start it. You'll find that the application will find it, but it will
not give any values. This is because notepad will be doing so little that it
spends next to no time in user mode (and hence zero as far as this counter is
concerned). To get the application to do some work move the window around a lot.
You'll see that now the application
spends some time executing in user mode. Close notepad. You'll find that
display will end. The reason is that the call to NextValue
will throw InvalidOperationException because the application no
longer exists.
In general you will want to read single counters from a category, but
there may be a situation when you want to read all instances and all
counters for a category. To do this you can call PerformanceCounterCategory.ReadCategory,
this will return an InstanceDataCollectionCollection object.
As this name suggests, it is a collection of collections. Here is a
schematic:

The InstanceDataCollectionCollection object is
a dictionary of InstanceDataCollection objects, one for each
counter, the key used by this collection is the CounterName. Each
InstanceDataCollection object is a dictionary containing
InstanceData objects where the key is the InstanceName.
Each InstanceData object contains a CounterSample
which has the counter data. This order seems a little odd to me, I would have preferred the outer
collection to be a collection of instances and the inner collection to be a
collection of counters which seems more logical to me since the innermost data
is counter data.
This section has shown how to read performance data. As you can see, it is straightforward. The next section will show how you can provide your own counters.
| I hope that you enjoy this tutorial and value the knowledge that you will gain from it. I am always pleased to hear from people who use this tutorial (contact me). If you find this tutorial useful then please also email your comments to mvpga@microsoft.com. |
Errata
If you see an error on this page, please contact me and I will fix the problem.
