Microsoft Vista and .NET
Home About Workshops Articles Writing Talks Books Contact

Analysis of .NET Use in Vista RC2

Introduction
Types of Files
Vista Build 5744 (RC2)
Conclusions
Additional Comments
Performing the Tests
Downloads

 Welcome to the 4% Operating System

Introduction

Microsoft has provided a very mixed message about .NET and Vista. At first, when they released the technical preview of Longhorn at the 2003 PDC Microsoft announced that it used .NET, and indeed it did. Everyone in the .NET world was optimistic about Longhorn. For example, Chris Sells, who is in charge of Longhorn content on Microsoft's MSDN web site (and someone who has my utmost respect), said the following in 2003:

Based On the .NET Framework

First and foremost, while Windows Server 2003 embraced managed code by being the first operating system to ship with the .NET Framework preinstalled, Longhorn is the first operating system whose major new features are actually based on the .NET Framework.

John Montgomery, Director of Product Management said the following in 2003:

Put another way, at PDC 2000, Microsoft debuted the .NET Framework, which introduced a new managed programming model on top of our existing Windows operating systems. With Longhorn and WinFX, we're keeping that managed programming model while building new core parts of the OS, such as moving to the Avalon subsystem next to GDI and User.

Michael Wallent, general manager of the Avalon team said in 2003 (on the .NET Show):

One of the most important things when you think about Longhorn is, it's not just another operating system release like Windows XP or Windows 2000. We think about it more like a wave, and the wave is really made up of more than just any one particular thing. Think about the wave being built upon a foundation of a new platform investment that we've made across the board, built on top of the .NET foundation. And then with Longhorn there'll be client releases and a server release.

All of these statements say the same thing: the new features of Longhorn will be managed code and available through the WinFX API (which was formerly called the Longhorn API, and now is confusingly called .NET 3.0). The overall theme of the 2003 PDC was that to get the full benefits of Longhorn you had to write managed code.

So many things have changed since then. First, the WinFX components were renamed the Windows Presentation Foundation (WPF) and Windows Communications Foundation (WCF) and the storage layer, WinFS was dropped (first it was postponed to another release of Windows, then it was cancelled). Then in a bizarre act Microsoft took the aggregate of .NET 2.0, WCF and WPF and renamed it .NET 3.0. Bizarre!

The technical preview of Longhorn (released at the 2003 PDC) used Avalon for its user interface and WinFS to access files under Documents and Settings. The code was effectively based on the codebase of XPSP1. After the PDC Microsoft started again on Longhorn. This time they used the codebase of Windows Server 2003 SP1. They also removed WinFS from the operating system, and from the API, WinFX. Although Avalon and Indigo were part of WinFX none of the applications supplied with Longhorn used them. Finally, Microsoft gave the operating system a name: Vista.

In terms of organization, Microsoft split the development of WinFX from Vista. WinFX was no longer a Vista technology, and Microsoft announced that it would be available for earlier versions of Windows. Furthermore, separating the development of WinFX and Longhorn meant that Vista could not be based on WinFX because to do so would mean that Vista would appear many months, perhaps years, after the release of WinFX. This was reflected in the community technical previews (CTP) and the betas of Vista that have been released. None of these used WCF nor WPF, indeed, most of the builds did not even have WinFX installed and you had to download and install the components separately. Build 5308

The next publicly released build (February 2006 CTP, a pre-Beta 2 build) had WinFX 3.0 as an additional component that you could install. In effect, the install package was copied to the hard disk when you installed Vista, but you had to take the extra step through the Control Panel to install those components. At the time of writing the latest build is 5744 (RC2), this appears to have WPF installed by Vista set up, and WCF is available if you choose to install it through the Control Panel.

On all builds a version of the .NET was supplied. Early on this was beta versions of .NET 2.0, but by 5231 .NET 2.0 was released and so all builds from this point onwards used this version. The earlier version of this page showed an analysis of the use that Vista made of .NET. In effect I performed a recursive search through all the folders below Windows for executable files and tested each one to see if it was a managed file or hosted the runtime. I will not give details here other than to mention that .NET was used in two main areas: the Microsoft Management Console (MMC) and Media Center.

Types of Files

In a later section I'll describe how I made the measurement. In this section I will explain the types of files that I am checking for. The first thing I do is test to see if a file is an executable. The simplest way to do this is using its extension; and this has the advantage that it is quick. The list of executable extensions I use are:

.exe;.dll;.scr;.cpl;.drv;.sys;.msc;.com;.msi;.vxd;.ocx;

The next thing I do is read the data in the file to see if it is a PE (portable executable). All 32-bit executable files are PE files, even .NET assemblies. If the file has an extension that says it is an executable file, but it is not a PE file, then it must be a 16-bit Windows or DOS file.

Next, I check to see if it is a .NET assembly. To do this I check to see if the file contains the CLR header. This header contains important information about the location of the .NET code in the file and the version of the framework that was used to write that code. The location of this header is given in the file's Data Directory table. If the data directory item has zero values then the file is unmanaged, if it has non-zero values then the file is a .NET assembly.

You can test this yourself using the dumpbin utility with the /headers switch. This utility will print the various headers in a file on the command line. At the end of the Optional Header Values you'll see a list of the Data Directories (there will always be 16 of them) and if the COM Descriptor Directory has a non-zero location it indicates that the file is a .NET assembly. The contents of the CLR header can also be listed using the /clrheader switch (if the file is unmanaged this will show no values). XP tests for the CLR header when it executes a file and if the CLR header is present it will initialize the runtime and pass the entry point of the assembly to the runtime, so that the file runs totally within the runtime.

It is possible for an unmanaged PE file to host the .NET runtime. Such a file will not have a CLR header because it dynamically loads the runtime when the executable runs.

If anyone tells you that Visual Studio .NET is a managed application, you instantly know that they know nothing about .NET. Simply typing dumpbin devenv.exe /headers (assuming you have devenv.exe in your path) will prove this: the location in the COM Descriptor Directory is zero.

An executable hosts the runtime by calling one of the initialization, or runtime binding functions exported from mscoree.dll. The function that will most likely be used is CorBindToRuntimeEx. To test whether a file hosts the runtime code needs to access the PE file's Import Address Table (you can do this with dumpbin /imports) to see if the file imports functions from mscoree.dll. If it does, the code can then check to see if the file imports one of the runtime binding functions. (devenv.exe imports CorBindToRuntimeEx from mscoree.dll)

Vista Build 5744 (RC2)

Since Microsoft has already announced that it will release Vista to OEMs before the end of 2006 and that it will market a consumer version early in 2007 it is reasonable to think that this build (released at the beginning of October 2006) will be the final publicly available build before the release. It is reasonable to assume that the RTM version will be close to RC2, so it's use of .NET will be very close to the .NET use of the final version of Vista.

Developers are more likely to chose to use an API if that API is already installed on every machine. Installing new APIs is fraught with difficulty, with versioning (which .NET, hopefully, has solved) and with the size of the API install file. One important part of this analysis is to determine which .NET APIs are installed on the machine. Well, the good news is that all the current versions of .NET (1.0, 1.1 and 2.0) are installed, and there is also the rather oddly named 3.0. Why do I say that it's name is odd? Well the reason is that 3.0 only contains WCF, WPF and WF. It requires that 2.0 is also installed. This is a new departure for Microsoft because 2.0 did not require 1.1, which did not require 1.0. I think this new versioning mechanism is misleading and may cause problems in the future.

However, I am pleased that WPF and WCF are installed because developers will be able to create applications that use these technologies and they don't have to worry about distributing a separate runtime component with their applications.

This section has the results of my attempts to find out Vista's use of .NET. I have looked in four places:

  1. The operating system's files are in \Windows, right? So that is the first place to look. Most of the operating system is in the System32 subfolder, but there are other folders too. However, \Windows also contains the managed Global Assembly Cache (GAC) which has copies of shared assemblies, and it also has the unmanaged equivalent, the side-by-side assembly cache. So I exclude them in this search. In addition, .NET cannot be used in kernel mode so I also exclude driver files.
  2. User applications are installed under \Program Files. Microsoft also uses this folder, so I must check it for managed files.
  3. I want to exclude the .NET framework itself, so that means excluding files in \Windows\Microsoft.NET, most of these assemblies will be shared and will appear in the GAC. There may be files installed in the GAC that are not part of the framework, so I have to perform a cross reference against the files that I have already identified.
  4. The side-by-side assembly cache (\Windows\WinSxS) is for unmanaged assemblies (collections of DLLs) and it is checked by LoadLibraryEx when it is searching for a DLL. .NET Fusion (the technology to locate and load managed assemblies) does not use LoadLibraryEx but even so, Vista appears to install assemblies there.

First, the simple, recursive check of \Windows. I tested all folders below \Windows excluding the subfolders Microsoft.NET, WinSxS and System32\DriverStore. These are the results:

File Type Count
Total Number Of Files 8924
Executable Files 2811
Managed Executable Files (Assemblies) 38
Executable Files that Host the Runtime 2
Unmanaged Executable Files 2619
Non-PE Files 152

Of the 38 .NET executable files, 29 of them are under the ehome folder (the two hosting executables are under this folder too), that is, they are for Media Center. The remaining nine files are:

AuthFWSnapin.dll
AuthFWSnapin.Resources.dll
AuthFWWizFwk.dll
AuthFWWizFwk.Resources.dll
MtEdit.exe
NAPCRYPT.dll
NAPHLPR.dll
Narrator.exe
Narrator.Resources.dll

If we exclude resource files (which do not have code) then it means that there are only six files. This has been drastically reduced from the number of files that were managed in Beta 1 or the builds that followed it. What has happened to these files? Microsoft appear to have removed the assembly that gave a managed API to MMC, and several of the files (for example, the event log viewer and task scheduler) that implemented managed MMC snapins. Thus all of these files would have to have their own stand alone executable. This is true of the screen reader (Narrator.exe)n and of the Group Policy Migration Tables Editor (mtedit.exe), but there does not appear to be a hosting file for the Network Access Protection assemblies (those with name starting with NAP) nor for Windows Firewall administration (the files starting with AuthFW).

This is not the full story. A recursive test of Program Files gives:

File Type Count
Total Number Of Files 1562
Executable Files 252
Managed Executable Files (Assemblies) 28
Executable Files that Host the Runtime 0
Unmanaged Executable Files 218
Non-PE Files 0

There are 3 copies of the managed API and resources for the tablet PC, but more interesting files are those in the Reference Assemblies\Microsoft\Framework\v3.0 subfolder. These are assemblies for Windows Presentation Foundation (WPF) and the Workflow Foundation (WF). Excluding copies and resource files this means that there are 25 assemblies under \Program Files.

A visual search on the GAC shows that there are non-Framework assemblies that have not been accounted for in the two searches above. So the final place to look for managed assemblies is the side-by-side cache that was excluded earlier (\windows\WinSxS). This command line:

dir \windows\winsxs\msil*

gives 138 directories (ie assemblies). This list is too long to give here but they represent the following groups of assemblies:

  • Media Center (assemblies starting with eh)
  • Event Viewer
  • Group Policy
  • Microsoft Management Console
  • Trusted Platform Module
  • Network Access Protection
  • Presentation Framework
  • Workflow Framework
  • Communications Framework
  • Task Scheduler

I estimate that there are 91 framework assemblies in WinSxS, leaving 47 in the groups above.

Thus, there are 110 managed assemblies in Vista RC2 (29 under ehome + 9 under System32 + 25 under Program Files + 47 under WinSxS). Compare this to the number of unmanaged executable files, an estimate can be determined by adding the number of unmanaged executable files under System32 (2619) and those under Program Files (218), to yield a figure of 2837.

Put this another way: 3.7% of the executable files (excluding the framework files) in Vista RC2 Ultimate are managed, therefore, Vista is less than 4% managed. Note that I am using the number of executable files since it is easier to measure than the number of publicly available API methods.

I will let readers draw their own conclusions from this figure.

Conclusions

Am I happy with the situation? Well, if I apply the hopes that I had after the 2003 PDC then I would say that I am not. At the 2003 PDC we were promised an operating system and an API that was substantially based on .NET, yet what Microsoft has delivered with Vista RC2 Ultimate is an operating system where just four percent of the executable files are managed. On the positive side, Vista has WPF and WCF installed by the setup program, so these technologies will be on every Vista machine. This will encourage developers to use these technologies. Do I think that Vista uses .NET as much as it should? Well, using it for a few snapins in MMC is fairly minor compared to what it could be used for. However, I recognise that this is all that we will get.

Additional Comments

What happened to the previous version of this page? The first version of this page contained the results of the number of managed files on each version of Longhorn/Vista released by Microsoft and I compared these figures with the values I obtained from the PDC 2003 version of Longhorn. Each time I expressed my disappointment that Microsoft was not doing enough with .NET. Eventually I came to the conclusion that Microsoft had no intention of living up to the promises that they had made at the 2003 PDC.

That page got quite a lot of attention from the rumour mongers at slashdot, and that then got the attention of various project managers at Microsoft. I started to get 'advice' from Microsoft, some advice was more useful than others. One project manager exclaimed that he would not include me on any more betas. I thought that this was very odd and was a misplaced sentiment. My intention was always to make Vista and .NET better and I have never released NDA material. I personally know of several beta testers (and pre-beta testers) who have released NDA material, but Microsoft still continue to invite them to beta test software. The only conclusion that I could draw from those comments was that if you want to be a beta tester for Microsoft you must not be critical, however constructive, about their software.

Then I was offered a way out by another project manager. He offered a carrot-and-stick approach. The stick was clear, if I didn't take down that page, he told me, my likelihood of doing any work for Microsoft would be slim. No work giving conference talks, no consultancy, no writing; I would be unlikely to be invited to test any more betas. And, of course, it was unlikely that I would be awarded MVP in the future. The carrot was that if I took down the page I would be asked to write some white papers for Microsoft. Well, I took down the page, but I still suffered the stick. And the carrot? It did not appear, my emails to the project manager who made the offer have not been answered. So this is why I have posted this page here again.

Performing The Tests

Initially I decided to write the test application using the .NET framework. In general .NET makes this sort of thing simpler than Win32 because the .NET framework library is much higher level than the Win32 library or the C runtime library. The C# compiler is supplied as part of the .NET framework installation and this meant that I could simply copy the source files to the test machine and compile it for the build installed on that machine. However, XPSP2 does not have the .NET framework installed, and installing it could affect my results. So after getting my code to work I committed a cardinal sin and ported the code to Win32. This meant that the test application was no longer dependent upon any version of the .NET framework. In this section I will describe the managed version of the test application rather than the unmanaged version because the code is easier to understand

Since I intended to compile this code using the tools provided as part of the .NET framework re-distributable I decided not to use Visual Studio. The .NET test application is a Windows GUI application (the unmanaged version is a command line application) but I decided to create it without the Visual Studio designer. This is not difficult to do, especially if the application uses anchored or docked controls.

The download for this article has the source files and a make file for the application. The main code is in a file called pefile.cs. This has code that loads a file and then performs the checks to see if the file has non-zero values in the COM Descriptor Directory (ie it has a CLR Header) as well as checking to see if the Import Name Table imports one of the runtime binding functions.

The following picture is a simple illustration of the format of a 32-bit Windows file:

Every 32-bit Windows executable files starts with the characters MZ (or the bytes 0x5a4d), after that are various values including (at offset 0x3c) the relative address of the PE identifier, PE\0\0 (the bytes 0x00004550). A file that lacks these identifiers is not an executable file. Immediately after the PE identifier is the COFF header, and then comes the PE header. The important value in the COFF header relevant to this discussion is the number of sections, which we will return to in a moment. The PE header contributes the values that dumpbin lists as being the Optional Header Values, in spite of the name, these are actually vital for a PE file! The end of the PE header is an array that is called the Data Directory. There are sixteen items in the Data Directory and each one is an eight byte value: the lower four bytes are a virtual address and the remaining bytes are the size of the item referred to. Three data directory items are relevant to this discussion: Import Table (index 1), Delay Import Directory (index 13) and the COM Descriptor Directory (index 14).

The address given in the data directory is a Relative Virtual Address (RVA), this will be the address in memory where the file has been loaded. However, this will not be the offset from the beginning of the disk file because the operating system will align data sections according to memory page boundaries. Since our program will manipulate disk files we need to convert an RVA to a raw disk file offset. The way to do this is to use the section header tables.

After the PE header will be one or more section headers (the number of section headers was given in the COFF header that I mentioned earlier). Each section is named, and the name can be up to 8 characters (note that the section header allocates just eight bytes, so an eight character name will not have a terminating NUL character). The section header also lists the size of the section, the raw file offset of the section and the RVA. Therefore, to convert an RVA to a raw address you need to loop through the section headers and obtain the start and end RVA for the section. When you have found the section that contains the RVA you can then find the offset from the section's start RVA and add this offset to the section's raw address to get the raw offset in the file.

For example, imagine that a file has the following sections:

Section RVA Start Size Raw Address
.text 01000 44689 00400
.data 46000 01d90 44c00
.rsrc 48000 b2400 46400
.reloc fb000 36dc f8800

To find the raw address of an RVA of 0x46abc, you first identify which section it is in. You can see that this address is in the .data section (0x46000 to 0x47d8f) at offset 0x0abc. This section starts at the raw address 0x44c00, so you add this address to the offset to get a raw address 0x456bc (0x44c00 + 0x0abc).

From the above description you have all the knowledge you need to determine if a file is an assembly. First, you find the location of the PE header, then you access the fourteenth Data Directory (the COM Descriptor Directory) and read the RVA and size. If both are non-zero then it means that the file contains a CLR header and so it is a .NET assembly file.

I outlined earlier that some unmanaged files may host the .NET runtime by calling one of the runtime hosting functions. To test for this you need to get access to the Import Table. This has one or more Import Descriptor structures. Each one will have the RVA to the name of the DLL that it refers to and an RVA to an array called the Import Name Table (INT). The INT is an array of 32-bit numbers (the end of the table is a zero entry) if the top bit of the entry is set then the rest of the bits give the ordinal for the function (ie a number). If the top bit is clear then the number is the RVA to another structure. This final structure has 2 bytes which contain the ordinal and the following bytes are the name of the function terminated with a NUL.

Thus, to determine whether the file imports a .NET binding function the code loops through all of the Import Descriptors and looks for the mscoree.dll entry. If it finds this entry, the code then loops through all the entries in the INT looking for the binding functions by name or ordinal. If such a function is found the file is marked as hosting the CLR.

It is possible that the executable could use delay loading. Normally, when Windows loads an executable (process or DLL) it accesses the INT and then loads each DLL specified, Windows then determines the address of each function that the executable requires and puts the address of the function in the executable's Import Address Table so that the function can be called by the executable. The problem with this approach is that some DLLs have initialization code and sometimes this code takes a while to execute. This makes the loading of the executable take longer.

Delay loading is not a feature of the operating system, it is a feature provided by the Visual C++ linker. In effect, the loading of the DLL is delayed until the first time that a method in the DLL is called. The Delay Import Directory is an array of ImageDelayDescr structures (see delayimp.h). This is similar to the Import Descriptor structure I mentioned earlier: there is a member that is the RVA to the name of the DLL and an RVA to an INT. My code loops through each ImageDelayDescr structure in the Delay Import Directory, tests to see if the DLL's name is mscoree.dll and if so it accesses the INT and if it contains one of the runtime binding functions the executable is marked as hosting the CLR.

All of this assumes that the code can access the file. If the file has been opened for exclusive access by another process then you will not be able to open it for these tests. One possible way to get round this issue is to make a copy of the file and test the copy. If the copy fails then my program flags an error. Note that if the file is large then making a copy will take a long time. The program maintains a hard coded list of files to ignore (you will have to change the source code to change this list) and at the moment this list only contains the single file pagefile.sys.

The main file in the application (dotnetornot.cs) handles the user interface and collates the files that will be tested. Some folders are just copies of existing files, for example, XP maintains a folder called dllcache which is essentially a copy of the system32 folder. This is maintained by the Windows File Protection mechanism which exists to try and prevent malware from attaching itself to an operating system file: if a file in system32 changes WFP will detect this and replace the changed file with the copy in dllcache. The test program maintains a list of folders that should be ignored, and at start up this list of folders is read from a file called excluded.txt. The Exclude tab gives access to this folder list and and it provides a mechanism to edit it.

When you click on the Start button a new thread is created to perform the search. The reason why the search is performed on a new thread is because the search can take a long time and if the main thread is used this would block any updates to the user interface. Using a different thread means that user interface updates can occur while the search continues, but it also means that if the search routine needs to access the user interface (to update the progress bar, or to add a value to the list view) this must be done on the GUI thread by calling ISynchronizeInvoke.Invoke. Once the search has started the Start button changes to a Stop button which you can use to abort the search.

The search mechanism needs to know the folder where the search will be started. Initially I used the .NET framework's FolderBrowserDialog to allow the user to select the start folder. However, I found that when I used this on Vista Beta 1 it took a long time to show the root of the disk where Vista was installed. The reason was that when you select a folder the browser dialog needs to determine if each subfolder itself has subfolders to determine if it should show the subfolder with a + symbol in the tree view. In Vista there are a few folders in the root that have over eleven thousand folders and so enumerating these subfolders took a long time to perform.

Part of the problem was that the folder browser dialog attempted to fill the tree view with information about all the folders on the disk. To get round this issue I decided to write my own version of the folder browser dialog (folderchooser.cs) that would read as little as possible from the disk. So if you expand a folder node with FileChooser it will already contain child folders (the dialog had to do this to make the folder have the plus symbol), however, the expand handler needs to determine if the child folders has subfolders, and so the browser dialog accesses the next level of folders (but only the next level): the children of the child folders.

The start folder selected by the user will be passed to the worker thread. The first thing that the thread does is check to see if this folder is on the exclusion list, and if it is, the method returns immediately. Then the worker thread enumerates all the files in the folder, and for each file it checks to see if the file is on the exclusion list and if it is not, it then checks to see if the file is managed. The name of the file and an icon representing the type of the file is added to a list view. Once all files have been checked the worker thread enumerates all the folders in the test folder and recursively calls the search method for each folder.

The icons in the list view indicates the type of the file: the file could not be opened, non-PE file, an unmanaged file, managed file or an unmanaged file that hosts the the runtime:

A running count is kept of all of these identified file types and at any time you can click on the Statistics tab to get a summary of these and a list of the extensions of the files that were tested:

Downloads

There are two downloads for this article. The first is a managed project that has a GUI: the source code and a compiled executable are provided. The second download is an unmanaged project supplied as an executable only. The unmanaged project is a command line executable and a DLL, it has a file called excluded.txt that, on each line, has the name of a folder that you want excluded from the search (for example \windows\Microsoft.NET). I don't provide the source for this project because it is horribly messy and I haven't the time to make it pretty and easy to read.

Managed dotNetOrNot.exe
Unmanaged dotNetOrNot.exe

 

To support my work please consider donating through Paypal:

(c) 2008 Richard Grimes, all rights reserved