Analysis of .NET Use in Vista RC2
|Types of Files|
|Vista Build 5744 (RC2)|
|Performing the Tests|
Microsoft has provided a very mixed message about .NET and Vista. At first, when they released the technical preview of Longhorn at the 2003 PDC Microsoft announced that it used .NET, and indeed it did. Everyone in the .NET world was optimistic about Longhorn. For example, Chris Sells, who is in charge of Longhorn content on Microsoft's MSDN web site (and someone who has my utmost respect), said the following in 2003:
Based On the .NET Framework
First and foremost, while Windows Server 2003™ embraced managed code by being the first operating system to ship with the .NET Framework preinstalled, Longhorn is the first operating system whose major new features are actually based on the .NET Framework.
John Montgomery, Director of Product Management said the following in 2003:
Put another way, at PDC 2000, Microsoft debuted the .NET Framework, which introduced a new managed programming model on top of our existing Windows operating systems. With Longhorn and WinFX, we're keeping that managed programming model while building new core parts of the OS, such as moving to the Avalon subsystem next to GDI and User.
Michael Wallent, general manager of the Avalon team said in 2003 (on the .NET Show):
One of the most important things when you think about Longhorn is, it's not just another operating system release like Windows XP or Windows 2000. We think about it more like a wave, and the wave is really made up of more than just any one particular thing. Think about the wave being built upon a foundation of a new platform investment that we've made across the board, built on top of the .NET foundation. And then with Longhorn there'll be client releases and a server release.
All of these statements say the same thing: the new features of Longhorn will be managed code and available through the WinFX API (which was formerly called the Longhorn API, and now is confusingly called .NET 3.0). The overall theme of the 2003 PDC was that to get the full benefits of Longhorn you had to write managed code.
So many things have changed since then. First, the WinFX components were renamed the Windows Presentation Foundation (WPF) and Windows Communications Foundation (WCF) and the storage layer, WinFS was dropped (first it was postponed to another release of Windows, then it was cancelled). Then in a bizarre act Microsoft took the aggregate of .NET 2.0, WCF and WPF and renamed it .NET 3.0. Bizarre!
The technical preview of Longhorn (released at the 2003 PDC) used Avalon for
its user interface and WinFS to access files under
Documents and Settings.
The code was effectively based on the codebase of XPSP1. After the PDC Microsoft
started again on Longhorn. This time they used the codebase of Windows Server
2003 SP1. They also removed WinFS from the operating system, and from the API,
WinFX. Although Avalon and Indigo were part of WinFX none of the applications
supplied with Longhorn used them. Finally, Microsoft gave the operating system a
In terms of organization, Microsoft split the development of WinFX from Vista. WinFX was no longer a Vista technology, and Microsoft announced that it would be available for earlier versions of Windows. Furthermore, separating the development of WinFX and Longhorn meant that Vista could not be based on WinFX because to do so would mean that Vista would appear many months, perhaps years, after the release of WinFX. This was reflected in the community technical previews (CTP) and the betas of Vista that have been released. None of these used WCF nor WPF, indeed, most of the builds did not even have WinFX installed and you had to download and install the components separately. Build 5308
The next publicly released build (February 2006 CTP, a pre-Beta 2 build) had WinFX 3.0 as an additional component that you could install. In effect, the install package was copied to the hard disk when you installed Vista, but you had to take the extra step through the Control Panel to install those components. At the time of writing the latest build is 5744 (RC2), this appears to have WPF installed by Vista set up, and WCF is available if you choose to install it through the Control Panel.
On all builds a version of the .NET was supplied. Early on this was beta
versions of .NET 2.0, but by 5231 .NET 2.0 was released and so all builds from
this point onwards used this version. The earlier version of this page showed an
analysis of the use that Vista made of .NET. In effect I performed a recursive
search through all the folders below
Windows for executable files
and tested each one to see if it was a managed file or hosted the runtime. I
will not give details here other than to mention that .NET was used in two main
areas: the Microsoft Management Console (MMC) and Media Center.
Types of Files
In a later section I'll describe how I made the measurement. In this section I will explain the types of files that I am checking for. The first thing I do is test to see if a file is an executable. The simplest way to do this is using its extension; and this has the advantage that it is quick. The list of executable extensions I use are:
The next thing I do is read the data in the file to see if it is a PE (portable executable). All 32-bit executable files are PE files, even .NET assemblies. If the file has an extension that says it is an executable file, but it is not a PE file, then it must be a 16-bit Windows or DOS file.
Next, I check to see if it is a .NET assembly. To do this I check to see if the file contains the CLR header. This header contains important information about the location of the .NET code in the file and the version of the framework that was used to write that code. The location of this header is given in the file's Data Directory table. If the data directory item has zero values then the file is unmanaged, if it has non-zero values then the file is a .NET assembly.
You can test this
yourself using the
dumpbin utility with the
switch. This utility will print the various headers in a file on the command
line. At the end of the Optional Header Values you'll see a list of the
Data Directories (there will always be 16 of them) and if the COM
Descriptor Directory has a non-zero location it indicates that the file is a .NET assembly.
The contents of the CLR header can also be listed using the
/clrheader switch (if the
file is unmanaged this will show no values). XP tests for the CLR header when it
executes a file and if the CLR header is present it will initialize the runtime
and pass the entry point of the assembly to the runtime, so that the file runs
totally within the runtime.
It is possible for an unmanaged PE file to host the .NET runtime. Such a file will not have a CLR header because it dynamically loads the runtime when the executable runs.
If anyone tells you that Visual Studio .NET is a managed application, you
instantly know that they know nothing about .NET. Simply typing |
An executable hosts the runtime by calling one of the initialization, or
functions exported from
mscoree.dll. The function that will most likely
be used is
CorBindToRuntimeEx. To test whether a file hosts the runtime code needs
to access the
PE file's Import Address Table (you can do this with
/imports) to see if the file imports functions from
If it does, the code can then check to see if the file imports one of the
runtime binding functions. (
Vista Build 5744 (RC2)
Since Microsoft has already announced that it will release Vista to OEMs before the end of 2006 and that it will market a consumer version early in 2007 it is reasonable to think that this build (released at the beginning of October 2006) will be the final publicly available build before the release. It is reasonable to assume that the RTM version will be close to RC2, so it's use of .NET will be very close to the .NET use of the final version of Vista.
Developers are more likely to chose to use an API if that API is already installed on every machine. Installing new APIs is fraught with difficulty, with versioning (which .NET, hopefully, has solved) and with the size of the API install file. One important part of this analysis is to determine which .NET APIs are installed on the machine. Well, the good news is that all the current versions of .NET (1.0, 1.1 and 2.0) are installed, and there is also the rather oddly named 3.0. Why do I say that it's name is odd? Well the reason is that 3.0 only contains WCF, WPF and WF. It requires that 2.0 is also installed. This is a new departure for Microsoft because 2.0 did not require 1.1, which did not require 1.0. I think this new versioning mechanism is misleading and may cause problems in the future.
However, I am pleased that WPF and WCF are installed because developers will be able to create applications that use these technologies and they don't have to worry about distributing a separate runtime component with their applications.
This section has the results of my attempts to find out Vista's use of .NET. I have looked in four places:
- The operating system's files are in
\Windows, right? So that is the first place to look. Most of the operating system is in the
System32subfolder, but there are other folders too. However,
\Windowsalso contains the managed Global Assembly Cache (GAC) which has copies of shared assemblies, and it also has the unmanaged equivalent, the side-by-side assembly cache. So I exclude them in this search. In addition, .NET cannot be used in kernel mode so I also exclude driver files.
- User applications are installed under
\Program Files. Microsoft also uses this folder, so I must check it for managed files.
- I want to exclude the .NET framework itself, so that means excluding files
\Windows\Microsoft.NET, most of these assemblies will be shared and will appear in the GAC. There may be files installed in the GAC that are not part of the framework, so I have to perform a cross reference against the files that I have already identified.
- The side-by-side assembly cache (
\Windows\WinSxS) is for unmanaged assemblies (collections of DLLs) and it is checked by
LoadLibraryExwhen it is searching for a DLL. .NET Fusion (the technology to locate and load managed assemblies) does not use
LoadLibraryExbut even so, Vista appears to install assemblies there.
First, the simple, recursive check of
\Windows. I tested all folders
\Windows excluding the subfolders
System32\DriverStore. These are the
|Total Number Of Files||8924|
|Managed Executable Files (Assemblies)||38|
|Executable Files that Host the Runtime||2|
|Unmanaged Executable Files||2619|
Of the 38 .NET executable files, 29 of them are under the
(the two hosting executables are under this folder too), that is, they are
for Media Center. The remaining nine files are:
If we exclude resource files (which do not have code) then it means
that there are only six files. This has been drastically reduced from the number
of files that were managed in Beta 1 or the builds that followed it. What has
happened to these files? Microsoft appear to have removed the assembly that gave a managed API to MMC,
and several of the files (for example, the event log viewer and task scheduler)
that implemented managed MMC snapins. Thus all of these files would have to have their own stand alone executable. This
is true of the screen
Narrator.exe)n and of the Group Policy Migration Tables
mtedit.exe), but there does not appear to be a hosting
file for the Network
Access Protection assemblies (those with name starting with
nor for Windows Firewall administration (the files starting with
This is not the full story. A recursive test of
|Total Number Of Files||1562|
|Managed Executable Files (Assemblies)||28|
|Executable Files that Host the Runtime||0|
|Unmanaged Executable Files||218|
There are 3 copies of the managed API and resources for the tablet PC, but more interesting files
are those in the
Reference Assemblies\Microsoft\Framework\v3.0 subfolder.
These are assemblies for Windows Presentation Foundation (WPF) and the Workflow Foundation
(WF). Excluding copies and resource files this means that there are 25
A visual search on the GAC shows that there are non-Framework assemblies that
have not been accounted for in the two searches above. So the final place to
look for managed assemblies is the side-by-side cache that was excluded earlier
\windows\WinSxS). This command line:
gives 138 directories (ie assemblies). This list is too long to give here but they represent the following groups of assemblies:
- Media Center (assemblies starting with
- Event Viewer
- Group Policy
- Microsoft Management Console
- Trusted Platform Module
- Network Access Protection
- Presentation Framework
- Workflow Framework
- Communications Framework
- Task Scheduler
I estimate that there are 91 framework assemblies in
leaving 47 in the groups above.
Thus, there are 110 managed assemblies in Vista RC2 (29 under
+ 9 under
System32 + 25 under
Program Files + 47 under
WinSxS). Compare this to the number of unmanaged executable files,
an estimate can be determined by adding the number of unmanaged executable files
System32 (2619) and those under
(218), to yield a figure of 2837.
Put this another way: 3.7% of the executable files (excluding the framework files) in Vista RC2 Ultimate are managed, therefore, Vista is less than 4% managed. Note that I am using the number of executable files since it is easier to measure than the number of publicly available API methods.
I will let readers draw their own conclusions from this figure.
Am I happy with the situation? Well, if I apply the hopes that I had after the 2003 PDC then I would say that I am not. At the 2003 PDC we were promised an operating system and an API that was substantially based on .NET, yet what Microsoft has delivered with Vista RC2 Ultimate is an operating system where just four percent of the executable files are managed. On the positive side, Vista has WPF and WCF installed by the setup program, so these technologies will be on every Vista machine. This will encourage developers to use these technologies. Do I think that Vista uses .NET as much as it should? Well, using it for a few snapins in MMC is fairly minor compared to what it could be used for. However, I recognise that this is all that we will get.
What happened to the previous version of this page? The first version of this page contained the results of the number of managed files on each version of Longhorn/Vista released by Microsoft and I compared these figures with the values I obtained from the PDC 2003 version of Longhorn. Each time I expressed my disappointment that Microsoft was not doing enough with .NET. Eventually I came to the conclusion that Microsoft had no intention of living up to the promises that they had made at the 2003 PDC.
That page got quite a lot of attention from the rumour mongers at slashdot, and that then got the attention of various project managers at Microsoft. I started to get 'advice' from Microsoft, some advice was more useful than others. One project manager exclaimed that he would not include me on any more betas. I thought that this was very odd and was a misplaced sentiment. My intention was always to make Vista and .NET better and I have never released NDA material. I personally know of several beta testers (and pre-beta testers) who have released NDA material, but Microsoft still continue to invite them to beta test software. The only conclusion that I could draw from those comments was that if you want to be a beta tester for Microsoft you must not be critical, however constructive, about their software.
Then I was offered a way out by another project manager. He offered a carrot-and-stick approach. The stick was clear, if I didn't take down that page, he told me, my likelihood of doing any work for Microsoft would be slim. No work giving conference talks, no consultancy, no writing; I would be unlikely to be invited to test any more betas. And, of course, it was unlikely that I would be awarded MVP in the future. The carrot was that if I took down the page I would be asked to write some white papers for Microsoft. Well, I took down the page, but I still suffered the stick. And the carrot? It did not appear, my emails to the project manager who made the offer have not been answered. So this is why I have posted this page here again.
Performing The Tests
Initially I decided to write the test application using the .NET framework. In general .NET makes this sort of thing simpler than Win32 because the .NET framework library is much higher level than the Win32 library or the C runtime library. The C# compiler is supplied as part of the .NET framework installation and this meant that I could simply copy the source files to the test machine and compile it for the build installed on that machine. However, XPSP2 does not have the .NET framework installed, and installing it could affect my results. So after getting my code to work I committed a cardinal sin and ported the code to Win32. This meant that the test application was no longer dependent upon any version of the .NET framework. In this section I will describe the managed version of the test application rather than the unmanaged version because the code is easier to understand
Since I intended to compile this code using the tools provided as part of the .NET framework re-distributable I decided not to use Visual Studio. The .NET test application is a Windows GUI application (the unmanaged version is a command line application) but I decided to create it without the Visual Studio designer. This is not difficult to do, especially if the application uses anchored or docked controls.
The download for this article has the source files
and a make file for the application. The main code is in a file called
has code that loads a file and then performs the checks to see if the file has
non-zero values in the COM Descriptor Directory (ie it has a CLR
Header) as well as checking to see if the Import Name Table imports
one of the runtime binding functions.
The following picture is a simple illustration of the format of a 32-bit Windows file:
Every 32-bit Windows executable files starts with the characters
(or the bytes
0x5a4d), after that are various values including (at
the relative address of the PE identifier,
PE\0\0 (the bytes
0x00004550). A file that lacks these identifiers is not an executable
file. Immediately after the PE identifier is the COFF header, and then comes the PE
header. The important value in the COFF header relevant to this discussion is the number of
sections, which we will return to in a moment. The PE header contributes the
dumpbin lists as being the Optional Header Values,
in spite of the name, these are actually vital for a PE file! The end of the PE header is an array
that is called the Data Directory. There are sixteen items in the Data
Directory and each one is an eight byte value: the lower four bytes are a
virtual address and the remaining bytes are the size of the item referred to.
Three data directory items are relevant to this discussion: Import Table
(index 1), Delay Import Directory (index 13) and the COM Descriptor
Directory (index 14).
The address given in the data directory is a Relative Virtual Address (RVA), this will be the address in memory where the file has been loaded. However, this will not be the offset from the beginning of the disk file because the operating system will align data sections according to memory page boundaries. Since our program will manipulate disk files we need to convert an RVA to a raw disk file offset. The way to do this is to use the section header tables.
After the PE header will be one or more section headers (the number of
section headers was given in the COFF header that I mentioned earlier). Each
section is named, and the name can be up to 8 characters (note that the section
header allocates just eight bytes, so an eight character name will not have a
NUL character). The section header also lists the size
of the section, the raw file offset of the section and the RVA. Therefore, to
convert an RVA to a raw address you need to loop through the section headers
and obtain the start and end RVA for the section. When you have found the
section that contains the RVA you can then find the offset
from the section's start RVA and add this offset to the section's raw address to
get the raw offset in the file.
For example, imagine that a file has the following sections:
|Section||RVA Start||Size||Raw Address|
To find the raw address of an RVA of
0x46abc, you first identify
which section it is in. You can see that this address is in the
0x47d8f) at offset
This section starts at the raw address
0x44c00, so you add this
address to the offset to get a raw address
From the above description you have all the knowledge you need to determine if a file is an assembly. First, you find the location of the PE header, then you access the fourteenth Data Directory (the COM Descriptor Directory) and read the RVA and size. If both are non-zero then it means that the file contains a CLR header and so it is a .NET assembly file.
I outlined earlier that some unmanaged files may host the .NET runtime by
calling one of the runtime hosting functions. To test for this you need to get
access to the Import Table. This has one or more Import Descriptor structures.
Each one will have the RVA to the name of the DLL that it refers to and an RVA
to an array called the Import Name Table (INT). The INT is an array of
32-bit numbers (the end of the table is a zero entry) if the top bit of the
entry is set then the rest of the bits give the ordinal for the function (ie a
number). If the top bit is clear then the number is the RVA to another
structure. This final structure has 2 bytes which contain the ordinal and the
following bytes are the name of the function terminated with a
Thus, to determine whether the file imports a .NET binding function the code loops
through all of the Import Descriptors and looks for the
entry. If it finds this entry, the code then loops through all the entries in
the INT looking for the binding functions by name or ordinal. If such a function
is found the file is marked as hosting the CLR.
It is possible that the executable could use delay loading. Normally, when Windows loads an executable (process or DLL) it accesses the INT and then loads each DLL specified, Windows then determines the address of each function that the executable requires and puts the address of the function in the executable's Import Address Table so that the function can be called by the executable. The problem with this approach is that some DLLs have initialization code and sometimes this code takes a while to execute. This makes the loading of the executable take longer.
Delay loading is not a feature of the operating system,
it is a feature provided by the
Visual C++ linker.
In effect, the loading of the DLL is delayed until the first time that a method
in the DLL is called. The Delay Import Directory is an array of
delayimp.h). This is similar to the Import
Descriptor structure I mentioned earlier: there is a member that is the RVA
to the name of the DLL and an RVA to an INT. My code loops through each
ImageDelayDescr structure in the Delay Import Directory, tests to see if
the DLL's name is
mscoree.dll and if so it accesses the INT and if
it contains one of the runtime binding functions the executable is marked as
hosting the CLR.
All of this assumes that the code can access the file. If the file has been
opened for exclusive access by another process then you will not be able to open
it for these tests. One possible way to get round this issue is to make a copy
of the file and test the copy. If the copy fails then my program flags an
error. Note that if the file is large then making a copy will take a long time. The
program maintains a hard coded list of files to ignore (you will have to change
the source code to change this list) and at the moment this list only contains
the single file
The main file in the application (
dotnetornot.cs) handles the user interface and collates the
files that will be tested. Some folders are just copies of existing files, for
example, XP maintains a folder called
dllcache which is essentially
a copy of the
system32 folder. This is maintained by the Windows
File Protection mechanism which exists to try and prevent malware from
attaching itself to an operating system file: if a file in
changes WFP will detect this and replace the changed file with the copy in
dllcache. The test program maintains a list of folders that should be
ignored, and at start up this list of folders is read from a file called
The Exclude tab gives access to this folder list and
and it provides a mechanism to edit it.
When you click on the Start button a new thread is created to perform
the search. The reason why the search is performed on a new thread is because the search can take a long
time and if the main thread is used this would block any updates to the user
interface. Using a different thread means that user interface updates can occur
while the search continues, but it also means that if the search routine needs
to access the user interface (to update the progress bar, or to add a value to
the list view) this must be done on the GUI thread by calling
ISynchronizeInvoke.Invoke. Once the search has started the Start button changes to a Stop
button which you can use to abort the search.
The search mechanism needs to know the folder where the search will be started. Initially I used the
FolderBrowserDialog to allow the user to select
the start folder. However, I found that when I used
this on Vista Beta 1 it took a long time to show the root of the disk where
Vista was installed. The reason was that when you select a folder the
browser dialog needs to determine if each subfolder itself has subfolders to
determine if it should show the subfolder with a + symbol in the tree view. In
Vista there are a few folders in the root that have over
eleven thousand folders and so enumerating these subfolders took a long time to
Part of the problem was that the folder browser dialog attempted to fill the
tree view with information about all the folders on the disk. To get round this
issue I decided to write my own version of the folder browser dialog (
that would read as little as possible from the disk. So if you expand a folder
FileChooser it will already contain child folders (the dialog
had to do this to make the folder have the plus symbol), however, the expand
handler needs to determine if the child folders has subfolders, and so the
browser dialog accesses the next level of folders (but only the next level): the
children of the child folders.
The start folder selected by the user will be passed to the worker thread. The first thing that the thread does is check to see if this folder is on the exclusion list, and if it is, the method returns immediately. Then the worker thread enumerates all the files in the folder, and for each file it checks to see if the file is on the exclusion list and if it is not, it then checks to see if the file is managed. The name of the file and an icon representing the type of the file is added to a list view. Once all files have been checked the worker thread enumerates all the folders in the test folder and recursively calls the search method for each folder.
The icons in the list view indicates the type of the file: the file could not be opened, non-PE file, an unmanaged file, managed file or an unmanaged file that hosts the the runtime:
A running count is kept of all of these identified file types and at any time you can click on the Statistics tab to get a summary of these and a list of the extensions of the files that were tested:
There are two downloads for this article. The first is a managed project that
has a GUI: the source code and a compiled executable are provided. The second
download is an unmanaged project supplied as an executable only. The unmanaged
project is a command line executable and a DLL,
it has a file called
excluded.txt that, on each line, has the name of a folder that you want
excluded from the search (for example
I don't provide the source for this project because it is horribly messy and I
haven't the time to make it pretty and easy to read.