.NET Fusion
Home About Workshops Articles Writing Talks Books Contact

12. Native Images

.NET assemblies are just-in-time compiled to native machine code. This occurs on a method-by-method basis just before the method is called. This compilation takes time, but it also means that the code is writeable, and so cannot be shared between processes and raises the memory usage of the machine. You can pre-JIT an assembly, that is, the the JIT compilation can be performed over an entire assembly so that the .NET assembly loads faster and there is a more efficient use of memory. Such a pre-JITted assembly contains native code and is known as a native image.

12.1 Loading an Assembly

When your code uses an assembly it indicates to the runtime the name of the assembly and the method to call. As you know, Fusion will try and locate that assembly, it may already be in memory, or it may be somewhere on disk or on the network. Fusion will locate and load the assembly. If the requested method has not been called before the runtime will just-in-time compile it, that is, it will go through the IL and compile it to native code. This native code will be cached in memory, so that next time the method is called the native code will be used.

Thus, the runtime is not an interpreter, because the JIT compilation of a method occurs just once for a particular instance of a process. However, note that JIT compilation is on a method-by-method basis, it does not JIT compile the entire assembly. However, the runtime does make intelligent decisions: if the method being compiled calls other methods the JIT compiler will determine the size of those methods and if they are small it may decided to inline them, that is, copy the IL from those called methods into the method being compiled. I explained earlier in this workshop that you can tell the runtime not to inline methods, however, this is only useful for debuggers.

JIT compilation takes time, however, it is not the only time consuming activity that occurs when an assembly is loaded: there is a lot of security work being performed. I will cover this work in detail in the security workshop, but here I will outline what happens. First, if the assembly is private and has a strong name, the runtime will perform strong name validation, that is, the runtime will create a hash of the assembly and then compare this with the value obtained by decrypting the strong name signature with the embedded public key. If the two hashes are not the same, the runtime will not load the assembly. If the assembly is installed in the GAC this step is omitted because it will have been performed when the assembly was first installed into the GAC and the runtime assumes that the GAC is a secure location so nothing can happen to the assembly since installation.

The runtime will now load the assembly in memory and perform PE file validation. Typically Trojans work by changing your code to call their code, but they could also alter internal tables in the PE file to exploit vulnerabilities in the operating system to allow their code to be called. The runtime will check that the PE file containing the assembly is valid, that is, it will check things like unmanaged resources and the table of imported DLLs. These checks ensure that PE file addresses are within the correct range.

Once the PE tables have been verified, the runtime will validate the metadata that is contained within the assembly. This metadata gives information about the types implemented in the assembly, but it also gives information about external types that are used. Metadata is stored in tables, similar to a relational database and an entry in one table may reference an entry in another table. The runtime checks to make sure that such references are valid, that is, the correct table is referenced and the index of the item is within the bounds of the table. For example, the table describing methods will have a Relative Virtual Address (RVA) of the position in the file that contains the method's IL. The runtime checks to make sure that these addresses are held within the part of the file where .NET code resides. The runtime will also inspect the metadata to determine if it refers to a valid .NET type, that is, the type follows the rules of the .NET type system.

At this point the runtime knows that the metadata is correct so it uses some of the metadata that it has validated: the minimum permissions requested by the assembly. First, the runtime determines the security permissions (if any) that the assembly is granted by the machine's security policy. That is, it gathers evidence about the assembly and uses the security policy to obtain the permissions (see the security workshop for more details) that the assembly will be granted. The security policy defines code groups which are collections of permissions. The policy maps evidence to code groups, and so an assembly will be granted zero or more permissions collated from all of the code groups that policy says the assembly is a member of. Permissions allow an assembly to perform some action, and the runtime library will check that an assembly has the necessary permissions before it will execute code requested by that assembly. An assembly can indicate the permissions that it will require to have before it can perform its work, and this will be stored as metadata. So once the runtime has determined the permissions that the assembly is granted (a permission set) it will then check that the required permissions are in the permission set. If not, then the assembly clearly cannot run, so the assembly will not be loaded.

Although the runtime knows that the metadata is correct, it does not know whether the IL for the  methods defined in the assembly is correct. So the next action performed by the runtime is to validate and verify the IL in the assembly. To do this, the runtime walks though all the IL in every method, following every branch of the code. The runtime does not set up a stack, so no data will be processed, instead, it follows every code path and inspects the IL opcodes in each path. IL validation involves checking that the opcodes are valid, that is, it checks that the collection of bytes for each opcode is a valid sequence. During validation the runtime will also check jump opcodes to make sure that they jump within the method. Invalid IL is a symptom of assembly corruption, or of a broken compiler.

Next, the JIT compiler verifies the IL. This is not an exact science, because the JIT compiler has to verify that the code is performing safe operations. Although it is possible to determine when code is performing unsafe operations (for example, calling native code) it is impossible to be 100% sure that code is safe. The JIT compiler takes a conservative attitude and may fail to verify code that is safe. However, an assembly that has code that has not been verified can still be run as long as the assembly has been granted permission to execute unverified code. By default, code that has been installed on the computer has this permission. If an assembly is known to have unsafe code (for example, most code generated by the Managed C++ compiler) then the assembly can have metadata that indicates that this verification step should be skipped.

Now that you know about how assemblies are loaded, let's see how native images changes this.

12.2 Ngened Assemblies in .NET v1.1

In the previous section you can see that a lot of steps are performed before a method can be executed. If your application has lots of assemblies then these checks are performed for each assembly. These checks are important, because security is the most important aspect of your code. If your code is not secure then there really is no point in writing it.

In version 1.0 and 1.1 of the runtime Microsoft provided a tool called ngen.exe (Native Image Generator). This tool is used to perform JIT compilation step on all methods in an assembly and save the result as a native image file. This image is stored in a location called the native image cache so that the runtime can use it whenever the original IL assembly is called. The idea of the native image is to replace the JIT compilation step, but it also has some deeper ramifications. Although the native image is used as a replacement for the IL assembly, the IL assembly must still exist. When an assembly is loaded, the runtime will perform all the initialization steps as outlined above, but when it comes to JIT compile methods in that assembly it will first check to see if there is a native image in the cache and if so, this will be loaded and the JIT compilation step will be omitted. The documentation is not clear about at which point exactly that the runtime will load the native image. After all, when the native image is created the metadata and IL must be validated and the IL must be verified, and since the native image has neither IL nor metadata (well, it actually has a little bit of metadata) it means that these steps cannot be performed on the native image. However, since these steps will have been performed on the original assembly before the native image was generated and the native image is stored in a secure place on the hard disk, there does not appear to be a reason for these steps to be performed on the original assembly another time. So logically, the native image cache should be checked before the metadata tables are validated. However, as I have mentioned, the documentation does not list whether this is the case.

In addition, the IL assembly must be available on the machine in case the runtime finds that the native image is invalid for some reason. JIT compilation is dependent upon many factors, the runtime version, security policy and binding policy are a few. If any of these factors change the runtime will revert to the normal JIT compilation on a method-by-method basis. For example, a developer can provide link demands which indicate that a security check is performed at JIT compile time. Clearly, a link demand will be performed when the native image generator is run and this does not necessarily represent the situation when the native image assembly is run. Indeed, if the security policy changes when the assembly is run (and it is not a superset of the policy in force whenthe native image generator was run) then the runtime will ignore the native image assembly and instead it will load the IL image assembly as normal. Native images have a very small amount of metadata, but nothing for the types defined in the assembly, so if your code, or some other code uses reflection on your types metadata has to be available, and this means the IL assembly. (This has changed in .NET 3.0/2.0.)

JIT compiling methods as they are used has a less than obvious downside. In unmanaged DLLs exported functions are marked in the exports table and if the DLL is loaded at the library's preferred load address then the addresses stored in this table can be used by calling modules. If the DLL is loaded at a different address then the operating system has to change - or fix up - these addresses. This fix up operation takes time and, since these tables will change it means that they have to be writable memory pages. Read-only pages are sharable between processes so once a DLL has been loaded it means that other processes that use these pages will load quicker and the memory usage of the system is reduced. Writable pages are specific to a process, so if another process uses the same DLL it will have its own writeable pages, this increases the overall memory usage of the system.

The same general issues occur with .NET assemblies. Since .NET methods are JIT compiled it means that the generated code is created at run time, which means that it must be in writable, non-shareable memory pages. This increases the memory foot print of the process. When you create a native image the code that is generated will be loaded into read-only, sharable pages. Thus another bonus of creating a native image is to reduce the memory footprint.

Note that the native image generated is essentially the output of the JIT compiler and this output is cached for later use. However, even though this cache looks like a 'central repository' it is not a mechanism to share assemblies. That is the purpose of the GAC. The JIT compiler will compile both process and library methods, and the same is true about the native image generator. So native images can be generated from process assemblies as well as library assemblies, this is in contrast to the GAC which can only contain libraries.

Another problem with native images is that they cannot be shared across application domains. This means that if your application has more than one application domain then you cannot use native images. The most obvious example of an application with multiple application domains is the ASP.NET worker process: you cannot use native images in ASP.NET 1.1. (This has changed in .NET 3.0/2.0.)

Although the native image generator, ngen, was documented by Microsoft in version 1.0 and 1.1 of the runtime, they hardly encouraged people to use it. This was deliberate on Microsoft's part. Microsoft created native images of the framework assemblies, but since they controlled the policies that would apply to their assemblies, and they controlled the issuing of service packs and hotfixes, the deficiencies of native images, just outlined, does not affect their assembles. The problem, of course, is that you do not have such control over your assemblies, so Microsoft decided not to encourage you to use the native image generator. In .NET 3.0/2.0, Microsoft are more confident about native images and now they do encourage you to use them, or rather, they encourage you to perform performance tests to see if native images improve your application.

12.3 Contents of Native Images in .NET 1.1

When a native image is generated the ngen tool will place it in the native image cache. Information like the IL and metadata have been stripped, so there is no way that the runtime can perform validation and verification - those checks have to be performed on the IL assembly. The runtime assumes that the native image is the output of the JIT compiler, and this assumption can be accepted if the native image is stored in a secure place so that rogue code cannot alter the generated native code. The Fusion namespace extension in Windows Explorer will list native images as if they are in the GAC, and this is partially correct. The native image cache is in a folder under %windir%\assembly. But the images are not necessarily shared like GAC libraries are.

Native images are created with ngen.exe. This tool will install and uninstall assemblies. It can also show the native images that have been generated. Try this:

ngen /show

This will show all the native images in the native image cache. You will find that most of the framework libraries will have native images.

.NET Version 3.0
The new version shows two groups of assemblies, Native Images and NGEN Roots. An NGEN Root is an assembly that uses other assemblies and this can be a process or a library. A Native Image is just a generic term, and it includes native image roots.

Now take the library, process and key file from the previous pages. First, make sure that the library is not in the GAC (gacutil -u lib) and make sure that a configuration file does not exist in the folder. Now compile the library with a strong name and compile the process that uses the library. Run the application to confirm that it picks up the assembly from the application folder. Next generate a native image.

ngen lib.dll

Run the process. You'll see that the code base of the library is the same as before - the application folder. As I mentioned earlier, the native image cache is not a mechanism to share code. In this case the library is a private assembly irrespective of whether you have generated a native image. A native image can also be generated for the process, but in this case you should also generate the native images of the libraries it uses, if you don't, then those libraries will use JIT compilation and so you lose the effect of using native images. To try this out, delete the library native image and then generate the native image for the process and library:

ngen /delete lib
ngen app.exe "lib, Version=1.0.0.0, Culture=neutral, PublicKeyToken=3bf941bb1f722efe"

It is important that you give the full name of the library so that the JIT compiler can determine if there are any publisher policy files in force, which may change the version of the library used by the process, and hence the image that is generated.

.NET Version 3.0
If you pass a assembly - a process or a library - to the new version of ngen the tool will create a native image of this assembly (the root) and all the libraries it uses.

So where is the native image cache? Well, this is the so-called Zap cache which you can enumerate with the code given in Example 10.3. Also, you can get the absolute address of this cache by calling GetCachePath function, which is part of the unmanaged Fusion API. However, it is far simpler just to peak in the assembly cache folder. Move to the this folder under %windir% (pushd %windir%\assembly), and list the folder contents. You'll find a folder with a name something like NativeImages1_v1.1.4322 (.NET version 3.0/2.0 has a folder called NativeImages_v2.0.50727_32). Change to this folder and list its contents. You'll find a list of the short names of the native image assemblies that have been generated including the app and lib assemblies that you just added. Change directory to the lib folder. There you'll find another folder with a name that is composed of the version, culture and public key token, similar to how the GAC stores libraries. Change to this directory and here you'll find the native image of the library.

.NET Version 3.0
.NET version 3.0/2.0 has a slightly different scheme. The main folder is called NativeImages_v2.0.50727_32 and again there are folders with the short names of the assembly native images that have been cached. Under each is a folder with the name of a 16 byte hex number: a GUID that identifies the native image. Within that folder is the cached native image, but note that the runtime does not use the original name. If the original file was lib.dll the native image file is called lib.ni.dll. This means that if you obtain the list of modules loaded by an application (for example, type tasklist /m /fi "IMAGENAME eq app.exe" where app.exe is the process built with .NET version 3.0/2.0) you'll be able to identify the native image files used by the process by their name, and this includes the framework libraries (for example mscorlib.ni.dll).

Now run dumpbin on this assembly:

dumpbin /headers lib.dll

Take a look at the data directories (at the end of the Optional Header Values). There you'll see that the COM Descriptor Directory has a value. This means that the file is managed! Write down the RVA of this item (on my machine this is 0x2008), and then write down the virtual and raw addresses of the .text section (on my machine these are 0x2000 and 0x200), so that you can convert RVA's to raw addresses. 

We need to list the contents of this file, ILDASM is an obvious choice, however, you'll find that this will only list the manifest. Further, the advanced option (/adv) allows you to view the various CLR headers in an assembly (the COR Header option on the View menu). However, if you try this option on the native image file then you'll find that ILDASM will hang. Instead, use dumpbin to list the CLR header:

dumpbin /clrheader lib.dll

On my machine I get this:

Dump of file lib.dll

File Type: DLL

clr Header:

    48 cb
  2.00 runtime version
  2210 [ 43C] RVA [size] of MetaData Directory
     6 flags
     0 entry point token
     0 [ 0] RVA [size] of Resources Directory
     0 [ 0] RVA [size] of StrongNameSignature Directory
     0 [ 0] RVA [size] of CodeManagerTable Directory
     0 [ 0] RVA [size] of VTableFixups Directory
     0 [ 0] RVA [size] of ExportAddressTableJumps Directory

Summary

   2000 .data
   2000 .reloc
   2000 .text

This does not give you much information other than there is metadata in the assembly. Now load the assembly in a hex viewer (like Visual Studio). Since Windows Explorer uses the Fusion namespace extension you'll not be able to use the Open File dialog in Visual Studio. The simplest way to change this is to temporarily change the name of the desktop.ini file in %windir%\assembly: 

pushd %windor%\assembly
attrib desktop.ini -s -h -r
rename desktop.ini desktop.ini.old

After you have loaded the file, undo the changes:

rename desktop.ini.old desktop.ini
attrib desktop.ini +s +h +r
popd

The first thing you'll need to do is convert the RVA of the metadata to a raw address. To do this, subtract the virtual address of the .text section you recorded earlier from the RVA and then add the raw address of the .text section. In my case this gives 0x410. Move to this location and investigate the values. Here is what I get:

0410 42 53 4a 42 01 00 01 00 00 00 00 00 0c 00 00 00 BSJB............
0420 76 31 2e 31 2e 34 33 32 32 00 00 00 00 00 04 00 v1.1.4322.......
0430 60 00 00 00 78 00 00 00 23 7e 00 00 d8 00 00 00 `...x...#~......
0440 18 00 00 00 23 53 74 72 69 6e 67 73 00 00 00 00 ....#Strings....
0450 f0 00 00 00 10 00 00 00 23 47 55 49 44 00 00 00 ........#GUID...
0460 00 01 00 00 3c 03 00 00 23 42 6c 6f 62 00 00 00 ....<...#Blob...
0470 00 00 00 00 01 00 00 01 05 40 00 00 09 00 00 00 .........@......
0480 00 fa 01 33 00 02 00 00 01 00 00 00 01 00 00 00 ...3............
0490 01 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 ................
04a0 01 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 ................
04b0 01 00 01 00 0b 00 06 00 ac 00 04 80 00 00 01 00 ................
04c0 00 00 00 00 00 00 01 00 00 00 01 00 0a 00 00 00 ................
04d0 01 00 00 00 88 13 00 00 00 00 00 00 a3 00 0e 00 ................
04e0 00 00 00 00 00 00 00 00 00 3c 4d 6f 64 75 6c 65 .........<Module
04f0 3e 00 6c 69 62 00 6d 73 63 6f 72 6c 69 62 00 00 >.lib.mscorlib..
0500 3a 2e 71 a6 46 08 0b 4c 80 db 00 a6 8b 9c 18 a4 :.q.F..L........

You can use the .NET ECMA spec to decode all of this. I won't go into too many details, I will just identify the basics. Location 0x410 is the start of the Metadata Root (ECMA Spec, Partition II, 23.2.1). It is followed, at location 0x430, by an array of Stream Headers (23.2.2), each header has the offset of the stream from the start of the metadata root, its size, and then the name of the stream (a variable length string): #~ has values for the metadata tables, #Strings has values of the string name of types and members, #GUID has associated GUIDs and #Blob has raw data used by the runtime. (Another stream, not present here, is #US which has user strings, that is, string literals in your code.) The #~ stream is at offset 0x60 and its size is just 0x78 bytes. This corresponds to a location of 0x470 (0x410 + 0x60). The metadata stream is described by the ECMA Spec in section 23.2.6 which shows that the eight bytes at 0x478 is a bit map where each bit corresponds to a metadata table. In this case the value indicates that the tables that are listed are the tables with the following indexes: 0x0, 0x2, 0xe, 0x20 and 0x23. These are the Module (21.27), TypeDef (21.34), DeclSecurity (21.11), Assembly (21.2) and AssemblyRef (21.5) tables. Again, I will spare you the details of decoding these tables, but basically there is just one item in each table. On initial sight these values are understandable, after all, the lib library has just one type. However, I can tell you that I have performed this analysis on some of the framework native images and find the same values: there is just one type defined in every assembly. Furthermore, the type defined in lib, LibraryCode, has two methods GetVersion and its constructor (created by the compiler). These methods should be described in a MethodDef table (table index 0x6, 21.24 of the spec) but this code has no such table.

The names of the types defined in the assembly should be listed in the #Strings stream. This stream starts at 0x4e8 and its size is 0x18 bytes. Here's the relevant data:

04e0 00 00 00 00 00 00 00 00 00 3c 4d 6f 64 75 6c 65 .........<Module
04f0 3e 00 6c 69 62 00 6d 73 63 6f 72 6c 69 62 00 00 >.lib.mscorlib..
0500 3a 2e 71 a6 46 08 0b 4c 80 db 00 a6 8b 9c 18 a4 :.q.F..L........

As you can see, there are just three strings, <Module>, lib and mscorlib. lib is the name of the module (and hence the only entry in the Module table), mscorlib is the name of an assembly that is referenced (and hence the only entry in the AssemblyRef table), so the remaining string is the name of the only type that is defined in the assembly. LibraryCode does not exist because it has been compiled to native code and so its metadata does not exist in this file.

Now check the CLR header (24.3.3). Dumpbin prints out some values, but not all of values in this structure. You recorded the location of this structure earlier on and on my machine I get a value of 0x2008 for the RVA, this corresponds to a raw address of 0x208:

0200 00 00 00 00 00 00 00 00 48 00 00 00 02 00 00 00 ........H.......
0210 10 22 00 00 3c 04 00 00 06 00 00 00 00 00 00 00 ."..<...........
0220 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0240 00 00 00 00 00 00 00 00 50 20 00 00 40 00 00 00 ........P ..@...

The interesting entry is at 0x248, this is the RVA and size of the ManagedNativeHeader item (RVA 0x2050, size 0x40). The name of this data directory implies that it is the location of the native image code, but unfortunately, the ECMA spec does not document this structure so we can go no further with this analysis.

.NET Version 3.0
The version of dumpbin supplied with Visual Studio 2005 gives all of the values of the CLR Header, including ManagedNativeHeader. However, there is still no information about the data that it points to.

Finally, load the library again in ILDASM. Take a look at the manifest. This indicates that there is a code access security permission set (of the type prejitgrant) which indicates that verification should be skipped for this prejitted code. (This makes sense because the code will have been verified when the native image was generated and anyway, verification of native code will fail.) There is also the public key which you gave to the IL assembly.

Close down ILDASM, and return to the original folder (popd). Remove the prejitted application with:

ngen /delete app.exe "lib, Version=1.0.0.0, Culture=neutral, PublicKeyToken=3bf941bb1f722efe"

12.4 CLR Optimization Service in .NET 3.0/2.0

The last section identified that there is an inherent brittleness in native images: a native image is heavily dependent upon the OS version and local settings like the security policy and binding configuration. If any of these things change the native image becomes invalid. Of course, if you know that something has changed that could affect the native image you can always rebuild the affected images. Microsoft decided to make this mechanism easier to do in .NET version 3.0/2.0.

Again, use the library, process and key file that you used in the last section. Compile the library with the .NET 3.0/2.0 compiler and then compile the process. Next, generate a native image for the process:

ngen install app.exe

Notice the new syntax: you provide a command followed by the name of the assembly. If that assembly uses other assemblies then their native images will be generated too:

Microsoft (R) CLR Native Image Generator - Version 2.0.50727.42
Copyright (C) Microsoft Corporation 1998-2002. All rights reserved.
Installing assembly C:\TestFolder\Fusion\2.0.50727\12.3\app.exe
Compiling 2 assemblies:
   Compiling assembly C:\TestFolder\Fusion\2.0.50727\12.3\app.exe ...
app, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null
   Compiling assembly lib, Version=1.0.0.0, Culture=neutral, PublicKeyToken=3bf941bb1f722efe ...
lib, Version=1.0.0.0, Culture=neutral, PublicKeyToken=3bf941bb1f722efe

Notice that ngen first determines the libraries used by this assembly and then it generates the native images. The tool determines the libraries used by the assembly by reading its metadata. ngen will not generate native images for  assemblies you dynamically load using one of the Assembly methods. Now list the assemblies in the native image cache, the new syntax is this:

ngen display |more

I have piped the output through more because there is so much output. Notice that it gives two lists, NGEN Roots and Native Images. The former lists the top level assemblies in the cache, and you will find that app.exe will be listed in this group (note that the path to this assembly will be given). Native Images is a general list of the native images and in this list you'll find app and lib (in this case, the full name of the assemblies are given). Now ask ngen to display a filtered list of the cache:

ngen display app

This will show that app is a root and it will show that the root, app, depends on, erm, app. OK, so this last statement seems to be a tautology, but if you try this action on lib, you'll see that it makes more sense: lib is not a root, but one root, app depends on it. This shows that information is stored about the roots and the libraries they depend on, a mechanism called tracking.

Now let's take a look in the native images. Move to the assembly cache (pushd %windir%\assembly) and then move to the native image cache (on my machine for .NET version 3.0 this is called NativeImages_v2.0.50727_32). Move to the folder for the library (cd lib) and there you'll find a folder that has the name of a 16 byte hex number. Change the directory to that folder. The folder will contain a single file called lib.ni.dll. Run dumpbin on this file to list the headers.

dumpbin /headers lib.ni.dll

The first difference you notice is that the file has many more sections than a normal Win32 DLL or a .NET library assembly. A library assembly will normally have three sections: .reloc (relocation information) .rsrc (unmanaged resources) and .text (the location of the IL and metadata). The native image has eight sections, the new sections are: .data (global variables), .dbgmap, .extrel, .il and .xdata (unwind information for native SEH exceptions). Three of these are new section types. Of particular interest is the .il section, this section is almost as large as the .text section for this file (with good reason, as you'll see later). The .il section is marked as containing read-only code, whereas the .text section is marked as being read/execute code. So the data in the .il section is meant to be read, but not executed. Even more interesting is the .data section which is read/write initialized data and is larger than either the .il or .text sections. I do not know what this section is used for.

Write down the virtual address and the raw address of the .text and .il sections because you'll use them later. Now use dumpbin to list the CLR header in the file:

dumpbin /clrheader lib.ni.dll

Write down the RVA of the Metadata Directory. Also, notice that dumpbin lists the undocumented ManagedNativeHeader directory. Now load the library in the hex editor of Visual Studio (you'll have to disable the namespace extension using the steps I outlined earlier). Move to the location indicated by the Metadata Directory, you'll need to convert from an RVA to the raw address. On my machine the RVA is 0x23d0 and the .text section covers the range 0x2000 to 0x28e2, so the metadata is in the .text section. Since this section starts at the raw address 0x400, it means that the metadata is at 0x7d0 (0x400 + 0x23d0 - 0x200):

07d0 42 53 4a 42 01 00 01 00 00 00 00 00 0c 00 00 00 BSJB............
07e0 76 32 2e 30 2e 35 30 37 32 37 00 00 00 00 05 00 v2.0.50727......
07f0 6c 00 00 00 8c 00 00 00 23 7e 00 00 f8 00 00 00 l.......#~......
0800 18 00 00 00 23 53 74 72 69 6e 67 73 00 00 00 00 ....#Strings....
0810 10 01 00 00 08 00 00 00 23 55 53 00 18 01 00 00 ........#US.....
0820 10 00 00 00 23 47 55 49 44 00 00 00 28 01 00 00 ....#GUID...(...
0830 dc 02 00 00 23 42 6c 6f 62 00 00 00 00 00 00 00 ....#Blob.......
0840 02 00 00 01 05 40 00 00 09 00 00 00 00 fa 01 33 .....@.........3
0850 00 16 00 00 01 00 00 00 01 00 00 00 01 00 00 00 ................
0860 01 00 00 00 02 00 00 00 00 00 00 00 01 00 00 00 ................
0870 00 00 00 00 00 00 01 00 00 00 00 00 01 00 01 00 ................
0880 0b 00 06 00 b5 00 04 80 00 00 01 00 00 00 00 00 ................
0890 00 00 01 00 00 00 13 00 13 00 00 00 02 00 00 00 ................
08a0 00 00 00 00 00 00 00 00 01 00 0a 00 00 00 00 00 ................
08b0 01 00 00 00 00 00 00 00 00 00 00 00 0a 00 13 00 ................
08c0 00 00 00 00 00 00 00 00 00 3c 4d 6f 64 75 6c 65 .........<Module
08d0 3e 00 6d 73 63 6f 72 6c 69 62 00 6c 69 62 00 00 >.mscorlib.lib..

The size of the #~ stream 0x8c bytes and it starts at offset 0x6c from the start of the metadata root. This is an address of 0x83c and the 8 bytes at 0x844 gives the bitmap of the metadata tables present. As before, these are the Module, TypeDef, DeclSecurity, Assembly and AssemblyRef tables. These tables have one entry each, except for the AssemblyRef table which has two entries (this is different to .NET 1.1, where the AssemblyRef table had one entry). Again, I don't want to go into the details of how to calculate what these entries are, but basically one of these entries is a reference to the mscorlib library and the other is to lib; in .NET 1.1 the single entry was to mscorlib. So the metadata in this native image is very similar to 1.1 native images except that there is this extra assembly reference to the assembly that was used to create the native image.

Now move to the position in the file where the .il section is stored (you should have recorded this raw address from the data dumpbin gave). Here is part of the data I see:

2e00 42 53 4a 42 01 00 01 00 00 00 00 00 0c 00 00 00 BSJB............
2e10 76 32 2e 30 2e 35 30 37 32 37 00 00 00 00 05 00 v2.0.50727......
2e20 6c 00 00 00 38 01 00 00 23 7e 00 00 a4 01 00 00 l...8...#~......
2e30 58 01 00 00 23 53 74 72 69 6e 67 73 00 00 00 00 X...#Strings....
2e40 fc 02 00 00 38 00 00 00 23 55 53 00 34 03 00 00 ....8...#US.4...
2e50 10 00 00 00 23 47 55 49 44 00 00 00 44 03 00 00 ....#GUID...D...
2e60 10 01 00 00 23 42 6c 6f 62 00 00 00 00 00 00 00 ....#Blob.......
2e70 02 00 00 01 47 14 02 00 09 00 00 00 00 fa 01 33 ....G..........3
2e80 00 16 00 00 01 00 00 00 09 00 00 00 02 00 00 00 ................
2e90 02 00 00 00 0b 00 00 00 03 00 00 00 01 00 00 00 ................
2ea0 01 00 00 00 01 00 00 00 00 00 0a 00 01 00 00 00 ................
2eb0 00 00 06 00 2e 00 27 00 06 00 58 00 46 00 06 00 ......'...X.F...
2ec0 71 00 46 00 06 00 aa 00 8a 00 06 00 ca 00 8a 00 q.F.............

This is another metadata root, but notice now that the size of the #~ stream is 0x0138 bytes compared to 0x8c bytes in the previous metadata root. Similarly, there are 0x0158 bytes of strings (compared to 0x18). This is totally different metadata, and if you inspect the strings section (in my case this is at 0x2e00 + 0x01a4 = 0x2fa4) you'll find the strings of the types and type members used in LibraryCode. In other words the metadata from the IL assembly has been embedded into the native image in the .il section.

If your 1.1 code used reflection then the 1.1 runtime would have to load the IL image of the assembly to get access to the metadata. As you can see, this is not the case for 3.0/2.0 assemblies, the metadata is in the .il section of the native image. However, it does not stop at metadata. The metadata stream indicates that there are the following tables: Module, TypeRef, TypeDef, MethodDef, MemberRef, CustomAttribute, StandAloneSig, Assembly and AssemblyRef. The MethodDef  table will contain information about the methods defined in the assembly, including the RVA of the IL for the method. Analysing this metadata shows that there are two entries in the MethodDef table, and these are for GetVersion and .ctor (the constructor of LibraryCode). The interesting things is that the RVAs for these methods are 0x04 and 0x3c respectively. Of course, these values cannot be converted to raw addresses using the normal mechanism, and so they are not really RVAs. ILDASM on an IL assembly can show the actual bytes for the IL it is displaying, so I ran ILDASM on the original assembly and obtained the sequences of bytes for GetVersion. I found that these bytes are in the native image at 0x3270. The IL for a method has a header which indicates information about the method. This can be either tiny format (1 byte) or fat format (12 bytes). The bytes preceding the IL I identified at 0x3270 had the information that would be in the fat format header. Taking this into account, the location of the IL for the method is at 0x3264. So the RVA entry in the MethodDef table (0x4) must be the offset from the address 0x3260, which itself is 0xc bytes after the end of the last stream. There is no relevant documentation about what these 0xc bytes mean, however, what is clear is that the IL for the methods is in the native image.

Let me reiterate: the native image contains both the metadata and the IL from the assembly that was used to generate the image.

Now you've finished with this file, so close it in VS and at the command line you can return to the assembly application folder (popd).

Before leaving this section it is worth pointing out some other features of ngen. When you create a native image the tool will locate the libraries that it uses, which the documentation calls dependencies. (This is a confusing term, one definition for dependency is somethign that relies on something else. clearly this is not the case, the libraries are not dependent upon the root, it is the root that is dependent upon them. However, there is another definition of dependency that means 'a subordinate' which I guess is the definition used here since a library can be treated as a subordinate to the assembly that loads it. I wish Microsoft had used the term subordinate rather than dependency.) The native image generator needs to have the configuration for the assembly: if you are generating a native image for a library in the GAC then it needs to have access to any publisher policy files for the library; if the assembly is a process then the tool needs to get access to the configuration file. By default ngen will use normal Fusion probing to find the libraries the assembly uses and will use the current folder as the application base folder to search for private assemblies. However, you can supply the /appbase switch to indicate another folder.

If you are creating the native image for a library then the tool will use the information in the manifest to determine the libraries it uses, however, if this library is used with a process, the process configuration may specify version redirects for the libraries it uses. You can use the /execonfig switch to give the configuration file of a process assembly that provides such redirects, but clearly, the native image library will now be tied to that process, especially if hard binding is used (see later).

If the assembly is to be debugged then you can use the /debug switch to tell the native image generator to generate additional debugging information, otherwise, if a process is run under a debugger the runtime will load the IL assembly instead. Furthermore, if you want to use the assembly under a profiler you can use the /profile switch.

12.5 Logging Binding to Native Images

At this point you know that ngen will create native images for you and install this in a folder under %windir%\assembly. You also know that the native image contains the metadata and the IL of the assembly from which it was created. But are you convinced that when you run the application the native image rather than the IL image is loaded? Well, to convince you that this is the case, we will use the new version of fuslogvw.

This tool now has an option, at the bottom right hand side, to view the log file of binds to native images. Select this option, Native Images. You'll probably see some entries already for native image binds, but ignore these, indeed, to make sure that you only see the binds we are to perform, remove these existing entries by clicking on the Delete All button. Now click on the Settings button. I mentioned earlier that fuslogvw appears to work correctly only if you specify a custom log path, if you have not done that already, do it now. Make sure that you check Log all binds to disk and click on OK. Now run the process that you have generated a native image for and then click on the Refresh button on fuslogvw:

Notice that three binds have occurred, one is called ExplicitBind, which is for the process itself, and the other two are for the libraries that the process uses, lib and mscorlib. Take a look at these files. The contents are similar to those that you would expect for a bind to an IL assembly, but there are differences. Here is an excerpt:

LOG: Start binding of native image app, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null.
LOG: IL assembly loaded from C:\TestFolder\Fusion\2.0.50727\12.3\app.exe.
LOG: Start validating native image app, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null.
LOG: Start validating all the dependencies.
LOG: [Level 1]Start validating native image dependency mscorlib, Version=2.0.0.0,
      Culture=neutral, PublicKeyToken=b77a5c561934e089.
LOG: Dependency evaluation succeeded.
LOG: [Level 1]Start validating IL dependency lib, Version=1.0.0.0, Culture=neutral,
     PublicKeyToken=3bf941bb1f722efe.
LOG: Dependency evaluation succeeded.
LOG: Validation of dependencies succeeded.
LOG: Start loading all the dependencies into load context.
LOG: Loading of dependencies succeeded.
LOG: Bind to native image succeeded.
Native image has correct version information.
Attempting to use native image C:\WINDOWS\assembly\NativeImages_v2.0.50727_32\app\0b1006044ca3f944ab21eb0c07f4d752\app.ni.exe.
Native image successfully used.

Note that it first loads the IL assembly and then it goes through a process of 'validating' the assembly and the libraries it uses. After it has completed this procedure it loads the native image from the cache. The logs for the libraries show that a similar procedure is performed on those, and the log gives the name of the native image that is loaded. At this point remove the native images from the cache using:

ngen uninstall app.exe

12.6 Automatic Redirect to Native Images

In this example, you will create a shared library and create two processes that use it, then you'll create native images for all of the assemblies. Next, you'll create a new version of the library, install that into the GAC and provide a publisher policy assembly to redirect the processes to use the new library.

Use the files from the previous example. First, confirm that there is no config file and that the library has not been installed in the GAC. Now confirm that the version of the library code is 1.0.0.0. Compile the library and the process.

csc /t:library /keyfile:key.snk lib.dll
csc app.cs /r:lib.dll

Next, create a second process that uses this library, you can do this by specifying a different output name:

csc /out:app2.exe app.cs /r:lib.dll

Put the library in the GAC and then create native images for both of the processes:

gacutil -i lib.dll
ngen install app.exe
ngen install app2.exe

Run both processes to convince yourself that they work and check the fusion log to see that the native images are being loaded. Now change the library source so that the version is 1.1.0.0, compile this library and then insert it into the GAC. To indicate that this new version should be used instead of the old version you can create a publisher policy file. To do this, create a configuration file for the library (I have called it lib.config) with the redirection information:

<?xml version="1.0"?>
<configuration>
   <runtime>
      <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
         <dependentAssembly>
            <assemblyIdentity name="lib" publicKeyToken="3bf941bb1f722efe" />
            <bindingRedirect oldVersion="1.0.0.0" newVersion="1.1.0.0" />
         </dependentAssembly>
      </assemblyBinding>
   </runtime>
</configuration>

Compile this to a publisher policy assembly and add it to the GAC:

al /link:lib.config /out:policy.1.0.lib.dll /keyfile:key.snk
gacutil -i policy.1.0.lib.dll

Now move to fuslogvw and clear the log by clicking on Delete All. Now run app and confirm that the new version of the library is loaded. Switch to fuslogvw and look at the log entry for lib you will find the following lines:

LOG: Start binding of native image lib, Version=1.1.0.0, Culture=neutral, PublicKeyToken=3bf941bb1f722efe.
WRN: No matching native image found.

The log file for the process has these lines:

LOG: [Level 1]Start validating IL dependency lib, Version=1.0.0.0, Culture=neutral, PublicKeyToken=3bf941bb1f722efe.
WRN: [Level 1] Dependency version mismatch.
WRN: No matching native image found.
LOG: Bind to native image assembly did not succeed. Use IL image.

As you can see, it cannot find a native image for version 1.1.0.0, so it uses the IL file instead. Switch to fuslogvw to the Default view. Here, you'll find three entries, one for the process (marked WhereRefBind) and the other two for mscorlib and lib. The log for lib shows that a redirection has occurred:

LOG: This bind starts in default load context.
LOG: No application configuration file found.
LOG: Using machine configuration file from C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\config\machine.config.
LOG: Publisher policy file is found at C:\WINDOWS\assembly\GAC_MSIL\policy.1.0.lib\0.0.0.0__3bf941bb1f722efe\lib.config.
LOG: Publisher policy file redirect is found: 1.0.0.0 redirected to 1.1.0.0.
LOG: ProcessorArchitecture is locked to MSIL.
LOG: Post-policy reference: lib, Version=1.1.0.0, Culture=neutral, PublicKeyToken=3bf941bb1f722efe,
     processorArchitecture=MSIL
LOG: Found assembly by looking in the GAC.


Notice that you should use both the Default and Native Images logs when you are trying to debug binding errors with native images.

Clear both logs and repeat the test for app2. You should find that the same errors are reported. Clear the log. Now you need to tell ngen to update the native images. You have two options here, firstly you can call ngen update, this will go through all native images and check each one to make sure that the libraries it uses are in the native image GAC and recreate the native image if this is not the case. Since this will take a long time, we will use the second option: just install the process a second time into the native image cache. Type the following:

ngen install app.exe

Switch back to fuslogvw and refresh the display. You will find that there are four new entries from a process called mscorsvw.exe. This is the native image generator worker process. In fact this file also doubles up as the native image generator service if you use delayed updates, but in this case it is run as a process. These binds give information about loading the process before the new native image was generated. Clear the log again and run app. Switch to fuslogvw and confirm that the native image of the new version of the library is being used.

Now run app2. The results here are confusing. The log file for lib shows that redirection is used and that it successfully loads the native image. However, the log file for the process shows that there is a 'dependency version mismatch' and so the IL image is loaded. It is simple to fix this (just install the process again). However, to me this looks like a bug because the binding occurs correctly for the library but something happens after that binding that causes the version mismatch error message.

Now uninstall the second process:

ngen uninstall app2.exe

To show that this has succeeded list the contents of the native cache:

dir \Windows\assembly\NativeImages_v2.0.50727_32

You should find folders for app and lib, but not for app2. Finally, uninstall app and list the contents of the native cache. You'll find that both app and lib will have been removed. When you uninstall app2 the ngen tool senses that lib is being used by another assembly and so it does not remove it from the cache. However, when you remove the only assembly that uses this library ngen removes the native image.

Finally clean up by removing the libraries and policy file from the GAC:

gacutil -u policy.1.0.lib
gacutil -u lib

So that you don't generate too many log files, switch to fuslogvw and use the Settings dialog to select Log bind failures to disk before closing this tool.

12.7 Deferred Updates

I mentioned earlier that mscorsvw.exe can be run as a service. The reason for this service is to defer updates to a time when the machine is idle. You can queue up installs, updates and uninstalls using ngen. Installs and uninstalls can be given one of three priorities, 1, 2 or 3. The first two priorities mean that the change must be performed immediately with priority 1 changes being performed first. Priority 3 changes will be performed when the machine is idle. Microsoft do not say exactly what 'idle' means but they say that it is triggered if there has been no user input for a certain amount of time.

You can use deferred actions by using the /queue switch, this will start the ngen service to perform the native image generation. Since this is a service it means that if the machine reboots while the native image generation is being performed, the service will be started when the machine is booted so that the action is completed. However, once all actions in the queue have been completed the service stops and it will only be restarted when another action is added to the queue.

ngen also allows you to perform actions on the queue: you can pause it, continue it from pause and you can tell it to perform all items in the queue with a particular priority or higher. For example, use the examples from the last section. (Note that the processes were compiled for version 1.0.0.0 of the library so make sure that you have changed the source for the library back to this version and recompiled the library.) Now type the following:

ngen queue pause
ngen install app.exe /queue:3
ngen install app2.exe /queue:1

Note that app2 has a higher priority than app. Now start the update procedure:

ngen executequeueditems 3
ngen queue continue

For this test we want togenerate native images for all the items in the queue, that is, we don't want to wait for the machine to become idle. This is why you have used executequeueditems to perform all actions of priority 3 and higher. The final action tells the service to start work on the queue. You will not be informed that the actions have completed, however, a short time after the work has completed the service will be stopped and you can use the ngen queue status command to test for this.

There is no queue option for removing assemblies from the native image cache, so to clean up use the following to remove the two assemblies from the native image cache:

ngen uninstall app.exe
ngen uninstall app2.exe

12.8 Hard Binding

Earlier I mentioned the issues of creating binding to functions. In normal Win32 DLLs the library provides an export table with the addresses of the functions exported by the DLL. If the DLL is loaded in its preferred address then the addresses in the export table can be called by the process. If the DLL is loaded at another address then the OS has to perform fix ups to change the address to the actual address where the function resides. When you create a native image, ngen will create something similar to an export address table in the native image file (the details of this are not documented). This means that at run time the .NET runtime must perform some fix ups to methods that are called. This clearly takes time and it makes some previously read-only pages writable, which means they cannot be shared between processes.

The solution to this issue is hard binding. When you specify that you want to use hard binding ngen will put hard bound addresses in the assembly to the methods in the native image assembly it references. This means that the code pages are read-only and shareable, with the downside that all the hard bound libraries must be loaded at the same time. You cannot require that hard binding is performed, you can merely indicate that you would like to use it. To do this you add assembly level attributes to the assembly. Again, the documentation uses the confusing term dependency, it really should be subordinate. A root assembly will use the [Dependency] attribute to indicate the subordinate assemblies it will use and to indicate how likely it is that the subordinate will be loaded. The rationale is that hard binding will lengthen initial application start up and so you should avoid it if the subordinate assembly is unlikely to be called. In this case you should use LoadHint.Sometimes. However, if the code that uses the subordinate assembly will always be called then you should use LoadHint.Always to indicate that you would like to use hard binding.

The [DefaultDependency] is applied to subordinate assemblies. You use this to indicate how likely you think this assembly will be loaded and it is used by ngen when there is not a [Dependency] attribute on the root assembly (the one that uses the subordinate), or if the has that attribute but specifies LoadHint.Default.

I hope that you enjoy this tutorial and value the knowledge that you will gain from it. I am always pleased to hear from people who use this tutorial (contact me). If you find this tutorial useful then please also email your comments to mvpga@microsoft.com.

Errata

If you see an error on this page, please contact me and I will fix the problem.

Page Thirteen

This page is (c) 2007 Richard Grimes, all rights reserved