.NET Security Workshop
Home About Workshops Articles Writing Talks Books Contact

11 Secret Key Cryptography

Symmetric key cryptography means that the same key is used to encrypt the data as is used to decrypt it. This means that the key must be kept secret between the users encrypting data and decrypting the data. This is why the scheme has the alternative name of secret key cryptography because the key is always kept secret. The .NET framework has several classes to give access to symmetric algorithms and it also provides classes to help you make passwords more secure. In this section I will describe how to use the .NET secret key algorithms, how to do this through streams and I will also show how to encode binary data to base64 characters.

11.1 Symmetric Key Algorithms

There are two types of symmetric algorithm, stream and block. Stream algorithms work on the data a bit (or byte) at a time, whereas block algorithms work on data a block at a time. All of the algorithms in the .NET framework library are block algorithms and all of them derive from the SymmetricAlgorithm class. As explained on the previous page, block algorithms can be run in one of several modes and once encrypted using a particular mode the cyphertext must be decrypted using the same mode. The mode of an algorithm is indicated by the SymmetricAlgorithm.Mode property which is one of the CipherMode enumeration. All are block modes, but since CFB and OFB use a shift register for the input it means that they behave as a stream cypher with respect to the data they consume.

The classes derived from SymmetricAlgorithm are shown in the following table.

Class Description
DES Data Encryption Standard, the standard ratified by the US government. Originally designed by IBM. 64-bit blocks with 56-bit keys. Now considered insecure.
RC2 Ron Rivest algorithm, "Ron's Code". 64-bit blocks with 8- to 128-bit keys.
Rijndael Designed by Vincent Rijmen and Joan Daemen. Adopted by NIST as the Data Encryption Standard. 128-bit blocks with 128- to 256-bit keys.
TripleDES DES applied three times. Designed by IBM. 64-bit blocks with 168-bit keys. Considered to be secure.

The SymmetricAlgorithm base class defines the CreateEncryptor and CreateDecryptor methods that all derived classes must implement. This means that all symmetric algorithms are intended to be accessible through the ICryptoTransform interface and hence used through a CyptoStream object. This sounds like a contradiction: block algorithms used through a stream object, however it isn't because the stream object only provides a stream interface to that block data. Indeed, many of the framework stream implementations (eg FileStream) are actually based on block data because they are buffered.

Your data is rarely in a number of whole blocks, typically there will be a partial block of data and so the algorithm must treat this differently, by padding the data. The ICryptoTransform interface has two methods, TransformBlock and TransformFinalBlock. The second of these will pad the final block with values specified by the value of the SymmetricAlgorithm.Padding property. As mentioned earlier, block algorithms can be used in one of several modes. The default is CBC: cypher block chaining. This, and most of the other modes need an initialization vector, and it is important that the IV is carefully chosen to be random. By default, the SymmetricAlgorithm class will set the IV property to a random value.

To see this create the following file (rijndael.cs):

using System;
using System.Security.Cryptography;

class App
{
   static void Main()
   {
      Rijndael r = Rijndael.Create();
      Console.WriteLine("current IV is {0}", BitConverter.ToString(r.IV));
   }
}

Compile and run this code, you'll see something like the following:

current IV is EB-00-A3-5B-EE-0E-7A-4D-B4-07-2D-43-95-15-66-AA

Run this program several times to confirm that each run gives a different IV. Although the IV does not have to be secret, it must be the same for the encryption and the decryption, so if you persist encrypted data you must make sure that you also persist the IV.

Each algorithm also needs a secret key and that key should be a specific number of bits. You can determine the number of bits from the LegalKeySizes array which is an array of KeySizes objects. Add the following code:

foreach (KeySizes keySize in r.LegalKeySizes)
{
   Console.WriteLine("Key size {0} to {1} in {2} increments",
      keySize.MinSize, keySize.MaxSize, keySize.SkipSize);
}

For Rijndael I get:

Key size 128 to 256 in 64 increments

This means that the key can be 128, 192 or 256 bits.

At this point I want to caution you. There are various properties that are sizes, however, some are sizes in bits, others are sizes in bytes. You should carefully check what units the size property uses. For example, SymmetricAlgorithm.BlockSize should be the same as ICryptoTransform.InputBlockSize, but the former is given in bits and the latter is given in bytes.

Users rarely give passwords of the required number of bits and even if they do, they will use a keyboard with printable characters which restrict the range of each byte in their password. It is better to generate a password from a pass phrase. The framework library provides PasswordDeriveBytes to do just this. This class will combine the pass phase with some extra bytes (called salt) and then perform a hash on this data. It will do this repeatedly for a specified number of iterations. The pass phrase cannot be derived from the result and the salt makes it harder for an attacker to perform a dictionary attack. The salt should be a randomly generated value, but again, it does not have to be kept secret. However, since the salt determines the key used in the encryption it should be available to the code decrypting the data and is typically stored with the IV and the encrypted data.

The salt and IV should be randomly generated data. SymmetricAlgorithm will generate a random IV, so that leaves the salt. The problem with most random number generators (like System.Random) is that you cannot guarantee their randomness and any repeatability in the data weakens your cryptographic security. The framework provides a class called RandomNumberGenerator that will create a more secure random number generator. In a similar way to symmetric algorithms, this class has a static method that will return an instance of a derived class (RNGCryptServiceProvider) that is an implementation provided by the CryptoAPI.

Putting all of this together, if you want to generate a password from a pass phase you could use this code:

void CreatePassword(string phrase, int saltSize, int passSize, out byte[] pass, out byte[] salt)
{
   RandomNumberGenerator rand = RandomNumberGenerator.Create();
   salt = new byte[saltSize];
   rand.GetBytes(salt);
   PasswordDeriveBytes pdb = new PasswordDeriveBytes(phrase, salt);
   pass = pdb.GetBytes(passSize);
}

PasswordDeriveBytes is an implementation of the abstract class DeriveBytes.

.NET Version 3.0
In version 3.0/2.0 of the framework the GetBytes method is deprecated so you should use the CryptDeriveKey instead. The new version of the framework has another implementation of DeriveBytes called Rfc2898DeriveBytes. This uses password-based key derivation functionality (PBKDF2) described by RFC2898 and it uses the HMACSHA1 class. You use this in much the same way as you would use PasswordDeriveBytes

Now let do some encryption and decryption.

First I want to show you the basic use of ICryptTransform, so for the time being ignore some of the deliberate security lapses. Create a file called encrypt.cs:

using System;
using System.Security.Cryptography;
using System.Text;

class App
{
   static void Main()
   {
      Rijndael r = Rijndael.Create();
      r.Mode = CipherMode.ECB;
      string phrase = "daisy, daisy, give me your answer to";
      PasswordDeriveBytes pdb = new PasswordDeriveBytes(phrase, new byte[0]);
      r.Key = pdb.GetBytes(r.KeySize>>3);
      string data = "The quick brown fox jumps over the lazy dog.";
   }
}

This creates an instance using ECB, that is, we won't use an IV in this example. (Note that you should not write code like this, we will fix this later.) The pass phrase and the cleartext are hard coded in the example, and a key is generated from the pass phrase without a salt. The size of the the key is given in KeySize in bits so I shift right 3 times (divide by 8) to get the number of bytes.

Next you need to transform the data. To do this obtain an encryptor from the algorithm and call a member function called CryptoTransform. This function will apply the cryptographic transform to the data and return the transformed data. We can also obtain a decryptor and apply that with the same function:

string data = "The quick brown fox jumps over the ;lazy dog."

ICryptoTransform en = r.CreateEncryptor();
byte[] input = Encoding.ASCII.GetBytes(data);
input = CryptoTransform(input, en);
Console.WriteLine(BitConverter.ToString(input));
ICryptoTransform de = r.CreateDecryptor();
byte[] output = CryptoTransform(input, de);
Console.WriteLine(Encoding.ASCII.GetString(output));

This code should print out the hex values of the transformed data and the last line should print the clear text of the input data. The CryptoTransform method looks like this:

static byte[] CryptoTransform(byte[] input, ICryptoTransform en)
{
   byte[] output = new byte[input.Length*2];
   int inputOffset = 0;
   int bytesToRead = en.InputBlockSize;
   int outputOffset = 0;
   int totalBytesTransformed = 0;

   while(true)
   {
      int numTransformed = en.TransformBlock(input, inputOffset, bytesToRead, output, outputOffset);
      inputOffset += bytesToRead;
      outputOffset += numTransformed;
      totalBytesTransformed += numTransformed;
      if (input.Length - inputOffset < en.InputBlockSize)
         break;
   }
   byte[] tempBuffer = en.TransformFinalBlock(input, inputOffset, input.Length - inputOffset);
   byte[] returnedBuffer = new byte[totalBytesTransformed + tempBuffer.Length];
   Array.Copy(output, 0, returnedBuffer, 0, totalBytesTransformed);
   Array.Copy(tempBuffer, 0, returnedBuffer, totalBytesTransformed, tempBuffer.Length);
   return returnedBuffer;
}

The transforms the input data depending on the interface passed as the parameter. Since I do not know the size of the output buffer I create one twice the size of the input buffer. This works in practice, but I will leave it up to the reader to write code that checks that the buffer is large enough before calling TransformBlock. This code will call TransformBlock with a block of data of the size given by InputBlockSize. This assumes that TransformBlock can be called multiple times with different data, this is the case with all of the framework's algorithms which will return true for CanTransformMultipleBlocks.

Compile this code (csc encrypt.cs) and run it to confirm that the binary data created by the first call to CryptoTransform will decrypt to the original data.

As you can see, this is quite involved, so instead let's use a CryptoStream object and streams. Add a using statement for System.IO and replace the contents of CryptoTransform with this:

static byte[] CryptoTransform(byte[] input, ICryptoTransform en)
{
   MemoryStream sInput = new MemoryStream(input);
   CryptoStream cs = new CryptoStream(sInput, en, CryptoStreamMode.Read);
   MemoryStream sOutput = new MemoryStream();
   byte[] buffer = new byte[1024];
   while(true)
   {
      int read = cs.Read(buffer, 0, buffer.Length);
      if (read == 0) break;
      sOutput.Write(buffer, 0, read);
   }
   cs.Clear();
   return sOutput.ToArray();
}

In addition, add a using statement for System.IO. Compile and run this code.

Note that this code has a call to CryptoStream.Clear. The reason is that if there is buffered data in the stream it might be possible for another process to get access to the memory used by the stream and so get access to cleartext. The stream object will live in memory until the next garbage collection and this could be a long time after the stream object has been used, so you cannot rely on the GC to remove this object for you. This is the reason for the call to the Clear method: it will clear any internal buffers so that sensitive data will not persist in memory.

As you can see this is far simpler code. A MemoryStream object gives stream access to the input array and this stream and the cryptographic object are used to create the CryptoStream. The mode is set to Read so that the data is transformed when it is read from the underlying stream. This is a purely arbitrary choice and the following code works just as well:

MemoryStream sInput = new MemoryStream(input);
MemoryStream sOutput = new MemoryStream();
CryptoStream cs = new CryptoStream(sOutput, en, CryptoStreamMode.Write);
byte[] buffer = new byte[1024];
while(true)
{
   int read = sInput.Read(buffer, 0, buffer.Length);
   if (read == 0) break;
   cs.Write(buffer, 0, read);
}
cs.FlushFinalBlock();
cs.Clear();
return sOutput.ToArray();

In both cases the input stream is read until it is empty (so the Read returns zero bytes). The main difference between these two fragments of code is that in the first case the ICryptoTransdorm.TransformFinalBlock will be called by the call to CryptoStream.Read that returns a value of zero. The second fragment of code has to explicitly make this call.

Regardless of which version you use, it is clearly simpler code than the version that calls the ICryptoTransform methods directly. However, be aware that the CryptoStream methods allocate intermediate arrays, and if the number of bytes that you request is less than the number transformed it will cache the excess. The convenience of making your code more readable results in more memory allocations.

I mentioned earlier that the code uses Electronic Codebook mode (CipherMode.ECB). There is an inherent weakness in this mode. If there are repeated strings in the cleartext then these will be apparent in the cyphertext. Change the calling code so that it looks like this:

Rijndael r = Rijndael.Create();
r.Mode = CipherMode.ECB;
r.Padding = PaddingMode.None;
string phrase = "daisy, daisy, give me your answer to";
PasswordDeriveBytes pdb = new PasswordDeriveBytes(phrase, new byte[0]);
r.Key = pdb.GetBytes(r.KeySize>>3);

string data = "repeated.text...repeated.text...repeated.text...repeated.text...";
byte[] input = Encoding.ASCII.GetBytes(data);

ICryptoTransform en = r.CreateEncryptor();
input = CryptoTransform(input, en);
Console.WriteLine(BitConverter.ToString(input));

ICryptoTransform de = r.CreateDecryptor();
byte[] output = CryptoTransform(input, de);

Console.WriteLine(Encoding.ASCII.GetString(output));

The clear text contains a repeated phrase. Significantly, the repeated phrase is 16 characters and the algorithm converts data in 16 byte blocks. This means that each block will be the same. Notice that I have also changed the padding mode to be PaddingMode.None so no padding is used. This is possible because I have a whole number of blocks. If the input data could not fit into a whole number of blocks then the algorithm will throw a CryptographicException. There are two other values that can be used for the padding. PaddingMode.Zeros will pad the input data with zero so that it becomes a number of whole blocks, however, this action is not reversible: the decryptor will not know if the zeros it obtains are part of the data. The final type of padding is PaddingMode.PKCS7 which is the default. This will pad the final block to make a full block with a byte that has the value of the number of padding bytes used. For example, if the block is 16 bytes and the final block is 13 bytes then three bytes with the value 0x03 will be used as padding. The decryptor can remove the padding after it has done its work by reading the last byte of the last block and removing that number of bytes from the end. The only problem with this mechanism is that if the input data has a whole number of blocks (as in this example) then an additional block will be added to the end so that this padding block will be removed by the decryptor.

Compile and run this code and you should get a result like the following (which I have edited slightly to align the values):

EE-98-44-3F-36-6C-E1-5C-70-55-44-59-A5-F7-B2-20-
EE-98-44-3F-36-6C-E1-5C-70-55-44-59-A5-F7-B2-20-
EE-98-44-3F-36-6C-E1-5C-70-55-44-59-A5-F7-B2-20-
EE-98-44-3F-36-6C-E1-5C-70-55-44-59-A5-F7-B2-20

As you can see there are four groups of 16 bytes and each group is the same - these correspond to the four repeated phrases in the cleartext. A cryptoanalyst (the polite name for a cracker) will exploit repeated values in the cyphertext. In all languages some words are more popular than others and so the cryptoanalyst will use tables of the probability of words in the appropriate language to make a guess at the word that could be the repeated block. Thus repeated blocks represent a serious weakness in the algorithm. Now change the Mode to CBC:

Rijndael r = Rijndael.Create();
r.Mode = CipherMode.CBC;

In this case the algorithm will combine each block with the previous encrypted block. As I mentioned earlier, this requires an initialization vector and one will already be created for you (with random values). If you run this code and rearrange to blocks of 16 bytes you'll get data that does not repeat. On my machine I get the following, but you will get something different because of the random IV:

46-D3-F2-35-AA-EC-AB-FE-C0-F7-C3-EE-EB-3A-5C-57-
8A-D0-71-ED-AC-A5-9F-4E-DE-12-15-2D-A3-D4-1D-DF-
0B-E7-7A-21-6A-C0-82-F3-CD-37-E0-0E-FB-DE-C1-94-
FB-C8-B2-DB-C7-6C-9B-C1-27-D8-ED-14-7C-7F-24-B5

11.2 Persisting Data

The last section showed that using the CryptoStream class simplifies the code that you need to write. In that section I ignored salt when generating the symmetric key, I mentioned that this adds insecurity to your code. However, when you add salt you must use it for encryption and decryption, so the salt value must be available to both the encryption and decryption code. Furthermore, the same initialization vector must also be available to the encryption and decryption code. So if you intend to persist your encrypted data you must also persist the salt and initialization vector. In this section I will show you how to do this.

The following example is a simple command line tool that will encrypt the contents of a file and write the encrypted data to another file. The command line looks like this:

encrypt <input file> <output file> <password> [e|d]

the final parameter is optional and indicates if the action is encryption or decryption (encryption is the default). Create a file called encrypt.cs. Here is the code for the Main function:

using System;
using System.IO;
using System.Security.Cryptography;
using System.Text;

class App
{
   static void Main(string[] args)
   {
      if (args.Length < 3)
      {
         Console.WriteLine("command line: <input file> <output file> <password> [e|d]");
         Console.WriteLine("\twhere e means encrypt (the default) and d means decrypt");
         return;
      }

      bool bEncrypt = true;
      if (args.Length == 4) bEncrypt = (args[3].ToLower()[0] == 'e');

      FileStream fsIn = new FileStream(args[0], FileMode.Open);
      FileStream fsOut = new FileStream(args[1], FileMode.Create);
      if (bEncrypt) Encrypt(fsIn, fsOut, args[2]);
      else Decrypt(fsIn, fsOut, args[2]);
   }
}

Most of the code is used to manipulate the command line parameters. Once the code has determined the names of the input and output files it opens them and then calls Encrypt or Decrypt depending on the last command line parameter. Both of these methods will close the files when they have completed their work, so there is no need to call Close on these FileStream objects. The Encrypt method looks like this:

static void Encrypt(Stream sIn, Stream sOut, string password)
{
   Rijndael r = Rijndael.Create();
   byte[] salt = null;
   byte[] pass = null;
   CreatePassword(password, r.KeySize >> 3, r.KeySize >> 3, out pass, ref salt);
   r.Key = pass;
   sOut.Write(salt, 0, salt.Length);
   sOut.Write(r.IV, 0, r.IV.Length);
   CryptoStream cs = new CryptoStream(sIn, r.CreateEncryptor(), CryptoStreamMode.Read);
   byte[] buffer = new byte[1024];
   while (true)
   {
      int read = cs.Read(buffer, 0, buffer.Length);
      if (read == 0) break;
      sOut.Write(buffer, 0, read);
   }
   cs.Close(); // closes sIn, no need to call Clear
   sOut.Close();
}

The CreatePassword method is similar (but not the same as) the method given in the last section and I will show it in a moment. This method will create a key from a pass phrase and a array of salt, if you pass in a null array for the salt (as in this case) the method will generate random values and return the array. Once the key is generated, it is used to initialize the algorithm. I do not create an initialization vector, instead I allow the Rijndael constructor to do this. The code then writes the salt and the IV to the output file. It is important that the corresponding Decrypt code knows the size of these values. So the length of the IV is obtained from the algorithm, and the salt is the same length as the key (KeySize is given in bits).

Next the code creates a CryptoStream based on the input steam and the transform's encryptor. The CryptoStream has a Read mode, meaning that the data is encrypted as it is read from the input stream. Finally, the data is read (and encrypted) a kilobyte at a time and then written out to the output stream. The CryptoStream is then closed which will also close the underlying stream.

Decrypt is similar with the difference that it must read the salt and IV from the file. Here is the code:

static void Decrypt(Stream sIn, Stream sOut, string password)
{
   Rijndael r = Rijndael.Create();
   byte[] salt = new byte[r.KeySize >> 3];
   byte[] pass = null;
   sIn.Read(salt, 0, salt.Length);
   CreatePassword(password, r.KeySize >> 3, r.KeySize >> 3, out pass, ref salt);
   r.Key = pass;
   byte[] IV = new byte[r.IV.Length];
   sIn.Read(IV, 0, IV.Length);
   r.IV = IV;
   CryptoStream cs = new CryptoStream(sOut, r.CreateDecryptor(), CryptoStreamMode.Write);
   byte[] buffer = new byte[1024];
   while (true)
   {
      int read = sIn.Read(buffer, 0, buffer.Length);
      if (read == 0) break;
      cs.Write(buffer, 0, read);
   }
   cs.FlushFinalBlock();
   sIn.Close();
   cs.Close(); // closes sOut, no need to call Clear
}

First the salt is read (and assumed to be the same size as the key) and then CreatePassword is called and the returned key is used to initialize the algorithm's key. Next, the IV is read from the file and used to initialize the algorithm. Finally, the CryptoStream is created from the output stream and the decryptor in the Write mode. The data is read a kilobyte at a time from the input stream and written through the CryptoStream which decrypts the data before writing it to the output stream. Since the read action of the input stream determines whether the while block finishes the code must call FlushFinalBlock to make sure that the final decrypted block is written to the output file.

Here is the CreatePassword mehtod:

static void CreatePassword(string phrase, int saltSize, int passSize, out byte[] pass, ref byte[] salt)
{
   if (salt == null)
   {
      RandomNumberGenerator rand = RandomNumberGenerator.Create();
      salt = new byte[saltSize];
      rand.GetBytes(salt);
   }
   PasswordDeriveBytes pdb = new PasswordDeriveBytes(phrase, salt);
   pass = pdb.GetBytes(passSize);
}

Compile this code and test it. First, encrypt a text file (for example the source code of the utility). Open the resultant file in Notepad to convince you that the data is encrypted. Then call the utility again, to decrypt the file you have just created, and use a new name for the decrypted file. Finally, compare the two files:

encrypt encrypt.cs encrypt.enc secret e
encrypt encrypt.enc encrypt_new.cs secret d
comp encrypt.cs encrypt_new.cs /D /L

Here, encrypt.cs and encrypt_new.cs are the cleartext and decrypted cyphertext files and the /L /D switches will give the location of any differences and display the differences in decimal. The utility should indicate that the files are the same.

11.3 Cryptographic Hash Algorithms

We have already seen the application of a hash function: I presented the PasswordDeriveBytes class that generates a key from a pass phrase. The idea is that a pass phrase input from a user is unlikely to be the most secure because it will only contain characters from a limited character set (those available on the keyboard). So the PasswordDeriveBytes class overcomes this by repeatedly applying a hash function on the pass phrase - feeding the result of the hash back into the hash function.

Hash functions are one way, that is, you cannot calculate the original data from the hash. This is important because the hash can be made public without disclosing the data that created it, however, it is of little use if the hash is not sufficiently unique. This property of hashes, collision, is very important.

Hash functions will return a value of a specific size, for example SHA1 will return 160 bit (10 byte) value. You could naively imagine that this means that such a hash could be generated from one of 2160 (1.5 x 1048) different input data. However, when you consider all the possible combinations of input data even this value is not large enough. Thus it is possible that a specific hash value can come from at least two different input data. The important point is that given a hash and the original data it should not be possible to predict another input data that would create the same hash. Furthermore, it is vital that two similar input data cannot produce the same hash.

All of the framework's hash classes derive from the abstract HashAlgorithm class. These classes are shown in the table below.

Class Description
Implementation
MD5 Message Digest algorithm 5, designed by Ron Rivest. 128 bit digest. It is now known that it is possible to generate two byte strings that generate the same MD5 hash. MD5 should not used for new hashes. MD5CryptoServiceProvider
RIPEMD160 An implementation of the RACE Integrity Primitive Evaluation project's Message Digest 160-bit algorithm. .NET 3.0/2.0 RIPEMD160Managed
SHA1 Secure Hash Algorithm, designed by the NSA and is a US government standard. 160 bit digest. It is more secure than MD5, but some attacks have been suggested, but not proven. NIST plan to phase out SHA1 by 2010. SHA1CryptoServiceProvider
SHA256 SHA with 256 bit digest length SHA256Managed
SHA384 SHA with 384 bit digest length SHA384Managed
SHA512 SHA with 512 bit digest length SHA512Managed

These classes are abstract and provide a static Create method that will return an instance of a concrete subclass:

SHA1 sha1 = SHA1.Create();

This creates an instance of the default class to be used for SHA1 and this default class is defined by the CryptoConfig class. Within CryptoConfig is a hash table (a collection object, not to be confused with cryptographic hashes) that maps the name of the algorithm (for example, SHA, SHA1, System.Security.Cryptography.SHA1) to the class that implements it (SHA1CryptoServiceProvider). This mapping is hard coded into the class and since the hash table is a private member you cannot change it. Hence you cannot change the implementation. The Implementation column in the table above shows the instances that will returned from Create. Note that only two of these are implemented by the unmanaged CryptoAPI.

There are two ways to create a hash. The first is to call the ComputeHash method and pass a byte array or a stream:

byte[] hash;
using(FileStream fs = File.Open("test.dat"))
{
   hash = sha1.ComputeHash(fs);
}
Console.WriteLine(BitConverter.ToString(hash));

From an earlier page you saw that HashAlgorithm implements ICryptoTransform and this means that an instance can be used to initialize a CryptoStream. This is the second way to create a hash: you can read (or write) through the CryptoStream object and the hash will be calculated. The ICryptoTransform implementation on these hash classes do not affect the data that is read or written, so a CryptoStream object based on them essentially has pass-through methods. To obtain the hash you need to access the Hash property.

The class is designed like this to allow you to create a hash while you are encrypting a file. This is a process called hash-and-encrypt and it relies on the assumption that it is not feasible to find two sets of data that have the same hash. The hash is appended to the end of the data and is encrypted with it. An attacker could change some bytes in the cyphertext that corresponds to the data. This change will be picked up when the cyphertext is decrypted because a hash taken on the decrypted data will not agree with the hash appended to it.

To see how this works use the code for symmetric key encryption shown above (enh.cs):

using System;
using System.Security.Cryptography;
using System.Text;
using System.IO;

class App
{
   static void Main()
   {
      Rijndael r = Rijndael.Create();
      string phrase = "daisy, daisy, give me your answer to";
      PasswordDeriveBytes pdb = new PasswordDeriveBytes(phrase, new byte[0]);
      r.Key = pdb.GetBytes(r.KeySize>>3);
      string data = "The quick brown fox jumps over the lazy dog.";
      ICryptoTransform en = r.CreateEncryptor();
      byte[] input = Encoding.ASCII.GetBytes(data);
      input = CryptoTransform(input, en);
      Console.WriteLine(BitConverter.ToString(input));
      ICryptoTransform de = r.CreateDecryptor();
      byte[] output = CryptoTransform(input, de);
      Console.WriteLine(Encoding.ASCII.GetString(output));
   }
   static byte[] CryptoTransform(byte[] input, ICryptoTransform en)
   {
      MemoryStream sInput = new MemoryStream(input);
      MemoryStream sOutput = new MemoryStream();
      CryptoStream cs = new CryptoStream(sOutput, en, CryptoStreamMode.Write);
      byte[] buffer = new byte[1024];
      while(true)
      {
         int read = sInput.Read(buffer, 0, buffer.Length);
         if (read == 0) break;
         cs.Write(buffer, 0, read);
      }
      cs.FlushFinalBlock();
      cs.Clear();
      return sOutput.ToArray();
   }
}

This takes a pass phrase and then creates a key. Then it uses this key to encrypt some text, dumps the cyphertext to the console before decrypting the data and printing out the result. CryptoTransform performs the encryption and decryption using the encryptor or decryptor passed as a parameter. To add hash-and-encrypt requires a few changes:

static byte[] CryptoTransform(byte[] input, ICryptoTransform en, bool bEncrypt)
{
   MemoryStream sInput = new MemoryStream(input);
   MemoryStream sOutput = new MemoryStream();
   CryptoStream cs = new CryptoStream(sOutput, en, CryptoStreamMode.Write);
   SHA256 sha256 = SHA256.Create();
   Stream data = null;
   if (bEncrypt)
      data = new CryptoStream(sInput, sha256, CryptoStreamMode.Read);
   else
      data = sInput;
   byte[] buffer = new byte[1024];
   while(true)
   {
      int read = data.Read(buffer, 0, buffer.Length);
      if (read == 0) break;
      cs.Write(buffer, 0, read);
   }
   if (bEncrypt)
      cs.Write(sha256.Hash, 0, sha256.Hash.Length);
   cs.FlushFinalBlock();
   cs.Clear();
   if (!bEncrypt)
   {
      byte[] hash = new byte[sha256.HashSize >> 3];
      sOutput.Position = sOutput.Length - hash.Length;
      sOutput.Read(hash, 0, hash.Length);
      sOutput.Position = 0;
      sOutput.SetLength(sOutput.Length - hash.Length);
      byte[] newHash = sha256.ComputeHash(sOutput);
      for (int x = 0; x < hash.Length; ++x)
      {
         if (hash[x] != newHash[x])
            throw new CryptographicException("Data is corrupt!");
      }
   }
   return sOutput.ToArray();
}

The first change is that there is a boolean that determines whether the code is being called to encrypt or decrypt the data. (Unfortunately, ICryptoTransform does not have a property to specify whether the object is a encryptor or a decryptor, if such a property was available then this boolean would not be needed and this code would be cleaner.) If the data is being encrypted then I create a CryptoStream object based on SHA256 and use this to read the data. The data is encrypted/decrypted as before by writing it through the CryptoStream object created on the crypto-transform interface. Once all of the data is processed I check to see if this is encryption, and if so, the hash is written through the encryption stream, that is, it is encrypted too. This means that the hash is appended to the end of the data and this new data is encrypted.

If the data is being decrypted the action is the same as previously. The difference is that the cleartext now contains the data with the hash appended. So the first thing to do is extract the hash:

byte[] hash = new byte[sha256.HashSize >> 3];
sOutput.Position = sOutput.Length - hash.Length;
sOutput.Read(hash, 0, hash.Length)

Next, I need to remove this hash from the data. This is actually very simple to do, all I need to do is tell the MemoryStream that it is a little bit shorter. After that, a hash is calculated on the data:

sOutput.Position = 0;
sOutput.SetLength(sOutput.Length - hash.Length);
byte[] newHash = sha256.ComputeHash(sOutput);

The final part of the code compares the newly calculated hash with the hash extracted from the decrypted cyphertext. If the two hashes disagree the data is corrupted. To get this code to compile you need to make a few minor adjustments to the Main method:

ICryptoTransform en = r.CreateEncryptor();
byte[] input = Encoding.ASCII.GetBytes(data);
input = CryptoTransform(input, en, true);
Console.WriteLine(BitConverter.ToString(input));
ICryptoTransform de = r.CreateDecryptor();
byte[] output = CryptoTransform(input, de, false);

Compile and run this code. You'll find that it works just the same as before. To see the hash in action introduce a deliberate corruption of the cyphertext:

input = CryptoTransform(input, en, true);
input[0] = 0;
Console.WriteLine(BitConverter.ToString(input));

Now run the code and you'll find that an exception will be thrown. This mechanism has two properties. Firstly, it provides a mechanism for detecting the integrity of the data: if the data is changed between the encryption and decryption (for example, when the cyphertext it is being transmitted) this is detected after the data is decrypted. The second property is authentication: the cyphertext is decrypted with the secret key only known to trusted personnel and the hash proves the integrity of the encrypted data. Since the cyphertext can only be created by someone who has access to the secret key this authenticates the sender of the data.

The framework provides classes to perform this authentication and integrity tests in one action. These are called keyed hash classes and they derive from the KeyedHashAlgorithm abstract class. The idea is that a secret key, known only to the party who sends the data and the party who receives the data is combined with the data in such a way to generate a message authentication code (MAC). The scheme relies on the key being kept secret. The attacker, who does not know the key, will be unable to create a MAC from the data she has tampered with.

Class Description
HMACMD5 Hash-based MAC based on MD5. The key can be any size and the MAC will be 128 bits. .NET 3.0/2.0
HMACRIPEMD160 Hash-based MAC based on RIPEMD160. The key can be any size and the MAC will be 160 bits. .NET 3.0/2.0
HMACSHA1 Hash-based MAC based on SHA1. The key can be any size and the MAC will be 160 bits.
HMACSHA256 Hash-based MAC based on SHA256. The key can be any size and the MAC will be 256 bits. .NET 3.0/2.0
HMACSHA384 Hash-based MAC based on SHA384. The key can be any size and the MAC will be 384 bits. .NET 3.0/2.0
HMACSHA512 Hash-based MAC based on SHA512. The key can be any size and the MAC will be 512 bits. .NET 3.0/2.0
MACTripleDES A 16 or 24 byte key is used with TripleDES (CDC mode, padding with zeros, IV of zeros) to create cyphertext. The last block (64 bits) is the MAC.

These classes create a MAC in two very different ways. MACTripleDES uses a secret key to encrypt the data and then uses part of the encrypted data (the last 8 bytes) as the MAC. The rest of the classes mix the key into the data and then performs a hash, then the key is mixed into the hash and another hash is performed to create the MAC. In this case the size of the hash determines the size of the MAC.

The user has two choices, either call ComputeHash on the entire message in one go, or create a CryptoStream and read the data through the keyed hash routine as the data is read for another purpose. This is essentially the same as the using HashAlgorithm classes with the additional responsibility of providing a key.

11.4 Base64 Encoding

I mentioned earlier that two of the classes that implement ICryptoStream are ToBase64Transform and FromBase64Transform. These classes are used to encode rather than encrypt data. Encryption transforms the data into a form that is secure from unauthorised personnel using an encryption key. Encoding does not use a key, and as such anyone can convert the data back into the decoded form. You might ask what is the point of encoding? Well, it transform the data into a format that is either more acceptable, or in some cases more usable, by other code. For example, Internet news and mail is based on text (RFC822), and specifically on a limited range of characters. Binary code, like images, music or executables will contain bytes that are outside of this range, and so this means that binary data cannot be attached to news or email messages. To get round this restriction the news and mail protocols allow the user to provide binary data as MIME (Multipurpose Internet Mail Extensions, RFC2045, RFC2046, RFC2047, RFC2048 and RFC2049) attachments. The MIME RFC specifies that base64 encoding should be used to encode binary data into the 7 bit character set used by mail messages, and this ensures that mail transfer agents will not alter the data during transmission. The base64 algorithm is simple, converting 3 octet binary groups into 4 octet text groups. For binary data that contains a number of bytes that is not exactly divisible by three base64 defines a padding scheme.

Since ToBase64Transform and FromBase64Transform implement ICryptoTransform they can be used to initialise a CryptoStream object. This means that you can chain these classes to apply automatic base64 encoding and decoding as part of the encryption and decryption code. Here is an altered version of the CryptoTransform code that I showed above.

static byte[] CryptoTransform(byte[] input, ICryptoTransform en, bool bEncrypt)
{
   MemoryStream sInput = new MemoryStream(input);
   MemoryStream sOutput = new MemoryStream();
   CryptoStream cs;
   if (bEncrypt)
   {
      cs = new CryptoStream(sOutput, new ToBase64Transform(), CryptoStreamMode.Write);
      cs = new CryptoStream(cs, en, CryptoStreamMode.Write);
   }
   else
   {
      cs = new CryptoStream(sOutput, en, CryptoStreamMode.Write);
      cs = new CryptoStream(cs, new FromBase64Transform(), CryptoStreamMode.Write);
   }
   byte[] buffer = new byte[1024];
   while(true)
   {
      int read = sInput.Read(buffer, 0, buffer.Length);
      if (read == 0) break;
      cs.Write(buffer, 0, read);
   }
   cs.FlushFinalBlock();
   cs.Clear();
   return sOutput.ToArray();
}

This method takes a parameter that indicates whether the method is called to encrypt or decrypt data. If the data is being encrypted then the CryptoStream is created so that a call to Write will first encrypt the data and then base64 encode it. If the data is being decrypted the CryptoStream is created so that it is base64 decoded first and then decrypted. If you add this to the previous example you can change the calling code like this:

ICryptoTransform en = r.CreateEncryptor();
input = CryptoTransform(input, en, true);
Console.WriteLine(Encoding.ASCII.GetString(input));

ICryptoTransform de = r.CreateDecryptor();
byte[] output = CryptoTransform(input, de, false);
Console.WriteLine(Encoding.ASCII.GetString(output));

If you run this code you will get the following

oFFpfcMp/ZtPUJFuYt5AB4sjszhfOodtkv9YGK9NDTQqyAJVs/KsOgAfjuwSP5xe
The quick brown fox jumps over the lazy dog.

The first line is the base64 encoded encrypted data, which is suitable to attach to an email or news message.

There are several issues surrounding these two transform classes. The first issue is that you must create a CryptoStream class to use them and the action of base64 encoding and decoding is so widespread that it really warrants a standalone base64 stream class. I mentioned above that the MIME RFC indicates that base64 attachments should have lines no longer than 76 characters. However, the CryptoStream class and ToBase64Transform pay no attention to line length. So if you use the ToBase64Transform you will have to write the data to some intermediate storage (like a MemoryStream) and then read through the transformed data and add newlines in appropriate places. This will involve allocating at least one more buffer. FromBase64Transform will strip out whitespace from the data passed to it, but this is unnecessary because it is implemented with Convert.FromBase64CharArray which ignores whitespace. This extra processing means a lower performance.

Furthermore, Convert.FromBase64CharArray (used by FromBase64Transform) takes an array of Char and Convert.ToBase64CharArray (used by ToBase64Transform) returns a Char array but the methods that use them (TransformBlock and TransformFinalBlock) handle Byte arrays. This means that there will always be a call to Encoding.ASCII to convert between these two array types. This involves more array allocation and iteration through the values in the various input buffers: yet more CPU cycles are burned.

Since so many temporary buffers are used this means that a lot of copying must occur between all of these buffers. The library code does make a concession to optimisation here because instead of using the generic Array.Copy routine the library methods use the Buffer class. Array.Copy and Buffer.BlockCopy are internalcall, which means that the IL is not available, but the files in Rotor (the Shared Source CLI) show that Array.Copy performs various tests on the array type to see if the data can be copied and if so, how the data should be copied, before it copies the data. The Buffer.BlockCopy method just assumes that the data can be copied. In both cases an array of bytes copied to another array of bytes will be performed by accessing the interior pointer to the two arrays and doing (what is essentially) a call to memmove. However, although these base64 methods make this concession to performance, better performance can be obtained if the code is designed not to perform the allocations and copies.

As you can see this is a huge catalogue of issues and I am very surprised that Microsoft have not addressed them. So I decided that I would create my own base64 stream class. The code and description can be found here.

I hope that you enjoy this tutorial and value the knowledge that you will gain from it. I am always pleased to hear from people who use this tutorial (contact me). If you find this tutorial useful then please also email your comments to mvpga@microsoft.com.

Errata

If you see an error on this page, please contact me and I will fix the problem.

Page Twelve

This page is (c) 2007 Richard Grimes, all rights reserved