.NET Security Workshop
Home About Workshops Articles Writing Talks Books Contact

10 Cryptography

Cryptography is the general term for the mechanisms to encrypt and decrypt data and for the techniques to try and break those codes. Encryption applies some algorithm on data - the cleartext - to produce the encrypted data - the cyphertext. Typically, the algorithms used for the encryption and decryption are well known and this is important because it means that the algorithms have been tested to make sure they are secure - security by obscurity is not real security. The cyphertext is protected because it can only be decrypted using a key that is known only by authorised personnel. This means that encryption is used to ensure that data can only be viewed by authorised personnel, that is, personnel that have access to the right key. This can be utilised in other ways: if Bob has Alice's key, and he knows for sure that it is her key, if this key will decrypt message it means that the message must have come from Alice - this is an authentication mechanism. There are issues performing this with symmetric algorithms, where the same key is used to encrypt and decrypt because this key must be distributed in a secure way. However, there are asymmetric algorithms, where one key encrypts and another decrypts and so one of these can be made purposely public and the other kept extremely private. This enables digital signatures where you can authenticate a message to determine who sent it, or to ensure that the message was not altered during transmission. As you can see, cryptography offers many facilities in addition to the obvious one of keeping data secret.

10.1 Terminology

There are two forms of cryptography, each with their own advantages and disadvantages. Symmetric cryptography uses the same key to decrypt as the one that was used to encrypt the data. Typically, symmetric algorithms are fast, but they have the disadvantage that the key must be kept secret between the users. If the key is compromised then all the encrypted data is compromised. This means that key distribution becomes important - you must create a mechanism to distribute the key in a secure way so that it is delivered to the right person and the mechanism must ensure that the key holder keeps the key secure.

Asymmetric cryptography (also called public key cryptography) uses a pair of keys. One key is used to encrypt the data and the other will decrypt it. Although the keys are mathematically related, there is no way in a feasible amount of time that you can deduce one key from the other. Public key cryptography is based on the mathematics of very large prime numbers and consequently the algorithms can be quite slow. A typical cryptographic solution will use symmetric key cryptography to encrypt a large body of data, because of the speed of the algorithms, but it will use public key cryptography to encrypt the symmetric key for distribution because of the convenience of the using public key distribution.

Clearly the design of a crypto algorithm is important to the security of cyphertext, a simple Caesar Substitution algorithm is far less secure than DES (Data Encryption Standard). However, there is another factor involved, and this is the key length. The longer the key the more difficult it is for an attacker to break the encryption. The key length determines how many possible keys could be used. More keys means that it will take longer to try all of them in a brute force attack. (A brute force attack is the simplest way to break encryption, but it is only one of many.) How long an attack will take depends on the speed of the machines performing the attack, so commentators usually look at the performance of current machines and then apply Moore's Law (that the number of transistors in a processor - and presumably the performance - doubles every 18 months) to guess at the speed of future machines. The current thinking is that for symmetric algorithms 128 bit keys offers sufficient protection, that is, for the foreseeable future computers will not be able to perform a brute force attack in a feasible amount of time.

Asymmetric algorithms are a different situation because the two keys are mathematically related. RSA Security (one of the inventors of the first public key algorithm) suggest that a 3072 bit RSA key is equivalent to a 128 bit symmetric key. They suggest that a 1024 bit key is equivalent to an 80 bit symmetric key which they say will be sufficiently secure until 2010. The following table is reproduced from Bruce Schneier's Applied Cryptography (ISBN 0-471-11709-9), and this shows the symmetric and asymmetric key lengths that have equivalent resistance to brute force attack. The figures do not quite agree with the figures given by RSA, so when choosing a public key length it is prudent to be more conservative.

Symmetric Key Length Asymmetric Key Length
56 bits 384 bits
64 bits 512 bits
80 bits 768 bits
112 bits 1792 bits
128 bits 2304 bits

In addition to the key length, the actual contents of the key is also important. A typical scenario is to use a session key, that is, generate a random symmetric key to encrypt the data, and then encrypt this with the public key of the user, so that the user can use their private key to extract the session key and then decrypt the data. The generation of such session keys means that a reliable random number generator must be used. Functions like the CRT rand function do not provide a truly random number and a cracker can use this to break the code.

If you are encrypting data then you will use a symmetric algorithm. Once you have chosen the algorithm and generated a suitable key you need to decide the mode of the encryption. There are two types: block cyphers and stream cyphers. Block cyphers operate on blocks of data, often 64 bits but it could be longer; stream cyphers operates on streams of data one bit or byte at a time. With a stream cypher, since a bit or byte is so much smaller than the key, it means that when encrypted with the same key will produce a different result on different occasions. With a block cypher, the same block encrypted with the same key, will create the same cyphertext. This gives a cracker a starting point and so block cyphers use feedback and other operations to remove this predictability. This leads to several modes for block cyphers: electronic codebook (ECB), cypher block chaining (CBC), cypher feedback mode (CFB), cypher text stealing (CTS) and output feedback (OFB).

The name ECB comes from the possibility with block cyphers that it is possible to generate a codebook of cyphertext and cleartext blocks. If a cracker has access to encrypted data and cleartext then she can generate the codebook without access to the key. However, before you think that ECB is insecure, bear in mind that for 64 bit blocks the code book will have 264 entries and that there will have to be one code book for every key. However, the same feature can be used to advantage because it means that the blocks of data do not have to be encrypted in the order that they appear in the cleartext and so algorithms can use parallelization techniques. Furthermore, if the cyphertext is corrupted during transmission then the corrupted bits in the cyphertext affect only the associated clear text block: no other part of the message is affected. Since not all cleartext will be an exact number of blocks in length the data must be padde and, the padding must make it clear which bytes are padding bytes and not cleartext so that when the cyphertext is decrypted the padding bytes can be discarded.

CBC mode uses a feedback mechanism to try and get around the code book issues of block cyphers. In this case the result of the encryption of the last block is XORed to the next cleartext block before encryption. This means that if you encrypt several identical blocks the resultant cyphertext blocks will not be the same. The encryption and decryption much be performed in opposite order and the order of the blocks become important: you cannot use parallelization techniques. Furthermore, there must be some initial block that will be used to encrypt the first block and is called an initialization vector (IV). An IV must be chosen carefully because if the same IV is used for two messages encrypted with the same key, then the cyphertext block will be the same. To get round this issue the IV should be random, but note that it need not be secret, the reason is that every block in the cyphertext is effectively the IV for the next block. The security is in the key, not in the IV. With CBC mode errors in cyphertext block will translate, during decryption, to a completely garbled block, and to an error in the next block too. If a cyphertext block is lost or added then all subsequent blocks are affected, thus your application must be able to detect if a malicious attacker has added an extra block.

In CBC the encryption cannot begin until a complete block has been received: this is an inherent problem with block cyphers. However, there are mechanisms to make a block cypher act upon streamed data. One such mode is cypher feedback (CFB). With this mode, the data can be encrypted in units smaller than the block because it uses a shift register that represents the block that will be encryped. Initially this shift register contains the IV, but as data arrives it is shifted into the register. This register is XORed with the previous encrypted block before encryption. In many respects, CFB is similar to CBC, but error propagation is different because the shifting means that a garbled byte will get shifted through the register and hence affect many blocks. For a n-bit CFB mode using m-bit block size, a single error will affect m/n-1 blocks, for example, an 8-bit CFB mode with 64-bit blocks an error will affect 9 bytes of decrypted cleartext.

Another mode that uses a shift register is output feedback mode, however, it is different in how the encryption occurs. In OFB the block cypher is used to generate a pseudorandom stream of bytes (called the key stream) from the previous cyphertext and this stream is then XORed with the cleartext to generate the cyphertext. An IV is used to generate the first key stream. This scheme has the advantage that no padding is required and that a single error in the cyphertext will only result in a single error in the decrypted cleartext, so no error propagation occurs. However, lost blocks will result in gibberish from decrypting subsequent blocks. The feedback size should be the same size as the block size.

The final mode I will mention here is called cyphertext stealing. This is like CBC except that it does not need padding. If the cleartext is an exact number of blocks then CTS is the same as CBC. If the cleartext is any other size then the data is padded with zeros and CBC is applied. Then the last two blocks are exchanged and the result is truncated to the size of the cleartext.

Many cryptographic techniques depend upon one way functions. As the name suggest, the function is not reversible: if you perform a one way function upon some data it is not possible to generate the original data from the result. Such one way functions are used for hash functions where the function creates a unique result for the input data. Thus, given two different documents the hashes created will be different and it is not possible to derive the original document from either hash. The uniqueness of the hash is just as important as the one-way feature. Hashes combined with public key cryptography are used to generate digital signatures.

Public key cryptography relies on the public key being made available but verifiably authentic. If you encrypt a session key with a user's public key, then only the user's private key can decrypt the cyphertext and get the session key, but you must be confident that you are using the user's public key. Certificates are a mechanism to give you trust in a key. A certificate is essentially a public key for someone who has been verified by a trusted organization called a certificate authority (CA). The CA shows that the user is verified by signing the certificate, essentially by creating a hash of the key and signing it with the CA's private key. The CA itself can be authenticated by another authority, and its certificate will be signed by another authority. This creates a chain of trust and sometimes when you receive a certificate you need to walk this chain until you arrive at a CA that you trust.

Finally, in this round up of terms, it is important to point out that asymmetric keys can be created for either signing or for key exchange and one key cannot be used for the other.

10.2 .NET Cryptographic Classes

Almost all of .NET's cryptography classes are wrappers around Windows CryptoAPI, that is, they are managed wrappers around unmanaged classes. The exception is the Rijndael class which is implemented totally in managed code. The Rijndael algorithm was chosen in a public competition in 2001 by the National Institute of Standards and Technology (NIST) as the Advanced Encryption Standard (AES). In 2003 AES was approved by the US government for classified information. From all of this you should get the impression about the preferred algorithm to use for new projects! However, .NET provides other algorithms so that you can integrate with code that uses other algorithms.

There are several design aspects in the .NET framework library that you should be made aware of. The first point is that if you are going to apply a crypto algorithm on data that data is most likely obtained via a stream. For example, if you read data from a file then you'll have a FileStream and if you obtain the data from a socket then you'll use a NetworkStream. The data underlying those streams may not be stream based themselves (for example, FileStream is buffered) but your access to that data will be through a stream. The .NET cryptographic library provides a class called CryptoStream that derives from Stream and hence can be used polymorphically where any other stream is required. However, the most important aspect of CryptoStream is that it is based on another stream, which means that you are chaining streams together. So if you want to encrypt the contents of a file, you open the file to obtain a FileStream and then use it to create a CryptoStream. You can then use the stream API to read data from the file and encrypt it in one action.

CryptoStream is the only stream class in the framework that behaves in this way. This is a great pity, anyone who has used the standard C++ library will tell you how useful the stream operators can be. One area where a stream class would be immensely useful is in the encoding and decoding of base64 data: the code in the framework to do this is unashamedly block based. I have provided a stream class that will do this work, and this article explains what is wrong with the .NET framework code for base64 encoding and describes my class.

Of course, the CryptoStream must also be constructed using a cryptographic algorithm. So that you can use one of many algorithms this class takes a reference to an ICryptoTransform interface. The following table lists the public classes in the framework that implement this interface:

Class Description
CryptoAPITransform Represents a cryptographic algorithm that encrypts or decrypts data.
FromBase64Transform Converts data from base64
HashAlgorithm Creates a hash from the input data
RijndaelManagedTransform Transforms data using the Rijndael cryptographic algorithm
ToBase64Algorithm Converts data to base64

The ICryptoTransform interface converts one type of data into another type. This is why there are two classes for base64: one to encode to base64 and another to decode from base64. It is important to point out that the methods of ICryptoTransform are block based (TransformBlock, TransformFinalBlock) and are used to initialize a stream, by necessity this means that there must be some intermediate storage of data.

There is only one class for hash functions because hashes are one way, so you can only create a hash from the input data, you cannot create the data from the hash. Unlike the other transforms, data passed through a CryptoStream constructed from a HashAlgorithm is not changed, instead, once all the data is passed through the stream you extract the computed hash.

Notice that CryptoAPITransform is a generic class, it does not represent any specific cryptographic algorithm. Indeed, this class does not have any public constructors. To see how you can initialize a CryptoStream with one of the cryptographic algorithm we need to look a little deeper. All of the symmetric algorithms derive from a class called SymmetricAlgorithm, and hence do not derive from CryptoAPITransform nor implement ICryptoTransform. The SymmetricAlgorithm class has two interesting methods, one called CreateEncryptor and the other called CreateDecryptor. Both of these methods return an ICryptoTransform reference, DES, RC2 and TripleDES will return an instance of CryptoAPITransform (which has an internal constructor that those classes can call); the methods on the Rijndael will return an instance of RijndaelManagedTransform, which is an internal class in 1.1 (but public in .NET 3.0/2.0).

These are implementation details, of course, but as you can see the library has a far from obvious design. Indeed, there is another level of indirection that you must be made aware of. The framework provides implementations of symmetric algorithms in the classes: DES, RC2, Rijndael and TripleDES. However, these are all abstract classes, so you cannot create instances of them. Instead, each one has a static method called Create which will create an instance of another class. You'll find that the implementation of Create in Rijndael creates an instance of RijndaelManaged; DES creates DESCryptoServiceProvider; RC2 creates RC2CryptoServiceProvider and TripleDES creates TripleDESCryptoServiceProvider. All of these classes are subclasses of the class that creates them, and each one has a public default constructor. Of these, only RijndaelManaged is implemented as managed code, the others are implemented by the unmanaged Windows CryptoAPI.

The reason why Microsoft decided to use this rather odd scheme is that the abstract base class creates an instance of the default implementation. These abstract classes have an overload of Create that takes a string parameter which is the name of the implementation that you require. You are free to write your own version of, say, DES by deriving your class from the DES class and you can then register the class with the system. If you decide that your class is far better than the Microsoft implementation then you can change the type of the default class. The framework uses the CryptoConfig class to obtain the default class names from the machine.config file. Thus, you can change the machine.config file to contain the name of your new class.

Reiterating what I mentioned above, CryptoStream is based on an implementation of ICryptoTransform. The ICryptoTransform implementation provides an algorithm that encrypts or decrypts data. The CryptoStream instance can be created in either read or write mode, which indicates whether the transform is applied to data read from the underlying stream, or whether it is applies to data that is transformed and then written to the underlying stream. A CryptoStream instance can only support one mode, however, encryptors and decryptors can be used by either mode.

The framework provides four implementations of symmetric algorithms (he abstract classes DES, RC2, Rijndael and TripleDES), all of these have a CreateEncryptor and a CreateDecryptor method and so can be used to provide an ICryptoTransform to initialise a CryptoStream object. The framework also provides two implementations of asymmetric algorithms (the abstract classes are RSA and DSA) but neither of these classes provide a CreateEncryptor nor a CreateDecryptor method. This means that there are no corresponding ICryptoTransform classes associated with these algorithms. The reason is that CryptoStream is intended to be used to encrypt large blocks of data and public key (asymmetric) algorithms are not suited to this task. Public key algorithms are suited to encrypting smaller values in key exchange and creating digital signatures.

These uses of public key algorithms often require a certificate. The .NET framework library provides the X509Certificate class to allow you to get information about X509 certificates created from a file containing a DER ASN.1 formatted certificate and from files signed with such a certificate. The class will handle version 3 of X509 but the information that it returns are only the X509 version 1 fields.

.NET Version 3.0
The X509Certificate2 class extends the support to give information about the version 2 and version 3 fields of the X509 certificate. The framework also has support for accessing certificates from the system store, through code or through a dialog requiring user action. There is also extensive support for trust chains and revocation lists, and for X509 certificate extensions.

As mentioned earlier, signature generation uses asymmetric keys and the concrete asymmetric classes, RSACryptoServiceProvider and DSACryptoServiceProvider provide methods to create signatures. The framework also provides classes, based on RSA, to exchange keys. RSAPKCS1KeyExchangeFormatter class allows you to encrypt a symmetric session key with a public key whereas the RSAPKCS1KeyExchangeDeformatter class will decrypt the data with the private key to create the session key.

.NET Version 3.0
Version 3.0/2.0 of the framework has classes for Public Key Cryptography Standard (PKCS) and Cryptographic Message Standard (CMS, a superset of PKCS7). These classes provide a standard way to provide digitally signed and digitally enveloped (encrypted) messages.

Digital signatures depend on the generation of a hash and the framework provides classes derived from HashAlgorithm that implement the common hash algorithms: MD5 and SHA. In addition, the abstract class KeyedHashAlgorithm is a base class for hash algorithms that are based on a cryptographic key. In version 1.1 of the framework the only implementation of a keyed hash algorithm is based on Triple DES.

.NET Version 3.0
Version 3.0/2.0 of the framework has a hash class for RIPEMD and keyed hash algorithms based on MD5, RIPEMD and SHA.

The following pages cover in depth symmetric and asymmetric cryptography, key exchange and signatures.

I hope that you enjoy this tutorial and value the knowledge that you will gain from it. I am always pleased to hear from people who use this tutorial (contact me). If you find this tutorial useful then please also email your comments to mvpga@microsoft.com.

Errata

If you see an error on this page, please contact me and I will fix the problem.

Page Eleven

This page is (c) 2007 Richard Grimes, all rights reserved