# 10 Cryptography

Cryptography is the general term for the mechanisms to encrypt and decrypt data and for the techniques to try and break those codes. Encryption applies some algorithm on data - the cleartext - to produce the encrypted data - the cyphertext. Typically, the algorithms used for the encryption and decryption are well known and this is important because it means that the algorithms have been tested to make sure they are secure - security by obscurity is not real security. The cyphertext is protected because it can only be decrypted using a key that is known only by authorised personnel. This means that encryption is used to ensure that data can only be viewed by authorised personnel, that is, personnel that have access to the right key. This can be utilised in other ways: if Bob has Alice's key, and he knows for sure that it is her key, if this key will decrypt message it means that the message must have come from Alice - this is an authentication mechanism. There are issues performing this with symmetric algorithms, where the same key is used to encrypt and decrypt because this key must be distributed in a secure way. However, there are asymmetric algorithms, where one key encrypts and another decrypts and so one of these can be made purposely public and the other kept extremely private. This enables digital signatures where you can authenticate a message to determine who sent it, or to ensure that the message was not altered during transmission. As you can see, cryptography offers many facilities in addition to the obvious one of keeping data secret.

## 10.1 Terminology

There are two forms of cryptography, each with their own advantages and
disadvantages. *Symmetric cryptography* uses the same key to decrypt as
the one that was used to encrypt the data. Typically, symmetric algorithms are fast, but
they have the disadvantage that the key must be kept secret between the users. If the key is
compromised then all the encrypted data is compromised. This means that key
distribution becomes important - you must create a mechanism to distribute the
key in a secure way so that it is delivered to the right person and the
mechanism must ensure that the key holder keeps
the key secure.

*Asymmetric cryptography* (also called *public key cryptography*) uses a pair of keys. One key is
used to encrypt the data and the other will decrypt it. Although the keys are
mathematically related, there is no way in a feasible amount of time that you
can deduce one key from the other. Public key cryptography is based on the
mathematics of very large prime numbers and consequently the algorithms can be
quite slow. A typical cryptographic solution will use symmetric key
cryptography to encrypt a large body of data, because of the speed of the
algorithms, but it will use public key
cryptography to encrypt the symmetric key for distribution because of the
convenience of the using public key distribution.

Clearly the design of a crypto algorithm is important to the security of
cyphertext, a simple
Caesar Substitution algorithm is far less secure than DES (Data
Encryption Standard). However, there is another factor involved, and this
is the *key length*. The
longer the key the more difficult it is for an attacker to break the
encryption. The key length determines how many possible keys could be used.
More keys means that it will take longer to try all of them in a
*brute force attack*. (A brute force attack is the simplest way to break encryption, but it is only one of many.) How long an attack will
take depends on the speed of the machines performing the attack, so
commentators usually look at the performance of current machines and then
apply Moore's Law (that the number of transistors in a processor - and
presumably the performance - doubles every 18 months) to guess at the speed of
future machines. The current thinking is that for symmetric algorithms 128 bit
keys offers sufficient protection, that is, for the foreseeable future computers
will not be able to perform a brute force attack in a feasible amount of time.

Asymmetric algorithms are a different situation because the two keys are mathematically related. RSA Security (one of the inventors of the first public key algorithm) suggest that a 3072 bit RSA key is equivalent to a 128 bit symmetric key. They suggest that a 1024 bit key is equivalent to an 80 bit symmetric key which they say will be sufficiently secure until 2010. The following table is reproduced from Bruce Schneier's Applied Cryptography (ISBN 0-471-11709-9), and this shows the symmetric and asymmetric key lengths that have equivalent resistance to brute force attack. The figures do not quite agree with the figures given by RSA, so when choosing a public key length it is prudent to be more conservative.

Symmetric Key Length | Asymmetric Key Length |
---|---|

56 bits | 384 bits |

64 bits | 512 bits |

80 bits | 768 bits |

112 bits | 1792 bits |

128 bits | 2304 bits |

In addition to the key length, the actual contents of the key is also
important. A typical scenario is to use a *session key*, that is,
generate a random symmetric key to encrypt the data, and then encrypt this
with the public key of the user, so that the user can use their private key to
extract the session key and then decrypt the data. The generation of such
session keys means that a reliable *random number generator* must be
used. Functions like the CRT `rand`

function do not provide a truly
random number and a cracker can use this to break the code.

If you are encrypting data then you will use a symmetric algorithm. Once
you have chosen the algorithm and generated a suitable key you need to decide
the *mode* of the encryption. There are two types: *block cyphers*
and *stream cyphers*. Block cyphers operate on blocks of data, often 64
bits but it could be longer; stream cyphers operates on streams of data one
bit or byte at a time. With a stream cypher, since a bit or byte is so much
smaller than the key, it means that when encrypted with the same key will
produce a different result on different occasions. With a block cypher, the
same block encrypted with the same key, will create the same cyphertext. This
gives a cracker a starting point and so block cyphers use feedback and other
operations to remove this predictability. This leads to several modes for
block cyphers: *electronic codebook* (ECB), *cypher block chaining*
(CBC), *cypher feedback mode* (CFB), *cypher text stealing* (CTS)
and *output feedback* (OFB).

The name ECB comes from the possibility with block cyphers that it is
possible to generate a codebook of cyphertext and cleartext blocks. If a
cracker has access to encrypted data and cleartext then she can generate the
codebook without access to the key. However, before you think that ECB is
insecure, bear in mind that for 64 bit blocks the code book will have 2^{64}
entries and that there will have to be one code book for every key. However,
the same feature can be used to advantage because it means that the blocks of
data do not have to be encrypted in the order that they appear in the
cleartext and so algorithms can use parallelization techniques. Furthermore,
if the cyphertext is corrupted during transmission then the corrupted bits in
the cyphertext affect only the associated clear text block: no other part of
the message is affected. Since not all cleartext will be an exact number of
blocks in length the data must be padde and, the padding must make it clear
which bytes are padding bytes and not cleartext so that when the cyphertext is
decrypted the padding bytes can be discarded.

CBC mode uses a feedback mechanism to try and get around the code book
issues of block cyphers. In this case the result of the encryption of the last
block is XORed to the next cleartext block before encryption. This means that
if you encrypt several identical blocks the resultant cyphertext blocks will
not be the same. The encryption and decryption much be performed in opposite
order and the order of the blocks become important: you cannot use
parallelization techniques. Furthermore, there must be some initial block that
will be used to encrypt the first block and is called an *initialization
vector* (IV). An IV must be chosen carefully because if the same IV is used
for two messages encrypted with the same key, then the cyphertext block will
be the same. To get round this issue the IV should be random, but note that it
need not be secret, the reason is that every block in the cyphertext is
effectively the IV for the next block. The security is in the key, not in the
IV. With CBC mode errors in cyphertext block will translate, during
decryption, to a completely garbled block, and to an error in the next block
too. If a cyphertext block is lost or added then all subsequent blocks are
affected, thus your application must be able to detect if a malicious attacker
has added an extra block.

In CBC the encryption cannot begin until a complete block has been
received: this is an inherent problem with block cyphers. However, there are
mechanisms to make a block cypher act upon streamed data. One such mode is
cypher feedback (CFB). With this mode, the data can be encrypted in units
smaller than the block because it uses a shift register that represents the
block that will be encryped. Initially this shift register contains the IV,
but as data arrives it is shifted into the register. This register is XORed
with the previous encrypted block before encryption. In many respects, CFB is
similar to CBC, but error propagation is different because the shifting means
that a garbled byte will get shifted through the register and hence affect
many blocks. For a *n*-bit CFB mode using *m*-bit block size, a
single error will affect *m/n-1* blocks, for example, an 8-bit CFB mode
with 64-bit blocks an error will affect 9 bytes of decrypted cleartext.

Another mode that uses a shift register is output feedback mode, however,
it is different in how the encryption occurs. In OFB the block cypher is used
to generate a pseudorandom stream of bytes (called the *key stream*) from
the previous cyphertext and this stream is then XORed with the cleartext to
generate the cyphertext. An IV is used to generate the first key stream. This
scheme has the advantage that no padding is required and that a single error
in the cyphertext will only result in a single error in the decrypted
cleartext, so no error propagation occurs. However, lost blocks will result in
gibberish from decrypting subsequent blocks. The feedback size should be the
same size as the block size.

The final mode I will mention here is called cyphertext stealing. This is like CBC except that it does not need padding. If the cleartext is an exact number of blocks then CTS is the same as CBC. If the cleartext is any other size then the data is padded with zeros and CBC is applied. Then the last two blocks are exchanged and the result is truncated to the size of the cleartext.

Many cryptographic techniques depend upon *one way functions*. As the
name suggest, the function is not reversible: if you perform a one way
function upon some data it is not possible to generate the original data from
the result. Such one way functions are used for *hash functions* where
the function creates a unique result for the input data. Thus, given two
different documents the hashes created will be different and it is not
possible to derive the original document from either hash. The uniqueness of
the hash is just as important as the one-way feature. Hashes combined with
public key cryptography are used to generate *digital signatures*.

Public key cryptography relies on the public key being made available but
verifiably authentic. If you encrypt a session key with a user's public key,
then only the user's private key can decrypt the cyphertext and get the
session key, but you must be confident that you are using the user's public
key. Certificates are a mechanism to give you trust in a key. A certificate is
essentially a public key for someone who has been verified by a trusted
organization called a *certificate authority* (CA). The CA shows that the
user is verified by signing the certificate, essentially by creating a hash of
the key and signing it with the CA's private key. The CA itself can be
authenticated by another authority, and its certificate will be signed by
another authority. This creates a chain of trust and sometimes when you
receive a certificate you need to walk this chain until you arrive at a CA
that you trust.

Finally, in this round up of terms, it is important to point out that asymmetric keys can be created for either signing or for key exchange and one key cannot be used for the other.

## 10.2 .NET Cryptographic Classes

Almost all of .NET's cryptography classes are wrappers around Windows
CryptoAPI, that is, they are managed wrappers around unmanaged classes. The
exception is the `Rijndael`

class which is implemented totally in
managed code. The `Rijndael`

algorithm was chosen in a public
competition in 2001 by the National Institute of Standards and Technology (NIST)
as the Advanced Encryption Standard (AES). In 2003 AES was approved by the US
government for classified information. From all of this you should get the
impression about the preferred algorithm to use for new projects! However,
.NET provides other algorithms so that you can integrate with code that uses
other algorithms.

There are several design aspects in the .NET framework library that you
should be made aware of. The first point is that if you are going to apply a
crypto algorithm on data that data is most likely obtained via a stream. For
example, if you read data from a file then you'll have a `FileStream`

and if you obtain the data from a socket then you'll use a `NetworkStream`

.
The data underlying those streams may not be stream based themselves (for
example, `FileStream`

is buffered) but your access to that data
will be through a stream. The .NET cryptographic library provides a class
called `CryptoStream`

that derives from `Stream`

and
hence can be used polymorphically where any other stream is required. However,
the most important aspect of `CryptoStream`

is that it is based on
another stream, which means that you are chaining streams together. So if you want
to encrypt the contents of a file, you open the file to obtain a ```
FileStream
```

and then use it to create a `CryptoStream`

. You
can then use the stream API to read data from the file and encrypt it in one
action.

`CryptoStream` is the only stream class in the framework that behaves
in this way. This is a great pity, anyone who has used the standard C++ library
will tell you how useful the stream operators can be. One area where a stream
class would be immensely useful is in the encoding and decoding of base64 data:
the code in the framework to do this is unashamedly block based. I have provided
a stream class that will do this work, and
this article explains what is wrong
with the .NET framework code for base64 encoding and describes my class. |

Of course, the `CryptoStream`

must also be constructed using a
cryptographic algorithm. So that you can use one of many algorithms this class
takes a reference to an `ICryptoTransform`

interface. The following
table lists the public classes in the framework that implement this interface:

Class | Description |
---|---|

`CryptoAPITransform` |
Represents a cryptographic algorithm that encrypts or decrypts data. |

`FromBase64Transform` |
Converts data from base64 |

`HashAlgorithm` |
Creates a hash from the input data |

`RijndaelManagedTransform` |
Transforms data using the Rijndael cryptographic algorithm |

`ToBase64Algorithm` |
Converts data to base64 |

The `ICryptoTransform`

interface converts one type of data into
another type. This is why there are two classes for base64: one to encode to
base64 and another to decode from base64. It is important to point out that
the methods of `ICryptoTransform`

are *block* based (`TransformBlock`

,
`TransformFinalBlock`

) and are used to initialize a *stream*,
by necessity this means that there must be some intermediate storage of data.

There is only one class for hash
functions because hashes are one way, so you can only create a hash from the
input data, you cannot create the data from the hash. Unlike the other
transforms, data passed through a `CryptoStream`

constructed from a
`HashAlgorithm`

is not changed, instead, once all the data is
passed through the stream you extract the computed hash.

Notice that `CryptoAPITransform`

is a generic class, it does not
represent any specific cryptographic algorithm. Indeed, this class does not
have any public constructors. To see how you can initialize a ```
CryptoStream
```

with one of the cryptographic algorithm we need to look a
little deeper. All of the symmetric algorithms derive from a class called
`SymmetricAlgorithm`

, *and hence do not derive from
CryptoAPITransform nor implement ICryptoTransform*. The

`SymmetricAlgorithm`

class has two interesting methods, one
called `CreateEncryptor`

and the other called `CreateDecryptor`

.
Both of these methods return an `ICryptoTransform`

reference, ```
DES
```

, `RC2`

and `TripleDES`

will return an
instance of `CryptoAPITransform`

(which has an internal constructor
that those classes can call); the methods on the `Rijndael`

will
return an instance of `RijndaelManagedTransform`

, which is an
internal class in 1.1 (but public in .NET 3.0/2.0).These are implementation details, of course, but as you can see the library
has a far from obvious design. Indeed, there is another level of indirection
that you must be made aware of. The framework provides implementations of
symmetric algorithms in the classes: `DES`

, `RC2`

, ```
Rijndael
```

and `TripleDES`

. However, these are all abstract classes,
so you cannot create instances of them. Instead, each one has a static method
called `Create`

which will create an instance of another class. You'll find that the implementation of `Create`

in
`Rijndael`

creates an instance of `RijndaelManaged`

;
`DES`

creates `DESCryptoServiceProvider`

; `RC2`

creates `RC2CryptoServiceProvider`

and `TripleDES`

creates `TripleDESCryptoServiceProvider`

. All of these classes are
subclasses of the class that creates them, and each one has a public default
constructor. Of these, only `RijndaelManaged`

is implemented as
managed code, the others are implemented by the unmanaged Windows CryptoAPI.

The reason why Microsoft decided to use this rather odd scheme is that the
abstract base class creates an instance of the *default* implementation.
These abstract classes have an overload of `Create`

that takes a
string parameter which is the name of the implementation that you require. You are free to write your own version of, say, DES by deriving
your class from the
`DES`

class and you can then register the class with the system. If you decide that your class is far better than the
Microsoft implementation then you can change the type of the default class.
The framework uses the `CryptoConfig`

class to obtain the
default class names from the `machine.config`

file. Thus, you can
change the `machine.config`

file to contain the name of your new
class.

Reiterating what I mentioned above, `CryptoStream`

is based on
an implementation of `ICryptoTransform`

. The `ICryptoTransform`

implementation provides an algorithm that encrypts or decrypts data. The ```
CryptoStream
```

instance can be created in either read or write mode,
which indicates whether the transform is applied to data read from the
underlying stream, or whether it is applies to data that is transformed and then written to the
underlying stream. A `CryptoStream`

instance can only support one
mode, however, encryptors and decryptors can be used by either mode.

The framework provides four implementations of symmetric algorithms (he
abstract classes `DES`

, `RC2`

, `Rijndael`

and
`TripleDE`

S), all of these have a `CreateEncryptor`

and a `CreateDecryptor`

method and so can be used to provide an `ICryptoTransform`

to
initialise a `CryptoStream`

object. The framework also provides two implementations of asymmetric algorithms (the
abstract classes are `RSA`

and `DSA`

) but neither of these
classes provide a `CreateEncryptor`

nor a `CreateDecryptor`

method. This means that there are no corresponding `ICryptoTransform`

classes associated with these algorithms. The reason is that `CryptoStream`

is
intended to be used to encrypt large blocks of data and public key
(asymmetric) algorithms are not suited to this task. Public key algorithms are
suited to encrypting smaller values in key exchange and creating digital signatures.

These uses of public key algorithms often require a certificate. The .NET
framework library provides the `X509Certificate`

class to allow you
to get information about X509 certificates created from a file containing a
DER ASN.1 formatted certificate and from files signed with such a certificate.
The class will handle version 3 of X509 but the information that it returns
are only the X509 version 1 fields.

.NET Version 3.0The `X509Certificate2` class extends the support to give information
about the version 2 and version 3 fields of the X509 certificate. The framework
also has support for accessing certificates from the system store, through code
or through a dialog requiring user action. There is also extensive support for
trust chains and revocation lists, and for X509 certificate
extensions. |

As mentioned earlier, signature generation uses asymmetric keys and the
concrete asymmetric classes, `RSACryptoServiceProvider`

and ```
DSACryptoServiceProvider
```

provide methods to create signatures. The
framework also provides classes, based on RSA, to exchange keys. ```
RSAPKCS1KeyExchangeFormatter
```

class allows you to encrypt a symmetric
session key with a public key whereas the `RSAPKCS1KeyExchangeDeformatter`

class will decrypt the data with the private key to create the session key.

.NET Version 3.0Version 3.0/2.0 of the framework has classes for Public Key Cryptography Standard (PKCS) and Cryptographic Message Standard (CMS, a superset of PKCS7). These classes provide a standard way to provide digitally signed and digitally enveloped (encrypted) messages. |

Digital signatures depend on the generation of a hash and the framework
provides classes derived from `HashAlgorithm`

that implement the
common hash algorithms: MD5 and SHA. In addition, the abstract class ```
KeyedHashAlgorithm
```

is a base class for hash algorithms that are based
on a cryptographic key. In version 1.1 of the framework the only
implementation of a keyed hash algorithm is based on Triple DES.

.NET Version 3.0Version 3.0/2.0 of the framework has a hash class for RIPEMD and keyed hash algorithms based on MD5, RIPEMD and SHA. |

The following pages cover in depth symmetric and asymmetric cryptography, key exchange and signatures.

I hope that you enjoy this tutorial and value the knowledge that you will gain from it. I am always pleased to hear from people who use this tutorial (contact me). If you find this tutorial useful then please also email your comments to mvpga@microsoft.com. |

## Errata

If you see an error on this page, please contact me and I will fix the problem.