Introduction


Base64 encoding is used to convert binary data into a text-like format that allows it to be transported in environments that can handle only text safely. Use cases are encoding UID's for use in HTTP URL's, encoding encryption keys and certificates to make them safely portable through e-mail, display them in HTML pages and use them with copy and paste.

Base64 is sometimes also refered to as PEM, which stands for Privacy-enhanced Electronic Mail. There, Base64 was used to create printable text again after binary e-mail data that was generated during the e-mail encryption process.

How it works


Base64 encoding takes the original binary data and operates on it by dividing it into tokens of three bytes. A byte consists of eight bits, so Base64 takes 24bits in total. These 3 bytes are then converted into four printable characters from the ASCII standard.

The first step is to take the three bytes (24bit) of binary data and split it into four numbers of six bits. Because the ASCII standard defines the use of seven bits, Base64 only uses 6 bits (corresponding to 2^6 = 64 characters) to ensure the encoded data is printable and none of the special characters available in ASCII are used. The algorithm's name Base64 comes from the use of these 64 ASCII characters. The ASCII characters used for Base64 are the numbers 0-9, the alphabets 26 lowercase and 26 uppercase characters plus two extra characters '+' and '/'.

Base64 Encoding/Decoding Table
A B C D E F G H I J K L M N O P
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
 
Q R S T U V W X Y Z a b c d e f
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
 
g h i j k l m n o p q r s t u v
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 
w x y z 0 1 2 3 4 5 6 7 8 9 + /
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
 

In our programs, we can simply define this table as a character array. For example in 'C' we will do:

/* ---- Base64 Encoding/Decoding Table --- */
char b64[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

Technically, there is a 65th character '=' in use, but more about it further down.

The ASCII conversion of 3-byte, 24-bit groups is repeated until the whole sequence of original data bytes is encoded. To ensure the encoded data can be properly printed and does not exceed any mail server's line length limit, newline characters are inserted to keep line lengths below 76 characters.

What happens when the last sequence of data bytes to encode is not exactly 3 bytes long? If the size of the original data in bytes is not a multiple of three, we might end up with only one or two remaining (8-bit) bytes. The solution is to add the missing bytes by using a byte value of '0' to create the final 3-byte group. Because these artificial trailing '0's cannot be encoded using the encoding table, we introduce a 65th character: '=' to represent '0'. Naturally, this character can only appear at the end of encoded data.

Example


Let's say we want to convert three bytes 155, 162 and 233. The corresponding 24-bit stream is 100110111010001011101001.

155 -> 10011011
162 -> 10100010
233 -> 11101001

Splitting up these bits into 4 groups of 6bit creates the following 4 decimal values: 38, 58, 11 and 41.

100110 -> 38
111010 -> 58
001011 -> 11
101001 -> 41

Converting these into ASCII characters using the Base64 encoding table translates them into the ASCII sequence "m6Lp".

38 -> m
58 -> 6
11 -> L
41 -> p

Further Information


Here are code examples demonstrating the algorithm in the most popular programming languages such as 'C', Perl, Shell, Java, PHP Javascript and Microsoft's VBS. The implementation is similar for all languages, allowing a easy comparison.

See Also:

Sample Code: