archive-org.com » ORG » J » JOSEFSSON.ORG

Total: 236

Choose link from "Titles, links and description words view":

Or switch to "Titles and links view".

  • of Contents 1 Introduction 3 2 Summary 3 3 Methodology 3 4 Exceptions 3 5 Implementations Tested 4 5 1 GNU Coreutils base64 4 6 Acknowledgements 4 7 Security Considerations 4 8 IANA Considerations 4 9 References 5 9 1 Normative References 5 9 2 Informative References 5 Author s Address 5 Josefsson Expires November 29 2009 Page 2 Internet Draft RFC 4648 Implementation Report May 2009 1 Introduction This is an implementation report of The Base16 Base32 and Base64 Data Encodings RFC4648 document It follows the outline suggested by I D dusseault impl reports 2 Summary The base64 encoding have a long history of being used in Internet protocols the earliest use in the RFC series appears to be RFC0989 It has been widely implemented as part of MIME RFC2045 which is already a Draft Standard The base64url alphabet does not appear to be widely deployed The base32 encoding is not as widely used as base64 but has applications in the case insensitive systems The base32 encoding is used by I D ietf sasl gs2 The base32hex encoding is used by RFC5155 and a restricted form is used by RFC2938 The base16 encoding is usually referred to as hexadecimal or hex encoding and is used in many protocols and informally in technical documents 3 Methodology We identified that we wanted to test the following encodings base64 base64url base32 base32hex base16 The primary test is of course that basic encoding and decoding works and lead to expected results Section 3 of RFC 4648 discuss some implementation discrepancies of base encoding To possibly find interoperability problems we checked and documented these corner cases separately In particular how line feeds are handled during encoding and decoding LF whether padding is done correctly PAD how non alphabetical characters are handled NONALPHA whether pad bits are zero or not ZEROBITS A useful test case whether pad bit handling is appropriate is YR which is a non canonical encoding of a ASCII 0x61 that normally is encoded as YQ 4 Exceptions Encoding and decoding of data interoperate well Josefsson Expires November 29 2009 Page 3 Internet Draft RFC 4648 Implementation Report May 2009 Some tools accepted non canonical encodings but none appeared to ever generate them This is consistent with the requirements in section 3 5 of RFC 4648 We acknowledge that many implementations of base64 were written for a general purpose and thus may not follow some of the guidelines e g related to line feeds in RFC 4648 strictly However we believe this should not be a reason against Draft Standard status because the document is clear on the issues and the minor variations does not appear to lead to any problem 5 Implementations Tested 5 1 GNU Coreutils base64 There is a base64 command line tool written in C included in GNU Coreutils GNU Coreutils Base64 It supports the base64 alphabet LF On encoding it wraps output after 76 characters same as MIME On decoding it accepts line wrapped input PAD It

    Original URL path: http://www.josefsson.org/base-encoding/draft-josefsson-rfc4648-impl-report-00.txt (2016-04-30)
    Open archived version from archive



  • special meaning in some file system environments the encoding described in this section is recommended instead The remaining unreserved URI character is but some file system environments does not permit multiple in a filename thus making the character unattractive as well The pad character is typically percent encoded when used in an URI 9 but if the data length is known implicitly this can be avoided by skipping the padding see section 3 2 This encoding may be referred to as base64url This encoding should not be regarded as the same as the base64 encoding and should not be referred to as only base64 Unless made clear base64 refer to the base 64 in the previous section This encoding is technically identical to the previous one except for the 62 nd and 63 rd alphabet character as indicated in table 2 Table 2 The URL and Filename safe Base 64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 minus 12 M 29 d 46 u 63 13 N 30 e 47 v underline 14 O 31 f 48 w 15 P 32 g 49 x 16 Q 33 h 50 y pad Josefsson Expires November 12 2006 Page 9 Internet Draft Base N encodings May 2006 6 Base 32 Encoding The following description of base 32 is derived from 11 with corrections This encoding may be referred to as base32 The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but need not be humanly readable A 33 character subset of US ASCII is used enabling 5 bits to be represented per printable character The extra 33rd character is used to signify a special processing function The encoding process represents 40 bit groups of input bits as output strings of 8 encoded characters Proceeding from left to right a 40 bit input group is formed by concatenating 5 8bit input groups These 40 bits are then treated as 8 concatenated 5 bit groups each of which is translated into a single character in the base 32 alphabet When encoding a bit stream via the base 32 encoding the bit stream must be presumed to be ordered with the most significant bit first That is the first bit in the stream will be the high order bit in the first 8bit byte and the eighth bit will be the low order bit in the first 8bit byte and so on Each 5 bit group is used as an index into an array of 32 printable characters The character referenced by the index is placed in the output string These characters identified in Table 3 below are selected from US ASCII digits and uppercase letters Table 3 The Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 9 J 18 S 27 3 1 B 10 K 19 T 28 4 2 C 11 L 20 U 29 5 3 D 12 M 21 V 30 6 4 E 13 N 22 W 31 7 5 F 14 O 23 X 6 G 15 P 24 Y pad 7 H 16 Q 25 Z 8 I 17 R 26 2 Special processing is performed if fewer than 40 bits are available at the end of the data being encoded A full encoding quantum is always completed at the end of a body When fewer than 40 input bits are available in an input group bits with value zero are added on the right to form an integral number of 5 bit groups Padding at the end of the data is performed using the character Since all Josefsson Expires November 12 2006 Page 10 Internet Draft Base N encodings May 2006 base 32 input is an integral number of octets only the following cases can arise 1 the final quantum of encoding input is an integral multiple of 40 bits here the final unit of encoded output will be an integral multiple of 8 characters with no padding 2 the final quantum of encoding input is exactly 8 bits here the final unit of encoded output will be two characters followed by six padding characters 3 the final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be four characters followed by four padding characters 4 the final quantum of encoding input is exactly 24 bits here the final unit of encoded output will be five characters followed by three padding characters or 5 the final quantum of encoding input is exactly 32 bits here the final unit of encoded output will be seven characters followed by one padding character 7 Base 32 Encoding With Extended Hex Alphabet The following description of base 32 is derived from 7 This encoding may be referred to as base32hex This encoding should not be regarded as the same as the base32 encoding and should not be referred to as only base32 This encoding is used by e g NSEC3 10 One property with this alphabet that the base64 and base32 alphabet lack is that encoded data maintain its sort order when the encoded data is compared bit wise This encoding is identical to the previous one except for the alphabet The new alphabet is found in table 4 Josefsson Expires November 12 2006 Page 11 Internet Draft Base N encodings May 2006 Table 4 The Extended Hex Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 0 9 9 18 I 27 R 1 1 10 A 19 J 28 S 2 2 11 B 20 K 29 T 3 3 12 C 21 L 30 U 4 4 13 D 22 M 31 V 5 5 14 E 23 N 6 6 15 F 24 O pad 7 7 16 G 25 P 8 8 17 H 26 Q Josefsson Expires November 12 2006 Page 12 Internet Draft Base N encodings May 2006 8 Base 16 Encoding The following description is original but analogous to previous descriptions Essentially Base 16 encoding is the standard case insensitive hex encoding and may be referred to as base16 or hex A 16 character subset of US ASCII is used enabling 4 bits to be represented per printable character The encoding process represents 8 bit groups octets of input bits as output strings of 2 encoded characters Proceeding from left to right a 8 bit input is taken from the input data These 8 bits are then treated as 2 concatenated 4 bit groups each of which is translated into a single character in the base 16 alphabet Each 4 bit group is used as an index into an array of 16 printable characters The character referenced by the index is placed in the output string Table 5 The Base 16 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 0 4 4 8 8 12 C 1 1 5 5 9 9 13 D 2 2 6 6 10 A 14 E 3 3 7 7 11 B 15 F Unlike base 32 and base 64 no special padding is necessary since a full code word is always available Josefsson Expires November 12 2006 Page 13 Internet Draft Base N encodings May 2006 9 Illustrations And Examples To translate between binary and a base encoding the input is stored in a structure and the output is extracted The case for base 64 is displayed in the following figure borrowed from 5 first octet second octet third octet 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 1 index 2 index 3 index 4 index The case for base 32 is shown in the following figure borrowed from 7 Each successive character in a base 32 value represents 5 successive bits of the underlying octet sequence Thus each group of 8 characters represents a sequence of 5 octets 40 bits 1 2 3 01234567 89012345 67890123 45678901 23456789 8th character 7th character 6th character 5th character 4th character 3rd character 2nd character 1st character The following example of Base64 data is from 5 with corrections Josefsson Expires November 12 2006 Page 14 Internet Draft Base N encodings May 2006 Input data 0x14fb9c03d97e Hex 1 4 f b 9 c 0 3 d 9 7 e 8 bit 00010100 11111011 10011100 00000011 11011001 01111110 6 bit 000101 001111 101110 011100 000000 111101 100101 111110 Decimal 5 15 46 28 0 61 37 62 Output F P u c A 9 l Input data 0x14fb9c03d9 Hex 1 4 f b 9 c 0 3 d 9 8 bit 00010100 11111011 10011100 00000011 11011001 pad with 00 6 bit 000101 001111 101110 011100 000000 111101 100100 Decimal 5 15 46 28 0 61 36 pad with Output F P u c A 9 k Input data 0x14fb9c03 Hex 1 4 f b 9 c 0 3 8 bit 00010100 11111011 10011100 00000011 pad with 0000 6 bit 000101 001111 101110 011100 000000 110000 Decimal 5 15 46 28 0 48 pad with Output F P u c A w 10 Test Vectors BASE64 BASE64 f Zg BASE64 fo Zm8 BASE64 foo Zm9v BASE64 foob Zm9vYg BASE64 fooba Zm9vYmE BASE64 foobar Zm9vYmFy BASE32 BASE32 f MY BASE32 fo MZXQ Josefsson Expires November 12 2006 Page 15 Internet Draft Base N encodings May 2006 BASE32 foo MZXW6 BASE32 foob MZXW6YQ BASE32 fooba MZXW6YTB BASE32 foobar MZXW6YTBOI BASE32 HEX BASE32 HEX f CO BASE32 HEX fo CPNG BASE32 HEX foo CPNMU BASE32 HEX foob CPNMUOG BASE32 HEX fooba CPNMUOJ1 BASE32 HEX foobar CPNMUOJ1E8 BASE16 BASE16 f 66 BASE16 fo 666F BASE16 foo 666F6F BASE16 foob 666F6F62 BASE16 fooba 666F6F6261 BASE16 foobar 666F6F626172 11 ISO C99 Implementation Of Base64 Below is an ISO C99 implementation of Base64 encoding and decoding The code assume that the US ASCII characters are encoding inside char with values below 255 which holds for all POSIX platforms but should otherwise be portable This code is not intended as a normative specification of base64 11 1 Prototypes base64 h base64 h Encode binary data using printable characters Josefsson Expires November 12 2006 Page 16 Internet Draft Base N encodings May 2006 Copyright C 2004 2005 2006 Free Software Foundation Inc Written by Simon Josefsson This program is free software you can redistribute it and or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation either version 2 1 or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU Lesser General Public License for more details You can retrieve a copy of the GNU Lesser General Public License from http www gnu org licenses lgpl txt or by writing to the Free Software Foundation Inc 51 Franklin Street Fifth Floor Boston MA 02110 1301 USA ifndef BASE64 H define BASE64 H Get size t include Get bool include This uses that the expression n k 1 k means the smallest integer n k i e the ceiling of n k define BASE64 LENGTH inlen inlen 2 3 4 extern bool isbase64 char ch extern void base64 encode const char restrict in size t inlen char restrict out size t outlen extern size t base64 encode alloc const char in size t inlen char out extern bool base64 decode const char restrict in size t inlen char restrict out size t outlen Josefsson Expires November 12 2006 Page 17 Internet Draft Base N encodings May 2006 extern bool base64 decode alloc const char in size t inlen char out size t outlen endif BASE64 H 11 2 Implementation base64 c base64 c Encode binary data using printable characters Copyright C 1999 2000 2001 2004 2005 2006 Free Software Foundation Inc This program is free software you can redistribute it and or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation either version 2 1 or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU Lesser General Public License for more details You can retrieve a copy of the GNU Lesser General Public License from http www gnu org licenses lgpl txt or by writing to the Free Software Foundation Inc 51 Franklin Street Fifth Floor Boston MA 02110 1301 USA Written by Simon Josefsson Partially adapted from GNU MailUtils mailbox filter trans c as of 2004 11 28 Improved by review from Paul Eggert Bruno Haible and Stepan Kasal Be careful with error checking Here is how you would typically use these functions bool ok base64 decode alloc in inlen out outlen if ok FAIL input was not valid base64 if out NULL FAIL memory allocation error OK data in OUT OUTLEN size t outlen base64 encode alloc in inlen out if out NULL outlen 0 inlen 0 FAIL input too long Josefsson Expires November 12 2006 Page 18 Internet Draft Base N encodings May 2006 if out NULL FAIL memory allocation error OK data in OUT OUTLEN Get prototype include base64 h Get malloc include Get UCHAR MAX include C89 compliant way to cast char to unsigned char static inline unsigned char to uchar char ch return ch Base64 encode IN array of size INLEN into OUT array of size OUTLEN If OUTLEN is less than BASE64 LENGTH INLEN write as many bytes as possible If OUTLEN is larger than BASE64 LENGTH INLEN also zero terminate the output buffer void base64 encode const char restrict in size t inlen char restrict out size t outlen static const char b64str 64 ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz0123456789 while inlen outlen out b64str to uchar in 0 2 if outlen break out b64str to uchar in 0 4 0 0x3f if outlen break out inlen b64str to uchar in 1 6 0 0x3f if outlen break out inlen b64str to uchar in 2 0x3f if outlen break if inlen inlen if inlen in 3 if outlen out 0 Allocate a buffer and store zero terminated base64 encoded data from array IN of size INLEN returning BASE64 LENGTH INLEN i e the length of the encoded data excluding the terminating zero On return the OUT variable will hold a pointer to newly allocated memory that must be deallocated by the caller If output string length would overflow 0 is returned and OUT is set to NULL If memory allocation fail OUT is set to NULL and the return value indicate length of the requested memory block i e BASE64 LENGTH inlen 1 size t base64 encode alloc const char in size t inlen char out size t outlen 1 BASE64 LENGTH inlen Check for overflow in outlen computation If there is no overflow outlen inlen If the operation inlen 2 overflows then it yields at most 1 so outlen is 0 If the multiplication overflows we lose at least half of the correct value so the result is 4 if inlen outlen out NULL Josefsson Expires November 12 2006 Page 20 Internet Draft Base N encodings May 2006 return 0 out malloc outlen if out base64 encode in inlen out outlen return outlen 1 With this approach this file works independent of the charset used think EBCDIC However it does assume that the characters in the Base64 alphabet A Za z0 9 are encoded in 0 255 POSIX 1003 1 2001 require that char and unsigned char are 8 bit quantities though taking care of that problem But this may be a potential problem on non POSIX C99 platforms define B64 x x A 0 x B 1 x C 2 x D 3 x E 4 x F 5 x G 6 x H 7 x I 8 x J 9 x K 10 x L 11 x M 12 x N 13 x O 14 x P 15 x Q 16 x R 17 x S 18 x T 19 x U 20 x V 21 x W 22 x X 23 x Y 24 x Z 25 x a 26 x b 27 x c 28 x d 29 Josefsson Expires November 12 2006 Page 21 Internet Draft Base N encodings May 2006 x e 30 x f 31 x g 32 x h 33 x i 34 x j 35 x k 36 x l 37 x m 38 x n 39 x o 40 x p 41 x q 42 x r 43 x s 44 x t 45 x u 46 x v 47 x w 48 x x 49 x y 50 x z 51 x 0 52 x 1 53 x 2 54 x 3 55 x 4 56 x 5 57 x 6 58 x 7 59 x 8 60 x 9 61 x 62 x 63 1 static const signed char b64 0x100 B64 0 B64 1 B64 2 B64 3 B64 4 B64 5 B64 6 B64 7 B64 8

    Original URL path: http://www.josefsson.org/base-encoding/draft-josefsson-rfc3548bis-04.txt (2016-04-30)
    Open archived version from archive


  • special meaning in some file system environments the encoding described in this section is recommended instead This encoding may be referred to as base64url This encoding should not be regarded as the same as the base64 encoding and should not be referred to as only base64 Unless made clear base64 refer to the base 64 in the previous section This encoding is technically identical to the previous one except for the 62 nd and 63 rd alphabet character as indicated in table 2 Table 2 The URL and Filename safe Base 64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 minus 12 M 29 d 46 u 63 13 N 30 e 47 v underline 14 O 31 f 48 w 15 P 32 g 49 x 16 Q 33 h 50 y pad 6 Base 32 Encoding The following description of base 32 is derived from 10 with corrections This encoding may be referred to as base32 The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but need not be humanly readable Josefsson Expires November 4 2006 Page 8 Internet Draft Base N encodings May 2006 A 33 character subset of US ASCII is used enabling 5 bits to be represented per printable character The extra 33rd character is used to signify a special processing function The encoding process represents 40 bit groups of input bits as output strings of 8 encoded characters Proceeding from left to right a 40 bit input group is formed by concatenating 5 8bit input groups These 40 bits are then treated as 8 concatenated 5 bit groups each of which is translated into a single character in the base 32 alphabet When encoding a bit stream via the base 32 encoding the bit stream must be presumed to be ordered with the most significant bit first That is the first bit in the stream will be the high order bit in the first 8bit byte and the eighth bit will be the low order bit in the first 8bit byte and so on Each 5 bit group is used as an index into an array of 32 printable characters The character referenced by the index is placed in the output string These characters identified in Table 3 below are selected from US ASCII digits and uppercase letters Table 3 The Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 9 J 18 S 27 3 1 B 10 K 19 T 28 4 2 C 11 L 20 U 29 5 3 D 12 M 21 V 30 6 4 E 13 N 22 W 31 7 5 F 14 O 23 X 6 G 15 P 24 Y pad 7 H 16 Q 25 Z 8 I 17 R 26 2 Special processing is performed if fewer than 40 bits are available at the end of the data being encoded A full encoding quantum is always completed at the end of a body When fewer than 40 input bits are available in an input group bits with value zero are added on the right to form an integral number of 5 bit groups Padding at the end of the data is performed using the character Since all base 32 input is an integral number of octets only the following cases can arise 1 the final quantum of encoding input is an integral multiple of 40 bits here the final unit of encoded output will be an integral multiple of 8 characters with no padding 2 the final quantum of encoding input is exactly 8 bits here the final unit of encoded output will be two characters followed by six Josefsson Expires November 4 2006 Page 9 Internet Draft Base N encodings May 2006 padding characters 3 the final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be four characters followed by four padding characters 4 the final quantum of encoding input is exactly 24 bits here the final unit of encoded output will be five characters followed by three padding characters or 5 the final quantum of encoding input is exactly 32 bits here the final unit of encoded output will be seven characters followed by one padding character 7 Base 32 Encoding With Extended Hex Alphabet The following description of base 32 is derived from 7 This encoding may be referred to as base32hex This encoding should not be regarded as the same as the base32 encoding and should not be referred to as only base32 This encoding is used by e g NSEC3 9 One property with this alphabet that the base64 and base32 alphabet lack is that encoded data maintain its sort order when the encoded data is compared bit wise This encoding is identical to the previous one except for the alphabet The new alphabet is found in table 4 Table 4 The Extended Hex Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 0 9 9 18 I 27 R 1 1 10 A 19 J 28 S 2 2 11 B 20 K 29 T 3 3 12 C 21 L 30 U 4 4 13 D 22 M 31 V 5 5 14 E 23 N 6 6 15 F 24 O pad 7 7 16 G 25 P 8 8 17 H 26 Q Josefsson Expires November 4 2006 Page 10 Internet Draft Base N encodings May 2006 8 Base 16 Encoding The following description is original but analogous to previous descriptions Essentially Base 16 encoding is the standard case insensitive hex encoding and may be referred to as base16 or hex A 16 character subset of US ASCII is used enabling 4 bits to be represented per printable character The encoding process represents 8 bit groups octets of input bits as output strings of 2 encoded characters Proceeding from left to right a 8 bit input is taken from the input data These 8 bits are then treated as 2 concatenated 4 bit groups each of which is translated into a single character in the base 16 alphabet Each 4 bit group is used as an index into an array of 16 printable characters The character referenced by the index is placed in the output string Table 5 The Base 16 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 0 4 4 8 8 12 C 1 1 5 5 9 9 13 D 2 2 6 6 10 A 14 E 3 3 7 7 11 B 15 F Unlike base 32 and base 64 no special padding is necessary since a full code word is always available Josefsson Expires November 4 2006 Page 11 Internet Draft Base N encodings May 2006 9 Illustrations And Examples To translate between binary and a base encoding the input is stored in a structure and the output is extracted The case for base 64 is displayed in the following figure borrowed from 5 first octet second octet third octet 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 1 index 2 index 3 index 4 index The case for base 32 is shown in the following figure borrowed from 7 Each successive character in a base 32 value represents 5 successive bits of the underlying octet sequence Thus each group of 8 characters represents a sequence of 5 octets 40 bits 1 2 3 01234567 89012345 67890123 45678901 23456789 8th character 7th character 6th character 5th character 4th character 3rd character 2nd character 1st character The following example of Base64 data is from 5 with corrections Josefsson Expires November 4 2006 Page 12 Internet Draft Base N encodings May 2006 Input data 0x14fb9c03d97e Hex 1 4 f b 9 c 0 3 d 9 7 e 8 bit 00010100 11111011 10011100 00000011 11011001 01111110 6 bit 000101 001111 101110 011100 000000 111101 100101 111110 Decimal 5 15 46 28 0 61 37 62 Output F P u c A 9 l Input data 0x14fb9c03d9 Hex 1 4 f b 9 c 0 3 d 9 8 bit 00010100 11111011 10011100 00000011 11011001 pad with 00 6 bit 000101 001111 101110 011100 000000 111101 100100 Decimal 5 15 46 28 0 61 36 pad with Output F P u c A 9 k Input data 0x14fb9c03 Hex 1 4 f b 9 c 0 3 8 bit 00010100 11111011 10011100 00000011 pad with 0000 6 bit 000101 001111 101110 011100 000000 110000 Decimal 5 15 46 28 0 48 pad with Output F P u c A w 10 Test Vectors BASE64 BASE64 f Zg BASE64 fo Zm8 BASE64 foo Zm9v BASE64 foob Zm9vYg BASE64 fooba Zm9vYmE BASE64 foobar Zm9vYmFy BASE32 BASE32 f MY BASE32 fo MZXQ Josefsson Expires November 4 2006 Page 13 Internet Draft Base N encodings May 2006 BASE32 foo MZXW6 BASE32 foob MZXW6YQ BASE32 fooba MZXW6YTB BASE32 foobar MZXW6YTBOI BASE32 HEX BASE32 HEX f CO BASE32 HEX fo CPNG BASE32 HEX foo CPNMU BASE32 HEX foob CPNMUOG BASE32 HEX fooba CPNMUOJ1 BASE32 HEX foobar CPNMUOJ1E8 BASE16 BASE16 f 66 BASE16 fo 666F BASE16 foo 666F6F BASE16 foob 666F6F62 BASE16 fooba 666F6F6261 BASE16 foobar 666F6F626172 11 ISO C99 Implementation Of Base64 Below is an ISO C99 implementation of Base64 encoding and decoding The code assume that the US ASCII characters are encoding inside char with values below 255 which holds for all POSIX platforms but should otherwise be portable This code is not intended as a normative specification of base64 11 1 Prototypes base64 h base64 h Encode binary data using printable characters Josefsson Expires November 4 2006 Page 14 Internet Draft Base N encodings May 2006 Copyright C 2004 2005 2006 Free Software Foundation Inc Written by Simon Josefsson This program is free software you can redistribute it and or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation either version 2 1 or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU Lesser General Public License for more details You should have received a copy of the GNU Lesser General Public License along with this program if not write to the Free Software Foundation Inc 51 Franklin Street Fifth Floor Boston MA 02110 1301 USA ifndef BASE64 H define BASE64 H Get size t include Get bool include This uses that the expression n k 1 k means the smallest integer n k i e the ceiling of n k define BASE64 LENGTH inlen inlen 2 3 4 extern bool isbase64 char ch extern void base64 encode const char restrict in size t inlen char restrict out size t outlen extern size t base64 encode alloc const char in size t inlen char out extern bool base64 decode const char restrict in size t inlen char restrict out size t outlen Josefsson Expires November 4 2006 Page 15 Internet Draft Base N encodings May 2006 extern bool base64 decode alloc const char in size t inlen char out size t outlen endif BASE64 H 11 2 Implementation base64 c base64 c Encode binary data using printable characters Copyright C 1999 2000 2001 2004 2005 2006 Free Software Foundation Inc This program is free software you can redistribute it and or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation either version 2 1 or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU Lesser General Public License for more details You should have received a copy of the GNU Lesser General Public License along with this program if not write to the Free Software Foundation Inc 51 Franklin Street Fifth Floor Boston MA 02110 1301 USA Written by Simon Josefsson Partially adapted from GNU MailUtils mailbox filter trans c as of 2004 11 28 Improved by review from Paul Eggert Bruno Haible and Stepan Kasal Be careful with error checking Here is how you would typically use these functions bool ok base64 decode alloc in inlen out outlen if ok FAIL input was not valid base64 if out NULL FAIL memory allocation error OK data in OUT OUTLEN size t outlen base64 encode alloc in inlen out if out NULL outlen 0 inlen 0 FAIL input too long Josefsson Expires November 4 2006 Page 16 Internet Draft Base N encodings May 2006 if out NULL FAIL memory allocation error OK data in OUT OUTLEN Get prototype include base64 h Get malloc include Get UCHAR MAX include C89 compliant way to cast char to unsigned char static inline unsigned char to uchar char ch return ch Base64 encode IN array of size INLEN into OUT array of size OUTLEN If OUTLEN is less than BASE64 LENGTH INLEN write as many bytes as possible If OUTLEN is larger than BASE64 LENGTH INLEN also zero terminate the output buffer void base64 encode const char restrict in size t inlen char restrict out size t outlen static const char b64str 64 ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz0123456789 while inlen outlen out b64str to uchar in 0 2 if outlen break out b64str to uchar in 0 4 0 0x3f if outlen break out inlen b64str to uchar in 1 6 0 0x3f if outlen break out inlen b64str to uchar in 2 0x3f if outlen break if inlen inlen if inlen in 3 if outlen out 0 Allocate a buffer and store zero terminated base64 encoded data from array IN of size INLEN returning BASE64 LENGTH INLEN i e the length of the encoded data excluding the terminating zero On return the OUT variable will hold a pointer to newly allocated memory that must be deallocated by the caller If output string length would overflow 0 is returned and OUT is set to NULL If memory allocation fail OUT is set to NULL and the return value indicate length of the requested memory block i e BASE64 LENGTH inlen 1 size t base64 encode alloc const char in size t inlen char out size t outlen 1 BASE64 LENGTH inlen Check for overflow in outlen computation If there is no overflow outlen inlen If the operation inlen 2 overflows then it yields at most 1 so outlen is 0 If the multiplication overflows we lose at least half of the correct value so the result is 4 if inlen outlen out NULL Josefsson Expires November 4 2006 Page 18 Internet Draft Base N encodings May 2006 return 0 out malloc outlen if out base64 encode in inlen out outlen return outlen 1 With this approach this file works independent of the charset used think EBCDIC However it does assume that the characters in the Base64 alphabet A Za z0 9 are encoded in 0 255 POSIX 1003 1 2001 require that char and unsigned char are 8 bit quantities though taking care of that problem But this may be a potential problem on non POSIX C99 platforms define B64 x x A 0 x B 1 x C 2 x D 3 x E 4 x F 5 x G 6 x H 7 x I 8 x J 9 x K 10 x L 11 x M 12 x N 13 x O 14 x P 15 x Q 16 x R 17 x S 18 x T 19 x U 20 x V 21 x W 22 x X 23 x Y 24 x Z 25 x a 26 x b 27 x c 28 x d 29 Josefsson Expires November 4 2006 Page 19 Internet Draft Base N encodings May 2006 x e 30 x f 31 x g 32 x h 33 x i 34 x j 35 x k 36 x l 37 x m 38 x n 39 x o 40 x p 41 x q 42 x r 43 x s 44 x t 45 x u 46 x v 47 x w 48 x x 49 x y 50 x z 51 x 0 52 x 1 53 x 2 54 x 3 55 x 4 56 x 5 57 x 6 58 x 7 59 x 8 60 x 9 61 x 62 x 63 1 static const signed char b64 0x100 B64 0 B64 1 B64 2 B64 3 B64 4 B64 5 B64 6 B64 7 B64 8 B64 9 B64 10 B64 11 B64 12 B64 13 B64 14 B64 15 B64 16 B64 17 B64 18 B64 19 B64 20 B64 21 B64 22 B64 23 B64 24 B64 25 B64 26 B64 27 B64 28 B64 29 B64 30 B64 31 B64 32 B64 33 B64 34 B64 35 B64 36 B64 37 B64 38 B64 39 B64 40 B64 41 B64 42 B64 43 Josefsson Expires November 4 2006 Page 20 Internet

    Original URL path: http://www.josefsson.org/base-encoding/draft-josefsson-rfc3548bis-03.txt (2016-04-30)
    Open archived version from archive


  • environments the encoding described in this section is recommended instead This encoding should not be regarded as the same as the base64 encoding and should not be referred to as only base64 Unless made clear base64 refer to the base 64 in the previous section This encoding is technically identical to the previous one except for the 62 nd and 63 rd alphabet character as indicated in table 2 Table 2 The URL and Filename safe Base 64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 minus 12 M 29 d 46 u 63 13 N 30 e 47 v understrike 14 O 31 f 48 w 15 P 32 g 49 x 16 Q 33 h 50 y pad 6 Base 32 Encoding The following description of base 32 is due to 9 with corrections The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but need not be humanly readable Josefsson Expires September 25 2006 Page 7 Internet Draft Base N encodings March 2006 A 33 character subset of US ASCII is used enabling 5 bits to be represented per printable character The extra 33rd character is used to signify a special processing function The encoding process represents 40 bit groups of input bits as output strings of 8 encoded characters Proceeding from left to right a 40 bit input group is formed by concatenating 5 8bit input groups These 40 bits are then treated as 8 concatenated 5 bit groups each of which is translated into a single digit in the base 32 alphabet When encoding a bit stream via the base 32 encoding the bit stream must be presumed to be ordered with the most significant bit first That is the first bit in the stream will be the high order bit in the first 8bit byte and the eighth bit will be the low order bit in the first 8bit byte and so on Each 5 bit group is used as an index into an array of 32 printable characters The character referenced by the index is placed in the output string These characters identified in Table 3 below are selected from US ASCII digits and uppercase letters Table 3 The Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 9 J 18 S 27 3 1 B 10 K 19 T 28 4 2 C 11 L 20 U 29 5 3 D 12 M 21 V 30 6 4 E 13 N 22 W 31 7 5 F 14 O 23 X 6 G 15 P 24 Y pad 7 H 16 Q 25 Z 8 I 17 R 26 2 Special processing is performed if fewer than 40 bits are available at the end of the data being encoded A full encoding quantum is always completed at the end of a body When fewer than 40 input bits are available in an input group bits with value zero are added on the right to form an integral number of 5 bit groups Padding at the end of the data is performed using the character Since all base 32 input is an integral number of octets only the following cases can arise 1 the final quantum of encoding input is an integral multiple of 40 bits here the final unit of encoded output will be an integral multiple of 8 characters with no padding 2 the final quantum of encoding input is exactly 8 bits here the final unit of encoded output will be two characters followed by six Josefsson Expires September 25 2006 Page 8 Internet Draft Base N encodings March 2006 padding characters 3 the final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be four characters followed by four padding characters 4 the final quantum of encoding input is exactly 24 bits here the final unit of encoded output will be five characters followed by three padding characters or 5 the final quantum of encoding input is exactly 32 bits here the final unit of encoded output will be seven characters followed by one padding character 7 Base 32 Encoding with Extended Hex Alphabet The following description of base 32 is due to 7 This encoding should not be regarded as the same as the base32 encoding and should not be referred to as only base32 One property with this alphabet that the base64 and base32 alphabet lack is that encoded data maintain its sort order when the encoded data is compared bit wise This encoding is identical to the previous one except for the alphabet The new alphabet is found in table 4 Table 4 The Extended Hex Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 0 9 9 18 I 27 R 1 1 10 A 19 J 28 S 2 2 11 B 20 K 29 T 3 3 12 C 21 L 30 U 4 4 13 D 22 M 31 V 5 5 14 E 23 N 6 6 15 F 24 O pad 7 7 16 G 25 P 8 8 17 H 26 Q Josefsson Expires September 25 2006 Page 9 Internet Draft Base N encodings March 2006 8 Base 16 Encoding The following description is original but analogous to previous descriptions Essentially Base 16 encoding is the standard case insensitive hex encoding and may be referred to as base16 or hex A 16 character subset of US ASCII is used enabling 4 bits to be represented per printable character The encoding process represents 8 bit groups octets of input bits as output strings of 2 encoded characters Proceeding from left to right a 8 bit input is taken from the input data These 8 bits are then treated as 2 concatenated 4 bit groups each of which is translated into a single digit in the base 16 alphabet Each 4 bit group is used as an index into an array of 16 printable characters The character referenced by the index is placed in the output string Table 5 The Base 16 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 0 4 4 8 8 12 C 1 1 5 5 9 9 13 D 2 2 6 6 10 A 14 E 3 3 7 7 11 B 15 F Unlike base 32 and base 64 no special padding is necessary since a full code word is always available Josefsson Expires September 25 2006 Page 10 Internet Draft Base N encodings March 2006 9 Illustrations and examples To translate between binary and a base encoding the input is stored in a structure and the output is extracted The case for base 64 is displayed in the following figure borrowed from 5 first octet second octet third octet 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 1 index 2 index 3 index 4 index The case for base 32 is shown in the following figure borrowed from 7 Each successive character in a base 32 value represents 5 successive bits of the underlying octet sequence Thus each group of 8 characters represents a sequence of 5 octets 40 bits 1 2 3 01234567 89012345 67890123 45678901 23456789 8th character 7th character 6th character 5th character 4th character 3rd character 2nd character 1st character The following example of Base64 data is from 5 with corrections Josefsson Expires September 25 2006 Page 11 Internet Draft Base N encodings March 2006 Input data 0x14fb9c03d97e Hex 1 4 f b 9 c 0 3 d 9 7 e 8 bit 00010100 11111011 10011100 00000011 11011001 01111110 6 bit 000101 001111 101110 011100 000000 111101 100101 111110 Decimal 5 15 46 28 0 61 37 62 Output F P u c A 9 l Input data 0x14fb9c03d9 Hex 1 4 f b 9 c 0 3 d 9 8 bit 00010100 11111011 10011100 00000011 11011001 pad with 00 6 bit 000101 001111 101110 011100 000000 111101 100100 Decimal 5 15 46 28 0 61 36 pad with Output F P u c A 9 k Input data 0x14fb9c03 Hex 1 4 f b 9 c 0 3 8 bit 00010100 11111011 10011100 00000011 pad with 0000 6 bit 000101 001111 101110 011100 000000 110000 Decimal 5 15 46 28 0 48 pad with Output F P u c A w 10 Test vectors BASE64 BASE64 f Zg BASE64 fo Zm8 BASE64 foo Zm9v BASE64 foob Zm9vYg BASE64 fooba Zm9vYmE BASE64 foobar Zm9vYmFy BASE32 BASE32 f MY BASE32 fo MZXQ Josefsson Expires September 25 2006 Page 12 Internet Draft Base N encodings March 2006 BASE32 foo MZXW6 BASE32 foob MZXW6YQ BASE32 fooba MZXW6YTB BASE32 foobar MZXW6YTBOI BASE32 HEX BASE32 HEX f CO BASE32 HEX fo CPNG BASE32 HEX foo CPNMU BASE32 HEX foob CPNMUOG BASE32 HEX fooba CPNMUOJ1 BASE32 HEX foobar CPNMUOJ1E8 BASE16 BASE16 f GG BASE16 fo GGGP BASE16 foo GGGPGP BASE16 foob GGGPGPGC BASE16 fooba GGGPGPGCGB BASE16 foobar GGGPGPGCGBHC 11 ISO C99 Implementation of Base64 Below is an ISO C99 implementation of Base64 encoding and decoding The code assume that the US ASCII characters are encoding inside char with values below 255 which holds for all POSIX platforms but should otherwise be portable This code is not intended as a normative specification of base64 11 1 Prototypes base64 h base64 h Encode binary data using printable characters Josefsson Expires September 25 2006 Page 13 Internet Draft Base N encodings March 2006 Copyright C 2004 2005 2006 Free Software Foundation Inc Written by Simon Josefsson This program is free software you can redistribute it and or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation either version 2 1 or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU Lesser General Public License for more details You should have received a copy of the GNU Lesser General Public License along with this program if not write to the Free Software Foundation Inc 51 Franklin Street Fifth Floor Boston MA 02110 1301 USA ifndef BASE64 H define BASE64 H Get size t include Get bool include This uses that the expression n k 1 k means the smallest integer n k i e the ceiling of n k define BASE64 LENGTH inlen inlen 2 3 4 extern bool isbase64 char ch extern void base64 encode const char restrict in size t inlen char restrict out size t outlen extern size t base64 encode alloc const char in size t inlen char out extern bool base64 decode const char restrict in size t inlen char restrict out size t outlen Josefsson Expires September 25 2006 Page 14 Internet Draft Base N encodings March 2006 extern bool base64 decode alloc const char in size t inlen char out size t outlen endif BASE64 H 11 2 Implementation base64 c base64 c Encode binary data using printable characters Copyright C 1999 2000 2001 2004 2005 2006 Free Software Foundation Inc This program is free software you can redistribute it and or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation either version 2 1 or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU Lesser General Public License for more details You should have received a copy of the GNU Lesser General Public License along with this program if not write to the Free Software Foundation Inc 51 Franklin Street Fifth Floor Boston MA 02110 1301 USA Written by Simon Josefsson Partially adapted from GNU MailUtils mailbox filter trans c as of 2004 11 28 Improved by review from Paul Eggert Bruno Haible and Stepan Kasal Be careful with error checking Here is how you would typically use these functions bool ok base64 decode alloc in inlen out outlen if ok FAIL input was not valid base64 if out NULL FAIL memory allocation error OK data in OUT OUTLEN size t outlen base64 encode alloc in inlen out if out NULL outlen 0 inlen 0 FAIL input too long Josefsson Expires September 25 2006 Page 15 Internet Draft Base N encodings March 2006 if out NULL FAIL memory allocation error OK data in OUT OUTLEN Get prototype include base64 h Get malloc include Get UCHAR MAX include C89 compliant way to cast char to unsigned char static inline unsigned char to uchar char ch return ch Base64 encode IN array of size INLEN into OUT array of size OUTLEN If OUTLEN is less than BASE64 LENGTH INLEN write as many bytes as possible If OUTLEN is larger than BASE64 LENGTH INLEN also zero terminate the output buffer void base64 encode const char restrict in size t inlen char restrict out size t outlen static const char b64str 64 ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz0123456789 while inlen outlen out b64str to uchar in 0 2 if outlen break out b64str to uchar in 0 4 0 0x3f if outlen break out inlen b64str to uchar in 1 6 0 0x3f if outlen break out inlen b64str to uchar in 2 0x3f if outlen break if inlen inlen if inlen in 3 if outlen out 0 Allocate a buffer and store zero terminated base64 encoded data from array IN of size INLEN returning BASE64 LENGTH INLEN i e the length of the encoded data excluding the terminating zero On return the OUT variable will hold a pointer to newly allocated memory that must be deallocated by the caller If output string length would overflow 0 is returned and OUT is set to NULL If memory allocation fail OUT is set to NULL and the return value indicate length of the requested memory block i e BASE64 LENGTH inlen 1 size t base64 encode alloc const char in size t inlen char out size t outlen 1 BASE64 LENGTH inlen Check for overflow in outlen computation If there is no overflow outlen inlen If the operation inlen 2 overflows then it yields at most 1 so outlen is 0 If the multiplication overflows we lose at least half of the correct value so the result is 4 if inlen outlen out NULL Josefsson Expires September 25 2006 Page 17 Internet Draft Base N encodings March 2006 return 0 out malloc outlen if out base64 encode in inlen out outlen return outlen 1 With this approach this file works independent of the charset used think EBCDIC However it does assume that the characters in the Base64 alphabet A Za z0 9 are encoded in 0 255 POSIX 1003 1 2001 require that char and unsigned char are 8 bit quantities though taking care of that problem But this may be a potential problem on non POSIX C99 platforms define B64 x x A 0 x B 1 x C 2 x D 3 x E 4 x F 5 x G 6 x H 7 x I 8 x J 9 x K 10 x L 11 x M 12 x N 13 x O 14 x P 15 x Q 16 x R 17 x S 18 x T 19 x U 20 x V 21 x W 22 x X 23 x Y 24 x Z 25 x a 26 x b 27 x c 28 x d 29 Josefsson Expires September 25 2006 Page 18 Internet Draft Base N encodings March 2006 x e 30 x f 31 x g 32 x h 33 x i 34 x j 35 x k 36 x l 37 x m 38 x n 39 x o 40 x p 41 x q 42 x r 43 x s 44 x t 45 x u 46 x v 47 x w 48 x x 49 x y 50 x z 51 x 0 52 x 1 53 x 2 54 x 3 55 x 4 56 x 5 57 x 6 58 x 7 59 x 8 60 x 9 61 x 62 x 63 1 static const signed char b64 0x100 B64 0 B64 1 B64 2 B64 3 B64 4 B64 5 B64 6 B64 7 B64 8 B64 9 B64 10 B64 11 B64 12 B64 13 B64 14 B64 15 B64 16 B64 17 B64 18 B64 19 B64 20 B64 21 B64 22 B64 23 B64 24 B64 25 B64 26 B64 27 B64 28 B64 29 B64 30 B64 31 B64 32 B64 33 B64 34 B64 35 B64 36 B64 37 B64 38 B64 39 B64 40 B64 41 B64 42 B64 43 Josefsson Expires September 25 2006 Page 19 Internet Draft Base N encodings March 2006 B64 44 B64 45 B64 46 B64 47 B64 48 B64 49 B64 50 B64 51 B64 52 B64 53 B64 54 B64 55 B64 56 B64 57 B64 58 B64 59 B64

    Original URL path: http://www.josefsson.org/base-encoding/draft-josefsson-rfc3548bis-02.txt (2016-04-30)
    Open archived version from archive


  • in some file system environments the encoding described in this section is recommended instead This encoding should not be regarded as the same as the base64 encoding and should not be referred to as only base64 Unless made clear base64 refer to the base 64 in the previous section This encoding is technically identical to the previous one except for the 62 nd and 63 rd alphabet character as indicated in table 2 Table 2 The URL and Filename safe Base 64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 minus 12 M 29 d 46 u 63 13 N 30 e 47 v understrike 14 O 31 f 48 w 15 P 32 g 49 x 16 Q 33 h 50 y pad 6 Base 32 Encoding The following description of base 32 is due to 9 with corrections The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but need not be humanly readable Josefsson Expires September 23 2006 Page 7 Internet Draft Base N encodings March 2006 A 33 character subset of US ASCII is used enabling 5 bits to be represented per printable character The extra 33rd character is used to signify a special processing function The encoding process represents 40 bit groups of input bits as output strings of 8 encoded characters Proceeding from left to right a 40 bit input group is formed by concatenating 5 8bit input groups These 40 bits are then treated as 8 concatenated 5 bit groups each of which is translated into a single digit in the base 32 alphabet When encoding a bit stream via the base 32 encoding the bit stream must be presumed to be ordered with the most significant bit first That is the first bit in the stream will be the high order bit in the first 8bit byte and the eighth bit will be the low order bit in the first 8bit byte and so on Each 5 bit group is used as an index into an array of 32 printable characters The character referenced by the index is placed in the output string These characters identified in Table 3 below are selected from US ASCII digits and uppercase letters Table 3 The Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 9 J 18 S 27 3 1 B 10 K 19 T 28 4 2 C 11 L 20 U 29 5 3 D 12 M 21 V 30 6 4 E 13 N 22 W 31 7 5 F 14 O 23 X 6 G 15 P 24 Y pad 7 H 16 Q 25 Z 8 I 17 R 26 2 Special processing is performed if fewer than 40 bits are available at the end of the data being encoded A full encoding quantum is always completed at the end of a body When fewer than 40 input bits are available in an input group bits with value zero are added on the right to form an integral number of 5 bit groups Padding at the end of the data is performed using the character Since all base 32 input is an integral number of octets only the following cases can arise 1 the final quantum of encoding input is an integral multiple of 40 bits here the final unit of encoded output will be an integral multiple of 8 characters with no padding 2 the final quantum of encoding input is exactly 8 bits here the final unit of encoded output will be two characters followed by six Josefsson Expires September 23 2006 Page 8 Internet Draft Base N encodings March 2006 padding characters 3 the final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be four characters followed by four padding characters 4 the final quantum of encoding input is exactly 24 bits here the final unit of encoded output will be five characters followed by three padding characters or 5 the final quantum of encoding input is exactly 32 bits here the final unit of encoded output will be seven characters followed by one padding character 7 Base 32 Encoding with Extended Hex Alphabet The following description of base 32 is due to 7 This encoding should not be regarded as the same as the base32 encoding and should not be referred to as only base32 One property with this alphabet that the base64 and base32 alphabet lack is that encoded data maintain its sort order when the encoded data is compared bit wise This encoding is identical to the previous one except for the alphabet The new alphabet is found in table 4 Table 4 The Extended Hex Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 0 9 9 18 I 27 R 1 1 10 A 19 J 28 S 2 2 11 B 20 K 29 T 3 3 12 C 21 L 30 U 4 4 13 D 22 M 31 V 5 5 14 E 23 N 6 6 15 F 24 O pad 7 7 16 G 25 P 8 8 17 H 26 Q Josefsson Expires September 23 2006 Page 9 Internet Draft Base N encodings March 2006 8 Base 16 Encoding The following description is original but analogous to previous descriptions Essentially Base 16 encoding is the standard case insensitive hex encoding and may be referred to as base16 or hex A 16 character subset of US ASCII is used enabling 4 bits to be represented per printable character The encoding process represents 8 bit groups octets of input bits as output strings of 2 encoded characters Proceeding from left to right a 8 bit input is taken from the input data These 8 bits are then treated as 2 concatenated 4 bit groups each of which is translated into a single digit in the base 16 alphabet Each 4 bit group is used as an index into an array of 16 printable characters The character referenced by the index is placed in the output string Table 5 The Base 16 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 0 4 4 8 8 12 C 1 1 5 5 9 9 13 D 2 2 6 6 10 A 14 E 3 3 7 7 11 B 15 F Unlike base 32 and base 64 no special padding is necessary since a full code word is always available Josefsson Expires September 23 2006 Page 10 Internet Draft Base N encodings March 2006 9 Illustrations and examples To translate between binary and a base encoding the input is stored in a structure and the output is extracted The case for base 64 is displayed in the following figure borrowed from 5 first octet second octet third octet 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 1 index 2 index 3 index 4 index The case for base 32 is shown in the following figure borrowed from 7 Each successive character in a base 32 value represents 5 successive bits of the underlying octet sequence Thus each group of 8 characters represents a sequence of 5 octets 40 bits 1 2 3 01234567 89012345 67890123 45678901 23456789 8th character 7th character 6th character 5th character 4th character 3rd character 2nd character 1st character The following example of Base64 data is from 5 with corrections Josefsson Expires September 23 2006 Page 11 Internet Draft Base N encodings March 2006 Input data 0x14fb9c03d97e Hex 1 4 f b 9 c 0 3 d 9 7 e 8 bit 00010100 11111011 10011100 00000011 11011001 01111110 6 bit 000101 001111 101110 011100 000000 111101 100101 111110 Decimal 5 15 46 28 0 61 37 62 Output F P u c A 9 l Input data 0x14fb9c03d9 Hex 1 4 f b 9 c 0 3 d 9 8 bit 00010100 11111011 10011100 00000011 11011001 pad with 00 6 bit 000101 001111 101110 011100 000000 111101 100100 Decimal 5 15 46 28 0 61 36 pad with Output F P u c A 9 k Input data 0x14fb9c03 Hex 1 4 f b 9 c 0 3 8 bit 00010100 11111011 10011100 00000011 pad with 0000 6 bit 000101 001111 101110 011100 000000 110000 Decimal 5 15 46 28 0 48 pad with Output F P u c A w 10 Test vectors BASE64 BASE64 f Zg BASE64 fo Zm8 BASE64 foo Zm9v BASE64 foob Zm9vYg BASE64 fooba Zm9vYmE BASE64 foobar Zm9vYmFy BASE32 BASE32 f MY BASE32 fo MZXQ Josefsson Expires September 23 2006 Page 12 Internet Draft Base N encodings March 2006 BASE32 foo MZXW6 BASE32 foob MZXW6YQ BASE32 fooba MZXW6YTB BASE32 foobar MZXW6YTBOI BASE32 HEX BASE32 HEX f CO BASE32 HEX fo CPNG BASE32 HEX foo CPNMU BASE32 HEX foob CPNMUOG BASE32 HEX fooba CPNMUOJ1 BASE32 HEX foobar CPNMUOJ1E8 BASE16 BASE16 f GG BASE16 fo GGGP BASE16 foo GGGPGP BASE16 foob GGGPGPGC BASE16 fooba GGGPGPGCGB BASE16 foobar GGGPGPGCGBHC 11 ISO C99 Implementation of Base64 Below is an ISO C99 implementation of Base64 encoding and decoding The code assume that the US ASCII characters are encoding inside char with values below 255 which holds for all POSIX platforms but should otherwise be portable This code is not intended as a normative specification of base64 11 1 Prototypes base64 h base64 h Encode binary data using printable characters Josefsson Expires September 23 2006 Page 13 Internet Draft Base N encodings March 2006 This program is free software you can redistribute it and or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation either version 2 1 or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU Lesser General Public License for more details You should have received a copy of the GNU Lesser General Public License along with this program if not write to the Free Software Foundation Inc 51 Franklin Street Fifth Floor Boston MA 02110 1301 USA Written by Simon Josefsson ifndef BASE64 H define BASE64 H Get size t include Get bool include This uses that the expression n k 1 k means the smallest integer n k i e the ceiling of n k define BASE64 LENGTH inlen inlen 2 3 4 extern bool isbase64 char ch extern void base64 encode const char restrict in size t inlen char restrict out size t outlen extern size t base64 encode alloc const char in size t inlen char out extern bool base64 decode const char restrict in size t inlen char restrict out size t outlen Josefsson Expires September 23 2006 Page 14 Internet Draft Base N encodings March 2006 extern bool base64 decode alloc const char in size t inlen char out size t outlen endif BASE64 H 11 2 Implementation base64 c base64 c Encode binary data using printable characters This program is free software you can redistribute it and or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation either version 2 1 or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU Lesser General Public License for more details You should have received a copy of the GNU Lesser General Public License along with this program if not write to the Free Software Foundation Inc 51 Franklin Street Fifth Floor Boston MA 02110 1301 USA Written by Simon Josefsson Partially adapted from GNU MailUtils mailbox filter trans c as of 2004 11 28 Improved by review from Paul Eggert Bruno Haible and Stepan Kasal Be careful with error checking Here is how you would typically use these functions bool ok base64 decode alloc in inlen out outlen if ok FAIL input was not valid base64 if out NULL FAIL memory allocation error OK data in OUT OUTLEN size t outlen base64 encode alloc in inlen out if out NULL outlen 0 inlen 0 FAIL input too long if out NULL FAIL memory allocation error Josefsson Expires September 23 2006 Page 15 Internet Draft Base N encodings March 2006 OK data in OUT OUTLEN Get prototype include base64 h Get malloc include Get UCHAR MAX include C89 compliant way to cast char to unsigned char static inline unsigned char to uchar char ch return ch Base64 encode IN array of size INLEN into OUT array of size OUTLEN If OUTLEN is less than BASE64 LENGTH INLEN write as many bytes as possible If OUTLEN is larger than BASE64 LENGTH INLEN also zero terminate the output buffer void base64 encode const char restrict in size t inlen char restrict out size t outlen static const char b64str 64 ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz0123456789 while inlen outlen out b64str to uchar in 0 2 if outlen break out b64str to uchar in 0 4 0 0x3f if outlen break out inlen b64str to uchar in 1 6 0 0x3f Josefsson Expires September 23 2006 Page 16 Internet Draft Base N encodings March 2006 if outlen break out inlen b64str to uchar in 2 0x3f if outlen break if inlen inlen if inlen in 3 if outlen out 0 Allocate a buffer and store zero terminated base64 encoded data from array IN of size INLEN returning BASE64 LENGTH INLEN i e the length of the encoded data excluding the terminating zero On return the OUT variable will hold a pointer to newly allocated memory that must be deallocated by the caller If output string length would overflow 0 is returned and OUT is set to NULL If memory allocation fail OUT is set to NULL and the return value indicate length of the requested memory block i e BASE64 LENGTH inlen 1 size t base64 encode alloc const char in size t inlen char out size t outlen 1 BASE64 LENGTH inlen Check for overflow in outlen computation If there is no overflow outlen inlen If the operation inlen 2 overflows then it yields at most 1 so outlen is 0 If the multiplication overflows we lose at least half of the correct value so the result is 4 if inlen outlen out NULL return 0 Josefsson Expires September 23 2006 Page 17 Internet Draft Base N encodings March 2006 out malloc outlen if out base64 encode in inlen out outlen return outlen 1 With this approach this file works independent of the charset used think EBCDIC However it does assume that the characters in the Base64 alphabet A Za z0 9 are encoded in 0 255 POSIX 1003 1 2001 require that char and unsigned char are 8 bit quantities though taking care of that problem But this may be a potential problem on non POSIX C99 platforms define B64 x x A 0 x B 1 x C 2 x D 3 x E 4 x F 5 x G 6 x H 7 x I 8 x J 9 x K 10 x L 11 x M 12 x N 13 x O 14 x P 15 x Q 16 x R 17 x S 18 x T 19 x U 20 x V 21 x W 22 x X 23 x Y 24 x Z 25 x a 26 x b 27 x c 28 x d 29 x e 30 x f 31 x g 32 Josefsson Expires September 23 2006 Page 18 Internet Draft Base N encodings March 2006 x h 33 x i 34 x j 35 x k 36 x l 37 x m 38 x n 39 x o 40 x p 41 x q 42 x r 43 x s 44 x t 45 x u 46 x v 47 x w 48 x x 49 x y 50 x z 51 x 0 52 x 1 53 x 2 54 x 3 55 x 4 56 x 5 57 x 6 58 x 7 59 x 8 60 x 9 61 x 62 x 63 1 static const signed char b64 0x100 B64 0 B64 1 B64 2 B64 3 B64 4 B64 5 B64 6 B64 7 B64 8 B64 9 B64 10 B64 11 B64 12 B64 13 B64 14 B64 15 B64 16 B64 17 B64 18 B64 19 B64 20 B64 21 B64 22 B64 23 B64 24 B64 25 B64 26 B64 27 B64 28 B64 29 B64 30 B64 31 B64 32 B64 33 B64 34 B64 35 B64 36 B64 37 B64 38 B64 39 B64 40 B64 41 B64 42 B64 43 B64 44 B64 45 B64 46 B64 47 B64 48 B64 49 B64 50 B64 51 B64 52 B64 53 B64 54 B64 55 Josefsson Expires September 23 2006 Page 19 Internet Draft Base N encodings March 2006 B64 56 B64 57 B64 58 B64 59 B64 60 B64 61

    Original URL path: http://www.josefsson.org/base-encoding/draft-josefsson-rfc3548bis-01.txt (2016-04-30)
    Open archived version from archive


  • quantum of encoding input is exactly 8 bits here the final unit of encoded output will be two characters followed by two padding characters or 3 the final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be three characters followed by one padding character Josefsson Ed Expires May 16 2006 Page 6 Internet Draft Base N encodings November 2005 4 Base 64 Encoding with URL and Filename Safe Alphabet The Base 64 encoding with an URL and filename safe alphabet has been used in 10 An alternative alphabet has been suggested that used as the 63rd character Since the character has special meaning in some file system environments the encoding described in this section is recommended instead This encoding should not be regarded as the same as the base64 encoding and should not be referred to as only base64 Unless made clear base64 refer to the base 64 in the previous section This encoding is technically identical to the previous one except for the 62 nd and 63 rd alphabet character as indicated in table 2 Table 2 The URL and Filename safe Base 64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 12 M 29 d 46 u minus 13 N 30 e 47 v 63 14 O 31 f 48 w understrike 15 P 32 g 49 x 16 Q 33 h 50 y pad 5 Base 32 Encoding The following description of base 32 is due to 9 with corrections The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but need not be humanly readable Josefsson Ed Expires May 16 2006 Page 7 Internet Draft Base N encodings November 2005 A 33 character subset of US ASCII is used enabling 5 bits to be represented per printable character The extra 33rd character is used to signify a special processing function The encoding process represents 40 bit groups of input bits as output strings of 8 encoded characters Proceeding from left to right a 40 bit input group is formed by concatenating 5 8bit input groups These 40 bits are then treated as 8 concatenated 5 bit groups each of which is translated into a single digit in the base 32 alphabet When encoding a bit stream via the base 32 encoding the bit stream must be presumed to be ordered with the most significant bit first That is the first bit in the stream will be the high order bit in the first 8bit byte and the eighth bit will be the low order bit in the first 8bit byte and so on Each 5 bit group is used as an index into an array of 32 printable characters The character referenced by the index is placed in the output string These characters identified in Table 3 below are selected from US ASCII digits and uppercase letters Table 3 The Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 9 J 18 S 27 3 1 B 10 K 19 T 28 4 2 C 11 L 20 U 29 5 3 D 12 M 21 V 30 6 4 E 13 N 22 W 31 7 5 F 14 O 23 X 6 G 15 P 24 Y pad 7 H 16 Q 25 Z 8 I 17 R 26 2 Special processing is performed if fewer than 40 bits are available at the end of the data being encoded A full encoding quantum is always completed at the end of a body When fewer than 40 input bits are available in an input group zero bits are added on the right to form an integral number of 5 bit groups Padding at the end of the data is performed using the character Since all base 32 input is an integral number of octets only the following cases can arise 1 the final quantum of encoding input is an integral multiple of 40 bits here the final unit of encoded output will be an integral multiple of 8 characters with no padding 2 the final quantum of encoding input is exactly 8 bits here the Josefsson Ed Expires May 16 2006 Page 8 Internet Draft Base N encodings November 2005 final unit of encoded output will be two characters followed by six padding characters 3 the final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be four characters followed by four padding characters 4 the final quantum of encoding input is exactly 24 bits here the final unit of encoded output will be five characters followed by three padding characters or 5 the final quantum of encoding input is exactly 32 bits here the final unit of encoded output will be seven characters followed by one padding character 6 Base 32 Encoding with Extended Hex Alphabet The following description of base 32 is due to 7 This encoding should not be regarded as the same as the base32 encoding and should not be referred to as only base32 One property with this alphabet that the base64 and base32 alphabet lack is that encoded data maintain its sort order when the encoded data is compared bit wise This encoding is identical to the previous one except for the alphabet The new alphabet is found in table 4 Table 4

    Original URL path: http://www.josefsson.org/base-encoding/draft-josefsson-rfc3548bis-00.txt (2016-04-30)
    Open archived version from archive


  • B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 12 M 29 d 46 u 63 13 N 30 e 47 v 14 O 31 f 48 w pad 15 P 32 g 49 x 16 Q 33 h 50 y Special processing is performed if fewer than 24 bits are available at the end of the data being encoded A full encoding quantum is always completed at the end of a quantity When fewer than 24 input bits are available in an input group zero bits are added on the right to form an integral number of 6 bit groups Padding at the Josefsson editor Expires August 2 2002 Page 6 Internet Draft Base Encoding of Data February 2002 end of the data is performed using the character Since all base 64 input is an integral number of octets only the following cases can arise 1 the final quantum of encoding input is an integral multiple of 24 bits here the final unit of encoded output will be an integral multiple of 4 characters with no padding 2 the final quantum of encoding input is exactly 8 bits here the final unit of encoded output will be two characters followed by two padding characters or 3 the final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be three characters followed by one padding character Josefsson editor Expires August 2 2002 Page 7 Internet Draft Base Encoding of Data February 2002 4 Base 64 Encoding with URL and Filename Safe Alphabet The Base 64 encoding with an URL and filename safe alphabet has been used in 8 An alternative alphabet has been suggested that used as the 63rd character Since the character has special meaning in some file system environments the encoding described in this section is recommended instead This encoding should not be regarded as the same as the base64 encoding and should not be referred to as only base64 Unless made clear base64 refer to the base 64 in the previous section This encoding is technically identical to the previous one except for the 62 nd and 63 rd alphabet character as indicated in table 2 Table 2 The URL and Filename safe Base 64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 minus 12 M 29 d 46 u 63 understrike 13 N 30 e 47 v 14 O 31 f 48 w pad 15 P 32 g 49 x 16 Q 33 h 50 y Josefsson editor Expires August 2 2002 Page 8 Internet Draft Base Encoding of Data February 2002 5 Base 32 Encoding The following description of base 32 is due to 7 with corrections The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but need not be humanly readable A 33 character subset of US ASCII is used enabling 5 bits to be represented per printable character The extra 33rd character is used to signify a special processing function The encoding process represents 40 bit groups of input bits as output strings of 8 encoded characters Proceeding from left to right a 40 bit input group is formed by concatenating 5 8bit input groups These 40 bits are then treated as 8 concatenated 5 bit groups each of which is translated into a single digit in the base 32 alphabet When encoding a bit stream via the base 32 encoding the bit stream must be presumed to be ordered with the most significant bit first That is the first bit in the stream will be the high order bit in the first 8bit byte and the eighth bit will be the low order bit in the first 8bit byte and so on Each 5 bit group is used as an index into an array of 32 printable characters The character referenced by the index is placed in the output string These characters identified in Table 2 below are selected from US ASCII digits and uppercase letters Table 3 The Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 9 J 18 S 27 3 1 B 10 K 19 T 28 4 2 C 11 L 20 U 29 5 3 D 12 M 21 V 30 6 4 E 13 N 22 W 31 7 5 F 14 O 23 X 6 G 15 P 24 Y pad 7 H 16 Q 25 Z 8 I 17 R 26 2 Special processing is performed if fewer than 40 bits are available at the end of the data being encoded A full encoding quantum is always completed at the end of a body When fewer than 40 input bits are available in an input group zero bits are added on the right to form an integral number of 5 bit groups Padding at the end of Josefsson editor Expires August 2 2002 Page 9 Internet Draft

    Original URL path: http://www.josefsson.org/base-encoding/draft-josefsson-base-encoding-04.txt (2016-04-30)
    Open archived version from archive


  • 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 12 M 29 d 46 u 63 13 N 30 e 47 v 14 O 31 f 48 w pad 15 P 32 g 49 x 16 Q 33 h 50 y Josefsson editor Expires May 14 2002 Page 5 Internet Draft Base Encodings November 2001 Table 2 The URL and Filename safe Base 64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 minus 12 M 29 d 46 u 63 understrike 13 N 30 e 47 v 14 O 31 f 48 w pad 15 P 32 g 49 x 16 Q 33 h 50 y Special processing is performed if fewer than 24 bits are available at the end of the data being encoded A full encoding quantum is always completed at the end of a quantity When fewer than 24 input bits are available in an input group zero bits are added on the right to form an integral number of 6 bit groups Padding at the end of the data is performed using the character Since all base 64 input is an integral number of octets only the following cases can arise 1 the final quantum of encoding input is an integral multiple of 24 bits here the final unit of encoded output will be an integral multiple of 4 characters with no padding 2 the final quantum of encoding input is exactly 8 bits here the final unit of encoded output will be two characters followed by two padding characters or 3 the final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be three characters followed by one padding character Josefsson editor Expires May 14 2002 Page 6 Internet Draft Base Encodings November 2001 3 Base 32 Encoding The following description of base 32 is due to 7 with corrections and 6 the extended hex alphabet The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but need not be humanly readable A 33 character subset of US ASCII is used enabling 5 bits to be represented per printable character The extra 33rd character is used to signify a special processing function The encoding process represents 40 bit groups of input bits as output strings of 8 encoded characters Proceeding from left to right a 40 bit input group is formed by concatenating 5 8bit input groups These 40 bits are then treated as 8 concatenated 5 bit groups each of which is translated into a single digit in the base 32 alphabet When encoding a bit stream via the base 32 encoding the bit stream must be presumed to be ordered with the most significant bit first That is the first bit in the stream will be the high order bit in the first 8bit byte and the eighth bit will be the low order bit in the first 8bit byte and so on Each 5 bit group is used as an index into an array of 32 printable characters The character referenced by the index is placed in the output string These characters identified in Table 2 below are selected from US ASCII digits and uppercase letters Table 3 The Canonical Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 9 J 18 S 27 3 1 B 10 K 19 T 28 4 2 C 11 L 20 U 29 5 3 D 12 M 21 V 30 6 4 E 13 N 22 W 31 7 5 F 14 O 23 X 6 G 15 P 24 Y pad 7 H 16 Q 25 Z 8 I 17 R 26 2 Josefsson editor Expires May 14 2002 Page 7 Internet Draft Base Encodings November 2001 Table 4 The Extended Hex Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 0 9 9 18 I 27 R 1 1 10 A 19 J 28 S 2 2 11 B 20 K 29 T 3 3 12 C 21 L 30 U 4 4 13 D 22 M 31 V 5 5 14 E 23 N 6 6 15 F 24 O pad 7 7 16 G 25 P 8 8 17 H 26 Q Special processing is performed if fewer than 40 bits are available at the end of the data being encoded A full encoding quantum is always completed at the end of a body When fewer than 40 input bits are available in an input group zero bits are added on the right to form an integral number of 5 bit groups Padding at the end of the data is performed using the character Since all base 32 input is an integral number of octets only the following cases can arise 1 the final quantum of encoding input is an integral multiple of 40 bits here the final unit of encoded output will be an integral multiple of 8 characters with no padding 2 the final quantum of encoding input is exactly 8 bits here

    Original URL path: http://www.josefsson.org/base-encoding/draft-josefsson-base-encoding-03.txt (2016-04-30)
    Open archived version from archive



  •