archive-org.com » ORG » J » JOSEFSSON.ORG Total: 236 Choose link from "Titles, links and description words view": Or switch to
"Titles and links view". |

5 Josefsson Expires July 12 2001 Page 2 Internet Draft MIME registration of application dns January 2001 1 Introduction DNS 1 2 is a widely deployed protocol used to among other things translate domain names into IP addresses DNS data can be archived in a format described in RFC 2540 5 this document describe the registration of a MIME type for this format 2 MIME type registration of application dns To ietf types iana org Subject Registration of MIME media type application dns MIME media type name application MIME subtype name dns Required parameters none Optional parameters none Encoding considerations 7bit 8bit or binary Security considerations This definition identifies the content as being detached DNS information 5 This data may be security relevant according to 4 or secured information according to 3 Interoperability considerations none Published specification the format of data that could be tagged with this MIME type is documented in RFC 2540 5 Applications which use this media type DNS related software including software storing and using certificates stored in DNS Additional information Magic number s none File extension s none Macintosh File Type Code s unknown Person email address to contact for further information Simon Josefsson sjosefsson rsasecurity com Intended usage COMMON Author Change controller unknown Josefsson Expires July 12 2001 Page 3 Internet Draft MIME registration of application dns January 2001 References 1 Mockapetris P Domain Names Concepts and Facilities RFC 1034 November 1987 2 Mockapetris P Domain Names Implementation and Specification RFC 1035 November 1987 3 Eastlake D Domain Name System Security Extensions RFC 2535 March 1999 4 Eastlake D and O Gudmundsson Storing Certificates in the Domain Name System DNS RFC 2538 March 1999 5 Eastlake D Detached Domain Name System DNS Information RFC 2540 March 1999 Author s Address Simon Josefsson RSA Security

Original URL path: http://www.josefsson.org/dns-mime/draft-josefsson-mime-dns-00.txt (2016-04-30)

Open archived version from archive

implemented as part of MIME RFC2045 which is already a Draft Standard The base64url alphabet is newer and is not as common as base64 although there are several interoperable implementations The base32 encoding is not as widely used as base64 but has applications in case insensitive environments The base32 encoding is used by GS2 RFC5801 The base32hex encoding without padding is used by RFC5155 and a restricted form is used by RFC2938 The base16 encoding is usually referred to as hexadecimal or hex encoding and is used in many protocols and technical documents in an informal way 3 Methodology We identified that we wanted to test at least two distinct implementations of the following encodings base64 base64url base32 base32hex base16 The primary test is of course that encoding and decoding of data works and generate the expected results Section 3 of RFC 4648 discuss some implementation discrepancies of base encoding To iron out interoperability problems we checked these corner cases separately and documented the result In particular how line feeds are handled during encoding and decoding LF whether padding is done correctly PAD how non alphabetical characters are handled NONALPHA whether pad bits are zero or not Josefsson Expires September 14 2011 Page 3 Internet Draft Implementation Report for RFC 4648 March 2011 ZEROBITS A useful test vector for zero bit padding is correctly implemented is YR which is a non canonical encoding of a ASCII 0x61 that normally would be encoded as YQ Implementations should normally reject the input 4 Exceptions Basic encoding and decoding of data interoperate well Some tools accepted non canonical encodings but none appeared to ever generate them This is consistent with the requirements in section 3 5 of RFC 4648 We acknowledge that many implementations of base64 were written for a general purpose and thus may not follow some of the guidelines e g related to line feeds in RFC 4648 strictly However we believe this should not be a reason against Draft Standard status because the document is clear on the issues and the variations in implementations does not lead to significant practical problems In fact some of the variations are used to improve robustness in face of common problems 5 Implementations Tested 5 1 GNU Coreutils base64 There is a base64 command line tool written in C included in GNU Coreutils GNU Coreutils Base64 It supports the base64 alphabet LF On encoding it wraps output after 76 characters same as MIME On decoding it accepts line wrapped input PAD It appears to pad data properly NONALPHA It appears to return a non zero error code if the input contains non alphabetical characters ZEROBITS On encoding the pad bits are zero On decoding accepts non zero pad bits 5 2 OpenSSL base64 There is a base64 command line tool written in C included in OpenSSL OpenSSL Base64 It supports the base64 alphabet Josefsson Expires September 14 2011 Page 4 Internet Draft Implementation Report for RFC 4648 March 2011 LF On encoding it wraps

Original URL path: http://www.josefsson.org/base-encoding/draft-josefsson-rfc4648-impl-report.txt (2016-04-30)

Open archived version from archive

u 63 13 N 30 e 47 v 14 O 31 f 48 w pad 15 P 32 g 49 x 16 Q 33 h 50 y Special processing is performed if fewer than 24 bits are available at the end of the data being encoded A full encoding quantum is always completed at the end of a quantity When fewer than 24 input bits are available in an input group zero bits are added on the right to form an integral number of 6 bit groups Padding at the end of the data is performed using the character Since all base 64 input is an integral number of octets only the following cases can arise 1 the final quantum of encoding input is an integral multiple of 24 bits here the final unit of encoded output will be an integral multiple of 4 characters with no padding 2 the final quantum of encoding input is exactly 8 bits here the final unit of encoded output will be two characters followed by two padding characters or 3 the final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be three characters followed by one padding character Josefsson Informational Page 5 RFC 3548 The Base16 Base32 and Base64 Data Encodings July 2003 4 Base 64 Encoding with URL and Filename Safe Alphabet The Base 64 encoding with an URL and filename safe alphabet has been used in 8 An alternative alphabet has been suggested that used as the 63rd character Since the character has special meaning in some file system environments the encoding described in this section is recommended instead This encoding should not be regarded as the same as the base64 encoding and should not be referred to as only base64 Unless made clear base64 refer to the base 64 in the previous section This encoding is technically identical to the previous one except for the 62 nd and 63 rd alphabet character as indicated in table 2 Table 2 The URL and Filename safe Base 64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 minus 12 M 29 d 46 u 63 understrike 13 N 30 e 47 v 14 O 31 f 48 w pad 15 P 32 g 49 x 16 Q 33 h 50 y 5 Base 32 Encoding The following description of base 32 is due to 7 with corrections The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but need not be humanly readable Josefsson Informational Page 6 RFC 3548 The Base16 Base32 and Base64 Data Encodings July 2003 A 33 character subset of US ASCII is used enabling 5 bits to be represented per printable character The extra 33rd character is used to signify a special processing function The encoding process represents 40 bit groups of input bits as output strings of 8 encoded characters Proceeding from left to right a 40 bit input group is formed by concatenating 5 8bit input groups These 40 bits are then treated as 8 concatenated 5 bit groups each of which is translated into a single digit in the base 32 alphabet When encoding a bit stream via the base 32 encoding the bit stream must be presumed to be ordered with the most significant bit first That is the first bit in the stream will be the high order bit in the first 8bit byte and the eighth bit will be the low order bit in the first 8bit byte and so on Each 5 bit group is used as an index into an array of 32 printable characters The character referenced by the index is placed in the output string These characters identified in Table 2 below are selected from US ASCII digits and uppercase letters Table 3 The Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 9 J 18 S 27 3 1 B 10 K 19 T 28 4 2 C 11 L 20 U 29 5 3 D 12 M 21 V 30 6 4 E 13 N 22 W 31 7 5 F 14 O 23 X 6 G 15 P 24 Y pad 7 H 16 Q 25 Z 8 I 17 R 26 2 Special processing is performed if fewer than 40 bits are available at the end of the data being encoded A full encoding quantum is always completed at the end of a body When fewer than 40 input bits are available in an input group zero bits are added on the right to form an integral number of 5 bit groups Padding at the end of the data is performed using the character Since all base 32 input is an integral number of octets only the following cases can arise 1 the final quantum of encoding input is an integral multiple of 40 bits here the final unit of encoded output will be an integral multiple of 8 characters with no padding Josefsson Informational Page 7 RFC 3548 The Base16 Base32 and Base64 Data Encodings July 2003 2 the final quantum of encoding input is exactly 8 bits here the final unit of encoded output will be two characters followed by six padding characters 3 the final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be four

Original URL path: http://www.josefsson.org/base-encoding/rfc3548.txt (2016-04-30)

Open archived version from archive- Diff: rfc3548.txt - rfc4648.txt

then treated as 4 concatenated 6 bit groups each of which is translated into a single digit in the base 64 alphabet of which is translated into a single character in the base 64 alphabet Each 6 bit group is used as an index into an array of 64 printable Each 6 bit group is used as an index into an array of 64 printable characters The character referenced by the index is placed in the characters The character referenced by the index is placed in the output string output string Table 1 The Base 64 Alphabet Table 1 The Base 64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 1 B 18 S 35 j 52 0 skipping to change at page 5 line 29 skipping to change at page 6 line 44 11 L 28 c 45 t 62 11 L 28 c 45 t 62 12 M 29 d 46 u 63 12 M 29 d 46 u 63 13 N 30 e 47 v 13 N 30 e 47 v 14 O 31 f 48 w pad 14 O 31 f 48 w pad 15 P 32 g 49 x 15 P 32 g 49 x 16 Q 33 h 50 y 16 Q 33 h 50 y Special processing is performed if fewer than 24 bits are available Special processing is performed if fewer than 24 bits are available at the end of the data being encoded A full encoding quantum is at the end of the data being encoded A full encoding quantum is always completed at the end of a quantity When fewer than 24 input always completed at the end of a quantity When fewer than 24 input bits are available in an input group zero bits are added on the bits are available in an input group bits with value zero are added right to form an integral number of 6 bit groups Padding at the on the right to form an integral number of 6 bit groups Padding end of the data is performed using the character Since all base at the end of the data is performed using the character Since 64 input is an integral number of octets only the following cases all base 64 input is an integral number of octets only the following can arise cases can arise 1 t he final quantum of encoding input is an integral multiple of 24 1 T he final quantum of encoding input is an integral multiple of 24 bits here the final unit of encoded output will be an integral bits here the final unit of encoded output will be an integral multiple of 4 characters with no padding multiple of 4 characters with no padding 2 the final quantum of encoding input is exactly 8 bits here the 2 The final quantum of encoding input is exactly 8 bits here the final unit of encoded output will be two characters followed by two final unit of encoded output will be two characters followed by padding characters or two padding characters 3 the final quantum of encoding input is exactly 16 bits here the 3 The final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be three characters followed by one final unit of encoded output will be three characters followed by padding character one padding character 4 Base 64 Encoding with URL and Filename Safe Alphabet 5 Base 64 Encoding with URL and Filename Safe Alphabet The Base 64 encoding with an URL and filename safe alphabet has been The Base 64 encoding with an URL and filename safe alphabet has been used in 8 used in 12 An alternative alphabet has been suggested that used as the 63rd An alternative alphabet has been suggested that would use as the character Since the character has special meaning in some file 63rd character Since the character has special meaning in some system environments the encoding described in this section is file system environments the encoding described in this section is recommended instead recommended instead The remaining unreserved URI character is but some file system environments do not permit multiple in a filename thus making the character unattractive as well This encoding should not be regarded as the same as the base64 The pad character is typically percent encoded when used in an encoding and should not be referred to as only base64 Unless URI 9 but if the data length is known implicitly this can be made clear base64 refer to the base 64 in the previous section avoided by skipping the padding see section 3 2 This encoding may be referred to as base64url This encoding should not be regarded as the same as the base64 encoding and should not be referred to as only base64 Unless clarified otherwise base64 refers to the base 64 in the previous section This encoding is technically identical to the previous one except This encoding is technically identical to the previous one except for the 62 nd and 63 rd alphabet character as indicated in t able 2 for the 62 nd and 63 rd alphabet character as indicated in T able 2 Table 2 The URL and Filename safe Base 64 Alphabet Table 2 The URL and Filename safe Base 64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 minus 11 L 28 c 45 t 62 minus 12 M 29 d 46 u 63 understrike 12 M 29 d 46 u 63 13 N 30 e 47 v 13 N 30 e 47 v underline 14 O 31 f 48 w pad 14 O 31 f 48 w 15 P 32 g 49 x 15 P 32 g 49 x 16 Q 33 h 50 y 16 Q 33 h 50 y pad 5 Base 32 Encoding 6 Base 32 Encoding The following description of base 32 is due to 7 with The following description of base 32 is derived from 11 with corrections corrections This encoding may be referred to as base32 The Base 32 encoding is designed to represent arbitrary sequences of The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but need not be octets in a form that needs to be case insensitive but that need not humanly readable be human readable A 33 character subset of US ASCII is used enabling 5 bits to be A 33 character subset of US ASCII is used enabling 5 bits to be represented per printable character The extra 33rd character represented per printable character The extra 33rd character is used to signify a special processing function is used to signify a special processing function The encoding process represents 40 bit groups of input bits as output The encoding process represents 40 bit groups of input bits as output strings of 8 encoded characters Proceeding from left to right a strings of 8 encoded characters Proceeding from left to right a 40 bit input group is formed by concatenating 5 8bit input groups 40 bit input group is formed by concatenating 5 8bit input groups These 40 bits are then treated as 8 concatenated 5 bit groups each These 40 bits are then treated as 8 concatenated 5 bit groups each of which is translated into a single digit in the base 32 alphabet of which is translated into a single character in the base 32 When encoding a bit stream via the base 32 encoding the bit stream alphabet When a bit stream is encoded via the base 32 encoding the must be presumed to be ordered with the most significant bit first bit stream must be presumed to be ordered with the most significant That is the first bit in the stream will be the high order bit in bit first That is the first bit in the stream will be the high the first 8bit byte and the eighth bit will be the low order bit in order bit in the first 8bit byte the eighth bit will be the low the first 8bit byte and so on order bit in the first 8bit byte and so on Each 5 bit group is used as an index into an array of 32 printable Each 5 bit group is used as an index into an array of 32 printable characters The character referenced by the index is placed in the characters The character referenced by the index is placed in the output string These characters identified in Table 2 below are output string These characters identified in Table 3 below are selected from US ASCII digits and uppercase letters selected from US ASCII digits and uppercase letters Table 3 The Base 32 Alphabet Table 3 The Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding 0 A 9 J 18 S 27 3 0 A 9 J 18 S 27 3 1 B 10 K 19 T 28 4 1 B 10 K 19 T 28 4 2 C 11 L 20 U 29 5 2 C 11 L 20 U 29 5 3 D 12 M 21 V 30 6 3 D 12 M 21 V 30 6 4 E 13 N 22 W 31 7 4 E 13 N 22 W 31 7 5 F 14 O 23 X 5 F 14 O 23 X 6 G 15 P 24 Y pad 6 G 15 P 24 Y pad 7 H 16 Q 25 Z 7 H 16 Q 25 Z 8 I 17 R 26 2 8 I 17 R 26 2 Special processing is performed if fewer than 40 bits are available Special processing is performed if fewer than 40 bits are available at the end of the data being encoded A full encoding quantum is at the end of the data being encoded A full encoding quantum is always completed at the end of a body When fewer than 40 input bits always completed at the end of a body When fewer than 40 input bits are available in an input group zero bits are added on the right are available in an input group bits with value zero are added on to form an integral number of 5 bit groups Padding at the end of the right to form an integral number of 5 bit groups Padding at the data is performed using the character Since all base 32 the end of the data is performed using the character Since all input is an integral number of octets only the following cases can base 32 input is an integral number of octets only the following arise cases can arise 1 t he final quantum of encoding input is an integral multiple of 40 1 T he final quantum of encoding input is an integral multiple of 40 bits here the final unit of encoded output will be an integral bits here the final unit of encoded output will be an integral multiple of 8 characters with no padding multiple of 8 characters with no padding 2 the final quantum of encoding input is exactly 8 bits here the final unit of encoded output will be two characters followed by six padding characters 3 the final quantum of encoding input is exactly 16 bits here the 2 The final quantum of encoding input is exactly 8 bits here the final unit of encoded output will be four characters followed by four final unit of encoded output will be two characters followed by padding characters six padding characters 4 the final quantum of encoding input is exactly 24 bits here the 3 The final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be four characters followed by four padding characters 4 The final quantum of encoding input is exactly 24 bits here the final unit of encoded output will be five characters followed by final unit of encoded output will be five characters followed by three padding characters or three padding characters 5 the final quantum of encoding input is exactly 32 bits here the 5 The final quantum of encoding input is exactly 32 bits here the final unit of encoded output will be seven characters followed by one final unit of encoded output will be seven characters followed by padding character one padding character 6 Base 16 Encoding 7 Base 32 Encoding with Extended Hex Alphabet The following description of base 32 is derived from 7 This encoding may be referred to as base32hex This encoding should not be regarded as the same as the base32 encoding and should not be referred to as only base32 This encoding is used by e g NextSECure3 NSEC3 10 One property with this alphabet which the base64 and base32 alphabets lack is that encoded data maintains its sort order when the encoded data is compared bit wise This encoding is identical to the previous one except for the alphabet The new alphabet is found in Table 4 Table 4 The Extended Hex Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 0 9 9 18 I 27 R 1 1 10 A 19 J 28 S 2 2 11 B 20 K 29 T 3 3 12 C 21 L 30 U 4 4 13 D 22 M 31 V 5 5 14 E 23 N 6 6 15 F 24 O pad 7 7 16 G 25 P 8 8 17 H 26 Q 8 Base 16 Encoding The following description is original but analogous to previous The following description is original but analogous to previous descriptions Essentially Base 16 encoding is the standard standard descriptions Essentially Base 16 encoding is the standard case case insensitive hex encoding and may be referred to as base16 or insensitive hex encoding and may be referred to as base16 or hex hex A 16 character subset of US ASCII is used enabling 4 bits to be A 16 character subset of US ASCII is used enabling 4 bits to be represented per printable character represented per printable character The encoding process represents 8 bit groups octets of input bits The encoding process represents 8 bit groups octets of input bits as output strings of 2 encoded characters Proceeding from left to as output strings of 2 encoded characters Proceeding from left to right a 8 bit input is taken from the input data These 8 bits are right a n 8 bit input is taken from the input data These 8 bits are then treated as 2 concatenated 4 bit groups each of which is then treated as 2 concatenated 4 bit groups each of which is translated into a single digit in the base 16 alphabet translated into a single character in the base 16 alphabet Each 4 bit group is used as an index into an array of 16 printable Each 4 bit group is used as an index into an array of 16 printable characters The character referenced by the index is placed in the characters The character referenced by the index is placed in the output string output string Table 5 The Base 16 Alphabet Table 5 The Base 16 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding 0 0 4 4 8 8 12 C 0 0 4 4 8 8 12 C 1 1 5 5 9 9 13 D 1 1 5 5 9 9 13 D 2 2 6 6 10 A 14 E 2 2 6 6 10 A 14 E 3 3 7 7 11 B 15 F 3 3 7 7 11 B 15 F Unlike base 32 and base 64 no special padding is necessary since a Unlike base 32 and base 64 no special padding is necessary since a full code word is always available full code word is always available 7 Illustrations and e xamples 9 Illustrations and E xamples To translate between binary and a base encoding the input is stored To translate between binary and a base encoding the input is stored in a structure and the output is extracted The case for base 64 is in a structure and the output is extracted The case for base 64 is displayed in the following figure borrowed from 4 displayed in the following figure borrowed from 5 first octet second octet third octet first octet second octet third octet 7 6 5 4 3 2

Original URL path: http://www.josefsson.org/base-encoding/rfc46-from-35.diff.html (2016-04-30)

Open archived version from archive

system environments the encoding described in this section is recommended instead The remaining unreserved URI character is but some file system environments does not permit multiple in a filename thus making the character unattractive as well The pad character is typically percent encoded when used in an URI RFC3986 but if the data length is known implicitly this can be avoided by skipping the padding see section 3 2 This encoding may be referred to as base64url This encoding should not be regarded as the same as the base64 encoding and should not be referred to as only base64 Unless made clear base64 refer to the base 64 in the previous section This encoding is technically identical to the previous one except for the 62 nd and 63 rd alphabet character as indicated in table 2 Table 2 The URL and Filename safe Base 64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 minus 12 M 29 d 46 u 63 13 N 30 e 47 v underline 14 O 31 f 48 w 15 P 32 g 49 x 16 Q 33 h 50 y pad Josefsson Expires November 2 2006 Page 9 Internet Draft Base N encodings May 2006 6 Base 32 Encoding The following description of base 32 is derived from draft ietf cat sasl gssapi 01 with corrections This encoding may be referred to as base32 The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but need not be humanly readable A 33 character subset of US ASCII is used enabling 5 bits to be represented per printable character The extra 33rd character is used to signify a special processing function The encoding process represents 40 bit groups of input bits as output strings of 8 encoded characters Proceeding from left to right a 40 bit input group is formed by concatenating 5 8bit input groups These 40 bits are then treated as 8 concatenated 5 bit groups each of which is translated into a single character in the base 32 alphabet When encoding a bit stream via the base 32 encoding the bit stream must be presumed to be ordered with the most significant bit first That is the first bit in the stream will be the high order bit in the first 8bit byte and the eighth bit will be the low order bit in the first 8bit byte and so on Each 5 bit group is used as an index into an array of 32 printable characters The character referenced by the index is placed in the output string These characters identified in Table 3 below are selected from US ASCII digits and uppercase letters Table 3 The Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 9 J 18 S 27 3 1 B 10 K 19 T 28 4 2 C 11 L 20 U 29 5 3 D 12 M 21 V 30 6 4 E 13 N 22 W 31 7 5 F 14 O 23 X 6 G 15 P 24 Y pad 7 H 16 Q 25 Z 8 I 17 R 26 2 Special processing is performed if fewer than 40 bits are available at the end of the data being encoded A full encoding quantum is always completed at the end of a body When fewer than 40 input bits are available in an input group bits with value zero are added on the right to form an integral number of 5 bit groups Padding at Josefsson Expires November 2 2006 Page 10 Internet Draft Base N encodings May 2006 the end of the data is performed using the character Since all base 32 input is an integral number of octets only the following cases can arise 1 the final quantum of encoding input is an integral multiple of 40 bits here the final unit of encoded output will be an integral multiple of 8 characters with no padding 2 the final quantum of encoding input is exactly 8 bits here the final unit of encoded output will be two characters followed by six padding characters 3 the final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be four characters followed by four padding characters 4 the final quantum of encoding input is exactly 24 bits here the final unit of encoded output will be five characters followed by three padding characters or 5 the final quantum of encoding input is exactly 32 bits here the final unit of encoded output will be seven characters followed by one padding character 7 Base 32 Encoding With Extended Hex Alphabet The following description of base 32 is derived from RFC2938 This encoding may be referred to as base32hex This encoding should not be regarded as the same as the base32 encoding and should not be referred to as only base32 This encoding is used by e g NSEC3 I D ietf dnsext nsec3 One property with this alphabet that the base64 and base32 alphabet lack is that encoded data maintain its sort order when the encoded data is compared bit wise This encoding is identical to the previous one except for the alphabet The new alphabet is found in table 4 Josefsson Expires November 2 2006 Page 11 Internet Draft Base N encodings May 2006 Table 4 The Extended Hex Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 0 9 9 18 I 27 R 1 1 10 A 19 J 28 S 2 2 11 B 20 K 29 T 3 3 12 C 21 L 30 U 4 4 13 D 22 M 31 V 5 5 14 E 23 N 6 6 15 F 24 O pad 7 7 16 G 25 P 8 8 17 H 26 Q Josefsson Expires November 2 2006 Page 12 Internet Draft Base N encodings May 2006 8 Base 16 Encoding The following description is original but analogous to previous descriptions Essentially Base 16 encoding is the standard case insensitive hex encoding and may be referred to as base16 or hex A 16 character subset of US ASCII is used enabling 4 bits to be represented per printable character The encoding process represents 8 bit groups octets of input bits as output strings of 2 encoded characters Proceeding from left to right a 8 bit input is taken from the input data These 8 bits are then treated as 2 concatenated 4 bit groups each of which is translated into a single character in the base 16 alphabet Each 4 bit group is used as an index into an array of 16 printable characters The character referenced by the index is placed in the output string Table 5 The Base 16 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 0 4 4 8 8 12 C 1 1 5 5 9 9 13 D 2 2 6 6 10 A 14 E 3 3 7 7 11 B 15 F Unlike base 32 and base 64 no special padding is necessary since a full code word is always available Josefsson Expires November 2 2006 Page 13 Internet Draft Base N encodings May 2006 9 Illustrations And Examples To translate between binary and a base encoding the input is stored in a structure and the output is extracted The case for base 64 is displayed in the following figure borrowed from RFC2440 first octet second octet third octet 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 1 index 2 index 3 index 4 index The case for base 32 is shown in the following figure borrowed from RFC2938 Each successive character in a base 32 value represents 5 successive bits of the underlying octet sequence Thus each group of 8 characters represents a sequence of 5 octets 40 bits 1 2 3 01234567 89012345 67890123 45678901 23456789 8th character 7th character 6th character 5th character 4th character 3rd character 2nd character 1st character The following example of Base64 data is from RFC2440 with corrections Josefsson Expires November 2 2006 Page 14 Internet Draft Base N encodings May 2006 Input data 0x14fb9c03d97e Hex 1 4 f b 9 c 0 3 d 9 7 e 8 bit 00010100 11111011 10011100 00000011 11011001 01111110 6 bit 000101 001111 101110 011100 000000 111101 100101 111110 Decimal 5 15 46 28 0 61 37 62 Output F P u c A 9 l Input data 0x14fb9c03d9 Hex 1 4 f b 9 c 0 3 d 9 8 bit 00010100 11111011 10011100 00000011 11011001 pad with 00 6 bit 000101 001111 101110 011100 000000 111101 100100 Decimal 5 15 46 28 0 61 36 pad with Output F P u c A 9 k Input data 0x14fb9c03 Hex 1 4 f b 9 c 0 3 8 bit 00010100 11111011 10011100 00000011 pad with 0000 6 bit 000101 001111 101110 011100 000000 110000 Decimal 5 15 46 28 0 48 pad with Output F P u c A w 10 Test Vectors BASE64 BASE64 f Zg BASE64 fo Zm8 BASE64 foo Zm9v BASE64 foob Zm9vYg BASE64 fooba Zm9vYmE BASE64 foobar Zm9vYmFy BASE32 BASE32 f MY BASE32 fo MZXQ Josefsson Expires November 2 2006 Page 15 Internet Draft Base N encodings May 2006 BASE32 foo MZXW6 BASE32 foob MZXW6YQ BASE32 fooba MZXW6YTB BASE32 foobar MZXW6YTBOI BASE32 HEX BASE32 HEX f CO BASE32 HEX fo CPNG BASE32 HEX foo CPNMU BASE32 HEX foob CPNMUOG BASE32 HEX fooba CPNMUOJ1 BASE32 HEX foobar CPNMUOJ1E8 BASE16 BASE16 f 66 BASE16 fo 666F BASE16 foo 666F6F BASE16 foob 666F6F62 BASE16 fooba 666F6F6261 BASE16 foobar 666F6F626172 11 ISO C99 Implementation Of Base64 Below is an ISO C99 implementation of Base64 encoding and decoding The code assume that the US ASCII characters are encoding inside char with values below 255 which holds for all POSIX platforms but should otherwise be portable This code is not intended as a normative specification of base64 11 1 Prototypes base64 h base64 h Encode binary data using printable characters Josefsson Expires November 2 2006 Page 16 Internet Draft Base N encodings May 2006 Copyright C 2004 2005 2006 Free Software Foundation Inc Written by Simon Josefsson This program is free software you can redistribute it and or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation either version 2 1 or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU Lesser General Public License for more details You can retrieve a copy of the GNU Lesser General Public License from http www gnu org licenses lgpl txt or by writing to the Free Software Foundation Inc 51 Franklin Street Fifth Floor Boston MA 02110 1301 USA ifndef BASE64 H define BASE64 H Get size t include Get bool include This uses that the expression n k 1 k means the smallest integer n k i e the ceiling of n k define BASE64 LENGTH inlen inlen 2 3 4 extern bool isbase64 char ch extern void base64 encode const char restrict in size t inlen char restrict out size t outlen extern size t base64 encode alloc const char in size t inlen char out extern bool base64 decode const char restrict in size t inlen char restrict out size t outlen Josefsson Expires November 2 2006 Page 17 Internet Draft Base N encodings May 2006 extern bool base64 decode alloc const char in size t inlen char out size t outlen endif BASE64 H 11 2 Implementation base64 c base64 c Encode binary data using printable characters Copyright C 1999 2000 2001 2004 2005 2006 Free Software Foundation Inc This program is free software you can redistribute it and or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation either version 2 1 or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU Lesser General Public License for more details You can retrieve a copy of the GNU Lesser General Public License from http www gnu org licenses lgpl txt or by writing to the Free Software Foundation Inc 51 Franklin Street Fifth Floor Boston MA 02110 1301 USA Written by Simon Josefsson Partially adapted from GNU MailUtils mailbox filter trans c as of 2004 11 28 Improved by review from Paul Eggert Bruno Haible and Stepan Kasal Be careful with error checking Here is how you would typically use these functions bool ok base64 decode alloc in inlen out outlen if ok FAIL input was not valid base64 if out NULL FAIL memory allocation error OK data in OUT OUTLEN size t outlen base64 encode alloc in inlen out if out NULL outlen 0 inlen 0 FAIL input too long Josefsson Expires November 2 2006 Page 18 Internet Draft Base N encodings May 2006 if out NULL FAIL memory allocation error OK data in OUT OUTLEN Get prototype include base64 h Get malloc include Get UCHAR MAX include C89 compliant way to cast char to unsigned char static inline unsigned char to uchar char ch return ch Base64 encode IN array of size INLEN into OUT array of size OUTLEN If OUTLEN is less than BASE64 LENGTH INLEN write as many bytes as possible If OUTLEN is larger than BASE64 LENGTH INLEN also zero terminate the output buffer void base64 encode const char restrict in size t inlen char restrict out size t outlen static const char b64str 64 ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz0123456789 while inlen outlen out b64str to uchar in 0 2 if outlen break out b64str to uchar in 0 4 0 0x3f if outlen break out inlen b64str to uchar in 1 6 0 0x3f if outlen break out inlen b64str to uchar in 2 0x3f if outlen break if inlen inlen if inlen in 3 if outlen out 0 Allocate a buffer and store zero terminated base64 encoded data from array IN of size INLEN returning BASE64 LENGTH INLEN i e the length of the encoded data excluding the terminating zero On return the OUT variable will hold a pointer to newly allocated memory that must be deallocated by the caller If output string length would overflow 0 is returned and OUT is set to NULL If memory allocation fail OUT is set to NULL and the return value indicate length of the requested memory block i e BASE64 LENGTH inlen 1 size t base64 encode alloc const char in size t inlen char out size t outlen 1 BASE64 LENGTH inlen Check for overflow in outlen computation If there is no overflow outlen inlen If the operation inlen 2 overflows then it yields at most 1 so outlen is 0 If the multiplication overflows we lose at least half of the correct value so the result is 4 if inlen outlen out NULL Josefsson Expires November 2 2006 Page 20 Internet Draft Base N encodings May 2006 return 0 out malloc outlen if out base64 encode in inlen out outlen return outlen 1 With this approach this file works independent of the charset used think EBCDIC However it does assume that the characters in the Base64 alphabet A Za z0 9 are encoded in 0 255 POSIX 1003 1 2001 require that char and unsigned char are 8 bit quantities though taking care of that problem But this may be a potential problem on non POSIX C99 platforms define B64 x x A 0 x B 1 x C 2 x D 3 x E 4 x F 5 x G 6 x H 7 x I 8 x J 9 x K 10 x L 11 x M 12 x N 13 x O 14 x P 15 x Q 16 x R 17 x S 18 x T 19 x U 20 x V 21 x W 22 x X 23 x Y 24 x Z 25 x a 26 x b 27 x c 28 x d 29 Josefsson Expires November 2 2006 Page 21 Internet Draft Base N encodings May 2006 x e 30 x f 31 x g 32 x h 33 x i 34 x j 35 x k 36 x l 37 x m 38 x n 39 x o 40 x p 41 x q 42 x r 43 x s 44 x t 45 x u 46 x v 47 x w 48 x x 49 x y 50 x z 51 x 0 52 x 1 53 x 2 54 x 3 55 x 4 56 x 5 57 x 6 58 x 7 59 x 8 60 x 9 61 x 62 x 63 1 static const signed char b64 0x100 B64 0 B64 1 B64 2 B64 3 B64 4 B64 5 B64 6

Original URL path: http://www.josefsson.org/base-encoding/draft-josefsson-rfc3548bis.txt (2016-04-30)

Open archived version from archive- Diff: draft-josefsson-rfc3548bis-02.txt - draft-josefsson-rfc3548bis-03.txt

this means that any adjacent carriage return characters and are ignored Furthermore such specifications may line feed CRLF characters constitute non alphabet characters and consider the pad character as not part of the base alphabet are ignored Furthermore such specifications MAY ignore the pad until the end of the string If more than the allowed number of pad character treating it as non alphabet data if it is present characters are found at the end of the string e g a base 64 string before the end of the encoded data If more than the allowed number terminated with the excess pad characters could be ignored of pad characters are found at the end of the string e g a base 64 string terminated with the excess pad characters MAY also be ignored 3 4 Choosing the a lphabet 3 4 Choosing The A lphabet Different applications have different requirements on the characters Different applications have different requirements on the characters in the alphabet Here are a few requirements that determine which in the alphabet Here are a few requirements that determine which alphabet should be used alphabet should be used o Handled by humans Characters 0 O are easily interchanged o Handled by humans Characters 0 O are easily confused as as well 1 l and I In the base32 alphabet below where 0 well as 1 l and I In the base32 alphabet below where 0 zero and 1 one is not present a decoder may interpret 0 as O zero and 1 one are not present a decoder may interpret 0 as and 1 as I or L depending on case However by default it should O and 1 as I or L depending on case However by default it not see previous section should not see previous section o Encoded into structures that place other requirements For base o Encoded into structures that mandate other requirements For base 16 and base 32 this determines the use of upper or lowercase 16 and base 32 this determines the use of upper or lowercase alphabets For base 64 the non alphanumeric characters in alphabets For base 64 the non alphanumeric characters in particular may be problematic in file names and URLs particular may be problematic in file names and URLs o Used as identifiers Certain characters notably and in o Used as identifiers Certain characters notably and in the base 64 alphabet are treated as word breaks by legacy text the base 64 alphabet are treated as word breaks by legacy text search index tools search index tools There is no universally accepted alphabet that fulfills all the There is no universally accepted alphabet that fulfills all the requirements For an example of a highly specialized variant see requirements For an example of a highly specialized variant see IMAP 8 In this document we document and name some currently used IMAP 8 In this document we document and name some currently used alphabets alphabets 4 Base 64 Encoding 4 Base 64 Encoding The following description of base 64 is due to 3 4 5 and 6 The following description of base 64 is derived from 3 4 5 and 6 This encoding may be referred to as base64 The Base 64 encoding is designed to represent arbitrary sequences of The Base 64 encoding is designed to represent arbitrary sequences of octets in a form that requires case sensitivity but need not be octets in a form that allows the use of both upper and lowercase humanly readable letters but need not be humanly readable A 65 character subset of US ASCII is used enabling 6 bits to be A 65 character subset of US ASCII is used enabling 6 bits to be represented per printable character The extra 65th character represented per printable character The extra 65th character is used to signify a special processing function is used to signify a special processing function The encoding process represents 24 bit groups of input bits as output The encoding process represents 24 bit groups of input bits as output strings of 4 encoded characters Proceeding from left to right a strings of 4 encoded characters Proceeding from left to right a 24 bit input group is formed by concatenating 3 8 bit input groups 24 bit input group is formed by concatenating 3 8 bit input groups These 24 bits are then treated as 4 concatenated 6 bit groups each These 24 bits are then treated as 4 concatenated 6 bit groups each of which is translated into a single digit in the base 64 alphabet of which is translated into a single character in the base 64 alphabet Each 6 bit group is used as an index into an array of 64 printable Each 6 bit group is used as an index into an array of 64 printable characters The character referenced by the index is placed in the characters The character referenced by the index is placed in the output string output string Table 1 The Base 64 Alphabet Table 1 The Base 64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 1 B 18 S 35 j 52 0 skipping to change at page 7 line 5 skipping to change at page 8 line 5 multiple of 4 characters with no padding multiple of 4 characters with no padding 2 the final quantum of encoding input is exactly 8 bits here the 2 the final quantum of encoding input is exactly 8 bits here the final unit of encoded output will be two characters followed by two final unit of encoded output will be two characters followed by two padding characters or padding characters or 3 the final quantum of encoding input is exactly 16 bits here the 3 the final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be three characters followed by one final unit of encoded output will be three characters followed by one padding character padding character 5 Base 64 Encoding with URL a nd Filename Safe Alphabet 5 Base 64 Encoding With URL A nd Filename Safe Alphabet The Base 64 encoding with an URL and filename safe alphabet has been The Base 64 encoding with an URL and filename safe alphabet has been used in 1 0 used in 1 1 An alternative alphabet has been suggested that used as the 63rd An alternative alphabet has been suggested that used as the 63rd character Since the character has special meaning in some file character Since the character has special meaning in some file system environments the encoding described in this section is system environments the encoding described in this section is recommended instead recommended instead This encoding should not be regarded as the same as the base64 This encoding may be referred to as base64url This encoding encoding and should not be referred to as only base64 Unless should not be regarded as the same as the base64 encoding and made clear base64 refer to the base 64 in the previous section should not be referred to as only base64 Unless made clear base64 refer to the base 64 in the previous section This encoding is technically identical to the previous one except This encoding is technically identical to the previous one except for the 62 nd and 63 rd alphabet character as indicated in table 2 for the 62 nd and 63 rd alphabet character as indicated in table 2 Table 2 The URL and Filename safe Base 64 Alphabet Table 2 The URL and Filename safe Base 64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 minus 11 L 28 c 45 t 62 minus 12 M 29 d 46 u 63 12 M 29 d 46 u 63 13 N 30 e 47 v under strik e 13 N 30 e 47 v under lin e 14 O 31 f 48 w 14 O 31 f 48 w 15 P 32 g 49 x 15 P 32 g 49 x 16 Q 33 h 50 y pad 16 Q 33 h 50 y pad 6 Base 32 Encoding 6 Base 32 Encoding The following description of base 32 is due to 9 with The following description of base 32 is derived from 10 with corrections corrections This encoding may be referred to as base32 The Base 32 encoding is designed to represent arbitrary sequences of The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but need not be octets in a form that needs to be case insensitive but need not be humanly readable humanly readable A 33 character subset of US ASCII is used enabling 5 bits to be A 33 character subset of US ASCII is used enabling 5 bits to be represented per printable character The extra 33rd character represented per printable character The extra 33rd character is used to signify a special processing function is used to signify a special processing function The encoding process represents 40 bit groups of input bits as output The encoding process represents 40 bit groups of input bits as output strings of 8 encoded characters Proceeding from left to right a strings of 8 encoded characters Proceeding from left to right a 40 bit input group is formed by concatenating 5 8bit input groups 40 bit input group is formed by concatenating 5 8bit input groups These 40 bits are then treated as 8 concatenated 5 bit groups each These 40 bits are then treated as 8 concatenated 5 bit groups each of which is translated into a single digit in the base 32 alphabet of which is translated into a single character in the base 32 When encoding a bit stream via the base 32 encoding the bit stream alphabet When encoding a bit stream via the base 32 encoding the must be presumed to be ordered with the most significant bit first bit stream must be presumed to be ordered with the most significant That is the first bit in the stream will be the high order bit in bit first That is the first bit in the stream will be the high the first 8bit byte and the eighth bit will be the low order bit in order bit in the first 8bit byte and the eighth bit will be the low the first 8bit byte and so on order bit in the first 8bit byte and so on Each 5 bit group is used as an index into an array of 32 printable Each 5 bit group is used as an index into an array of 32 printable characters The character referenced by the index is placed in the characters The character referenced by the index is placed in the output string These characters identified in Table 3 below are output string These characters identified in Table 3 below are selected from US ASCII digits and uppercase letters selected from US ASCII digits and uppercase letters Table 3 The Base 32 Alphabet Table 3 The Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding 0 A 9 J 18 S 27 3 0 A 9 J 18 S 27 3 skipping to change at page 9 line 18 skipping to change at page 10 line 18 padding characters padding characters 4 the final quantum of encoding input is exactly 24 bits here the 4 the final quantum of encoding input is exactly 24 bits here the final unit of encoded output will be five characters followed by final unit of encoded output will be five characters followed by three padding characters or three padding characters or 5 the final quantum of encoding input is exactly 32 bits here the 5 the final quantum of encoding input is exactly 32 bits here the final unit of encoded output will be seven characters followed by one final unit of encoded output will be seven characters followed by one padding character padding character 7 Base 32 Encoding w ith Extended Hex Alphabet 7 Base 32 Encoding W ith Extended Hex Alphabet The following description of base 32 is due to 7 This encoding The following description of base 32 is derived from 7 This should not be regarded as the same as the base32 encoding and encoding may be referred to as base32hex This encoding should not should not be referred to as only base32 be regarded as the same as the base32 encoding and should not be referred to as only base32 This encoding is used by e g NSEC3 9 One property with this alphabet that the base64 and base32 alphabet One property with this alphabet that the base64 and base32 alphabet lack is that encoded data maintain its sort order when the encoded lack is that encoded data maintain its sort order when the encoded data is compared bit wise data is compared bit wise This encoding is identical to the previous one except for the This encoding is identical to the previous one except for the alphabet The new alphabet is found in table 4 alphabet The new alphabet is found in table 4 Table 4 The Extended Hex Base 32 Alphabet Table 4 The Extended Hex Base 32 Alphabet skipping to change at page 10 line 19 skipping to change at page 11 line 19 insensitive hex encoding and may be referred to as base16 or insensitive hex encoding and may be referred to as base16 or hex hex A 16 character subset of US ASCII is used enabling 4 bits to be A 16 character subset of US ASCII is used enabling 4 bits to be represented per printable character represented per printable character The encoding process represents 8 bit groups octets of input bits The encoding process represents 8 bit groups octets of input bits as output strings of 2 encoded characters Proceeding from left to as output strings of 2 encoded characters Proceeding from left to right a 8 bit input is taken from the input data These 8 bits are right a 8 bit input is taken from the input data These 8 bits are then treated as 2 concatenated 4 bit groups each of which is then treated as 2 concatenated 4 bit groups each of which is translated into a single digit in the base 16 alphabet translated into a single character in the base 16 alphabet Each 4 bit group is used as an index into an array of 16 printable Each 4 bit group is used as an index into an array of 16 printable characters The character referenced by the index is placed in the characters The character referenced by the index is placed in the output string output string Table 5 The Base 16 Alphabet Table 5 The Base 16 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding 0 0 4 4 8 8 12 C 0 0 4 4 8 8 12 C 1 1 5 5 9 9 13 D 1 1 5 5 9 9 13 D 2 2 6 6 10 A 14 E 2 2 6 6 10 A 14 E 3 3 7 7 11 B 15 F 3 3 7 7 11 B 15 F Unlike base 32 and base 64 no special padding is necessary since a Unlike base 32 and base 64 no special padding is necessary since a full code word is always available full code word is always available 9 Illustrations and e xamples 9 Illustrations And E xamples To translate between binary and a base encoding the input is stored To translate between binary and a base encoding the input is stored in a structure and the output is extracted The case for base 64 is in a structure and the output is extracted The case for base 64 is displayed in the following figure borrowed from 5 displayed in the following figure borrowed from 5 first octet second octet third octet first octet second octet third octet 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1

Original URL path: http://www.josefsson.org/base-encoding/draft-josefsson-rfc3548bis-03-from-2.diff.html (2016-04-30)

Open archived version from archive - Diff: draft-josefsson-rfc3548bis-03.txt - draft-josefsson-rfc3548bis.txt

first symbol are used but only the first two bits of the next symbol are used These pad bits MUST be set to zero by conforming encoders which is described in the descriptions on padding below If this property do not hold there is no canonical representation of base encoded data and multiple base encoded strings can be decoded to the same binary data If this property and others discussed in this document holds a canonical encoding is guaranteed In some environments the alteration is critical and therefor decoders MAY chose to reject an encoding if the pad bits have not been set to zero The specification referring to this may mandate a specific behaviour 4 Base 64 Encoding 4 Base 64 Encoding The following description of base 64 is derived from 3 4 5 The following description of base 64 is derived from RFC1421 and 6 This encoding may be referred to as base64 RFC2045 RFC2440 and RFC2535 This encoding may be referred to as base64 The Base 64 encoding is designed to represent arbitrary sequences of The Base 64 encoding is designed to represent arbitrary sequences of octets in a form that allows the use of both upper and lowercase octets in a form that allows the use of both upper and lowercase letters but need not be humanly readable letters but need not be humanly readable A 65 character subset of US ASCII is used enabling 6 bits to be A 65 character subset of US ASCII is used enabling 6 bits to be represented per printable character The extra 65th character represented per printable character The extra 65th character is used to signify a special processing function is used to signify a special processing function The encoding process represents 24 bit groups of input bits as output The encoding process represents 24 bit groups of input bits as output strings of 4 encoded characters Proceeding from left to right a strings of 4 encoded characters Proceeding from left to right a 24 bit input group is formed by concatenating 3 8 bit input groups 24 bit input group is formed by concatenating 3 8 bit input groups These 24 bits are then treated as 4 concatenated 6 bit groups each These 24 bits are then treated as 4 concatenated 6 bit groups each of which is translated into a single character in the base 64 of which is translated into a single character in the base 64 alphabet alphabet Each 6 bit group is used as an index into an array of 64 printable Each 6 bit group is used as an index into an array of 64 printable characters The character referenced by the index is placed in the characters The character referenced by the index is placed in the output string output string skipping to change at page 7 line 42 skipping to change at page 8 line 4 all base 64 input is an integral number of octets only the following all base 64 input is an integral number of octets only the following cases can arise cases can arise 1 the final quantum of encoding input is an integral multiple of 24 1 the final quantum of encoding input is an integral multiple of 24 bits here the final unit of encoded output will be an integral bits here the final unit of encoded output will be an integral multiple of 4 characters with no padding multiple of 4 characters with no padding 2 the final quantum of encoding input is exactly 8 bits here the 2 the final quantum of encoding input is exactly 8 bits here the final unit of encoded output will be two characters followed by two final unit of encoded output will be two characters followed by two padding characters or padding characters or 3 the final quantum of encoding input is exactly 16 bits here the 3 the final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be three characters followed by one final unit of encoded output will be three characters followed by one padding character padding character 5 Base 64 Encoding With URL And Filename Safe Alphabet 5 Base 64 Encoding With URL And Filename Safe Alphabet The Base 64 encoding with an URL and filename safe alphabet has been The Base 64 encoding with an URL and filename safe alphabet has been used in 11 used in mojonation An alternative alphabet has been suggested that used as the 63rd An alternative alphabet has been suggested that used as the 63rd character Since the character has special meaning in some file character Since the character has special meaning in some file system environments the encoding described in this section is system environments the encoding described in this section is recommended instead recommended instead The remaining unreserved URI character is but some file system environments does not permit multiple in a filename thus making the character unattractive as well The pad character is typically percent encoded when used in an URI RFC3986 but if the data length is known implicitly this can be avoided by skipping the padding see section 3 2 This encoding may be referred to as base64url This encoding This encoding may be referred to as base64url This encoding should not be regarded as the same as the base64 encoding and should not be regarded as the same as the base64 encoding and should not be referred to as only base64 Unless made clear should not be referred to as only base64 Unless made clear base64 refer to the base 64 in the previous section base64 refer to the base 64 in the previous section This encoding is technically identical to the previous one except This encoding is technically identical to the previous one except for the 62 nd and 63 rd alphabet character as indicated in table 2 for the 62 nd and 63 rd alphabet character as indicated in table 2 Table 2 The URL and Filename safe Base 64 Alphabet Table 2 The URL and Filename safe Base 64 Alphabet skipping to change at page 8 line 46 skipping to change at page 10 line 7 10 K 27 b 44 s 61 9 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 minus 11 L 28 c 45 t 62 minus 12 M 29 d 46 u 63 12 M 29 d 46 u 63 13 N 30 e 47 v underline 13 N 30 e 47 v underline 14 O 31 f 48 w 14 O 31 f 48 w 15 P 32 g 49 x 15 P 32 g 49 x 16 Q 33 h 50 y pad 16 Q 33 h 50 y pad 6 Base 32 Encoding 6 Base 32 Encoding The following description of base 32 is derived from 10 with The following description of base 32 is derived from corrections This encoding may be referred to as base32 draft ietf cat sasl gssapi 01 with corrections This encoding may be referred to as base32 The Base 32 encoding is designed to represent arbitrary sequences of The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but need not be octets in a form that needs to be case insensitive but need not be humanly readable humanly readable A 33 character subset of US ASCII is used enabling 5 bits to be A 33 character subset of US ASCII is used enabling 5 bits to be represented per printable character The extra 33rd character represented per printable character The extra 33rd character is used to signify a special processing function is used to signify a special processing function The encoding process represents 40 bit groups of input bits as output The encoding process represents 40 bit groups of input bits as output skipping to change at page 10 line 20 skipping to change at page 11 line 30 4 the final quantum of encoding input is exactly 24 bits here the 4 the final quantum of encoding input is exactly 24 bits here the final unit of encoded output will be five characters followed by final unit of encoded output will be five characters followed by three padding characters or three padding characters or 5 the final quantum of encoding input is exactly 32 bits here the 5 the final quantum of encoding input is exactly 32 bits here the final unit of encoded output will be seven characters followed by one final unit of encoded output will be seven characters followed by one padding character padding character 7 Base 32 Encoding With Extended Hex Alphabet 7 Base 32 Encoding With Extended Hex Alphabet The following description of base 32 is derived from 7 This The following description of base 32 is derived from RFC2938 This encoding may be referred to as base32hex This encoding should not encoding may be referred to as base32hex This encoding should not be regarded as the same as the base32 encoding and should not be be regarded as the same as the base32 encoding and should not be referred to as only base32 This encoding is used by e g NSEC3 referred to as only base32 This encoding is used by e g NSEC3 9 I D ietf dnsext nsec3 One property with this alphabet that the base64 and base32 alphabet One property with this alphabet that the base64 and base32 alphabet lack is that encoded data maintain its sort order when the encoded lack is that encoded data maintain its sort order when the encoded data is compared bit wise data is compared bit wise This encoding is identical to the previous one except for the This encoding is identical to the previous one except for the alphabet The new alphabet is found in table 4 alphabet The new alphabet is found in table 4 Table 4 The Extended Hex Base 32 Alphabet Table 4 The Extended Hex Base 32 Alphabet skipping to change at page 12 line 9 skipping to change at page 14 line 9 2 2 6 6 10 A 14 E 2 2 6 6 10 A 14 E 3 3 7 7 11 B 15 F 3 3 7 7 11 B 15 F Unlike base 32 and base 64 no special padding is necessary since a Unlike base 32 and base 64 no special padding is necessary since a full code word is always available full code word is always available 9 Illustrations And Examples 9 Illustrations And Examples To translate between binary and a base encoding the input is stored To translate between binary and a base encoding the input is stored in a structure and the output is extracted The case for base 64 is in a structure and the output is extracted The case for base 64 is displayed in the following figure borrowed from 5 displayed in the following figure borrowed from RFC2440 first octet second octet third octet first octet second octet third octet 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 1 index 2 index 3 index 4 index 1 index 2 index 3 index 4 index The case for base 32 is shown in the following figure borrowed from The case for base 32 is shown in the following figure borrowed from 7 Each successive character in a base 32 value represents 5 RFC2938 Each successive character in a base 32 value represents 5 successive bits of the underlying octet sequence Thus each group successive bits of the underlying octet sequence Thus each group of 8 characters represents a sequence of 5 octets 40 bits of 8 characters represents a sequence of 5 octets 40 bits 1 2 3 1 2 3 01234567 89012345 67890123 45678901 23456789 01234567 89012345 67890123 45678901 23456789 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 8th character 8th character 7th character 7th character 6th character 6th character 5th character 5th character 4th character 4th character 3rd character 3rd character 2nd character 2nd character 1st character 1st character The following example of Base64 data is from 5 with corrections The following example of Base64 data is from RFC2440 with corrections Input data 0x14fb9c03d97e Input data 0x14fb9c03d97e Hex 1 4 f b 9 c 0 3 d 9 7 e Hex 1 4 f b 9 c 0 3 d 9 7 e 8 bit 00010100 11111011 10011100 00000011 11011001 01111110 8 bit 00010100 11111011 10011100 00000011 11011001 01111110 6 bit 000101 001111 101110 011100 000000 111101 100101 111110 6 bit 000101 001111 101110 011100 000000 111101 100101 111110 Decimal 5 15 46 28 0 61 37 62 Decimal 5 15 46 28 0 61 37 62 Output F P u c A 9 l Output F P u c A 9 l Input data 0x14fb9c03d9 Input data 0x14fb9c03d9 Hex 1 4 f b 9 c 0 3 d 9 Hex 1 4 f b 9 c 0 3 d 9 skipping to change at page 15 line 20 skipping to change at page 17 line 20 General Public License as published by the Free Software General Public License as published by the Free Software Foundation either version 2 1 or at your option any Foundation either version 2 1 or at your option any later version later version This program is distributed in the hope that it will be This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU Lesser General Public PARTICULAR PURPOSE See the GNU Lesser General Public License for more details License for more details You should have received a copy of the GNU Lesser General You can retrieve a copy of the GNU Lesser General Public Public License along with this program if not write to License from http www gnu org licenses lgpl txt or by the Free Software Foundation Inc 51 Franklin Street writing to the Free Software Foundation Inc 51 Fifth Floor Boston MA 02110 1301 USA Franklin Street Fifth Floor Boston MA 02110 1301 USA ifndef BASE64 H ifndef BASE64 H define BASE64 H define BASE64 H Get size t Get size t include stddef h include stddef h Get bool Get bool include stdbool h include stdbool h skipping to change at page 16 line 29 skipping to change at page 18 line 29 General Public License as published by the Free Software General Public License as published by the Free Software Foundation either version 2 1 or at your option any Foundation either version 2 1 or at your option any later version later version This program is distributed in the hope that it will be This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU Lesser General Public PARTICULAR PURPOSE See the GNU Lesser General Public License for more details License for more details You should have received a copy of the GNU Lesser General You can retrieve a copy of the GNU Lesser General Public Public License along with this program if not write to License from http www gnu org licenses lgpl txt or by the Free Software Foundation Inc 51 Franklin Street writing to the Free Software Foundation Inc 51 Fifth Floor Boston MA 02110 1301 USA Franklin Street Fifth Floor Boston MA 02110 1301 USA Written by Simon Josefsson Partially adapted from GNU Written by Simon Josefsson Partially adapted from GNU MailUtils mailbox filter trans c as of 2004 11 28 MailUtils mailbox filter trans c as of 2004 11 28 Improved by review from Paul Eggert Bruno Haible and Improved by review from Paul Eggert Bruno Haible and Stepan Kasal Stepan Kasal Be careful with error checking Here is how you would Be careful with error checking Here is how you would typically use these functions typically use these functions bool ok base64 decode alloc in inlen out outlen bool ok base64 decode alloc in inlen out outlen skipping to change at page 25 line 16 skipping to change at page 27 line 16 12 Security Considerations 12 Security Considerations When implementing Base encoding and decoding care should be taken When implementing Base encoding and decoding care should be taken not to introduce vulnerabilities to buffer overflow attacks or other not to introduce vulnerabilities to buffer overflow attacks or other attacks on the implementation A decoder should not break on invalid attacks on the implementation A decoder should not break on invalid input including e g embedded NUL characters ASCII 0 input including e g embedded NUL characters ASCII 0 If non alphabet characters are ignored instead of causing rejection If non alphabet characters are ignored instead of causing rejection of the entire encoding as recommended a covert channel that can be of the entire encoding as recommended a covert channel that can be used to leak information is made possible The implications of used to leak information is made possible The ignored characters this should

Original URL path: http://www.josefsson.org/base-encoding/draft-josefsson-rfc3548bis-from--03.diff.html (2016-04-30)

Open archived version from archive - Diff: draft-josefsson-rfc3548bis-04.txt - rfc4648.txt

alphanumeric characters in particular may be problematic in file names and URLs particular may be problematic in file names and URLs o Used as identifiers Certain characters notably and in o Used as identifiers Certain characters notably and in the base 64 alphabet are treated as word breaks by legacy text the base 64 alphabet are treated as word breaks by legacy text search index tools search index tools There is no universally accepted alphabet that fulfills all the There is no universally accepted alphabet that fulfills all the requirements For an example of a highly specialized variant see requirements For an example of a highly specialized variant see IMAP 8 In this document we document and name some currently used IMAP 8 In this document we document and name some currently used alphabets alphabets 3 5 Canonical Encoding 3 5 Canonical Encoding The padding step in base 64 and base 32 encoding can if improperly The padding step in base 64 and base 32 encoding can if improperly implemented lead to non significant alterations of the encoded data implemented lead to non significant alterations of the encoded data For example if the input is only one octet for a base 64 encoding For example if the input is only one octet for a base 64 encoding then all six bits of the first symbol are used but only the first then all six bits of the first symbol are used but only the first two bits of the next symbol are used These pad bits MUST be set to two bits of the next symbol are used These pad bits MUST be set to zero by conforming encoders which is described in the descriptions zero by conforming encoders which is described in the descriptions on padding below If this property do not hold there is no on padding below If this property do not hold there is no canonical representation of base encoded data and multiple base canonical representation of base encoded data and multiple base encoded strings can be decoded to the same binary data If this encoded strings can be decoded to the same binary data If this property and others discussed in this document holds a canonical property and others discussed in this document holds a canonical encoding is guaranteed encoding is guaranteed In some environments the alteration is critical and therefor In some environments the alteration is critical and therefor e decoders MAY chose to reject an encoding if the pad bits have not decoders MAY chose to reject an encoding if the pad bits have not been set to zero The specification referring to this may mandate a been set to zero The specification referring to this may mandate a specific behaviour specific behaviour 4 Base 64 Encoding 4 Base 64 Encoding The following description of base 64 is derived from 3 4 5 The following description of base 64 is derived from 3 4 5 and 6 This encoding may be referred to as base64 and 6 This encoding may be referred to as base64 The Base 64 encoding is designed to represent arbitrary sequences of The Base 64 encoding is designed to represent arbitrary sequences of octets in a form that allows the use of both upper and lowercase octets in a form that allows the use of both upper and lowercase letters but need not be humanly readable letters but that need not be human readable A 65 character subset of US ASCII is used enabling 6 bits to be A 65 character subset of US ASCII is used enabling 6 bits to be represented per printable character The extra 65th character represented per printable character The extra 65th character is used to signify a special processing function is used to signify a special processing function The encoding process represents 24 bit groups of input bits as output The encoding process represents 24 bit groups of input bits as output strings of 4 encoded characters Proceeding from left to right a strings of 4 encoded characters Proceeding from left to right a 24 bit input group is formed by concatenating 3 8 bit input groups 24 bit input group is formed by concatenating 3 8 bit input groups These 24 bits are then treated as 4 concatenated 6 bit groups each These 24 bits are then treated as 4 concatenated 6 bit groups each of which is translated into a single character in the base 64 of which is translated into a single character in the base 64 skipping to change at page 7 line 44 skipping to change at page 6 line 50 Special processing is performed if fewer than 24 bits are available Special processing is performed if fewer than 24 bits are available at the end of the data being encoded A full encoding quantum is at the end of the data being encoded A full encoding quantum is always completed at the end of a quantity When fewer than 24 input always completed at the end of a quantity When fewer than 24 input bits are available in an input group bits with value zero are added bits are available in an input group bits with value zero are added on the right to form an integral number of 6 bit groups Padding on the right to form an integral number of 6 bit groups Padding at the end of the data is performed using the character Since at the end of the data is performed using the character Since all base 64 input is an integral number of octets only the following all base 64 input is an integral number of octets only the following cases can arise cases can arise 1 t he final quantum of encoding input is an integral multiple of 24 1 T he final quantum of encoding input is an integral multiple of 24 bits here the final unit of encoded output will be an integral bits here the final unit of encoded output will be an integral multiple of 4 characters with no padding multiple of 4 characters with no padding 2 the final quantum of encoding input is exactly 8 bits here the 2 The final quantum of encoding input is exactly 8 bits here the final unit of encoded output will be two characters followed by two final unit of encoded output will be two characters followed by padding characters or two padding characters 3 the final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be three characters followed by one padding character 5 Base 64 Encoding With URL And Filename Safe Alphabet 3 The final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be three characters followed by one padding character 5 Base 64 Encoding with URL and Filename Safe Alphabet The Base 64 encoding with an URL and filename safe alphabet has been The Base 64 encoding with an URL and filename safe alphabet has been used in 12 used in 12 An alternative alphabet has been suggested that used as the 63rd An alternative alphabet has been suggested that would use as the character Since the character has special meaning in some file 63rd character Since the character has special meaning in some system environments the encoding described in this section is file system environments the encoding described in this section is recommended instead The remaining unreserved URI character is recommended instead The remaining unreserved URI character is but some file system environments do es not permit multiple in a but some file system environments do not permit multiple in a filename thus making the character unattractive as well filename thus making the character unattractive as well The pad character is typically percent encoded when used in an The pad character is typically percent encoded when used in an URI 9 but if the data length is known implicitly this can be URI 9 but if the data length is known implicitly this can be avoided by skipping the padding see section 3 2 avoided by skipping the padding see section 3 2 This encoding may be referred to as base64url This encoding This encoding may be referred to as base64url This encoding should not be regarded as the same as the base64 encoding and should not be regarded as the same as the base64 encoding and should not be referred to as only base64 Unless made clear should not be referred to as only base64 Unless clarified base64 refer to the base 64 in the previous section otherwise base64 refers to the base 64 in the previous section This encoding is technically identical to the previous one except This encoding is technically identical to the previous one except for the 62 nd and 63 rd alphabet character as indicated in t able 2 for the 62 nd and 63 rd alphabet character as indicated in T able 2 Table 2 The URL and Filename safe Base 64 Alphabet Table 2 The URL and Filename safe Base 64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 5 F 22 W 39 n 56 4 skipping to change at page 10 line 11 skipping to change at page 8 line 32 14 O 31 f 48 w 14 O 31 f 48 w 15 P 32 g 49 x 15 P 32 g 49 x 16 Q 33 h 50 y pad 16 Q 33 h 50 y pad 6 Base 32 Encoding 6 Base 32 Encoding The following description of base 32 is derived from 11 with The following description of base 32 is derived from 11 with corrections This encoding may be referred to as base32 corrections This encoding may be referred to as base32 The Base 32 encoding is designed to represent arbitrary sequences of The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but need not be octets in a form that needs to be case insensitive but that need not humanly readable be human readable A 33 character subset of US ASCII is used enabling 5 bits to be A 33 character subset of US ASCII is used enabling 5 bits to be represented per printable character The extra 33rd character represented per printable character The extra 33rd character is used to signify a special processing function is used to signify a special processing function The encoding process represents 40 bit groups of input bits as output The encoding process represents 40 bit groups of input bits as output strings of 8 encoded characters Proceeding from left to right a strings of 8 encoded characters Proceeding from left to right a 40 bit input group is formed by concatenating 5 8bit input groups 40 bit input group is formed by concatenating 5 8bit input groups These 40 bits are then treated as 8 concatenated 5 bit groups each These 40 bits are then treated as 8 concatenated 5 bit groups each of which is translated into a single character in the base 32 of which is translated into a single character in the base 32 alphabet When encoding a bit stream via the base 32 encoding the alphabet When a bit stream is encoded via the base 32 encoding the bit stream must be presumed to be ordered with the most significant bit stream must be presumed to be ordered with the most significant bit first That is the first bit in the stream will be the high bit first That is the first bit in the stream will be the high order bit in the first 8bit byte and the eighth bit will be the low order bit in the first 8bit byte the eighth bit will be the low order bit in the first 8bit byte and so on order bit in the first 8bit byte and so on Each 5 bit group is used as an index into an array of 32 printable Each 5 bit group is used as an index into an array of 32 printable characters The character referenced by the index is placed in the characters The character referenced by the index is placed in the output string These characters identified in Table 3 below are output string These characters identified in Table 3 below are selected from US ASCII digits and uppercase letters selected from US ASCII digits and uppercase letters Table 3 The Base 32 Alphabet Table 3 The Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding skipping to change at page 11 line 7 skipping to change at page 9 line 32 Special processing is performed if fewer than 40 bits are available Special processing is performed if fewer than 40 bits are available at the end of the data being encoded A full encoding quantum is at the end of the data being encoded A full encoding quantum is always completed at the end of a body When fewer than 40 input bits always completed at the end of a body When fewer than 40 input bits are available in an input group bits with value zero are added on are available in an input group bits with value zero are added on the right to form an integral number of 5 bit groups Padding at the right to form an integral number of 5 bit groups Padding at the end of the data is performed using the character Since all the end of the data is performed using the character Since all base 32 input is an integral number of octets only the following base 32 input is an integral number of octets only the following cases can arise cases can arise 1 t he final quantum of encoding input is an integral multiple of 40 1 T he final quantum of encoding input is an integral multiple of 40 bits here the final unit of encoded output will be an integral bits here the final unit of encoded output will be an integral multiple of 8 characters with no padding multiple of 8 characters with no padding 2 the final quantum of encoding input is exactly 8 bits here the 2 The final quantum of encoding input is exactly 8 bits here the final unit of encoded output will be two characters followed by six final unit of encoded output will be two characters followed by padding characters six padding characters 3 the final quantum of encoding input is exactly 16 bits here the 3 The final quantum of encoding input is exactly 16 bits here the final unit of encoded output will be four characters followed by four final unit of encoded output will be four characters followed by padding characters four padding characters 4 t he final quantum of encoding input is exactly 24 bits here the 4 T he final quantum of encoding input is exactly 24 bits here the final unit of encoded output will be five characters followed by final unit of encoded output will be five characters followed by three padding characters or three padding characters 5 the final quantum of encoding input is exactly 32 bits here the 5 The final quantum of encoding input is exactly 32 bits here the final unit of encoded output will be seven characters followed by one final unit of encoded output will be seven characters followed by padding character one padding character 7 Base 32 Encoding W ith Extended Hex Alphabet 7 Base 32 Encoding w ith Extended Hex Alphabet The following description of base 32 is derived from 7 This The following description of base 32 is derived from 7 This encoding may be referred to as base32hex This encoding should not encoding may be referred to as base32hex This encoding should not be regarded as the same as the base32 encoding and should not be be regarded as the same as the base32 encoding and should not be referred to as only base32 This encoding is used by e g NSEC3 referred to as only base32 This encoding is used by e g 10 NextSECure3 NSEC3 10 One property with this alphabet that the base64 and base32 alphabet One property with this alphabet which the base64 and base32 lack is that encoded data maintain its sort order when the encoded alphabets lack is that encoded data maintains its sort order when data is compared bit wise the encoded data is compared bit wise This encoding is identical to the previous one except for the This encoding is identical to the previous one except for the alphabet The new alphabet is found in t able 4 alphabet The new alphabet is found in T able 4 Table 4 The Extended Hex Base 32 Alphabet Table 4 The Extended Hex Base 32 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding 0 0 9 9 18 I 27 R 0 0 9 9 18 I 27 R 1 1 10 A 19 J 28 S 1 1 10 A 19 J 28 S 2 2 11 B 20 K 29 T 2 2 11 B 20 K 29 T 3 3 12 C 21 L 30 U 3 3 12 C 21 L 30 U 4 4 13 D 22 M 31 V 4 4 13 D 22 M 31 V 5 5 14 E 23 N 5 5 14 E 23 N 6 6 15 F 24 O pad 6 6 15 F 24 O pad 7 7 16 G 25 P 7 7 16 G 25 P 8 8 17 H 26 Q 8 8 17 H 26 Q 8 Base 16 Encoding 8 Base 16 Encoding The following description is original but analogous to previous The following description is original but analogous to previous descriptions Essentially Base 16 encoding is the standard case descriptions Essentially Base 16 encoding is the standard case insensitive hex encoding and may be referred to as base16 or insensitive hex encoding and may be referred to as base16 or hex hex A 16 character subset of US ASCII is used enabling 4 bits to be A 16 character subset of US ASCII is used enabling 4 bits to be represented per printable character represented per printable character The encoding process represents 8 bit groups octets of input bits The encoding process represents 8 bit groups octets of input bits as output strings of 2 encoded characters Proceeding from left to as output strings of 2 encoded characters Proceeding from left to right a 8 bit input is taken from the input data These 8 bits are right a n 8 bit input is taken from the input data These 8 bits are then treated as 2 concatenated 4 bit groups each of which is then treated as 2 concatenated 4 bit groups each of which is translated into a single character in the base 16 alphabet translated into a single character in the base 16 alphabet Each 4 bit group is used as an index into an array of 16 printable Each 4 bit group is used as an index into an array of 16 printable characters The character referenced by the index is placed in the characters The character referenced by the index is placed in the output string output string Table 5 The Base 16 Alphabet Table 5 The Base 16 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding Value Encoding 0 0 4 4 8 8 12 C 0 0 4 4 8 8 12 C 1 1 5 5 9 9 13 D 1 1 5 5 9 9 13 D 2 2 6 6 10 A 14 E 2 2 6 6 10 A 14 E 3 3 7 7 11 B 15 F 3 3 7 7 11 B 15 F Unlike base 32 and base 64 no special padding is necessary since a Unlike base 32 and base 64 no special padding is necessary since a full code word is always available full code word is always available 9 Illustrations A nd Examples 9 Illustrations a nd Examples To translate between binary and a base encoding the input is stored To translate between binary and a base encoding the input is stored in a structure and the output is extracted The case for base 64 is in a structure and the output is extracted The case for base 64 is displayed in the following figure borrowed from 5 displayed in the following figure borrowed from 5 first octet second octet third octet first octet second octet third octet 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 1 index 2 index 3 index 4 index 1 index 2 index 3 index 4 index The case for base 32 is shown in the following figure borrowed from The case for base 32 is shown in the following figure borrowed from 7 Each successive character in a base 32 value represents 5 7 Each successive character in a base 32 value represents 5 skipping to change at page 16 line 40 skipping to change at page 14 line 5 BASE16 fo 666F BASE16 fo 666F BASE16 foo 666F6F BASE16 foo 666F6F BASE16 foob 666F6F62 BASE16 foob 666F6F62 BASE16 fooba 666F6F6261 BASE16 fooba 666F6F6261 BASE16 foobar 666F6F626172 BASE16 foobar 666F6F626172 11 ISO C99 Implementation Of Base64 11 ISO C99 Implementation of Base64 Below is an ISO C99 implementation of Base64 encoding and decoding The code assume that the US ASCII characters are encoding inside char with values below 255 which holds for all POSIX platforms but should otherwise be portable This code is not intended as a normative specification of base64 11 1 Prototypes base64 h base64 h Encode binary data using printable characters Copyright C 2004 2005 2006 Free Software Foundation Inc Written by Simon Josefsson This program is free software you can redistribute it and or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation either version 2 1 or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU Lesser General Public License for more details You can retrieve a copy of the GNU Lesser General Public License from http www gnu org licenses lgpl txt or by writing to the Free Software Foundation Inc 51 Franklin Street Fifth Floor Boston MA 02110 1301 USA ifndef BASE64 H define BASE64 H Get size t include stddef h Get bool include stdbool h This uses that the expression n k 1 k means the smallest integer n k i e the ceiling of n k define BASE64 LENGTH inlen inlen 2 3 4 extern bool isbase64 char ch extern void base64 encode const char restrict in size t inlen char restrict out size t outlen extern size t base64 encode alloc const char in size t inlen char out extern bool base64 decode const char restrict in size t inlen char restrict out size t outlen extern bool base64 decode alloc const char in size t inlen char out size t outlen endif BASE64 H 11 2 Implementation base64 c base64 c Encode binary data using printable characters Copyright C 1999 2000 2001 2004 2005 2006 Free Software Foundation Inc This program is free software you can redistribute it and or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation either version 2 1 or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU Lesser General Public License for more details You can retrieve a copy of the GNU Lesser General Public License from http www gnu org licenses lgpl txt or by writing to the Free Software Foundation Inc 51 Franklin Street Fifth Floor Boston MA 02110 1301 USA Written by Simon Josefsson Partially adapted from GNU MailUtils mailbox filter trans c as of 2004 11 28 Improved by review from Paul Eggert Bruno Haible and Stepan Kasal Be careful with error checking Here is how you would typically use these functions bool ok base64 decode alloc in inlen out outlen if ok FAIL input was not valid base64 if out NULL FAIL memory allocation error OK data in OUT OUTLEN size t outlen base64 encode alloc in inlen out if out NULL outlen 0 inlen 0 FAIL input too long if out NULL FAIL memory allocation error OK data in OUT OUTLEN Get prototype include base64 h Get malloc include stdlib h Get UCHAR MAX include limits h C89 compliant way to cast char to unsigned char static inline unsigned char to uchar char ch return ch Base64 encode IN array of size INLEN into OUT array of size OUTLEN If OUTLEN is less than BASE64 LENGTH INLEN write as many bytes as possible If OUTLEN is larger than BASE64 LENGTH INLEN also zero terminate the output buffer void base64 encode const char restrict in size t inlen char restrict out size t outlen static const char b64str 64 ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz0123456789 while inlen outlen out b64str to uchar in 0 2 if outlen break out b64str to uchar in 0 4 inlen to uchar in 1 4 0 0x3f if outlen break out inlen b64str to uchar in 1 2 inlen to uchar in 2 6 0 0x3f if outlen break out inlen b64str to uchar in 2 0x3f if outlen break if inlen inlen if inlen in 3 if outlen out 0 Allocate a buffer and store zero terminated base64 encoded data from array IN of size INLEN returning BASE64 LENGTH INLEN i e the length of the encoded data excluding the terminating zero On return the OUT variable will hold a pointer to newly allocated memory that must be deallocated by the caller If output string length would overflow 0 is returned and OUT is set to NULL If memory allocation fail OUT is set to NULL and the return value indicate length of the requested memory block i e BASE64 LENGTH inlen 1 size t base64 encode alloc const char in size t inlen char out size t outlen 1 BASE64 LENGTH inlen Check for overflow in outlen computation If there is no overflow outlen inlen If the operation inlen 2 overflows then it yields at most 1 so outlen is 0 If the multiplication overflows we lose at least half of the correct value so the result is inlen 2 3 2 which is less than inlen 2 0 66667 which is less than inlen as soon as inlen 4 if inlen outlen out NULL return 0 out malloc outlen if out base64 encode in inlen out outlen return outlen 1 With this approach this file works independent of the charset used think EBCDIC However it does assume that the characters in the Base64 alphabet A Za z0 9 are encoded in 0 255 POSIX 1003 1 2001 require that char and unsigned char are 8 bit quantities though taking care of that problem But this may be a potential problem on non POSIX C99 platforms define B64 x x A 0 x B 1 x C 2 x D 3 x E 4 x F 5 x G 6 x H 7 x I 8 x J 9 x K 10 x L 11 x M 12 x N 13 x O 14 x P 15 x Q 16 x R 17 x S 18 x T 19 x U 20 x V 21 x W 22 x X 23 x Y 24 x Z 25 x a 26 x b 27 x c 28 x d 29 x e 30 x f 31 x g 32 x h 33 x i 34 x j 35 x k 36 x l 37 x m 38 x n 39 x o 40 x p 41 x q 42 x r 43 x s 44 x t 45 x u 46 x v 47 x w 48 x x 49 x y 50 x z 51 x 0 52 x 1 53 x 2 54 x 3 55 x 4 56 x 5 57 x 6 58 x 7 59 x 8 60 x 9 61 x 62 x 63 1 static const signed char b64 0x100 B64 0 B64 1 B64 2 B64 3 B64 4 B64 5 B64 6 B64 7 B64 8 B64 9 B64 10 B64 11 B64 12 B64 13 B64 14 B64 15 B64 16 B64 17 B64 18 B64 19 B64 20 B64 21 B64 22 B64 23 B64 24 B64 25 B64 26 B64 27 B64 28 B64 29 B64 30 B64 31 B64 32 B64 33 B64 34 B64 35 B64 36 B64 37 B64 38 B64 39 B64 40 B64 41 B64 42 B64 43 B64 44 B64 45 B64 46 B64 47 B64 48 B64 49 B64 50 B64 51 B64 52 B64 53 B64 54 B64 55 B64 56 B64 57 B64 58 B64 59 B64 60 B64 61 B64 62 B64 63 B64 64 B64 65 B64 66 B64 67 B64 68 B64 69 B64 70 B64 71 B64 72 B64 73 B64 74 B64 75 B64 76 B64 77 B64 78 B64 79 B64 80 B64 81 B64 82 B64 83 B64 84 B64 85 B64 86 B64 87 B64 88 B64 89 B64 90 B64 91 B64 92 B64 93 B64 94 B64 95 B64 96 B64 97 B64 98 B64 99 B64 100 B64 101 B64 102 B64 103 B64 104 B64 105 B64 106 B64 107 B64 108 B64 109 B64 110 B64 111 B64 112 B64 113 B64 114 B64 115 B64 116 B64 117 B64 118 B64 119 B64 120 B64 121 B64 122 B64 123 B64 124 B64 125 B64 126 B64 127 B64 128 B64 129 B64 130 B64 131 B64 132 B64 133 B64 134 B64 135 B64 136 B64 137 B64 138 B64 139 B64 140 B64 141 B64 142 B64 143 B64 144 B64 145 B64 146 B64 147 B64 148 B64 149 B64 150 B64 151 B64 152 B64 153 B64 154 B64 155 B64 156 B64 157 B64 158 B64 159 B64 160 B64 161 B64 162 B64 163 B64 164 B64 165 B64 166 B64 167 B64 168 B64 169 B64 170 B64 171 B64 172 B64 173 B64 174 B64 175 B64 176 B64 177 B64 178 B64 179 B64 180 B64 181 B64 182 B64 183 B64 184 B64 185 B64 186 B64 187 B64 188 B64 189 B64 190 B64 191 B64 192 B64 193 B64 194 B64 195 B64 196 B64 197 B64 198 B64 199 B64 200 B64 201 B64 202 B64 203 B64 204 B64 205 B64 206 B64 207 B64 208 B64 209 B64 210 B64 211 B64 212 B64 213 B64 214 B64 215 B64 216 B64 217 B64 218 B64 219 B64 220 B64 221 B64 222 B64 223 B64 224 B64 225 B64 226 B64 227 B64 228 B64 229 B64 230 B64 231 B64 232 B64 233 B64 234 B64 235 B64 236 B64 237 B64 238 B64 239 B64 240 B64 241 B64 242 B64 243 B64 244 B64 245 B64 246 B64 247 B64 248 B64 249 B64 250 B64 251 B64 252 B64 253 B64 254 B64 255 if UCHAR MAX 255 define uchar in range c true else define uchar in range c c 255 endif bool isbase64 char ch return uchar in range to uchar ch 0 b64 to uchar ch Decode base64 encoded input array IN of length INLEN to output array OUT that can hold OUTLEN bytes Return true if decoding was successful i e if the input was valid base64 data false otherwise If OUTLEN is too small as many bytes as possible will be written to OUT On return OUTLEN holds the length of decoded bytes in OUT Note that as soon as any non alphabet characters are encountered decoding is stopped and false is returned This means that when applicable you must remove any line terminators that is part of the data stream before calling this function bool base64 decode const char restrict in size t inlen char restrict out size t outlen size t outleft outlen while inlen 2 if isbase64 in 0 isbase64 in 1 break if outleft out b64 to uchar in 0 2 b64 to uchar in 1 4 outleft if inlen 2 break if in 2 if inlen 4 break if in 3 break else if isbase64 in 2 break if outleft out b64 to uchar in 1 4 0xf0 b64 to uchar in 2 2 outleft if inlen 3 break if in 3 if inlen 4 break else if isbase64 in 3 break if outleft out b64 to uchar in 2 6 0xc0 b64 to uchar in 3 outleft in 4 inlen 4 outlen outleft if inlen 0 return false return true Allocate an output buffer in OUT and decode the base64 An ISO C99 implementation of Base64 encoding and decoding that is encoded data stored in IN of size INLEN to the OUT believed to follow all recommendations in this RFC is available from buffer On return the size of the decoded data is stored in OUTLEN OUTLEN may be NULL if the caller is not interested in the decoded length OUT may be NULL to indicate an out of memory error in which case OUTLEN contain the size of the memory block needed The function return true on successful decoding and memory allocation errors Use the OUT and OUTLEN parameters to differentiate between successful decoding and memory error The function return false if the input was invalid in which case OUT is NULL and OUTLEN is undefined bool base64 decode alloc const char in size t inlen char out size t outlen This may allocate a few bytes too much depending on input but it s not worth the extra CPU time to compute the exact amount The exact amount is 3 inlen 4 minus 1 if the input ends with and minus another 1 if the input ends with Dividing before multiplying avoids the possibility of overflow size t needlen 3 inlen 4 2 out malloc needlen http josefsson org base encoding if out return true if base64 decode in inlen out needlen

Original URL path: http://www.josefsson.org/base-encoding/rfc4648-from-draft-josefsson-rfc3548bis-04.diff.html (2016-04-30)

Open archived version from archive