tag:blogger.com,1999:blog-5887985082054147381.post4511914388014080373..comments2017-02-28T12:23:49.489-08:00Comments on Probably Done Before: What the Heck is Base64 Encoding really?Daniel Eklundhttp://www.blogger.com/profile/11570452431861145598noreply@blogger.comBlogger10125tag:blogger.com,1999:blog-5887985082054147381.post-6784558928096547372016-09-25T07:24:00.186-07:002016-09-25T07:24:00.186-07:00Did you know you can shorten your urls with AdFly ...Did you know you can shorten your urls with <b><a href="http://shortener.syntaxlinks.com/r/AdFly" rel="nofollow">AdFly</a></b> and <b>get $$$ from every visitor to</b> your shortened links.Bloggerhttp://www.blogger.com/profile/07287821785570247118noreply@blogger.comtag:blogger.com,1999:blog-5887985082054147381.post-44713769453011961132015-12-21T16:24:44.674-08:002015-12-21T16:24:44.674-08:00I find it shocking that all of these library autho...I find it shocking that all of these library authors would find it intuitive for hex (base16) and base encoders to have an API like encode([octets]) but for base32 a much more limiting encode(number) would make more sense.<br /><br />I guess this leaves us with fundamentally two different Crockford base32 standards, streaming and numeric? Maybe someone feels compelled to advertise a clarified spec with both of these as sub-codecs. Crockford fixing his web page would be nice but maybe enough harm has been done that a better way would be to accept that both are reasonable and already implemented viewpoints. We might be better off just documenting them both instead of figuring out which one is "correct".<br /><br />Thanks for the article! At least that cleared up why my work-in-progress (stream) implementation doesn't match the output of some other libraries for even a single low number/byte.Jakob Petsovitshttps://github.com/jpetsonoreply@blogger.comtag:blogger.com,1999:blog-5887985082054147381.post-48536888985989764992015-11-28T19:32:35.829-08:002015-11-28T19:32:35.829-08:00Great article.Great article.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5887985082054147381.post-82021114900502155882015-09-23T01:17:10.643-07:002015-09-23T01:17:10.643-07:00you can try this free online service to convert st...you can try this free online service to <a href="http://www.online-code.net/base64-string.html" rel="nofollow">convert string to base64 online</a>.<br />buyi wenhttp://www.blogger.com/profile/04998362585798811698noreply@blogger.comtag:blogger.com,1999:blog-5887985082054147381.post-7280489822961280512014-04-28T15:14:08.069-07:002014-04-28T15:14:08.069-07:00"You gave me the number: 11000010110000101100..."You gave me the number: 1100001011000010110000101100001<br />It is 31 digits long. Its decimal representation is 1633771873. I feel no ambiguity in understanding what number it represents -- and yet it is not a multiple of 5-bits. Why do I need to pad four extra zeros to the left (35 bits now), to make it a multiple of 5? Why is Crockford-32 telling me I should zero-pad?"<br /><br />Your process in obtaining 1633771873 was the following:<br /><br />representation in base 2 "1100001011000010110000101100001"<br />-><br />abstract mathematical integer<br />-><br />representation in base 10 "1633771873"<br /><br />You're using algebra (multiplications, divisions) to convert back and forth to the abstract mathematical integer. It's the only way to do it, because 10 and 2 are not powers of some common radix, that is, log 10 / log 2 is irrational.<br /><br />But, if you want to convert it to base 16 or 32, there is another way:<br />representation in base 2 "1100001011000010110000101100001"<br />-><br />padded representation in base 2 "00001 10000 10110 00010 11000 01011 00001" (the spaces are here just for clarity)<br />-><br />representation in base 32 "1GP2RB1"<br /><br />Here, each step is simple, local, and involves no algebra: the first step pads to a multiple of 5 bits, and the second step is just a lookup of the symbol corresponding to each group of 5 binary digits. The reason you need to pad to a multiple of 5 bits is simply that you couldn't do the lookup if you couldn't split the binary string into groups of exactly 5 bits!<br /><br />I can't read minds, but I think Crockford simply wrote the specification with the mental model of the second process. He just didn't write down the lookup step.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5887985082054147381.post-55014655581980496552014-04-28T12:16:51.042-07:002014-04-28T12:16:51.042-07:00Respectfully, I am not understanding you point her...Respectfully, I am not understanding you point here. <br /><br />It seems you are agreeing that padding to the left does not change the number (and very importantly here, we are talking about bit representations of integers in an unsigned schema, since nowhere have we opened the can of worms that is 1's-complement or 2's complement for negative numbers).<br /><br />I completely agree with your statement, "Padding the shortest binary representation of a number to the left [with zeros] introduces no ambiguities" as that was exactly my point, but I take it further and say, "zero padding to the left introduces no ambiguities, but there were no ambiguities to be introduced in the first place".<br /><br />You gave me the number: 1100001011000010110000101100001<br />It is 31 digits long. Its decimal representation is 1633771873. I feel no ambiguity in understanding what number it represents -- and yet it is not a multiple of 5-bits. Why do I need to pad four extra zeros to the left (35 bits now), to make it a multiple of 5? Why is Crockford-32 telling me I should zero-pad?<br /><br />As such, I am still left with this question about the Crockford 32 specification:<br />---------------------------------<br />If left-padding was implied, why does the sentence,<br /><br />“If the bit-length of the number to be encoded is not a multiple of 5 bits, then zero-extend the number to make its bit-length a multiple of 5” <br /><br />have to exist at all? What does it add to the discussion regarding implementation? <br />---------------------------------<br /><br />If it adds nothing under a left-zero-padding assumption, then it is irrelevant and should be removed. Would you agree with that? <br /><br />But since it is present, I take its very existence to mean something. Especially since in _most_ base encodings, the implementation of padding is meant to line up the N-bit-space with the 8-bit (octet) unit of data storage in all files, and is always right-padded. <br /><br />I believe that the very presence of this sentence is probitive of a "Concatenative Iterative Encoding" interpretation.<br /><br />I would prefer the specification be re-issued with these ambiguities removed. <br /><br />I see no reason to not have both: 1) a specification for encoding a single number, and 2) a specification for a data encoding that operates on octets and must be zero-padded to the right if the final bit count is not a multiple of 5 and 8 -- like regular base64, base32, or the Zooko-base32 encoding.<br /><br />Likewise, it would be neat if people acknowledged the differences when they were teaching base64 as ONLY "Concatenative Iterative Encoding". As an example of that is this blog post<br />http://code.tutsplus.com/tutorials/base-what-a-practical-introduction-to-base-encoding--net-27590<br />where the author starts with a "Place-Based Single Number Encoding" interpretation until he gets to base32 and base64 at which point he switches to a "Concatenative Iterative Encoding" interpretation.<br />Daniel Eklundhttp://www.blogger.com/profile/11570452431861145598noreply@blogger.comtag:blogger.com,1999:blog-5887985082054147381.post-56891730540100376852014-04-28T10:32:04.301-07:002014-04-28T10:32:04.301-07:00Sorry for the strong words, I saw it on Hacker New...Sorry for the strong words, I saw it on Hacker News where the title was "Why most Crockford32 Implementations are Wrong". The hubris of this statement made me overlook the more tentative wording of the post.<br /><br />"Left-based zero padding, DOES NOT CHANGE THE VALUE OF THE NMUBER at all."<br /><br />And it's exactly the reason why it's chosen! Padding the shortest binary representation of a number to the left introduces no ambiguities, since the first digit is always a 1 (for non-zero numbers), whereas padding to the right would make decoding ambiguous ("10000" could correspond to 1 or 10 or 100...).<br /><br />Of course padding is only necessary if you start from a binary representation of the number and want to transform it into a base 32 representation. You don't need it if you compute the base 32 representation from scratch:<br /><br />1100001011000010110000101100001 (base 2)<br />= 1GP2RB1 (base 32)<br />= 00001 10000 10110 00010 11000 01011 00001 (base 2)<br /><br />where you see that zero-extension to a multiple of 5 bits has happened naturally. I'm guessing that's why you find this mention of padding unnecessary: it's just that it was written with binary representation as a starting point.<br /><br />And that kind of padding is completely different from base64's padding with "=" symbols, which aims to let you concatenate different blocks of base64-encoded data. It doesn't make sense to concatenate numbers, that's why Crockford32 goes as far as using "=" for a completely different purpose. If it was an alternative to base64, it would have been wise to at least keep the option of base64-style padding open!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5887985082054147381.post-24548925280947218222014-04-28T07:05:48.510-07:002014-04-28T07:05:48.510-07:00A "horrible theory", like my use of the ...A "horrible theory", like my use of the word "incorrect" might be a bit harsh, but I do appreciate your passion. <br /><br />I don't advance my interpretation with 100% certainty, and I feel like I was honest enough to present a table showing evidence for both sides. The ambiguity was the frustrating part, not any belief that implementations were "incorrect". <br /><br />Your point about base10 is well-taken and does give me pause.<br /><br />However, I want to push back about the padding argument. Since neither left nor right padding was mentioned, the ambiguity remained. I reasoned thusly:<br /><br /> Why would you need to pad non 5-bit multiples to the left?<br /><br />This argument is key. The quote from the page is this: “If the bit-length of the number to be encoded is not a multiple of 5 bits, then zero-extend the number to make its bit-length a multiple of 5.”<br /><br />As an example, take a 32-bit (4-octet) number, like 01100001011000010110000101100001 (which in decimal, is 1633771873). This is NOT a multiple of 5-bits.<br /><br />Why, just because it is not a multiple of 5-bits, should I zero-pad it to the left with 3-more bits, or 8-more bits? <br /><br />Left-based zero padding, DOES NOT CHANGE THE VALUE OF THE NMUBER at all. <br />crockford32NumberEncoding(01100001011000010110000101100001) --> "1GP2RB1"<br /><br />crockford32NumberEncoding(0000000001100001011000010110000101100001) --> "1GP2RB1"<br /><br />The statement from the page,<br /> “If the bit-length of the number to be encoded is not a multiple of 5 bits, then zero-extend the number to make its bit-length a multiple of 5”<br /><br />is quite a strong statement of the form: "If then ". I can't believe it was made with the knowledge that the action has no impact whatsoever, otherwise it is completely irrelevant and can and should be removed.<br /><br />If padding to the left does not matter, then all that remains is padding to the right, which DOES matter. It was actually this sentence in the specification that gave me pause as I was familiar with how base64 encoding right-pads and inserts "==" in the output string as indication of the padding.<br /><br />If we right-pad, we get <br />crockford32NumberEncoding(0110000101100001011000010110000100000000) --> "C5GP2R80"<br />and "C5GP2R80" != "1GP2RB1"<br /><br />I don't know what Douglas Crockford meant the specification to REALLY be. I would love him to find this page and answer. Most people interpreted it as a base32-radix encoding with a single number. Me too.<br /><br /><br /><br /><br /><br />Daniel Eklundhttp://www.blogger.com/profile/11570452431861145598noreply@blogger.comtag:blogger.com,1999:blog-5887985082054147381.post-21774062357953233442014-04-28T05:49:05.201-07:002014-04-28T05:49:05.201-07:00What a horrible theory!
Your evidence for Crockfo...What a horrible theory!<br /><br />Your evidence for Crockford32 being a Concatenative Iterative Encoding System is completely unconvincing.<br /><br />When Crockford talks about base64, he explicit refers only to the "symbol set" it "uses". No comparison is made with the standard as a whole.<br /><br />"Base 10 is well known and well accepted" is also obvious proof that this is a scheme to encode integers and not arbitrary data. There is no well-known standard for transmission of binary data in base 10, as it would be computationally very expensive. However there is no doubt that people use base 10 all the time to encode integers!<br /><br />Your interpretation of "zero extension" to mean padding the rightmost digits is also flawed. It's the unsigned counterpart to "sign extension", and it is extremely common: for example the x86 has a dedicated MOVZX instruction to perform Zero eXtension (by padding the leftmost digits).<br /><br />If you want to contradict the clear and unambiguous statement that "This document describes a 32-symbol notation for expressing numbers", you should have better evidence than that. You're the only one to think that the standard is ambiguous in that regard. But I do agree that test vectors would be nice, and I would add that endianness should have been explicitly mentioned.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5887985082054147381.post-25893574927637504342014-04-27T17:40:51.260-07:002014-04-27T17:40:51.260-07:00Nice write-up.Nice write-up.David Karapetyanhttp://www.blogger.com/profile/11608989663308434850noreply@blogger.com