Salta al contenuto principale


SHOUT For Smaller QR Codes


QR codes have been with us for a long time now, and after passing through their Gardenesque hype cycle of inappropriate usage, have now settled down to be an important and ubiquitous part of life. If you have ever made a QR code you’ll know all about trying to generate the most compact and easily-scannable one you can, and for that [Terence Eden] is here with an interesting quirk. Upper-case text produces smaller codes than lower-case.

His post takes us on a journey into the encoding of QR codes, not in terms of their optical pattern generation, but instead the bit stream they contain. There are different modes to denote different types of payload, and in his two examples of the same URL in upper- and lower- cases, the modes are different. Upper-case is encoded as alphanumeric, while lower-case, seemingly though also containing alphanumeric information, is encoded as bytes.

To understand why, it’s necessary to consider the QR codes’ need for efficiency, which led its designers to reduce their character set as far as possible and only define uppercase letters in their alphanumeric set. The upper-case payload is thus encoded using less bits per character than the lower-case one, which is encoded as 8-bit bytes. A satisfying explanation for a puzzle in plain sight.

Hungry for more QR hackery? This one contains more than one payload!


hackaday.com/2025/02/26/shout-…


Why are QR Codes with capital letters smaller than QR codes with lower-case letters?


shkspr.mobi/blog/2025/02/why-a…

Take a look at these two QR codes. Scan them if you like, I promise there's nothing dodgy in them.


QR CODE QR Code.


Left is upper-case HTTPS://EDENT.TEL/ and right is lower-case https://edent.tel/

You can clearly see that the one on the left is a "smaller" QR as it has fewer bits of data in it. Both go to the same URl, the only difference is the casing.

What's going on?

Your first thought might be that there's a different level of error-correction. QR codes can have increasing levels of redundancy in order to make sure they can be scanned when damaged. But, in this case, they both have Low error correction.

The smaller code is "Type 1" - it is 21px * 21px. The larger is "Type 2" with 25px * 25px.

The official specification describes the versions in more details. The smaller code should be able to hold 25 alphanumeric character. But https://edent.tel/ is only 18 characters long. So why is it bumped into a larger code?

Using a decoder like ZXING it is possible to see the raw bytes of each code.

UPPER

20 93 1a a6 54 63 dd 28 35 1b 50 e9 3b dc 00 ec11 ec 11
lower:

41 26 87 47 47 07 33 a2 f2 f6 56 46 56 e7 42 e746 56 c2 f0 ec 11 ec 11 ec 11 ec 11 ec 11 ec 11ec 11
You might have noticed that they both end with the same sequence: ec 11 Those are "padding bytes" because the data needs to completely fill the QR code. But - hang on! - not only does the UPPER one safely contain the text, it also has some spare padding?

The answer lies in the first couple of bytes.

Once the raw bytes have been read, a QR scanner needs to know exactly what sort of code it is dealing with. The first four bits tell it the mode. Let's convert the hex to binary and then split after the first four bits:

TypeHEXBINSplit
UPPER20 9300100000 100100110010 000010010011
lower41 2601000001 001001100100 000100100110

The UPPER code is 0010 which indicates it is Alphanumeric - the standard says the next 9 bits show the length of data.

The lower code is 0100 which indicates it is Byte mode - the standard says the next 8 bits show the length of data.

TypeHEXBINSplit
UPPER20 9300100000 100100110010 0000 10010
lower41 2601000001 001001100100 000 10010

Look at that! They both have a length of 10010 which, converted to binary, is 18 - the exact length of the text.

Alphanumeric users 11 bits for every two characters, Byte mode uses (you guessed it!) 8 bits per single character.

But why is the lower-case code pushed into Byte mode? Isn't it using letters and number?

Well, yes. But in order to store data efficiently, Alphanumeric mode only has a limited subset of characters available. Upper-case letters, and a handful of punctuation symbols: space $ % * + - . / :

Luckily, that's enough for a protocol, domain, and path. Sadly, no GET parameters.

So, there you have it. If you want the smallest possible physical size for a QR code which contains a URl, make sure the text is all in capital letters.

#qr #QRCodes