Salta al contenuto principale



Why “caffè” may not be “caffè”


Every time I think I finally understand Unicode, it surprises me again. This time, it was a file full of coffee orders that wouldn’t grep for “caffè” - even though the word was clearly there. The culprit? Unicode normalization. Characters like “è” can be

Every time when I think I finally “got” Unicode, I get kicked in the back by this rabbit hole. 😆 However, IMHO it is important to recognise that when moving data and files between operating systems and programs that you’re better off knowing some of the pitfalls. So I’m sharing something I experienced when I transferred a file to my FreeBSD Play-Around notebook. So let’s assume a little story…

It’s late afternoon and you and some friends sit together playing around with BSD. A friend using another operating system collects coffee orders in a little text file to not forget anyone when going to the barista on the other side of the street. He sends the file to you, so at the next meeting you already know the preferences of your friends. You take a look at who wants a caffè:
armin@freebsd:/tmp $ cat orders2.txtMauro: cappuccinoArmin: caffè doppioAnna: caffè shakeratoStefano: caffèFranz: latte macchiatoFrancesca: cappuccinoCarla: latte macchiato
So you do a quick grep just to be very surprised!
armin@freebsd:/tmp $ grep -i caffè orders2.txtarmin@freebsd:/tmp $
Wait, WAT? Why is there no output? We have more than one line with caffè in the file? Well, you just met one of the many aspects of Unicode. This time it’s called “normalization”. 😎

Many characters can be represented by more than one form. Take the innocent “à” from the example above. There is an accented character in the Unicode characters called LATIN SMALL LETTER A WITH GRAVE. But you could also just use a regular LATIN SMALL LETTER A and combine it with the character COMBINING GRAVE ACCENT from the Unicode characters. Both result in the same character and “look” identical, but aren’t.

Let’s see a line with the word “caffè” as hex dump using the first approach (LATIN SMALL LETTER A WITH GRAVE):
\u0063\u0061\u0066\u0066\u00E8\u000Ac a f f è (LF)
Now let’s do the same for the same line using the second approach:
\u0063\u0061\u0066\u0066\u0065\u0300\u000Ac a f f è (LF)
And there you have it, the latter is a byte longer and the two lines do not match up even if both lines are encoded as UTF-8 and the character looks the same!

So obviously just using UTF-8 is not enough and you might encounter files using the second approach. Just to make matter more complicated there are actually four forms of Unicode normalization out there. 😆

  • NFD: canonical decomposition
  • NFC: canonical decomposition followed by canonical composition
  • NFKD: compatible decomposition
  • NFKC: compatible decomposition followed by canonical composition.

For the sake of brevity of this post and your nerves we’ll just deal with the first two and I refer you to this Wikipedia article for the rest.

Normal form C (NFC) is the most widely used normal form and is also defined by the W3C for HTML, XML, and JavaScript. Technically speaking, encoding in Latin1 (or Windows Codepage 1252), for example, is in normal form C, since an “à” or the umlaut “Ö” is a single character and is not composed of combining characters. Windows and the .Net framework also store Unicode strings in Normal Form C. This does not mean that NFD can be ignored. For example, the Mac OSX file system works with a variant of NFD data, as the Unicode standard was only finalized when OSX was designed. When two applications share Unicode data, but normalize them differently, errors and data loss can result.

So how do we get from one form to another in one of the BSD operating systems (also in Linux)? Well, the Unicode Consortium provides a toolset called ICU — International Components for Unicode. The Documentation URL is unicode-org.github.io/icu/ and you can install that in FreeBSD using the command
pkg install icu
After completion of the installation you have a new command line tool called uconv (not to be mismatched with iconv which serves a similar purpose). Using uconv you can transcode the normal forms into each other as well do a lot of other encoding stuff (this tool is a rabbit hole in itself 😎).

Similar to iconv you can specify a “from” and a “to” encoding for input. But you can also specify so-called “transliterations” that will be applied to the input. In its simplest form such a transliteration is something in the form SOURCE-TARGET that specifies the operation. The "any" stands for any input character. This is the way I got the hexdump from above by using the transliteration 'any-hex':
armin@freebsd:/tmp$ echo caffè | uconv -x 'any-hex'\u0063\u0061\u0066\u0066\u00E8\u000A
Instead of hex codes you can also output the Unicode code point names to see the difference between the two forms:
armin@freebsd:/tmp$ echo Caffè | uconv -f utf-8 -t utf-8 -x 'any-nfd' | uconv -f utf-8 -x 'any-name' \N{LATIN CAPITAL LETTER C}\N{LATIN SMALL LETTER A}\N{LATIN SMALL LETTER F}\N{LATIN SMALL LETTER F}\N{LATIN SMALL LETTER E}\N{COMBINING GRAVE ACCENT}\N{<control-000A>}
Now let’s try this for the NFC form:
armin@freebsd:/tmp$ echo Caffè | uconv -f utf-8 -t utf-8 -x 'any-nfc' | uconv -f utf-8 -x 'any-name'\N{LATIN CAPITAL LETTER C}\N{LATIN SMALL LETTER A}\N{LATIN SMALL LETTER F}\N{LATIN SMALL LETTER F}\N{LATIN SMALL LETTER E WITH GRAVE}\N{<control-000A>}
You can also convert from one normal form to another by using a transliteration like 'any-nfd' to convert the input to the normal form D (for decomposed, e.g. LATIN SMALL CHARACTER A + COMBINING GRAVE ACCENT) or 'any-nfc' for the normal form C.

If you want to learn about building your own transliterations, there’s a tutorial at unicode-org.github.io/icu/user… that shows the enormous capabilities of uconv.

Using the 'name' transliteration you can easily discern the various Sigmas here (I’m using sed to split the output into multiple lines):
armin@freebsd:/tmp $ echo '∑𝛴Σ' | uconv -x 'any-name' | sed -e 's/\\N/\n/g'{N-ARY SUMMATION}{MATHEMATICAL ITALIC CAPITAL SIGMA}{GREEK CAPITAL LETTER SIGMA}{<control-000A>}
If you want to get the Unicode character from the name, there are several ways depending on the programming language you prefer. Here is an example using python that shows the German umlaut "Ö":
python -c 'import unicodedata; print(unicodedata.lookup(u"LATIN CAPITAL LETTER O WITH DIAERESIS"))'
The uconv utility is a very mighty thing and every modern programming language (see the Python example above) also has libraries and modules to support handling Unicode data. The world gets connected, but not in ASCII. 😎

reshared this



US poll finds 60 percent of Gen Z voters back Hamas over Israel in Gaza war


in reply to Whostosay

Israel managed to figure out to turn a vicious terrorist group into the good guys.
in reply to BarneyPiccolo

Turns out when people break out of a literal concentration camp to fight back against their oppressor they're ~~freedom fighters~~ terrorists


Chatbots can be manipulated through flattery and peer pressure


Researchers convinced ChatGPT to do things it normally wouldn’t with basic psychology.

Technology Channel reshared this.





US reportedly suspends visa approvals for nearly all Palestinian passport holders


Restrictions to prevent travel for healthcare and college and come after denying visas to Palestinian Authority leaders


Archived version: archive.is/20250901035359/theg…


Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.



Flotilla with Greta Thunberg on board sets sail for Gaza


Hundreds of activists are aboard the Global Sumud Flotilla with the intention to bring humanitarian aid to Gaza.


Archived version: archive.is/newest/dw.com/en/fl…


Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.






Trust Issues


ComiCSS#206


[Opinion] How the IDF Central Command chief enables war crimes in the West Bank


If the situation were normal, someone appointed as head of the Israeli army's Central Command – which includes occupied territory in which 3.5 million Palestinians and 520,000 Israeli Jews live – would presumably have begun his term by meeting with the mayors of Palestinian cities and villages.


Archived version: archive.is/20250901044105/haar…


Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.



Poland, Baltic, Nordic States urge new EU funds for border security


Facing escalating drone incursions and hybrid threats, five EU border states are demanding fresh Commission funding to boost aerial defences and protect civilians


Archived version: archive.is/newest/euractiv.com…


Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.



At least 250 killed in 6.0-magnitude earthquake in Afghanistan


Hundreds of other people were injured in the quake, which struck Jalalabad near the border with Pakistan.


Archived version: archive.is/newest/nbcnews.com/…


Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.



Australian report raises concerns over age-verification software ahead of teen social ban


There was high accuracy for those over 19, but not for those up to three years on either side of the limit.


The deadly toll on journalists in the Gaza war


With foreign media barred, Palestinians have reported alone, facing the ‘most deliberate effort to kill and silence’ them ever


Archived version: archive.is/20250901045416/theg…


Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.



Australia | Man arrested after allegedly ramming car through front gates of Russian consulate in Sydney


Police called to Woollahra following reports of ‘unauthorised vehicle’ parked in driveway, with 39-year-old then crashing through gates, police allege


Archived version: archive.is/newest/theguardian.…


Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.



Steam users in the UK will need a credit card to access “mature content” games


cross-posted from: piefed.social/post/1204179

Steam is now complying with the Online Safety Act
in reply to BrikoX

That is actually a relatively mild solution that makes the age verification almost bearable...

...which is terrible, because age verification as a whole is a terrible concept. Would have been nice if a big player like Steam is working against it, but I also never expected they would, they are commercial after all.

Questa voce è stata modificata (1 settimana fa)
in reply to ook

I don't agree that it's mild. They explicitly will require credit cards as per the law, debit cards won't work for it, which is a huge deal.
Questa voce è stata modificata (1 settimana fa)



[Patch Notes] 0.3.0 Hotfix 12


0.3.0 Hotfix 12


  • Made adjustments to the Azmadi, the Faridun Prince fight:
    • Slightly increased cooldown of most of his skills.
    • Decreased the damage of most of his skills.
    • Fixed not being able to Dodge Roll through his lacerate attacks.


  • Fixed a bug where Skeletal Warriors could incorrectly be created without reserving Spirit.


[Patch Notes] 0.3.0 Hotfix 11


0.3.0 Hotfix 11


  • Killing the bosses in The Khari Crossing no longer turns off the entrance to Skullmaw Stairway.
  • Fixed the Derelict Mansion bosses sometimes failing to drop loot and open the arena doors.
  • Fixed achievement count icons incorrectly displaying in the chat, since there's currently no achievement list.
  • Fixed a client crash that could occur when using the Tempest Bell Skill.
  • Fixed 2 instance crashes.
in reply to BrikoX

This has to be a record for hotfixes without a version bump.
Questa voce è stata modificata (1 settimana fa)


[Patch Notes] 0.3.0 Hotfix 10


0.3.0 Hotfix 10


  • Fixed a bug where Zolin would grant you Breach Atlas Passive Skill Books for defeating the King of the Mists, instead of from defeating Xesht.
  • Fixed a client crash when displaying a character with the Merit of Service Unique Shield equipped.
  • Fixed 3 instance crashes.

This patch has been deployed without restarting the servers, you will need to restart your client to receive the client fixes.



[Announcement] Introducing the Third Edict Mystery Box


We've just released the Third Edict Mystery Box, a new mystery box without duplicate microtransactions! Many of these microtransactions have a varying number of thematic variations. Check out the contents of the box below!

Video: What's in the The Third Edict Mystery Box?

Please note that currently the Third Edict Mystery Box is available only in Path of Exile 2.


There are 56 microtransactions (including variations) that you can find in The Third Edict Mystery Box. Many of these microtransactions have a varying number of thematic variations, and you'll never receive a duplicate copy of a variation that you have already opened. For example, if you open one variant of a microtransaction, you may later receive another variant, but you'll never get a duplicate! Check out the full details at pathofexile2.com/mysterybox



[Patch Notes] 0.3.0 Hotfix 9


0.3.0 Hotfix 9


  • Fixed a bug where Siphon Elements had an incorrect starting gem level in the Gemcutting menu.
  • Fixed a bug where Lightning Warp was missing the Remnant tag.
  • Fixed a bug where Grim Feast could not be engraved in the Gemcutting menu.
  • Fixed a bug where interacting with the Wardrobe Decoration would open the cosmetics panel for all players in the hideout.
  • Fixed a bug where the Earnings Tab was hidden if you had the 'Hide remove only tabs' option enabled.
  • Fixed a client crash that could occur with The Devouring Diadem Unique Helmet.
  • Fixed a client crash that could sometimes occur when viewing a Gem Stash Tab.
  • Fixed 5 other client crashes.
  • Fixed 3 instance crashes.

This patch has been deployed without restarting the servers, you will need to restart your client to receive the client fixes in this patch. The update will be made available on PlayStation and Xbox as soon as the respective clients have passed certification.



its all about perspective ❤


transcription: your problematic behavior isnt a problem if i like it


[Patch Notes] 0.3.0 Hotfix 8


0.3.0 Hotfix 8


  • Temporarily disabled the Unique Item Fairgraves' Curse from dropping until we can fix a client crash caused by it.


[Patch Notes] 0.3.0 Hotfix 7


0.3.0 Hotfix 7


  • Fixed a bug where the Grip of Kulemak Unique Item was dropping outside of the associated Abyss content.
  • Azmadi, the Faridun Prince now deals approximately 33% less Damage with most of his abilities. Some of his more notable abilities (red flash ones) have less of a reduction, and are thus still mostly lethal.


[Patch Notes] 0.3.0 Hotfix 6


0.3.0 Hotfix 6


  • The Boss of The Excavation area in Act Four no longer just randomly instantly kills you after doing his Wall Skill sometimes.


[Patch Notes] 0.3.0 Hotfix 5


0.3.0 Hotfix 5


  • Fixed a bug where characters in Standard were unable to access The Ziggurat Refuge if they gained access to it before 0.3.0.
  • Fixed an instance crash.


[Patch Notes] 0.3.0 Hotfix 4


0.3.0 Hotfix 4


  • Fixed a bug where Abysses in areas that spawned 2 or more Pits required you to kill too many monsters before the Pits opened.


[Patch Notes] 0.3.0 Hotfix 3


0.3.0 Hotfix 3


  • Fixed a bug where Raging Spirits was not being triggered by Elemental Weakness.
  • Fixed a bug where a checkpoint in the Hunting Grounds could sometimes trap players in blocking environmental doodads.
  • Fixed 2 instance crashes.


[Patch Notes] 0.3.0 Hotfix 2


0.3.0 Hotfix 2


  • Temporarily disabled the Demonfire Ember Fusillade Skill Effect microtransaction due to a crash that is occurring. We will re-enable this microtransaction once we're able to patch in a fix for this crash.
  • Fixed a bug where the Wrath Aura was not granting additional Lightning Damage.
  • Fixed a bug where the slowing debuff from the water in Solitary Confinement was not affected by Slow Potency.
  • Fixed a bug where the Minions from Skittering Stone Support were not visually moving.
  • Fixed 3 instance crashes.


[Patch Notes] 0.3.0 Hotfix


0.3.0 Hotfix


  • Fixed 9 instance crashes.

in reply to BrikoX

Really love PoE, really wanted to love Poe2. Doesn't feel like an upgrade to me.
in reply to Linktank

To me it feels like the game for completely different type of player than Path of Exile.



[Patch Notes] 3.26.0h Patch Notes


3.26.0h Patch Notes


  • Added support for Path of Exile 2:The Third Edict Twitch drops and Free Weekend microtransactions.


Visited YushaKobo in Akihabara today


It's amazing being able to see and try out so many crazy and interesting keyboards in one place. Stuff that even bigger places like Bic Camera don't have, and that you could never imagine e seeing in a shop back in the UK. I picked up a Rainy75 and a set of WS BigLucky Tactiles to go with it. I'm looking forward to getting home to try it all out for real and get familiar with the setup. I'm also kinda weirded out to not hate the linear switches that the Rainy comes with. Shop guy Olodeh (hope I got your name right bro!) was super helpful, and overall a really positive experience. My favourite thing was the switch testers that show you on screen what each switch is, and some details about it. Some minor disappointments: no Boba U4T in stock today - i was looking forward to trying those; and no Wooting keebs for my gamer son to try out, although they did have another HE board that was above our price range, so at least we got a feel for that one.
in reply to not_woody_shaw

Thanks to the fact my hotel was pretty close, past two years, I've been there many times.
Thanks to a clerk I spoke with, and his suggestions, I've bought (not there, though) a wonderful ErgoDash I'm very happy with.



QAA Podcast with Cory Doctorow as guest


QAA Podcast: Cory Doctorow DESTROYS Enshitification (E338)

Episode webpage: soundcloud.com/qanonanonymous/…

Media file: chtbl.com/track/7791D/http://f…

reshared this

in reply to jungle

Skeptic. They have for years discussed various qanon happenings in great detail.
in reply to sexy_peach

Ah, good to know. I was wondering why Cory would be giving conspiracy theorists any attention. 😅