Salta al contenuto principale


sed doesn't accept poop emoji as a delimiter

Cowards

reshared this

in reply to Jo Shields

$ echo "apple" > test.txt
$ sed -i 's💩pp💩gg💩' test.txt
sed: -e expression #1, char 2: delimiter character is not a single-byte character
#1

reshared this

in reply to Jo Shields

SCNR : as it happens, @CoolSWEng has recently reimplemented sed in Rust for the uutils project: github.com/uutils/sed . But does it support poop-emoji-as-separator as its killer feature?!? (And, more importantly, should it?) Cc: @sylvestre
in reply to Stefano Zacchiroli

@zacchiro @CoolSWEng @sylvestre this might seem like a joke, but like most of my horrible engineering jokes, it's actually pretty sound.

Like, .csv isn't a serious file format because the risk of comma being in a field in a .csv is too high (hence most csv consumers let you set the delimiter, and tab is a common choice).

The risk of a "normal" sed delimiter like `/` being part of the pattern is very high, especially if UNIX paths are involved.

But a poop emoji? Almost certainly a safe choice! Who's modifying patterns with emoji in them?! Exactly!

in reply to Jo Shields

@CoolSWEng @sylvestre oh, I didn't think you were joking. I was sold already!

My only doubt is whether it's worth breaking backward compatibility on this --- and I'm inclined to think "yes", because it'd make Rust sed strictly *more* capable.

in reply to Stefano Zacchiroli

@zacchiro @CoolSWEng Arguably this is a bug in GNU sed, gnu.org/software/sed/manual/ht… only talks about “a single character” and not a single-byte character. POSIX also only talks about characters (pubs.opengroup.org/onlinepubs/…) but then POSIX only cares about the C locale anyway, anything else is implementation-defined.
in reply to Stefano Zacchiroli

Yes, the uutils Rust implementation, in contrast to GNU sed, supports 💩 as a delimiter. It also has (overridable) support for UTF-8 I/O and regular expressions.

@zacchiro @directhex @sylvestre

Stefano Zacchiroli reshared this.

in reply to Stefano Zacchiroli

@zacchiro @CoolSWEng echo "💐💩🌷" | cut -d"💩" -f1

with the rust coreutils
💐
with GNU:
cut: the delimiter must be a single character

Stefano Zacchiroli reshared this.