SCNR : as it happens, @CoolSWEng has recently reimplemented sed in Rust for the uutils project: github.com/uutils/sed . But does it support poop-emoji-as-separator as its killer feature?!? (And, more importantly, should it?) Cc: @sylvestre
@zacchiro @CoolSWEng @sylvestre this might seem like a joke, but like most of my horrible engineering jokes, it's actually pretty sound.
Like, .csv isn't a serious file format because the risk of comma being in a field in a .csv is too high (hence most csv consumers let you set the delimiter, and tab is a common choice).
The risk of a "normal" sed delimiter like `/` being part of the pattern is very high, especially if UNIX paths are involved.
But a poop emoji? Almost certainly a safe choice! Who's modifying patterns with emoji in them?! Exactly!
@CoolSWEng @sylvestre oh, I didn't think you were joking. I was sold already!
My only doubt is whether it's worth breaking backward compatibility on this --- and I'm inclined to think "yes", because it'd make Rust sed strictly *more* capable.
@zacchiro @CoolSWEng Arguably this is a bug in GNU sed, gnu.org/software/sed/manual/ht… only talks about “a single character” and not a single-byte character. POSIX also only talks about characters (pubs.opengroup.org/onlinepubs/…) but then POSIX only cares about the C locale anyway, anything else is implementation-defined.
Yes, the uutils Rust implementation, in contrast to GNU sed, supports 💩 as a delimiter. It also has (overridable) support for UTF-8 I/O and regular expressions.
Jo Shields
in reply to Jo Shields • • •$ sed -i 's💩pp💩gg💩' test.txt
sed: -e expression #1, char 2: delimiter character is not a single-byte character
reshared this
Stefano Zacchiroli reshared this.
Stefano Zacchiroli
in reply to Jo Shields • • •GitHub - uutils/sed: Rewrite of sed in Rust
GitHubJo Shields
in reply to Stefano Zacchiroli • • •@zacchiro @CoolSWEng @sylvestre this might seem like a joke, but like most of my horrible engineering jokes, it's actually pretty sound.
Like, .csv isn't a serious file format because the risk of comma being in a field in a .csv is too high (hence most csv consumers let you set the delimiter, and tab is a common choice).
The risk of a "normal" sed delimiter like `/` being part of the pattern is very high, especially if UNIX paths are involved.
But a poop emoji? Almost certainly a safe choice! Who's modifying patterns with emoji in them?! Exactly!
Stefano Zacchiroli
in reply to Jo Shields • • •@CoolSWEng @sylvestre oh, I didn't think you were joking. I was sold already!
My only doubt is whether it's worth breaking backward compatibility on this --- and I'm inclined to think "yes", because it'd make Rust sed strictly *more* capable.
Stephen Kitt
in reply to Stefano Zacchiroli • • •The "s" Command (sed, a stream editor)
www.gnu.orgDiomidis Spinellis
in reply to Stefano Zacchiroli • • •Yes, the uutils Rust implementation, in contrast to GNU sed, supports 💩 as a delimiter. It also has (overridable) support for UTF-8 I/O and regular expressions.
@zacchiro @directhex @sylvestre
Stefano Zacchiroli reshared this.
Stefano Zacchiroli
in reply to Diomidis Spinellis • • •Sylvestre
in reply to Stefano Zacchiroli • • •@zacchiro @CoolSWEng echo "💐💩🌷" | cut -d"💩" -f1
with the rust coreutils
💐
with GNU:
cut: the delimiter must be a single character
Stefano Zacchiroli reshared this.
anarcat
in reply to Sylvestre • • •Stefano Zacchiroli
in reply to anarcat • • •