#BabelOfCode 2024
Week 3
Language: x86_64 assembly [AMD64] (macroassembler: GNU as/gas)
PREV WEEK: mastodon.social/@mcc/113783248…
NEXT WEEK: mastodon.social/@mcc/113906616…
RULES: mastodon.social/@mcc/113676228…
I planned ASM for today and when I saw the challenge *almost* bounced to TCL, because I *don't* wanna write a parser in ASM. But the language here is exceedingly regular, so probs a state machine is enough.
Successfully ran this hello world cs.lmu.edu/~ray/notes/gasexamp… which I think should be all I need to start
Questa voce è stata modificata (9 mesi fa)
mcc
in reply to mcc • • •Macro (Using as)
sourceware.orgmcc
in reply to mcc • • •I've decided to add a new rule to my challenge, which is in addition to doing a different language every week I'm going to try to use exclusively *languages I haven't programmed in before*.
If that's the rule, x86_64 is a stretch as I've *written* x86_64— but I count it as valid, because I've never written a whole AMD64 *program*, only snippets embedded in a C file or OllyDbg-injected into an exe at runtime. Only ASMs I've written whole programs in are MIPS and LLVM in-memory representation.
Alaric Snell-Pym
in reply to mcc • • •mcc
in reply to Alaric Snell-Pym • • •Munyoki Kilyungi 🇰🇪
in reply to mcc • • •mcc
in reply to Munyoki Kilyungi 🇰🇪 • • •@saitama x86_64 assembly is rough because they've been designing CPUs for compilers for a while, so the asm is HUGE! lots of 12-character opcodes. They simply weren't designing it to be written by humans. But there's still a small, manageable ASM living inside of the monster if you focus on the opcodes inherited from old x86 (in fact, the stuff brought *over* from x86 got simplified in the process of 64-bit-ization).
Just remember: 0x90 for NOP!
mcc
in reply to mcc • • •Finding many things that are just sort of It's Assumed You Know This but may or may not be written anywhere. Like, there's a "movb" instruction which is not in the instruction reference I'm using and is not recognized by my syntax highlighter, but gas accepts it.
Question. I do
mov %al, %esi
It says operand type mismatch. OK. I think I can simulate this with
mov $0, %rax
mov $eax, $esi
BC non-al bits of rax get cleared in instruction 1.
…But what's the "widening"/truncating version of mov?
Paul Khuong
in reply to mcc • • •movzx/sx for widening. regular mov for narrowing, but be mindful of partial dependencies on 8 and 16 bit destinations (16b in particular).
The size suffix is an at&t special and not part of the ISA (which also has movd/movq…)
Ryan C. Gordon
in reply to mcc • • •felixcloutier.com/x86/movsx:mo…
Truncating is just using half the register.
MOVSX/MOVSXD — Move With Sign-Extension
www.felixcloutier.commcc
in reply to Ryan C. Gordon • • •mcc
in reply to mcc • • •Ryan C. Gordon
in reply to mcc • • •Just move the value to the bottom of a larger register.
xor %eax,%eax ; zero it out
mov $al, $3 ; put 3 in al
; %eax is now a 32-bit int set to 3.
mcc
in reply to Ryan C. Gordon • • •Ryan C. Gordon
in reply to mcc • • •Oblomov
in reply to mcc • • •mcc
in reply to mcc • • •Oblomov reshared this.
Owen Nelson
in reply to mcc • • •Amir Livne Bar-on
in reply to mcc • • •mcc
in reply to mcc • • •mcc
in reply to mcc • • •So it took longer than I'd hoped, but I now have a working first-pass AMD64 ASM program that can decode an ASCII number in the .data segment and print it out again.
github.com/mcclure/aoc2024/blo…
Build instructions in adjacent run.txt.
I have some questions.
(1 of 2). I think I don't like GNU/AT&T assembly format and would like to switch to Intel assembly format. Is Intel format… documented… somewhere? This is the closest I found. sourceware.org/binutils/docs/a…
aoc2024/03-01-multiply/src/number-echo.s at ad0a3f670a4ed6d403f81863977175315284d220 · mcclure/aoc2024
GitHubmcc
in reply to mcc • • •2. At a certain point in my code, I wanted to load a pointer to the .data segment variable "input" into my %r10. The way to do this turned out to be
lea input(%rip), %r10
rip is… the instruction pointer?? what the devil is the instruction pointer doing there? `input` is at a fixed location, surely it's not loading it from an address relative to the fricking instruction pointer.
Joe Groff
in reply to mcc • • •mcc
in reply to Joe Groff • • •@joe okay. so it's not literally relative to the instruction pointer but relative to the segment of the instruction pointer, or something?
i don't think my program is going to be remotely PIC-compatible lol
Joe Groff
in reply to mcc • • •Joe Groff
in reply to Joe Groff • • •mcc
in reply to mcc • • •Expanding on my question re: "where is intel assembly format actually documented?"
mov rax, 60
This is pretty simple, right? I want the number 60 in rax. This says: ambiguous operand size for mov. Oh, there was something about that in the gas manual. Okay, I say:
mov rax, dword 60
It says: junk 60 after expression
What the heck do I do now? Do I just come back to mastodon for help every time I want to type a number? All the StackOverflow examples on are AT&T format.
Christina Jennifer
in reply to mcc • • •The Intel Architecture Reference Manual is what you need. It explains instruction encodings as well as what every instruction does.
Intel syntax is very different to gnu syntax. Argument ordering is reversed. Instruction variants are marked by type-tagging the registers. It’s all very strange.
mcc
in reply to Christina Jennifer • • •@criffer Thanks… however, I am looking at this, and I cannot find where in the table of contents to find the definition of the assembly language?
For example, I look under "notation" and "operands", and it explains that it is *not* describing the assembly language, but rather a modification of the assembly language designed for representing the manual.
Paul Khuong
in reply to mcc • • •Nicole
in reply to mcc • • •mcc
in reply to Nicole • • •Nicole
in reply to mcc • • •mcc
in reply to mcc • • •Nicole
in reply to mcc • • •mcc
in reply to Nicole • • •Nicole
in reply to mcc • • •slembcke
in reply to mcc • • •Intel® 64 and IA-32 Architectures Software Developer Manuals
Intelmcc
in reply to slembcke • • •@slembcke I don't mind going all the way to the reference manual, but I don't see where the *assembly language* is described in this document and it's hard to CTRL-F a 5,000 page manual… mastodon.social/@mcc/113868620…
mcc
2025-01-21 22:07:34
slembcke
in reply to mcc • • •mcc
in reply to slembcke • • •Unus Nemo
in reply to mcc • • •mcc
in reply to mcc • • •Findings so far:
- If you put ".intel_syntax" at the top of a gas file, it does *not* give you intel syntax *or* AT&T syntax but a secret third thing. The way to get the real intel syntax is ".intel_syntax noprefix"
- It didn't accept the 0(reg) syntax to dereference. By experimentation, I found I could do 0[reg]. That is terrifying. Guessing, I mean.
- No one I have spoken to has learned intel syntax by anything other than oral tradition. Also, no one uses intel with gas (they all use nasm?)
Joe Groff
in reply to mcc • • •mcc
in reply to Joe Groff • • •William D. Jones
in reply to mcc • • •Intel's syntax goes back to the 8086/8 datasheet. You can see it in the IBM PC BIOS listings.
From there, Microsoft made their own assembler (MASM) which extends Intel's original syntax (along with all the segment shit no one cares about).
NASM is "well, it's dest then source operand", like MASM, but isn't really Intel/MASM syntax either. Code written for MASM will not compile for NASM for several syntactical reasons.
And GAS/AT&T syntax is the x86 Unix world. It's ass.
mcc
in reply to William D. Jones • • •@cr1901 @joe Say I'm not programming for the 8086/8, masm, or AT&T syntax. I'm programming for x86_64 and I want to use Intel's syntax.
I go to intel.com/content/www/us/en/de… . There's a 5,000 page manual there. If the old 8086/8 datasheet defines the syntax, I'd expect the 5,000 page 2024 version to as well.
I don't find it. The conventions section mastodon.social/@mcc/113868620… says it describes "a subset of" the assembly language.
Is the syntax hiding somewhere else in these 5,000 pages?
Intel® 64 and IA-32 Architectures Software Developer Manuals
Intelmcc
2025-01-21 22:07:34
mcc
in reply to mcc • • •William D. Jones
in reply to mcc • • •@joe The extent of what I know about Intel assembly directives and syntax is based upon reading the old 8086/8 datasheet, and MASM/NASM manuals. I'm not aware of an actual up-to-date docs of the syntaxes involved.
Learning what I know has been a trial and error process over the past decade-and-a-half I'm afraid. And sadly, I don't know why your original code doesn't compile (I thought ".intel_syntax" alone obviated the need for "%" before registers, but apparently not).
mcc
in reply to William D. Jones • • •mcc
in reply to mcc • • •Graham Sutherland 🎃 Polynomial
in reply to mcc • • •the "normal" way to express things like that in Intel syntax is [reg+disp] or [reg*scale+disp] where disp/scale are constants.
so for example if I wanted to read a dword from the address at eax multiplied by four plus 8, I'd do:
mov eax, [eax*4+8]
Graham Sutherland 🎃 Polynomial
in reply to Graham Sutherland 🎃 Polynomial • • •@cr1901 @joe the "dword", "qword", etc. prefixes are generally only used for clarifying load sizes on sign extensions. for example with a regular mov, these two are equivalent:
mov eax,
[eax]mov eax, dword ptr
[eax]but if you do movsx, the load size is ambiguous, so you clarify it:
movsx rax, [eax] ; ambiguous (error)
movsx rax, word ptr [eax] ; load 16-bit
movsx rax, dword ptr [eax] ; load 32-bit
Graham Sutherland 🎃 Polynomial
in reply to Graham Sutherland 🎃 Polynomial • • •✧✦Catherine✦✧
in reply to Graham Sutherland 🎃 Polynomial • • •Graham Sutherland 🎃 Polynomial
in reply to ✧✦Catherine✦✧ • • •mcc
in reply to Graham Sutherland 🎃 Polynomial • • •Graham Sutherland 🎃 Polynomial
in reply to mcc • • •slembcke
in reply to mcc • • •mcc
in reply to slembcke • • •slembcke
in reply to mcc • • •@joe Well, so that's the thing. I don't think there is THE "intex syntax". I mostly remember it being described as intel or AT&T "styled" syntaxes. (Wikipedia uses the term "branches of syntax")
I've always taken it to mean more like how certain languages are "C-like", but that doesn't mean they are even remotely compatible with C, just that they use curly braces and types go before variable names.
Joe Groff
in reply to mcc • • •Meg
in reply to mcc • • •?? I learned Intel syntax from books... The same way I learned c and c++?
Intel at&t syntax is a weirdness of the gnu world.
mcc
in reply to Meg • • •Meg
in reply to mcc • • •Meg
in reply to Meg • • •That said I think the canonical reference for how intel mnemonics work is really the intel blue books, which they nicely provide pdfs of for free now (used to be you'd have to get them to mail them to you and I still have some of those from the ia-32 days): intel.com/content/www/us/en/de…
It doesn't really explain syntax though, just a *lot* of detail on how they work and tables of the mnemonics and what operands they take.
Intel® 64 and IA-32 Architectures Software Developer Manuals
Intelgaytabase
in reply to Meg • • •mcc
in reply to gaytabase • • •наб
in reply to mcc • • •✧✦Catherine✦✧
in reply to mcc • • •Dima Pasechnik 🇺🇦 🇳🇱
in reply to mcc • • •GMP (gmplib.org) does use gas, as far as I can tell, and it has a lot of number theory related assembly code.
You might need to get through the rather convoluted IMHO build system...
The GNU MP Bignum Library
gmplib.orgShadSterling
in reply to mcc • • •back when the actual 386 was relevant, I recall Intel’s own programmers guides being quite good. Have you looked at their current documentation? I think it’s at intel.com/content/www/us/en/de…
I think I used MASM at the time, and it accepted the same syntax as the Intel books. I’m pretty sure the data names got expanded as absolute addresses, not relative addresses, tho I wouldn’t swear by it without doublechecking
Intel® 64 and IA-32 Architectures Software Developer Manuals
Intelmcc
in reply to ShadSterling • • •@ShadSterling Yes, I've downloaded it, but I cannot find where the assembly syntax is defined, I only find things like opcodes and registers defined.
mastodon.social/@mcc/113868620…
And it is a 5,000 page document so I probably can't read the entire thing hoping to stumble across a BNF.
mcc
2025-01-21 22:07:34
ShadSterling
in reply to mcc • • •mcc
in reply to mcc • • •My extremely normal compilation/invocation line for this program:
(cat data/sample.161.txt | perl tools/convert.pl > src/sample.s) && gcc -g src/sample.s -o run && ./run
("convert.pl" takes the input from stdin, converts it to a single-line string with escaped backquotes, quotes and newlines, then loads "src/base.s" and replaces the line "# !!!!!!!!!!" with an asciz declaration of the input string. This is because gas does not appear to support multiline strings in either AT&T or Intel syntax)
mcc
in reply to mcc • • •… what the hecking hell, I wrote a 100-line hand-rolled regular expression tester in *fricking assembly* and it worked on the *first fricking try*? on both the sample and real puzzle?!
i … … **what**??
How… what???
github.com/mcclure/aoc2024/blo…
…well, second try. the first time i ran it it found 0 matches, and then i double-checked the problem statement and realized they'd said "mul(" not "MUL(", but then I changed those three bytes in match_prefix and it worked.
…What?
aoc2024/03-01-multiply/src/sample.s at 914be9a0398f9b573b3c6c9c0d96d589024e26e0 · mcclure/aoc2024
GitHubOblomov reshared this.
Bill, organizer of stuff
in reply to mcc • • •demofox
in reply to mcc • • •mcc
in reply to mcc • • •That was… disturbingly easy? Part 2 I did not in fact get working on the first try, but at the same time it didn't take so long and I was able to do it while tired¹.
¹ Sign of this: It took me like 5 loops of run it, doesn't work, look closer, find an error, doesn't work, look closer, find an error, eventually I don't see any more errors, so I ran it in gdb and realized I'd been running the wrong file in every test so far. Like. Compiling a.s & editing b.s.
…b.s still needed a 5 changed to a 6
mcc
in reply to mcc • • •Anyway final code, I guess I finished the whole week in a single day.
github.com/mcclure/aoc2024/blo…
I don't know who if anyone is reading these, and this is NOT especially readable to start with, but if you do read you may notice the part 2 patch is pretty ugly lol. I just sorta wedged the part 2 requirements in there sideways.
That's my week 3; next week TCL, I think.
BTW here's Andrzej's week 3, and some of my followers may be interested to learn this week he's doing UXN. mastodon.gamedev.place/@unjell…
aoc2024/03-02-multiply/src/sample2.s at 666d7ec77669d943ffb8bc0bc08be4085f7a9bbd · mcclure/aoc2024
GitHubOblomov reshared this.
наб
in reply to mcc • • •Göran Roseen
in reply to mcc • • •mcc
in reply to Göran Roseen • • •mcc
in reply to mcc • • •Göran Roseen
in reply to mcc • • •Preprocessor I can understand, but build scripting? More surprising...
BTW, I remember that when I learnt Tcl in the early 90's, it was the first time I saw that you could have different languages for different tasks in your toolbox.
I used it a lot for text mangling, while my main work had to be done in Fortran and C.
And using Tk, I could give my old Fortran programs a GUI...
Tom Forsyth
in reply to mcc • • •Oblomov reshared this.
mcc
in reply to Tom Forsyth • • •Intel® 64 and IA-32 Architectures Software Developer Manuals
Intelmcc reshared this.
mcc
in reply to mcc • • •Based on this, in my opinion, GCC and Clang should for clarity stop referring to "Intel syntax" and, taking a cue from ARC, refer to "Alleged Intel syntax", or perhaps "Intel folk syntax".
However, I'm also perplexed, because if there's no source of truth for "Intel syntax", then how did clang and gcc know what to implement? Or rather, how do clang and gcc know their "Intel syntax"es are compatible with each *other*? (2/2)
reshared this
mcc e Oblomov reshared this.
Joe Groff
in reply to mcc • • •Tom Forsyth
in reply to mcc • • •I added a whole bunch of instructions to x86 (what became AVX512), including new syntax for the mask registers. I remember trying to find out who "owned" that, and whether we should use v0(k1), v0[k1] or v0{k1} or some other syntax.
Sadly I don't have my notes from that time, but my vague recollection is that the answer was "nobody cares - pick one". Which was very alarming! I did have some feedback from our internal assembler team, but they stressed that they were NOT a public authority.
Tom Forsyth
in reply to Tom Forsyth • • •The official tools Intel provides are the C intrinsics - and they are of course C syntax, so have no bearing on the assembly.
So yeah, my recollection is we picked what seemed sensible and went with it. BUT - that was just for the purposes of ISA documentation - there was no hard link to the actual syntax accepted by the assemblers (dramatically so in the case of AT&T syntax).
So it really does seem like a thing nobody owns, except for each specific tool vendor!
Erin 💽✨
in reply to Tom Forsyth • • •Intel provided an assembler at one time (ASM86), maybe they still do as part of ICC? And basically “intel syntax” is a descendent of that per oral tradition. It’s Intel syntax because its the syntax that Intel’s asembler used, and that the Intel datasheets use; as opposed to AT&T syntax, the syntax that AT&T’s assembler for Unix used.
When Microsoft made MASM it copied the syntax. Borland’s Turbo Assembler (TASM) copied that. Everything else “intel syntax” is a descendent of those two
In ASM86 and MASM, what
mov eax, foodoes is not immediately obvious. If “foo” is defined as constant (label EQU 0xf00), it’ll set EAX to0xf00. If “foo” is defined as a variable, it’ll load the contents of that variable.TASM added “Ideal Mode”, in which this is always consistent:
mov eax, fooalways sets EAX to the address of the foo label;mov eax, [foo]loads from that address.Most other assemblers implementing Intel syntax (NASM, FASM, YASM, GAS w/
.intel_synatx noprefix) are broadly copying Ideal ModeBut it’s all kind of vibes.
Oblomov reshared this.
mcc
in reply to Erin 💽✨ • • •Erin 💽✨
in reply to mcc • • •mcc
in reply to Erin 💽✨ • • •Cassandrich
in reply to mcc • • •mcc
in reply to Cassandrich • • •slembcke
in reply to mcc • • •matthew - retroedge.tech likes this.
Tom Forsyth
in reply to mcc • • •mattpd
in reply to mcc • • •@TomF FWIW, assemblers generally have syntactic diffs, sometimes documented, like Section 2.2 Quick Start for MASM Users: nasm.us/doc/nasmdoc2.html#sect…
"Modern X86 Assembly Language Programming by Daniel Kusswurm" does have both MASM and NASM, github.com/Apress/modern-x86-a…, although the diffs b/w these may be relatively minor (compared to AT&T).
See also (for more refs):
- books: github.com/MattPD/cpplinks/blo…
- tutorials: github.com/MattPD/cpplinks/blo… (speakers usually mention which asm they're using)
GitHub - Apress/modern-x86-assembly-language-programming-3e: Source Code for 'Modern X86 Assembly Language Programming' by Daniel Kusswurm
GitHubwilkie
in reply to mcc • • •the reason the terms Intel and gas "syntax" exist is to make sure you never know what the hell you're reading. these syntaxes seem to stretch and bend depending on their context... and to accommodate inline assembly via archaic rune magic and other mechanisms lost to time.
and that third point is spot on... I learned Intel syntax from trial and error like my ancestors before me... and yes... I was writing x86 assembly in nasm just yesterday. 😅🥲
schrotthaufen
in reply to mcc • • •Try “.intel_syntax noprefix”
Edit: I’m late to the party
Dan Cassidy 🦌
in reply to mcc • • •Spoofer3
in reply to mcc • • •NASM
www.nasm.usJon A. Cruz
in reply to mcc • • •slembcke
in reply to mcc • • •mcc
in reply to slembcke • • •slembcke
in reply to mcc • • •IIRC abusing LEA is the usual way to multiply by an immediate value. I forget the limitations of that though.
The AMD64 multiply instruction can still do all sorts of "multiply RAX(?) by one of these 12 memory addressing modes", so it's still kinda CISC-y I suppose.
Jordan
in reply to mcc • • •imulis the one that supports immediates (but not fullwidth)Somewhat Complicated Instruction Set Computer
mcc
in reply to Jordan • • •slembcke
in reply to mcc • • •Glyph
in reply to mcc • • •schrotthaufen
in reply to mcc • • •mega
in reply to mcc • • •the AT&T style assemblers have their own mnemonics for some stuff (that the Intel manual only uses a single instruction).
Maaaaybe you can use godbolt.org to use an Intel-style assembler (like nasm) interactively and see the AT&T style disassembly on the side 🤔
наб
in reply to mcc • • •mcc
in reply to наб • • •Amber
in reply to mcc • • •mcc
in reply to Amber • • •@puppygirlhornypost2 @nabijaczleweli i meant C Pre-Processor
However I also do not want to write C++
Steve Leach
in reply to mcc • • •Tutiluren
Unknown parent • • •r10 looks like the content out of an ascii string to me. Are you accidentally passing its data instead of a pointer to it?
r10 contents as ascii: \n731\n
mcc
in reply to Tutiluren • • •mcc
Unknown parent • • •mcc
Unknown parent • • •Joe Groff
in reply to mcc • • •mcc
Unknown parent • • •mcc
Unknown parent • • •slembcke
in reply to mcc • • •mcc
in reply to slembcke • • •mcc
Unknown parent • • •@dysfun @slembcke This is the reference I've been using, but I don't know how to turn lines in an instruction page into a thing I type in the window?
felixcloutier.com/x86/mov
This for example does not clarify which things should literally be present and which are notation of the reference itself. Some of the lines contain non-typeable typesetting like superscripts.
MOV — Move
www.felixcloutier.commcc
Unknown parent • • •@MonniauxD the manual specifically says not to do this (it doesn't say what to do instead, but it says not to do it) and if i try to compile it, it says
src/number-echo-intel.s: Assembler messages:
src/number-echo-intel.s:21: Warning: mnemonic suffix used with `mov'
src/number-echo-intel.s:21: Warning: NOTE: Such forms are deprecated and will be rejected by a future version of the assembler
mcc
Unknown parent • • •@dysfun @slembcke There is a line
MOV r32, imm32
How do I write a 32-bit immediate? That is what I have been trying to figure out. If I write "mov eax, 60" gas prints "Error: ambiguous operand size for `mov'"
azul
in reply to mcc • • •mcc
in reply to azul • • •slembcke
in reply to mcc • • •azul
in reply to mcc • • •mcc
Unknown parent • • •mcc
in reply to azul • • •Joe Groff
Unknown parent • • •mcc
Unknown parent • • •David Monniaux
Unknown parent • • •unnick
in reply to mcc • • •.intel_syntax noprefixsomewhere at the top? if not, does it fix it?David Monniaux
Unknown parent • • •nasm accepts
bits 64
toto:
mov rax, 60
mcc
Unknown parent • • •mcc
Unknown parent • • •@tomjennings @TomF Clang was supposed to have been about cleaning up the by-convention morass that GNU had fallen into. I expect GNU to accept this state of affairs on the world's most popular PC platform and not try to change it for 25 years, but Clang I expect better!!
This is two weeks after I discover Clang's libunwind library has literally no api documentation at all.
mcc
Unknown parent • • •Oblomov reshared this.
tom jennings
Unknown parent • • •Assembler syntaxes are peculiar to each assembler. There never was any standardization there. It's still the 1950s in there!
Erin 💽✨
in reply to mcc • • •Erin 💽✨
in reply to Erin 💽✨ • • •Tom Forsyth
Unknown parent • • •tom jennings
Unknown parent • • •@TomF
I had all of those books, yards of them, as they came out. But no one but board designers used the data books, they're were tediously hardware oriented.
What we all used were the cheatcharts. Foldout instruction set summaries.
Oh how I wish I had my collection! I had probably 25, chips if written code for. Weird stuff like 8x300. Cosmac. Weird little Intel and moto.
The cheatcharts are what you want. All dog-eared torn and coffee stained from use.
But the datebook will have the official instructions descriptions and assembler mnemonics in excruciating detail.
@mcc
Tom Forsyth
Unknown parent • • •Tom Forsyth
in reply to mcc • • •mcc
in reply to Tom Forsyth • • •Tom Forsyth
in reply to mcc • • •🇺🇦 haxadecimal
in reply to mcc • • •API documentation? Let me think... Oh, yes, I vaguely recall, from back in the early days, that such things existed. How quaint!
🙁