Benvenuto nel Poliverso

mcc

9 mesi fa • •

mcc
9 mesi fa • •

#BabelOfCode 2024
Week 3
Language: x86_64 assembly [AMD64] (macroassembler: GNU as/gas)

PREV WEEK: mastodon.social/@mcc/113783248…
NEXT WEEK: mastodon.social/@mcc/113906616…
RULES: mastodon.social/@mcc/113676228…

I planned ASM for today and when I saw the challenge *almost* bounced to TCL, because I *don't* wanna write a parser in ASM. But the language here is exceedingly regular, so probs a state machine is enough.

Successfully ran this hello world cs.lmu.edu/~ray/notes/gasexamp… which I think should be all I need to start

gasexamples

^cs.lmu.edu

mcc

2025-01-06 20:16:27

#BabelOfCode 2024
Week 2
Language: Forth
Confidence level: Low
PREV WEEK: mastodon.social/@mcc/113743302…
NEXT WEEK: mastodon.social/@mcc/113867584…
RULES: mastodon.social/@mcc/113676228…
So today's challenge looks *absurdly* easy, to the point I'm mostly just suspicious that part 2 will get hard. I figure this is an okay time to burn Forth.
I'm wanting to save Fortran for a week I can use the matrix ops. This puzzle looks suspiciously like part 2 will turn into a 2-dimensional array problem.

#babelofcode

Questa voce è stata modificata (9 mesi fa)

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

My language "confidence level" for this week is high, but down to medium-high for step 2 (because obvs I don't know WHAT they'll throw at me at step 2). I'm kinda unenthused about the gas macro language. The macro language documentation ( sourceware.org/binutils/docs/a… + sourceware.org/binutils/docs/a… , I think that's literally all they wrote ) is sketchy and unclear. Can macros take a macro name as argument and invoke the passed-in macro? I literally can't tell. I'm going to uncover syntax by trial and error

Macro (Using as)

^{sourceware.org}

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

I've decided to add a new rule to my challenge, which is in addition to doing a different language every week I'm going to try to use exclusively *languages I haven't programmed in before*.

If that's the rule, x86_64 is a stretch as I've *written* x86_64— but I count it as valid, because I've never written a whole AMD64 *program*, only snippets embedded in a C file or OllyDbg-injected into an exe at runtime. Only ASMs I've written whole programs in are MIPS and LLVM in-memory representation.

in reply to mcc

Alaric Snell-Pym

in reply to mcc • 9 mesi fa • •

if you need ideas, I'm currently learning Idris. Do a challenge in Idris with full correctness proof of your solution in the function type if you'd like to exercise all the formal logic you forgot years ago! 😅

in reply to Alaric Snell-Pym

mcc

in reply to Alaric Snell-Pym • 9 mesi fa • •

@kitten_tech Haskell/Idris are on the list. I don't think I will be writing proofs.

@Alaric Snell-Pym

in reply to mcc

Munyoki Kilyungi 🇰🇪

in reply to mcc • 9 mesi fa • •

When I grow up I want to be like you lolz. I'm taking baby steps at learning x86_64. For very small toy programs, I appreciate assembly, and by extension C. Still a noob. Hopefully by the close of the year, I should be able to work on more meaningful programs ;)

in reply to Munyoki Kilyungi 🇰🇪

mcc

in reply to Munyoki Kilyungi 🇰🇪 • 9 mesi fa • •

@saitama x86_64 assembly is rough because they've been designing CPUs for compilers for a while, so the asm is HUGE! lots of 12-character opcodes. They simply weren't designing it to be written by humans. But there's still a small, manageable ASM living inside of the monster if you focus on the opcodes inherited from old x86 (in fact, the stuff brought *over* from x86 got simplified in the process of 64-bit-ization).

Just remember: 0x90 for NOP!

@Munyoki Kilyungi 🇰🇪

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

Finding many things that are just sort of It's Assumed You Know This but may or may not be written anywhere. Like, there's a "movb" instruction which is not in the instruction reference I'm using and is not recognized by my syntax highlighter, but gas accepts it.

Question. I do

mov %al, %esi

It says operand type mismatch. OK. I think I can simulate this with

mov $0, %rax
mov $eax, $esi

BC non-al bits of rax get cleared in instruction 1.

…But what's the "widening"/truncating version of mov?

in reply to mcc

Paul Khuong

in reply to mcc • 9 mesi fa • •

movzx/sx for widening. regular mov for narrowing, but be mindful of partial dependencies on 8 and 16 bit destinations (16b in particular).

The size suffix is an at&t special and not part of the ISA (which also has movd/movq…)

in reply to mcc

Ryan C. Gordon

in reply to mcc • 9 mesi fa • •

felixcloutier.com/x86/movsx:mo…

Truncating is just using half the register.

MOVSX/MOVSXD — Move With Sign-Extension

^{www.felixcloutier.com}

in reply to Ryan C. Gordon

mcc

in reply to Ryan C. Gordon • 9 mesi fa • •

@icculus That's great. Thank you.

@Ryan C. Gordon

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

@icculus … … although uh… out of curiosity, is there a way to do it without sign extension?

@Ryan C. Gordon

in reply to mcc

Ryan C. Gordon

in reply to mcc • 9 mesi fa • •

Just move the value to the bottom of a larger register.

xor %eax,%eax ; zero it out
mov $al, $3 ; put 3 in al
; %eax is now a 32-bit int set to 3.

in reply to Ryan C. Gordon

mcc

in reply to Ryan C. Gordon • 9 mesi fa • •

@icculus okay, but i actually do have to zero it before moving it in this case

@Ryan C. Gordon

in reply to mcc

Ryan C. Gordon

in reply to mcc • 9 mesi fa • •

yeah, because the top bits of %eax are whatever was already in there.

in reply to mcc

Oblomov

in reply to mcc • 9 mesi fa • •

there isn't, but IIRC (been a while since I last did ask) there's a cbw instruction that sign-extends AL to AX

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

Ah… well… that did not take long

Terminal screenshot saying:

Segmentation fault (core dumped)

Questa voce è stata modificata (9 mesi fa)

Oblomov reshared this.

in reply to mcc

Owen Nelson

in reply to mcc • 9 mesi fa • •

try again but with extra pie

in reply to mcc

Amir Livne Bar-on

in reply to mcc • 9 mesi fa • •

there's some post that was deleted that showed the contents of the registers, I don't know how to interpret it but this observation might be a hint: the value in r10 read from right to left are the 7 bytes \n137\n

Questa voce è stata modificata (9 mesi fa)

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

I'm sorry the x86_64 multiply instruction works fucking *how*. What fucking century is it

Questa voce è stata modificata (9 mesi fa)

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

So it took longer than I'd hoped, but I now have a working first-pass AMD64 ASM program that can decode an ASCII number in the .data segment and print it out again.

github.com/mcclure/aoc2024/blo…

Build instructions in adjacent run.txt.

I have some questions.

(1 of 2). I think I don't like GNU/AT&T assembly format and would like to switch to Intel assembly format. Is Intel format… documented… somewhere? This is the closest I found. sourceware.org/binutils/docs/a…

aoc2024/03-01-multiply/src/number-echo.s at ad0a3f670a4ed6d403f81863977175315284d220 · mcclure/aoc2024

Advent of Code 2024 challenge (laid-back/"babel" version) - mcclure/aoc2024

^GitHub

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

2. At a certain point in my code, I wanted to load a pointer to the .data segment variable "input" into my %r10. The way to do this turned out to be

lea input(%rip), %r10

rip is… the instruction pointer?? what the devil is the instruction pointer doing there? `input` is at a fixed location, surely it's not loading it from an address relative to the fricking instruction pointer.

in reply to mcc

Joe Groff

in reply to mcc • 9 mesi fa • •

that's the notation for PC-relative addressing. if you're in a dynamic library or ASLR-supporting executable, then `input` is not at a fixed address, but it is at a fixed address relative to the code in the same image. also x86-64 doesn't have 64-bit absolute addressing modes so PC-relative addressing is more compact even if you don't need position independence

in reply to Joe Groff

mcc

in reply to Joe Groff • 9 mesi fa • •

@joe okay. so it's not literally relative to the instruction pointer but relative to the segment of the instruction pointer, or something?

i don't think my program is going to be remotely PIC-compatible lol

@Joe Groff

in reply to mcc

Joe Groff

in reply to mcc • 9 mesi fa • •

sorry if i wasn't clear, it's literally relative to the instruction pointer. when you write `foo(%rip)` it really means (roughly) `(foo - .)(%rip)` and assembles the offset from the instruction to the data. as long as you're not referring to data symbols from other dynamic libraries you shouldn't need to think about it much though

in reply to Joe Groff

Joe Groff

in reply to Joe Groff • 9 mesi fa • •

you can't write e.g. `mov foo, %rax` because there isn't an instruction encoding for an absolute 64-bit address, which is what you'd need to load from `foo` directly. you'd have to load an absolute address with two instructions like `movabs foo, %rdi; mov (%rdi), %rax` since `movabs` is the only 64-bit immediate instruction. that's why code uses PC-relative addressing even if you don't care about PIC

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

Expanding on my question re: "where is intel assembly format actually documented?"

mov rax, 60

This is pretty simple, right? I want the number 60 in rax. This says: ambiguous operand size for mov. Oh, there was something about that in the gas manual. Okay, I say:

mov rax, dword 60

It says: junk 60 after expression

What the heck do I do now? Do I just come back to mastodon for help every time I want to type a number? All the StackOverflow examples on are AT&T format.

in reply to mcc

Christina Jennifer

in reply to mcc • 9 mesi fa • •

The Intel Architecture Reference Manual is what you need. It explains instruction encodings as well as what every instruction does.

Intel syntax is very different to gnu syntax. Argument ordering is reversed. Instruction variants are marked by type-tagging the registers. It’s all very strange.

in reply to Christina Jennifer

mcc

in reply to Christina Jennifer • 9 mesi fa • •

@criffer Thanks… however, I am looking at this, and I cannot find where in the table of contents to find the definition of the assembly language?

For example, I look under "notation" and "operands", and it explains that it is *not* describing the assembly language, but rather a modification of the assembly language designed for representing the manual.

@Christina Jennifer

in reply to mcc

Paul Khuong

in reply to mcc • 9 mesi fa • •

you can ask gcc to emit masm (intel) style assembly. godbolt has a checkbox for that.

in reply to mcc

Nicole

in reply to mcc • 9 mesi fa • •

I believe in this case you need to use `mov eax, 60`?

in reply to Nicole

mcc

in reply to Nicole • 9 mesi fa • •

@streganil okay, i guess the idea is there's no such thing as a 64-bit literal?

@Nicole

in reply to mcc

Nicole

in reply to mcc • 9 mesi fa • •

right; if you want a 64-bit literal, maybe use movabs? but also I'm not super comfy with gas so I don't know if that's supported

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

@streganil mov eax, 60 gives me "ambiguous operand size for mov"

@Nicole

in reply to mcc

Nicole

in reply to mcc • 9 mesi fa • •

... maybe don't use gas, I guess. I can't figure this assembler out. clang and nasm both work with the original code...

in reply to Nicole

mcc

in reply to Nicole • 9 mesi fa • •

@streganil okay, but does clang have a manual for its assembler *at all*?

@Nicole

in reply to mcc

Nicole

in reply to mcc • 9 mesi fa • •

that one I don't know, sorry. I've only ever seriously written assembler for nasm (and a little bit of masm), unfortunately

in reply to mcc

slembcke

in reply to mcc • 9 mesi fa • •

Uh, I used these, but they are gigantic and I they make for a terrible quick reference. Surely there has to be a better indexed version out there: intel.com/content/www/us/en/de…

Intel® 64 and IA-32 Architectures Software Developer Manuals

These manuals describe the architecture and programming environment of the Intel® 64 and IA-32 architectures.

^Intel

in reply to slembcke

mcc

in reply to slembcke • 9 mesi fa • •

@slembcke I don't mind going all the way to the reference manual, but I don't see where the *assembly language* is described in this document and it's hard to CTRL-F a 5,000 page manual… mastodon.social/@mcc/113868620…

mcc

2025-01-21 22:07:34

@criffer Thanks… however, I am looking at this, and I cannot find where in the table of contents to find the definition of the assembly language?
For example, I look under "notation" and "operands", and it explains that it is *not* describing the assembly language, but rather a modification of the assembly language designed for representing the manual.

@slembcke

in reply to mcc

slembcke

in reply to mcc • 9 mesi fa • •

I mean they all seem to be in there, but once you find them in the giant doc you still have to figure out how to decipher them. Surely that's in the giant doc too, but ugh... Again, I'm just saying I found success with it, but in retrospect it feels like there *must* be a better option.

in reply to slembcke

mcc

in reply to slembcke • 9 mesi fa • •

@slembcke What I'm trying to figure out is not "what is the format of the opcodes" but "what is the syntax of the assembly language". How do I format a literal. How do I format a string. Etc.

@slembcke

in reply to mcc

Unus Nemo

in reply to mcc • 9 mesi fa • •

Here are a few screen shots from Introduction to 64 bit Assembly Programming for Linux by Ray Seyfarth.

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

Findings so far:

- If you put ".intel_syntax" at the top of a gas file, it does *not* give you intel syntax *or* AT&T syntax but a secret third thing. The way to get the real intel syntax is ".intel_syntax noprefix"

- It didn't accept the 0(reg) syntax to dereference. By experimentation, I found I could do 0[reg]. That is terrifying. Guessing, I mean.

- No one I have spoken to has learned intel syntax by anything other than oral tradition. Also, no one uses intel with gas (they all use nasm?)

in reply to mcc

Joe Groff

in reply to mcc • 9 mesi fa • •

it's been a while, but from my recollection nasm did fairly reasonably document its syntax when i first used it (though a lot of its documentation does describe things relative to masm/tasm syntax)

in reply to Joe Groff

mcc

in reply to Joe Groff • 9 mesi fa • •

@joe okay, but is nasm documenting Intel format or is it documenting nasm's flavor of Intel format…?

@Joe Groff

in reply to mcc

William D. Jones

in reply to mcc • 9 mesi fa • •

Intel's syntax goes back to the 8086/8 datasheet. You can see it in the IBM PC BIOS listings.

From there, Microsoft made their own assembler (MASM) which extends Intel's original syntax (along with all the segment shit no one cares about).

NASM is "well, it's dest then source operand", like MASM, but isn't really Intel/MASM syntax either. Code written for MASM will not compile for NASM for several syntactical reasons.

And GAS/AT&T syntax is the x86 Unix world. It's ass.

Questa voce è stata modificata (9 mesi fa)

in reply to William D. Jones

mcc

in reply to William D. Jones • 9 mesi fa • •

@cr1901 @joe Say I'm not programming for the 8086/8, masm, or AT&T syntax. I'm programming for x86_64 and I want to use Intel's syntax.

I go to intel.com/content/www/us/en/de… . There's a 5,000 page manual there. If the old 8086/8 datasheet defines the syntax, I'd expect the 5,000 page 2024 version to as well.

I don't find it. The conventions section mastodon.social/@mcc/113868620… says it describes "a subset of" the assembly language.

Is the syntax hiding somewhere else in these 5,000 pages?

Intel® 64 and IA-32 Architectures Software Developer Manuals

These manuals describe the architecture and programming environment of the Intel® 64 and IA-32 architectures.

^Intel

mcc

2025-01-21 22:07:34

@criffer Thanks… however, I am looking at this, and I cannot find where in the table of contents to find the definition of the assembly language?
For example, I look under "notation" and "operands", and it explains that it is *not* describing the assembly language, but rather a modification of the assembly language designed for representing the manual.

@William D. Jones @Joe Groff

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

@cr1901 @joe By "syntax" I mean things like: How do I represent a literal? It appears I can type 4[r10] to mean "memory address in r10 plus four bytes". Why square brackets? Where does it say you use square brackets and not some other type of brackets? Assembly language is a simple language, but it has a syntax, so I expect that syntax to be documented somewhere. If it's Intel's format I expect Intel to be the one documenting it.

@William D. Jones @Joe Groff

in reply to mcc

William D. Jones

in reply to mcc • 9 mesi fa • •

@joe The extent of what I know about Intel assembly directives and syntax is based upon reading the old 8086/8 datasheet, and MASM/NASM manuals. I'm not aware of an actual up-to-date docs of the syntaxes involved.

Learning what I know has been a trial and error process over the past decade-and-a-half I'm afraid. And sadly, I don't know why your original code doesn't compile (I thought ".intel_syntax" alone obviated the need for "%" before registers, but apparently not).

@Joe Groff

in reply to William D. Jones

mcc

in reply to William D. Jones • 9 mesi fa • •

@cr1901 ".intel_syntax" is a lie. ".intel_syntax noprefix" is intel syntax. The manual explains this, but very, very elliptically.

@William D. Jones

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

@cr1901 After experimenting, I am relatively certain it is impossible to write a program against ".intel_syntax" (the fake intel syntax) which gas will not issue deprecation warnings on

@William D. Jones

in reply to mcc

Graham Sutherland 🎃 Polynomial

in reply to mcc • 9 mesi fa • •

the "normal" way to express things like that in Intel syntax is [reg+disp] or [reg*scale+disp] where disp/scale are constants.

so for example if I wanted to read a dword from the address at eax multiplied by four plus 8, I'd do:

mov eax, [eax*4+8]

Questa voce è stata modificata (9 mesi fa)

in reply to Graham Sutherland 🎃 Polynomial

Graham Sutherland 🎃 Polynomial

in reply to Graham Sutherland 🎃 Polynomial • 9 mesi fa • •

@cr1901 @joe the "dword", "qword", etc. prefixes are generally only used for clarifying load sizes on sign extensions. for example with a regular mov, these two are equivalent:

mov eax,
[eax]mov eax, dword ptr

[eax]but if you do movsx, the load size is ambiguous, so you clarify it:

movsx rax, [eax] ; ambiguous (error)
movsx rax, word ptr [eax] ; load 16-bit
movsx rax, dword ptr [eax] ; load 32-bit

@William D. Jones @Joe Groff

Questa voce è stata modificata (9 mesi fa)

in reply to Graham Sutherland 🎃 Polynomial

Graham Sutherland 🎃 Polynomial

in reply to Graham Sutherland 🎃 Polynomial • 9 mesi fa • •

@cr1901 @joe the square brackets represent an address expression, so if you do `mov rax, rcx` that's just a register move, but `mov rax, [rcx]` is a memory load from the address in rcx. in the vast majority of cases a square bracket expression is used to signify a memory operation, with lea (load effective address) being the notable exception, which loads the result of the address expression itself into the target register.

@William D. Jones @Joe Groff

in reply to Graham Sutherland 🎃 Polynomial

✧✦Catherine✦✧

in reply to Graham Sutherland 🎃 Polynomial • 9 mesi fa • •

@gsuberland @cr1901 @joe (Andi has been asking for a comprehensive manual, not an explanation of the individual fact she listed as an example)

@William D. Jones @Graham Sutherland 🎃 Polynomial @Joe Groff

in reply to ✧✦Catherine✦✧

Graham Sutherland 🎃 Polynomial

in reply to ✧✦Catherine✦✧ • 9 mesi fa • •

@whitequark @cr1901 @joe ah, my bad, I missed the full context there.

@William D. Jones @✧✦Catherine✦✧ @Joe Groff

in reply to Graham Sutherland 🎃 Polynomial

mcc

in reply to Graham Sutherland 🎃 Polynomial • 9 mesi fa • •

@gsuberland @whitequark @cr1901 @joe (there are some other parts of this conversation where I was asking for help but at a certain point I switched to "is there a source of ground truth here or just memories of things that previously worked in human neurons?")

@William D. Jones @Graham Sutherland 🎃 Polynomial @✧✦Catherine✦✧ @Joe Groff

in reply to mcc

Graham Sutherland 🎃 Polynomial

in reply to mcc • 9 mesi fa • •

@whitequark @cr1901 @joe unfortunately this is one of those cases where the syntax was never rigidly defined by any kind of specification in the early days, leading to a variety of interpretations and colloquialisms between assembler and disassembler implementations, with most things defined by convention. modern instances of Intel's own instruction set documentation tend to be pretty consistent in terms of instruction formatting, but everything beyond that is implementation-specific.

@William D. Jones @✧✦Catherine✦✧ @Joe Groff

in reply to mcc

slembcke

in reply to mcc • 9 mesi fa • •

@joe Isn't that the problem with asm in general? Even for the same ISA different assemblers have slightly different and incompatible syntaxes anyway. 🙁

@Joe Groff

in reply to slembcke

mcc

in reply to slembcke • 9 mesi fa • •

@slembcke @joe This is very true, but GCC has a mode which *claims* to be Intel syntax compatible. You'd think that in order for GCC to make a claim of Intel syntax compliance, they'd need some canonical source of truth for the Intel syntax. Perhaps I think too highly of the GNU project in saying this.

@slembcke @Joe Groff

in reply to mcc

slembcke

in reply to mcc • 9 mesi fa • •

@joe Well, so that's the thing. I don't think there is THE "intex syntax". I mostly remember it being described as intel or AT&T "styled" syntaxes. (Wikipedia uses the term "branches of syntax")

I've always taken it to mean more like how certain languages are "C-like", but that doesn't mean they are even remotely compatible with C, just that they use curly braces and types go before variable names.

@Joe Groff

in reply to mcc

Joe Groff

in reply to mcc • 9 mesi fa • •

nasm's flavor of intel syntax. alas, i don't think there is a singular "intel syntax" anymore; that might've referred to MASM as the de facto standard back in the day but i don't think anyone besides MSVC still uses it literally

in reply to mcc

Meg

in reply to mcc • 9 mesi fa • •

?? I learned Intel syntax from books... The same way I learned c and c++?

Intel at&t syntax is a weirdness of the gnu world.

in reply to Meg

mcc

in reply to Meg • 9 mesi fa • •

@megmac … what book

@Meg

in reply to mcc

Meg

in reply to mcc • 9 mesi fa • •

I mean it wouldn't be useful now, because it's ancient, but I mentioned it in the other reply chain. It's uh.. this one

Cover of an old computer book called "Peter Norton's Assembly Language Book for the IBM PC" by Peter Norton and John Socha.

in reply to Meg

Meg

in reply to Meg • 9 mesi fa • •

That said I think the canonical reference for how intel mnemonics work is really the intel blue books, which they nicely provide pdfs of for free now (used to be you'd have to get them to mail them to you and I still have some of those from the ia-32 days): intel.com/content/www/us/en/de…

It doesn't really explain syntax though, just a *lot* of detail on how they work and tables of the mnemonics and what operands they take.

Intel® 64 and IA-32 Architectures Software Developer Manuals

These manuals describe the architecture and programming environment of the Intel® 64 and IA-32 architectures.

^Intel

in reply to Meg

gaytabase

in reply to Meg • 9 mesi fa • •

@megmac i managed to dig up an old reference for x86 assembly including syntax. i think that's as close as we're going to find.

@Meg

in reply to gaytabase

mcc

in reply to gaytabase • 9 mesi fa • •

@dysfun @megmac actually, the pdf you found appears to be at&t syntax not intel, because it specifically says registers must be prefixed by a %. which would be unsurprising, because it was a sun manual and sun was a unix shop. i appreciate the effort tho D:

@gaytabase @Meg

in reply to mcc

наб

in reply to mcc • 9 mesi fa • •

you may be interested in clang -masm=intel. this may be a secret fourth thing but it's probably the most long-term normal i think

Questa voce è stata modificata (9 mesi fa)

in reply to mcc

✧✦Catherine✦✧

in reply to mcc • 9 mesi fa • •

i used to use intel+gas for a while

in reply to mcc

Dima Pasechnik 🇺🇦 🇳🇱

in reply to mcc • 9 mesi fa • •

GMP (gmplib.org) does use gas, as far as I can tell, and it has a lot of number theory related assembly code.

You might need to get through the rather convoluted IMHO build system...

The GNU MP Bignum Library

^gmplib.org

in reply to mcc

ShadSterling

in reply to mcc • 9 mesi fa • •

back when the actual 386 was relevant, I recall Intel’s own programmers guides being quite good. Have you looked at their current documentation? I think it’s at intel.com/content/www/us/en/de…

I think I used MASM at the time, and it accepted the same syntax as the Intel books. I’m pretty sure the data names got expanded as absolute addresses, not relative addresses, tho I wouldn’t swear by it without doublechecking

Intel® 64 and IA-32 Architectures Software Developer Manuals

These manuals describe the architecture and programming environment of the Intel® 64 and IA-32 architectures.

^Intel

in reply to ShadSterling

mcc

in reply to ShadSterling • 9 mesi fa • •

@ShadSterling Yes, I've downloaded it, but I cannot find where the assembly syntax is defined, I only find things like opcodes and registers defined.

mastodon.social/@mcc/113868620…

And it is a 5,000 page document so I probably can't read the entire thing hoping to stumble across a BNF.

mcc

2025-01-21 22:07:34

@criffer Thanks… however, I am looking at this, and I cannot find where in the table of contents to find the definition of the assembly language?
For example, I look under "notation" and "operands", and it explains that it is *not* describing the assembly language, but rather a modification of the assembly language designed for representing the manual.

@ShadSterling

in reply to mcc

ShadSterling

in reply to mcc • 9 mesi fa • •

I wonder if I mostly learned it from the examples they included of each instruction…. But things like how to use an address in the data segment might not show up there. Well, maybe I did learn some key things from the oral tradition :/

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

My extremely normal compilation/invocation line for this program:

(cat data/sample.161.txt | perl tools/convert.pl > src/sample.s) && gcc -g src/sample.s -o run && ./run

("convert.pl" takes the input from stdin, converts it to a single-line string with escaped backquotes, quotes and newlines, then loads "src/base.s" and replaces the line "# !!!!!!!!!!" with an asciz declaration of the input string. This is because gas does not appear to support multiline strings in either AT&T or Intel syntax)

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

… what the hecking hell, I wrote a 100-line hand-rolled regular expression tester in *fricking assembly* and it worked on the *first fricking try*? on both the sample and real puzzle?!

i … … **what**??

How… what???

github.com/mcclure/aoc2024/blo…

…well, second try. the first time i ran it it found 0 matches, and then i double-checked the problem statement and realized they'd said "mul(" not "MUL(", but then I changed those three bytes in match_prefix and it worked.

…What?

"That's the right answer! You are one gold star closer." "mcclure" has 5 stars.

aoc2024/03-01-multiply/src/sample.s at 914be9a0398f9b573b3c6c9c0d96d589024e26e0 · mcclure/aoc2024

Advent of Code 2024 challenge (laid-back/"babel" version) - mcclure/aoc2024

^GitHub

Questa voce è stata modificata (9 mesi fa)

Oblomov reshared this.

in reply to mcc

Bill, organizer of stuff

in reply to mcc • 9 mesi fa • •

Time to buy a lottery ticket!

in reply to mcc

demofox

in reply to mcc • 9 mesi fa • •

I feel cooler for knowing you 😀

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

That was… disturbingly easy? Part 2 I did not in fact get working on the first try, but at the same time it didn't take so long and I was able to do it while tired¹.

¹ Sign of this: It took me like 5 loops of run it, doesn't work, look closer, find an error, doesn't work, look closer, find an error, eventually I don't see any more errors, so I ran it in gdb and realized I'd been running the wrong file in every test so far. Like. Compiling a.s & editing b.s.

…b.s still needed a 5 changed to a 6

Questa voce è stata modificata (9 mesi fa)

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

Anyway final code, I guess I finished the whole week in a single day.

github.com/mcclure/aoc2024/blo…

I don't know who if anyone is reading these, and this is NOT especially readable to start with, but if you do read you may notice the part 2 patch is pretty ugly lol. I just sorta wedged the part 2 requirements in there sideways.

That's my week 3; next week TCL, I think.

BTW here's Andrzej's week 3, and some of my followers may be interested to learn this week he's doing UXN. mastodon.gamedev.place/@unjell…

aoc2024/03-02-multiply/src/sample2.s at 666d7ec77669d943ffb8bc0bc08be4085f7a9bbd · mcclure/aoc2024

Advent of Code 2024 challenge (laid-back/"babel" version) - mcclure/aoc2024

^GitHub

Oblomov reshared this.

in reply to mcc

наб

in reply to mcc • 9 mesi fa • •

damn... you really invented unportable C

in reply to mcc

Göran Roseen

in reply to mcc • 9 mesi fa • •

Tcl...!

in reply to Göran Roseen

mcc

in reply to Göran Roseen • 9 mesi fa • •

@roseen oddly, it's very popular in FPGAs both as a preprocessor for Verilog and as a build scripting language.

@Göran Roseen

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

@roseen like… now, in 2024.

@Göran Roseen

in reply to mcc

Göran Roseen

in reply to mcc • 9 mesi fa • •

Preprocessor I can understand, but build scripting? More surprising...

BTW, I remember that when I learnt Tcl in the early 90's, it was the first time I saw that you could have different languages for different tasks in your toolbox.

I used it a lot for text mangling, while my main work had to be done in Fortran and C.
And using Tk, I could give my old Fortran programs a GUI...

in reply to mcc

Tom Forsyth

in reply to mcc • 9 mesi fa • •

IIRC a lot of the extended syntax was defined by MASM, sometimes to make life simpler for their extremely powerful macros. I don't know how much of that was also adopted by NASM. So well worth checking with both MASM and NASM. But yes - lots of oral tradition here...

Oblomov reshared this.

in reply to Tom Forsyth

mcc

in reply to Tom Forsyth • 9 mesi fa • •

So what I am looking for is neither nasm nor masm, but rather "Intel syntax". Clang and GCC both have modes in which they purport to follow "Intel syntax". To me, this is like Clang and GCC promising that an "Intel syntax" exists. From my research, unless there's a BNF I haven't found hiding in this 5000 page Intel x86_64 manual ( intel.com/content/www/us/en/de… ) , Intel has never defined such a thing. It was apparently only *implied* by examples in 8086 datasheets. (1/2)

Intel® 64 and IA-32 Architectures Software Developer Manuals

These manuals describe the architecture and programming environment of the Intel® 64 and IA-32 architectures.

^Intel

Questa voce è stata modificata (9 mesi fa)

mcc reshared this.

in reply to mcc

mcc

in reply to mcc • 9 mesi fa • •

Based on this, in my opinion, GCC and Clang should for clarity stop referring to "Intel syntax" and, taking a cue from ARC, refer to "Alleged Intel syntax", or perhaps "Intel folk syntax".

However, I'm also perplexed, because if there's no source of truth for "Intel syntax", then how did clang and gcc know what to implement? Or rather, how do clang and gcc know their "Intel syntax"es are compatible with each *other*? (2/2)

Questa voce è stata modificata (9 mesi fa)

reshared this

in reply to mcc

Joe Groff

in reply to mcc • 9 mesi fa • •

@TomF this all makes me think about how the Intel 8080 and Zilog Z80 have almost the same machine code level ISA, but are culturally separated by entirely different assembly language conventions

screenshot of a comparison table showing the same machine code instructions as written in conventional Intel 8080 and Zilog Z80 assembly language

@Tom Forsyth

in reply to mcc

Tom Forsyth

in reply to mcc • 9 mesi fa • •

I added a whole bunch of instructions to x86 (what became AVX512), including new syntax for the mask registers. I remember trying to find out who "owned" that, and whether we should use v0(k1), v0[k1] or v0{k1} or some other syntax.

Sadly I don't have my notes from that time, but my vague recollection is that the answer was "nobody cares - pick one". Which was very alarming! I did have some feedback from our internal assembler team, but they stressed that they were NOT a public authority.

Questa voce è stata modificata (9 mesi fa)

in reply to Tom Forsyth

Tom Forsyth

in reply to Tom Forsyth • 9 mesi fa • •

The official tools Intel provides are the C intrinsics - and they are of course C syntax, so have no bearing on the assembly.

So yeah, my recollection is we picked what seemed sensible and went with it. BUT - that was just for the purposes of ISA documentation - there was no hard link to the actual syntax accepted by the assemblers (dramatically so in the case of AT&T syntax).

So it really does seem like a thing nobody owns, except for each specific tool vendor!

in reply to Tom Forsyth

Erin 💽✨

in reply to Tom Forsyth • 9 mesi fa • •

Intel provided an assembler at one time (ASM86), maybe they still do as part of ICC? And basically “intel syntax” is a descendent of that per oral tradition. It’s Intel syntax because its the syntax that Intel’s asembler used, and that the Intel datasheets use; as opposed to AT&T syntax, the syntax that AT&T’s assembler for Unix used.

When Microsoft made MASM it copied the syntax. Borland’s Turbo Assembler (TASM) copied that. Everything else “intel syntax” is a descendent of those two

In ASM86 and MASM, what mov eax, foo does is not immediately obvious. If “foo” is defined as constant (label EQU 0xf00), it’ll set EAX to 0xf00. If “foo” is defined as a variable, it’ll load the contents of that variable.

TASM added “Ideal Mode”, in which this is always consistent: mov eax, foo always sets EAX to the address of the foo label; mov eax, [foo] loads from that address.

Most other assemblers implementing Intel syntax (NASM, FASM, YASM, GAS w/ .intel_synatx noprefix) are broadly copying Ideal Mode

But it’s all kind of vibes.

Oblomov reshared this.

in reply to Erin 💽✨

mcc

in reply to Erin 💽✨ • 9 mesi fa • •

@erincandescent Then I still assert we shouldn't be calling it "intel syntax" if it's "vaguely intel inspired syntax"!

@Erin 💽✨

in reply to mcc

Erin 💽✨

in reply to mcc • 9 mesi fa • •

we shouldn't really be calling it AT&T syntax either (they have no opinions on anything added after about 1990, it would be more accurate to call it GNU Syntax these days) and yet here we are

in reply to Erin 💽✨

mcc

in reply to Erin 💽✨ • 9 mesi fa • •

@erincandescent I reckon by AT&T syntax we really mean UNIX Syntax (and after all GNU is UNIX)

@Erin 💽✨

in reply to mcc

Cassandrich

in reply to mcc • 9 mesi fa • •

@erincandescent I don't care what we call it as long as I don't have to read it. 🤪

@Erin 💽✨

in reply to Cassandrich

mcc

in reply to Cassandrich • 9 mesi fa • •

@dalias @erincandescent for the record the gnu syntax may have an authoritative documentation source but there are significant gaps in that documentation

@Cassandrich @Erin 💽✨

in reply to mcc

slembcke

in reply to mcc • 9 mesi fa • •

@TomF I’m quite certain they aren’t! Clang calls itself GCC compatible for C code too, but that’s only 99% true. It is on the other hand, pretty easy to write code that works in both.

@Tom Forsyth

matthew - retroedge.tech likes this.

in reply to mcc

Tom Forsyth

in reply to mcc • 9 mesi fa • •

Ah, yes, my recollection is that this is true. Intel builds and specifies hardware. This squishy hazy "software" thing is just... like... your opinion, man!

in reply to mcc

mattpd

in reply to mcc • 9 mesi fa • •

@TomF FWIW, assemblers generally have syntactic diffs, sometimes documented, like Section 2.2 Quick Start for MASM Users: nasm.us/doc/nasmdoc2.html#sect…

"Modern X86 Assembly Language Programming by Daniel Kusswurm" does have both MASM and NASM, github.com/Apress/modern-x86-a…, although the diffs b/w these may be relatively minor (compared to AT&T).

See also (for more refs):
- books: github.com/MattPD/cpplinks/blo…
- tutorials: github.com/MattPD/cpplinks/blo… (speakers usually mention which asm they're using)

The source code examples published in this book can be executed on either Windows or Linux. The
Windows version of each source code example was developed using Visual C++ and MASM, while GNU C++
and NASM were used for the Linux version. MASM and NASM were chosen as the assemblers for this book’s
assembly language source code for a variety of reasons. MASM is included with Visual Studio, and NASM
can be easily installed on a Linux computer. There is sufficient similarity between these two assemblers,
which simplifies the source code explanations. Whenever possible, the source code examples avoid using
features unique to MASM or NASM. Most importantly, both MASM and NASM use the same instruction
mnemonics and operand orderings as those published in the AMD and Intel reference manuals.

GitHub - Apress/modern-x86-assembly-language-programming-3e: Source Code for 'Modern X86 Assembly Language Programming' by Daniel Kusswurm

Source Code for 'Modern X86 Assembly Language Programming' by Daniel Kusswurm - Apress/modern-x86-assembly-language-programming-3e

^GitHub

@Tom Forsyth

in reply to mcc

wilkie

in reply to mcc • 9 mesi fa • •

the reason the terms Intel and gas "syntax" exist is to make sure you never know what the hell you're reading. these syntaxes seem to stretch and bend depending on their context... and to accommodate inline assembly via archaic rune magic and other mechanisms lost to time.

and that third point is spot on... I learned Intel syntax from trial and error like my ancestors before me... and yes... I was writing x86 assembly in nasm just yesterday. 😅🥲

in reply to mcc

schrotthaufen

in reply to mcc • 9 mesi fa • •

Try “.intel_syntax noprefix”

Edit: I’m late to the party

Questa voce è stata modificata (9 mesi fa)

in reply to mcc

Dan Cassidy 🦌

in reply to mcc • 9 mesi fa • •

In this instance %rip is to signify your acknowledgement that you are digging yourself an early grave

in reply to mcc

Spoofer3

in reply to mcc • 9 mesi fa • •

been a long time since I did anything with nasm, but I used to like it. If you are already aware, sorry for noise. nasm.us/

NASM

The Netwide Assembler (NASM) is an assembler and disassembler for the Intel x86 architecture, used by developers worldwide

^www.nasm.us

in reply to mcc

Jon A. Cruz

in reply to mcc • 9 mesi fa • •

having cut my teeth on Motorola assembly back in the day, it hurts every time I have to delve into Intel.

in reply to mcc

slembcke

in reply to mcc • 9 mesi fa • •

Oh? I remember it being pretty boring. Pulls from a pair of registers for the multiplicands and outputs the high/low parts to a pair of registers for the result. Is the frustration that you don't get to pick the registers?

in reply to slembcke

mcc

in reply to slembcke • 9 mesi fa • •

@slembcke That last thing yes, also that there's no way to multiply by an immediate? Isn't the point of CISC that you don't have to do register juggling like this

@slembcke

in reply to mcc

slembcke

in reply to mcc • 9 mesi fa • •

IIRC abusing LEA is the usual way to multiply by an immediate value. I forget the limitations of that though.

The AMD64 multiply instruction can still do all sorts of "multiply RAX(?) by one of these 12 memory addressing modes", so it's still kinda CISC-y I suppose.

in reply to mcc

Jordan

in reply to mcc • 9 mesi fa • •

imul is the one that supports immediates (but not fullwidth)

Somewhat Complicated Instruction Set Computer

in reply to Jordan

mcc

in reply to Jordan • 9 mesi fa • •

@jrose @slembcke Isn't that signed, though?

@slembcke @Jordan

in reply to mcc

slembcke

in reply to mcc • 9 mesi fa • •

@jrose I forget... signed and unsigned multiplication are only different if you don't sign extend them right? Does mulq work around that by simply computing the full width result and letting the programmer decide how to use it?

@Jordan

in reply to mcc

Glyph

in reply to mcc • 9 mesi fa • •

the century of arm64 on the desktop

in reply to mcc

schrotthaufen

in reply to mcc • 9 mesi fa • •

You probably want movzx to move the 16bit from al into esi, and zero extend the upper (I think; didn’t do asm in about a decade now…) half of esi.

Questa voce è stata modificata (9 mesi fa)

in reply to mcc

mega

in reply to mcc • 9 mesi fa • •

the AT&T style assemblers have their own mnemonics for some stuff (that the Intel manual only uses a single instruction).

Maaaaybe you can use godbolt.org to use an Intel-style assembler (like nasm) interactively and see the AT&T style disassembly on the side 🤔

in reply to mcc

наб

in reply to mcc • 9 mesi fa • •

i'll be honest with you, I don't think i've ever seen anything that isn't basically compiler guts use .macro, while the C preprocessor remains very popular for assembly files

in reply to наб

mcc

in reply to наб • 9 mesi fa • •

@nabijaczleweli urrg don't waaaaaaanna write CPP

@наб

in reply to mcc

Amber

in reply to mcc • 9 mesi fa • •

C++ is fun... if you're a sadomasochist

in reply to Amber

mcc

in reply to Amber • 9 mesi fa • •

@puppygirlhornypost2 @nabijaczleweli i meant C Pre-Processor

However I also do not want to write C++

@наб @Amber

in reply to mcc

Steve Leach

in reply to mcc • 9 mesi fa • •

You should totally write your own 64-bit Forth as a project 😉.

Unknown parent

Tutiluren

Unknown parent • 9 mesi fa • •

r10 looks like the content out of an ascii string to me. Are you accidentally passing its data instead of a pointer to it?

r10 contents as ascii: \n731\n

in reply to Tutiluren

mcc

in reply to Tutiluren • 9 mesi fa • •

@n @mkj thanks

@Tutiluren @mkj

Unknown parent

mcc

Unknown parent • 9 mesi fa • •

@dysfun Can I still use gas macros?

@gaytabase

Unknown parent

mcc

Unknown parent • 9 mesi fa • •

@joe @dysfun yes, it has something called .macro

@gaytabase @Joe Groff

in reply to mcc

Joe Groff

in reply to mcc • 9 mesi fa • •

@dysfun what a time to be alive

@gaytabase

Unknown parent

mcc

Unknown parent • 9 mesi fa • •

@dysfun i'm reasonably certain that's at&t format, the thing i'm trying to switch away from.

@gaytabase

Unknown parent

mcc

Unknown parent • 9 mesi fa • •

@MonniauxD intel format

@David Monniaux

in reply to mcc

slembcke

in reply to mcc • 9 mesi fa • •

@dysfun I've only use Intel syntax and some of my instructions have the "size" tag on them. It's been a while, but IIRC you sometimes it infers wrong and you have to make it explicit.

@gaytabase

in reply to slembcke

mcc

in reply to slembcke • 9 mesi fa • •

@slembcke @dysfun but how do i know how to make it explicit if the assembly format is not documented…?

@slembcke @gaytabase

Unknown parent

mcc

Unknown parent • 9 mesi fa • •

@dysfun @slembcke This is the reference I've been using, but I don't know how to turn lines in an instruction page into a thing I type in the window?

felixcloutier.com/x86/mov

This for example does not clarify which things should literally be present and which are notation of the reference itself. Some of the lines contain non-typeable typesetting like superscripts.

MOV — Move

^{www.felixcloutier.com}

@slembcke @gaytabase

Unknown parent

mcc

Unknown parent • 9 mesi fa • •

@MonniauxD the manual specifically says not to do this (it doesn't say what to do instead, but it says not to do it) and if i try to compile it, it says

src/number-echo-intel.s: Assembler messages:
src/number-echo-intel.s:21: Warning: mnemonic suffix used with `mov'
src/number-echo-intel.s:21: Warning: NOTE: Such forms are deprecated and will be rejected by a future version of the assembler

@David Monniaux

Unknown parent

mcc

Unknown parent • 9 mesi fa • •

@dysfun @slembcke There is a line

MOV r32, imm32

How do I write a 32-bit immediate? That is what I have been trying to figure out. If I write "mov eax, 60" gas prints "Error: ambiguous operand size for `mov'"

@slembcke @gaytabase

in reply to mcc

azul

in reply to mcc • 9 mesi fa • •

@dysfun @slembcke would you consider switching to NASM? it is widely available, it uses intel syntax, and it infers argument sizes (like it's supposed to).

@slembcke @gaytabase

in reply to azul

mcc

in reply to azul • 9 mesi fa • •

I'd consider switching to clang but I would not consider this for this particular project, no.

Questa voce è stata modificata (9 mesi fa)

in reply to mcc

slembcke

in reply to mcc • 9 mesi fa • •

@dysfun I suspect you are missing a special character? In 6502 asm "lda 60" means load A from memory address 60. "lda #60" means load 60 into register A. I don't remember how that works, and the code I have open in front of me doesn't use any immediate values to check. 🙁

@gaytabase

in reply to mcc

azul

in reply to mcc • 9 mesi fa • •

@dysfun @slembcke nasm is not clang but ok. anyway i bring this up because for intel syntax the only time you should have to specify an operand size is when you are doing an operation with only memory & immediate operands. otherwise it should be able to infer the size from the register names. it sounds like gas just doesn't support intel syntax properly...

@slembcke @gaytabase

Unknown parent

mcc

Unknown parent • 9 mesi fa • •

@unnick @dysfun @slembcke i had .intel_syntax at the top but had not included noprefix. this fixes several things. thanks.

@slembcke @gaytabase @unnick

in reply to azul

mcc

in reply to azul • 9 mesi fa • •

@typeswitch It turns out that if you type ".intel_syntax" at the top of a gas file, gas will not give you intel syntax OR AT&T syntax but some secret third thing they do not document. apparently i was supposed to say ".intel_syntax noprefix" to get the real intel syntax.

@azul

Unknown parent

Joe Groff

Unknown parent • 9 mesi fa • •

@dysfun does gas have macros other than running the C preprocessor? preprocessing should work regardless of the syntax mode

@gaytabase

Unknown parent

mcc

Unknown parent • 9 mesi fa • •

@dysfun knowing these facts are entirely useless unless i know what document canonicalizes them

@gaytabase

Unknown parent

David Monniaux

Unknown parent • 9 mesi fa • •

movq rax, 60 works with gas

in reply to mcc

unnick

in reply to mcc • 9 mesi fa • •

have you put .intel_syntax noprefix somewhere at the top? if not, does it fix it?

Unknown parent

David Monniaux

Unknown parent • 9 mesi fa • •

nasm accepts

bits 64
toto:
mov rax, 60

Unknown parent

mcc

Unknown parent • 9 mesi fa • •

@dysfun This is helpful, thanks. Should I be worried however about the fact I am not writing x86, but rather x86_64?

@gaytabase

Unknown parent

mcc

Unknown parent • 9 mesi fa • •

@tomjennings @TomF Clang was supposed to have been about cleaning up the by-convention morass that GNU had fallen into. I expect GNU to accept this state of affairs on the world's most popular PC platform and not try to change it for 25 years, but Clang I expect better!!

This is two weeks after I discover Clang's libunwind library has literally no api documentation at all.

@Tom Forsyth @tom jennings

Unknown parent

mcc

Unknown parent • 9 mesi fa • •

@erincandescent @TomF @tomjennings That's why Apple funded it and why Google jumped on board, but in my opinion it wouldn't have taken over the market *just* because corporations like it. I use clang/lldb preferentially even when GCC/gdb is an option because it's just cleaner, more intentional software.

@Tom Forsyth @tom jennings @Erin 💽✨

Oblomov reshared this.

Unknown parent

tom jennings

Unknown parent • 9 mesi fa • •

@TomF
Assembler syntaxes are peculiar to each assembler. There never was any standardization there. It's still the 1950s in there!

@Tom Forsyth

in reply to mcc

Erin 💽✨

in reply to mcc • 9 mesi fa • •

it was always a goal of clang to accept all of the same input that GCC (and Gas) did...

in reply to Erin 💽✨

Erin 💽✨

in reply to Erin 💽✨ • 9 mesi fa • •

I mean, clang became an actual thing because Apple needed a replacement for GCC 4.2 after FSF changed their license

Unknown parent

Tom Forsyth

Unknown parent • 9 mesi fa • •

@tomjennings That's the whole discussion. They do not have the *assembler syntax* which is a different thing.

@tom jennings

Unknown parent

tom jennings

Unknown parent • 9 mesi fa • •

@TomF

I had all of those books, yards of them, as they came out. But no one but board designers used the data books, they're were tediously hardware oriented.

What we all used were the cheatcharts. Foldout instruction set summaries.
Oh how I wish I had my collection! I had probably 25, chips if written code for. Weird stuff like 8x300. Cosmac. Weird little Intel and moto.

The cheatcharts are what you want. All dog-eared torn and coffee stained from use.

But the datebook will have the official instructions descriptions and assembler mnemonics in excruciating detail.

@mcc

@mcc @Tom Forsyth