Salta al contenuto principale


#BabelOfCode 2024
Week 3
Language: x86_64 assembly [AMD64] (macroassembler: GNU as/gas)

PREV WEEK: mastodon.social/@mcc/113783248…
NEXT WEEK: mastodon.social/@mcc/113906616…
RULES: mastodon.social/@mcc/113676228…

I planned ASM for today and when I saw the challenge *almost* bounced to TCL, because I *don't* wanna write a parser in ASM. But the language here is exceedingly regular, so probs a state machine is enough.

Successfully ran this hello world cs.lmu.edu/~ray/notes/gasexamp… which I think should be all I need to start


#BabelOfCode 2024
Week 2
Language: Forth

Confidence level: Low

PREV WEEK: mastodon.social/@mcc/113743302…
NEXT WEEK: mastodon.social/@mcc/113867584…
RULES: mastodon.social/@mcc/113676228…

So today's challenge looks *absurdly* easy, to the point I'm mostly just suspicious that part 2 will get hard. I figure this is an okay time to burn Forth.

I'm wanting to save Fortran for a week I can use the matrix ops. This puzzle looks suspiciously like part 2 will turn into a 2-dimensional array problem.


Questa voce è stata modificata (9 mesi fa)
in reply to mcc

My language "confidence level" for this week is high, but down to medium-high for step 2 (because obvs I don't know WHAT they'll throw at me at step 2). I'm kinda unenthused about the gas macro language. The macro language documentation ( sourceware.org/binutils/docs/a… + sourceware.org/binutils/docs/a… , I think that's literally all they wrote ) is sketchy and unclear. Can macros take a macro name as argument and invoke the passed-in macro? I literally can't tell. I'm going to uncover syntax by trial and error
in reply to mcc

I've decided to add a new rule to my challenge, which is in addition to doing a different language every week I'm going to try to use exclusively *languages I haven't programmed in before*.

If that's the rule, x86_64 is a stretch as I've *written* x86_64— but I count it as valid, because I've never written a whole AMD64 *program*, only snippets embedded in a C file or OllyDbg-injected into an exe at runtime. Only ASMs I've written whole programs in are MIPS and LLVM in-memory representation.

in reply to mcc

if you need ideas, I'm currently learning Idris. Do a challenge in Idris with full correctness proof of your solution in the function type if you'd like to exercise all the formal logic you forgot years ago! 😅
in reply to mcc

When I grow up I want to be like you lolz. I'm taking baby steps at learning x86_64. For very small toy programs, I appreciate assembly, and by extension C. Still a noob. Hopefully by the close of the year, I should be able to work on more meaningful programs ;)
in reply to Munyoki Kilyungi 🇰🇪

@saitama x86_64 assembly is rough because they've been designing CPUs for compilers for a while, so the asm is HUGE! lots of 12-character opcodes. They simply weren't designing it to be written by humans. But there's still a small, manageable ASM living inside of the monster if you focus on the opcodes inherited from old x86 (in fact, the stuff brought *over* from x86 got simplified in the process of 64-bit-ization).

Just remember: 0x90 for NOP!

in reply to mcc

Finding many things that are just sort of It's Assumed You Know This but may or may not be written anywhere. Like, there's a "movb" instruction which is not in the instruction reference I'm using and is not recognized by my syntax highlighter, but gas accepts it.

Question. I do

mov %al, %esi

It says operand type mismatch. OK. I think I can simulate this with

mov $0, %rax
mov $eax, $esi

BC non-al bits of rax get cleared in instruction 1.

…But what's the "widening"/truncating version of mov?

in reply to mcc

movzx/sx for widening. regular mov for narrowing, but be mindful of partial dependencies on 8 and 16 bit destinations (16b in particular).

The size suffix is an at&t special and not part of the ISA (which also has movd/movq…)

in reply to mcc

@icculus … … although uh… out of curiosity, is there a way to do it without sign extension?
in reply to mcc

Just move the value to the bottom of a larger register.

xor %eax,%eax ; zero it out
mov $al, $3 ; put 3 in al
; %eax is now a 32-bit int set to 3.

in reply to Ryan C. Gordon

@icculus okay, but i actually do have to zero it before moving it in this case
in reply to mcc

yeah, because the top bits of %eax are whatever was already in there.
in reply to mcc

there isn't, but IIRC (been a while since I last did ask) there's a cbw instruction that sign-extends AL to AX
in reply to mcc

Ah… well… that did not take long
Questa voce è stata modificata (9 mesi fa)

Oblomov reshared this.

in reply to mcc

there's some post that was deleted that showed the contents of the registers, I don't know how to interpret it but this observation might be a hint: the value in r10 read from right to left are the 7 bytes \n137\n
Questa voce è stata modificata (9 mesi fa)
in reply to mcc

I'm sorry the x86_64 multiply instruction works fucking *how*. What fucking century is it
Questa voce è stata modificata (9 mesi fa)
in reply to mcc

So it took longer than I'd hoped, but I now have a working first-pass AMD64 ASM program that can decode an ASCII number in the .data segment and print it out again.

github.com/mcclure/aoc2024/blo…

Build instructions in adjacent run.txt.

I have some questions.

(1 of 2). I think I don't like GNU/AT&T assembly format and would like to switch to Intel assembly format. Is Intel format… documented… somewhere? This is the closest I found. sourceware.org/binutils/docs/a…

in reply to mcc

2. At a certain point in my code, I wanted to load a pointer to the .data segment variable "input" into my %r10. The way to do this turned out to be

lea input(%rip), %r10

rip is… the instruction pointer?? what the devil is the instruction pointer doing there? `input` is at a fixed location, surely it's not loading it from an address relative to the fricking instruction pointer.

in reply to mcc

that's the notation for PC-relative addressing. if you're in a dynamic library or ASLR-supporting executable, then `input` is not at a fixed address, but it is at a fixed address relative to the code in the same image. also x86-64 doesn't have 64-bit absolute addressing modes so PC-relative addressing is more compact even if you don't need position independence
in reply to Joe Groff

@joe okay. so it's not literally relative to the instruction pointer but relative to the segment of the instruction pointer, or something?

i don't think my program is going to be remotely PIC-compatible lol

in reply to mcc

sorry if i wasn't clear, it's literally relative to the instruction pointer. when you write `foo(%rip)` it really means (roughly) `(foo - .)(%rip)` and assembles the offset from the instruction to the data. as long as you're not referring to data symbols from other dynamic libraries you shouldn't need to think about it much though
in reply to Joe Groff

you can't write e.g. `mov foo, %rax` because there isn't an instruction encoding for an absolute 64-bit address, which is what you'd need to load from `foo` directly. you'd have to load an absolute address with two instructions like `movabs foo, %rdi; mov (%rdi), %rax` since `movabs` is the only 64-bit immediate instruction. that's why code uses PC-relative addressing even if you don't care about PIC
in reply to mcc

Expanding on my question re: "where is intel assembly format actually documented?"

mov rax, 60

This is pretty simple, right? I want the number 60 in rax. This says: ambiguous operand size for mov. Oh, there was something about that in the gas manual. Okay, I say:

mov rax, dword 60

It says: junk 60 after expression

What the heck do I do now? Do I just come back to mastodon for help every time I want to type a number? All the StackOverflow examples on are AT&T format.

in reply to mcc

The Intel Architecture Reference Manual is what you need. It explains instruction encodings as well as what every instruction does.

Intel syntax is very different to gnu syntax. Argument ordering is reversed. Instruction variants are marked by type-tagging the registers. It’s all very strange.

in reply to Christina Jennifer

@criffer Thanks… however, I am looking at this, and I cannot find where in the table of contents to find the definition of the assembly language?

For example, I look under "notation" and "operands", and it explains that it is *not* describing the assembly language, but rather a modification of the assembly language designed for representing the manual.

in reply to mcc

you can ask gcc to emit masm (intel) style assembly. godbolt has a checkbox for that.
in reply to mcc

I believe in this case you need to use `mov eax, 60`?
in reply to Nicole

@streganil okay, i guess the idea is there's no such thing as a 64-bit literal?
in reply to mcc

right; if you want a 64-bit literal, maybe use movabs? but also I'm not super comfy with gas so I don't know if that's supported
in reply to mcc

@streganil mov eax, 60 gives me "ambiguous operand size for mov"
in reply to mcc

... maybe don't use gas, I guess. I can't figure this assembler out. clang and nasm both work with the original code...
in reply to Nicole

@streganil okay, but does clang have a manual for its assembler *at all*?
in reply to mcc

that one I don't know, sorry. I've only ever seriously written assembler for nasm (and a little bit of masm), unfortunately
in reply to mcc

Uh, I used these, but they are gigantic and I they make for a terrible quick reference. Surely there has to be a better indexed version out there: intel.com/content/www/us/en/de…
in reply to slembcke

@slembcke I don't mind going all the way to the reference manual, but I don't see where the *assembly language* is described in this document and it's hard to CTRL-F a 5,000 page manual… mastodon.social/@mcc/113868620…


@criffer Thanks… however, I am looking at this, and I cannot find where in the table of contents to find the definition of the assembly language?

For example, I look under "notation" and "operands", and it explains that it is *not* describing the assembly language, but rather a modification of the assembly language designed for representing the manual.


in reply to mcc

I mean they all seem to be in there, but once you find them in the giant doc you still have to figure out how to decipher them. Surely that's in the giant doc too, but ugh... Again, I'm just saying I found success with it, but in retrospect it feels like there *must* be a better option.
in reply to slembcke

@slembcke What I'm trying to figure out is not "what is the format of the opcodes" but "what is the syntax of the assembly language". How do I format a literal. How do I format a string. Etc.
in reply to mcc

Here are a few screen shots from Introduction to 64 bit Assembly Programming for Linux by Ray Seyfarth.
in reply to mcc

Findings so far:

- If you put ".intel_syntax" at the top of a gas file, it does *not* give you intel syntax *or* AT&T syntax but a secret third thing. The way to get the real intel syntax is ".intel_syntax noprefix"

- It didn't accept the 0(reg) syntax to dereference. By experimentation, I found I could do 0[reg]. That is terrifying. Guessing, I mean.

- No one I have spoken to has learned intel syntax by anything other than oral tradition. Also, no one uses intel with gas (they all use nasm?)

in reply to mcc

it's been a while, but from my recollection nasm did fairly reasonably document its syntax when i first used it (though a lot of its documentation does describe things relative to masm/tasm syntax)
in reply to Joe Groff

@joe okay, but is nasm documenting Intel format or is it documenting nasm's flavor of Intel format…?
in reply to mcc

Intel's syntax goes back to the 8086/8 datasheet. You can see it in the IBM PC BIOS listings.

From there, Microsoft made their own assembler (MASM) which extends Intel's original syntax (along with all the segment shit no one cares about).

NASM is "well, it's dest then source operand", like MASM, but isn't really Intel/MASM syntax either. Code written for MASM will not compile for NASM for several syntactical reasons.

And GAS/AT&T syntax is the x86 Unix world. It's ass.

Questa voce è stata modificata (9 mesi fa)
in reply to William D. Jones

@cr1901 @joe Say I'm not programming for the 8086/8, masm, or AT&T syntax. I'm programming for x86_64 and I want to use Intel's syntax.

I go to intel.com/content/www/us/en/de… . There's a 5,000 page manual there. If the old 8086/8 datasheet defines the syntax, I'd expect the 5,000 page 2024 version to as well.

I don't find it. The conventions section mastodon.social/@mcc/113868620… says it describes "a subset of" the assembly language.

Is the syntax hiding somewhere else in these 5,000 pages?


@criffer Thanks… however, I am looking at this, and I cannot find where in the table of contents to find the definition of the assembly language?

For example, I look under "notation" and "operands", and it explains that it is *not* describing the assembly language, but rather a modification of the assembly language designed for representing the manual.


in reply to mcc

@cr1901 @joe By "syntax" I mean things like: How do I represent a literal? It appears I can type 4[r10] to mean "memory address in r10 plus four bytes". Why square brackets? Where does it say you use square brackets and not some other type of brackets? Assembly language is a simple language, but it has a syntax, so I expect that syntax to be documented somewhere. If it's Intel's format I expect Intel to be the one documenting it.
in reply to mcc

@joe The extent of what I know about Intel assembly directives and syntax is based upon reading the old 8086/8 datasheet, and MASM/NASM manuals. I'm not aware of an actual up-to-date docs of the syntaxes involved.

Learning what I know has been a trial and error process over the past decade-and-a-half I'm afraid. And sadly, I don't know why your original code doesn't compile (I thought ".intel_syntax" alone obviated the need for "%" before registers, but apparently not).

in reply to William D. Jones

@cr1901 ".intel_syntax" is a lie. ".intel_syntax noprefix" is intel syntax. The manual explains this, but very, very elliptically.
in reply to mcc

@cr1901 After experimenting, I am relatively certain it is impossible to write a program against ".intel_syntax" (the fake intel syntax) which gas will not issue deprecation warnings on
in reply to mcc

the "normal" way to express things like that in Intel syntax is [reg+disp] or [reg*scale+disp] where disp/scale are constants.

so for example if I wanted to read a dword from the address at eax multiplied by four plus 8, I'd do:

mov eax, [eax*4+8]

Questa voce è stata modificata (9 mesi fa)
in reply to Graham Sutherland 🎃 Polynomial

@cr1901 @joe the "dword", "qword", etc. prefixes are generally only used for clarifying load sizes on sign extensions. for example with a regular mov, these two are equivalent:

mov eax,
[eax]mov eax, dword ptr

[eax]but if you do movsx, the load size is ambiguous, so you clarify it:

movsx rax, [eax] ; ambiguous (error)
movsx rax, word ptr [eax] ; load 16-bit
movsx rax, dword ptr [eax] ; load 32-bit

Questa voce è stata modificata (9 mesi fa)
in reply to Graham Sutherland 🎃 Polynomial

@cr1901 @joe the square brackets represent an address expression, so if you do `mov rax, rcx` that's just a register move, but `mov rax, [rcx]` is a memory load from the address in rcx. in the vast majority of cases a square bracket expression is used to signify a memory operation, with lea (load effective address) being the notable exception, which loads the result of the address expression itself into the target register.
in reply to Graham Sutherland 🎃 Polynomial

@gsuberland @whitequark @cr1901 @joe (there are some other parts of this conversation where I was asking for help but at a certain point I switched to "is there a source of ground truth here or just memories of things that previously worked in human neurons?")
in reply to mcc

@whitequark @cr1901 @joe unfortunately this is one of those cases where the syntax was never rigidly defined by any kind of specification in the early days, leading to a variety of interpretations and colloquialisms between assembler and disassembler implementations, with most things defined by convention. modern instances of Intel's own instruction set documentation tend to be pretty consistent in terms of instruction formatting, but everything beyond that is implementation-specific.
in reply to mcc

@joe Isn't that the problem with asm in general? Even for the same ISA different assemblers have slightly different and incompatible syntaxes anyway. 🙁
in reply to slembcke

@slembcke @joe This is very true, but GCC has a mode which *claims* to be Intel syntax compatible. You'd think that in order for GCC to make a claim of Intel syntax compliance, they'd need some canonical source of truth for the Intel syntax. Perhaps I think too highly of the GNU project in saying this.
in reply to mcc

@joe Well, so that's the thing. I don't think there is THE "intex syntax". I mostly remember it being described as intel or AT&T "styled" syntaxes. (Wikipedia uses the term "branches of syntax")

I've always taken it to mean more like how certain languages are "C-like", but that doesn't mean they are even remotely compatible with C, just that they use curly braces and types go before variable names.

in reply to mcc

nasm's flavor of intel syntax. alas, i don't think there is a singular "intel syntax" anymore; that might've referred to MASM as the de facto standard back in the day but i don't think anyone besides MSVC still uses it literally
in reply to mcc

?? I learned Intel syntax from books... The same way I learned c and c++?

Intel at&t syntax is a weirdness of the gnu world.

in reply to mcc

I mean it wouldn't be useful now, because it's ancient, but I mentioned it in the other reply chain. It's uh.. this one
in reply to Meg

That said I think the canonical reference for how intel mnemonics work is really the intel blue books, which they nicely provide pdfs of for free now (used to be you'd have to get them to mail them to you and I still have some of those from the ia-32 days): intel.com/content/www/us/en/de…

It doesn't really explain syntax though, just a *lot* of detail on how they work and tables of the mnemonics and what operands they take.

in reply to Meg

@megmac i managed to dig up an old reference for x86 assembly including syntax. i think that's as close as we're going to find.
@Meg
in reply to gaytabase

@dysfun @megmac actually, the pdf you found appears to be at&t syntax not intel, because it specifically says registers must be prefixed by a %. which would be unsurprising, because it was a sun manual and sun was a unix shop. i appreciate the effort tho D:
in reply to mcc

you may be interested in clang -masm=intel. this may be a secret fourth thing but it's probably the most long-term normal i think
Questa voce è stata modificata (9 mesi fa)
in reply to mcc

GMP (gmplib.org) does use gas, as far as I can tell, and it has a lot of number theory related assembly code.

You might need to get through the rather convoluted IMHO build system...

in reply to mcc

back when the actual 386 was relevant, I recall Intel’s own programmers guides being quite good. Have you looked at their current documentation? I think it’s at intel.com/content/www/us/en/de…

I think I used MASM at the time, and it accepted the same syntax as the Intel books. I’m pretty sure the data names got expanded as absolute addresses, not relative addresses, tho I wouldn’t swear by it without doublechecking

in reply to ShadSterling

@ShadSterling Yes, I've downloaded it, but I cannot find where the assembly syntax is defined, I only find things like opcodes and registers defined.

mastodon.social/@mcc/113868620…

And it is a 5,000 page document so I probably can't read the entire thing hoping to stumble across a BNF.


@criffer Thanks… however, I am looking at this, and I cannot find where in the table of contents to find the definition of the assembly language?

For example, I look under "notation" and "operands", and it explains that it is *not* describing the assembly language, but rather a modification of the assembly language designed for representing the manual.


in reply to mcc

I wonder if I mostly learned it from the examples they included of each instruction…. But things like how to use an address in the data segment might not show up there. Well, maybe I did learn some key things from the oral tradition :/
in reply to mcc

My extremely normal compilation/invocation line for this program:

(cat data/sample.161.txt | perl tools/convert.pl > src/sample.s) && gcc -g src/sample.s -o run && ./run

("convert.pl" takes the input from stdin, converts it to a single-line string with escaped backquotes, quotes and newlines, then loads "src/base.s" and replaces the line "# !!!!!!!!!!" with an asciz declaration of the input string. This is because gas does not appear to support multiline strings in either AT&T or Intel syntax)

in reply to mcc

… what the hecking hell, I wrote a 100-line hand-rolled regular expression tester in *fricking assembly* and it worked on the *first fricking try*? on both the sample and real puzzle?!

i … … **what**??

How… what???

github.com/mcclure/aoc2024/blo…

…well, second try. the first time i ran it it found 0 matches, and then i double-checked the problem statement and realized they'd said "mul(" not "MUL(", but then I changed those three bytes in match_prefix and it worked.

…What?

Questa voce è stata modificata (9 mesi fa)

Oblomov reshared this.

in reply to mcc

That was… disturbingly easy? Part 2 I did not in fact get working on the first try, but at the same time it didn't take so long and I was able to do it while tired¹.

¹ Sign of this: It took me like 5 loops of run it, doesn't work, look closer, find an error, doesn't work, look closer, find an error, eventually I don't see any more errors, so I ran it in gdb and realized I'd been running the wrong file in every test so far. Like. Compiling a.s & editing b.s.

…b.s still needed a 5 changed to a 6

Questa voce è stata modificata (9 mesi fa)
in reply to mcc

Anyway final code, I guess I finished the whole week in a single day.

github.com/mcclure/aoc2024/blo…

I don't know who if anyone is reading these, and this is NOT especially readable to start with, but if you do read you may notice the part 2 patch is pretty ugly lol. I just sorta wedged the part 2 requirements in there sideways.

That's my week 3; next week TCL, I think.

BTW here's Andrzej's week 3, and some of my followers may be interested to learn this week he's doing UXN. mastodon.gamedev.place/@unjell…

Oblomov reshared this.

in reply to Göran Roseen

@roseen oddly, it's very popular in FPGAs both as a preprocessor for Verilog and as a build scripting language.
in reply to mcc

Preprocessor I can understand, but build scripting? More surprising...

BTW, I remember that when I learnt Tcl in the early 90's, it was the first time I saw that you could have different languages for different tasks in your toolbox.

I used it a lot for text mangling, while my main work had to be done in Fortran and C.
And using Tk, I could give my old Fortran programs a GUI...

in reply to mcc

IIRC a lot of the extended syntax was defined by MASM, sometimes to make life simpler for their extremely powerful macros. I don't know how much of that was also adopted by NASM. So well worth checking with both MASM and NASM. But yes - lots of oral tradition here...

Oblomov reshared this.

in reply to Tom Forsyth

So what I am looking for is neither nasm nor masm, but rather "Intel syntax". Clang and GCC both have modes in which they purport to follow "Intel syntax". To me, this is like Clang and GCC promising that an "Intel syntax" exists. From my research, unless there's a BNF I haven't found hiding in this 5000 page Intel x86_64 manual ( intel.com/content/www/us/en/de… ) , Intel has never defined such a thing. It was apparently only *implied* by examples in 8086 datasheets. (1/2)
Questa voce è stata modificata (9 mesi fa)

mcc reshared this.

in reply to mcc

Based on this, in my opinion, GCC and Clang should for clarity stop referring to "Intel syntax" and, taking a cue from ARC, refer to "Alleged Intel syntax", or perhaps "Intel folk syntax".

However, I'm also perplexed, because if there's no source of truth for "Intel syntax", then how did clang and gcc know what to implement? Or rather, how do clang and gcc know their "Intel syntax"es are compatible with each *other*? (2/2)

Questa voce è stata modificata (9 mesi fa)

reshared this

in reply to mcc

@TomF this all makes me think about how the Intel 8080 and Zilog Z80 have almost the same machine code level ISA, but are culturally separated by entirely different assembly language conventions
in reply to mcc

I added a whole bunch of instructions to x86 (what became AVX512), including new syntax for the mask registers. I remember trying to find out who "owned" that, and whether we should use v0(k1), v0[k1] or v0{k1} or some other syntax.

Sadly I don't have my notes from that time, but my vague recollection is that the answer was "nobody cares - pick one". Which was very alarming! I did have some feedback from our internal assembler team, but they stressed that they were NOT a public authority.

Questa voce è stata modificata (9 mesi fa)
in reply to Tom Forsyth

The official tools Intel provides are the C intrinsics - and they are of course C syntax, so have no bearing on the assembly.

So yeah, my recollection is we picked what seemed sensible and went with it. BUT - that was just for the purposes of ISA documentation - there was no hard link to the actual syntax accepted by the assemblers (dramatically so in the case of AT&T syntax).

So it really does seem like a thing nobody owns, except for each specific tool vendor!

in reply to Tom Forsyth

Intel provided an assembler at one time (ASM86), maybe they still do as part of ICC? And basically “intel syntax” is a descendent of that per oral tradition. It’s Intel syntax because its the syntax that Intel’s asembler used, and that the Intel datasheets use; as opposed to AT&T syntax, the syntax that AT&T’s assembler for Unix used.

When Microsoft made MASM it copied the syntax. Borland’s Turbo Assembler (TASM) copied that. Everything else “intel syntax” is a descendent of those two

In ASM86 and MASM, what mov eax, foo does is not immediately obvious. If “foo” is defined as constant (label EQU 0xf00), it’ll set EAX to 0xf00. If “foo” is defined as a variable, it’ll load the contents of that variable.

TASM added “Ideal Mode”, in which this is always consistent: mov eax, foo always sets EAX to the address of the foo label; mov eax, [foo] loads from that address.

Most other assemblers implementing Intel syntax (NASM, FASM, YASM, GAS w/ .intel_synatx noprefix) are broadly copying Ideal Mode

But it’s all kind of vibes.

Oblomov reshared this.

in reply to Erin 💽✨

@erincandescent Then I still assert we shouldn't be calling it "intel syntax" if it's "vaguely intel inspired syntax"!
in reply to mcc

we shouldn't really be calling it AT&T syntax either (they have no opinions on anything added after about 1990, it would be more accurate to call it GNU Syntax these days) and yet here we are
in reply to Erin 💽✨

@erincandescent I reckon by AT&T syntax we really mean UNIX Syntax (and after all GNU is UNIX)
in reply to mcc

@erincandescent I don't care what we call it as long as I don't have to read it. 🤪
in reply to Cassandrich

@dalias @erincandescent for the record the gnu syntax may have an authoritative documentation source but there are significant gaps in that documentation
in reply to mcc

@TomF I’m quite certain they aren’t! Clang calls itself GCC compatible for C code too, but that’s only 99% true. It is on the other hand, pretty easy to write code that works in both.
in reply to mcc

Ah, yes, my recollection is that this is true. Intel builds and specifies hardware. This squishy hazy "software" thing is just... like... your opinion, man!
in reply to mcc

@TomF FWIW, assemblers generally have syntactic diffs, sometimes documented, like Section 2.2 Quick Start for MASM Users: nasm.us/doc/nasmdoc2.html#sect…

"Modern X86 Assembly Language Programming by Daniel Kusswurm" does have both MASM and NASM, github.com/Apress/modern-x86-a…, although the diffs b/w these may be relatively minor (compared to AT&T).

See also (for more refs):
- books: github.com/MattPD/cpplinks/blo…
- tutorials: github.com/MattPD/cpplinks/blo… (speakers usually mention which asm they're using)

in reply to mcc

the reason the terms Intel and gas "syntax" exist is to make sure you never know what the hell you're reading. these syntaxes seem to stretch and bend depending on their context... and to accommodate inline assembly via archaic rune magic and other mechanisms lost to time.

and that third point is spot on... I learned Intel syntax from trial and error like my ancestors before me... and yes... I was writing x86 assembly in nasm just yesterday. 😅🥲

in reply to mcc

Try “.intel_syntax noprefix”

Edit: I’m late to the party

Questa voce è stata modificata (9 mesi fa)
in reply to mcc

In this instance %rip is to signify your acknowledgement that you are digging yourself an early grave
in reply to mcc

been a long time since I did anything with nasm, but I used to like it. If you are already aware, sorry for noise. nasm.us/
in reply to mcc

having cut my teeth on Motorola assembly back in the day, it hurts every time I have to delve into Intel.
in reply to mcc

Oh? I remember it being pretty boring. Pulls from a pair of registers for the multiplicands and outputs the high/low parts to a pair of registers for the result. Is the frustration that you don't get to pick the registers?
in reply to slembcke

@slembcke That last thing yes, also that there's no way to multiply by an immediate? Isn't the point of CISC that you don't have to do register juggling like this
in reply to mcc

IIRC abusing LEA is the usual way to multiply by an immediate value. I forget the limitations of that though.

The AMD64 multiply instruction can still do all sorts of "multiply RAX(?) by one of these 12 memory addressing modes", so it's still kinda CISC-y I suppose.

in reply to mcc

imul is the one that supports immediates (but not fullwidth)

Somewhat Complicated Instruction Set Computer

in reply to mcc

@jrose I forget... signed and unsigned multiplication are only different if you don't sign extend them right? Does mulq work around that by simply computing the full width result and letting the programmer decide how to use it?
in reply to mcc

You probably want movzx to move the 16bit from al into esi, and zero extend the upper (I think; didn’t do asm in about a decade now…) half of esi.
Questa voce è stata modificata (9 mesi fa)
in reply to mcc

the AT&T style assemblers have their own mnemonics for some stuff (that the Intel manual only uses a single instruction).

Maaaaybe you can use godbolt.org to use an Intel-style assembler (like nasm) interactively and see the AT&T style disassembly on the side 🤔

in reply to mcc

i'll be honest with you, I don't think i've ever seen anything that isn't basically compiler guts use .macro, while the C preprocessor remains very popular for assembly files
in reply to mcc

You should totally write your own 64-bit Forth as a project 😉.
Unknown parent

mastodon - Collegamento all'originale
Tutiluren

r10 looks like the content out of an ascii string to me. Are you accidentally passing its data instead of a pointer to it?

r10 contents as ascii: \n731\n

Unknown parent

mastodon - Collegamento all'originale
mcc
@dysfun Can I still use gas macros?
Unknown parent

mastodon - Collegamento all'originale
mcc
@joe @dysfun yes, it has something called .macro
Unknown parent

mastodon - Collegamento all'originale
mcc
@dysfun i'm reasonably certain that's at&t format, the thing i'm trying to switch away from.
Unknown parent

in reply to mcc

@dysfun I've only use Intel syntax and some of my instructions have the "size" tag on them. It's been a while, but IIRC you sometimes it infers wrong and you have to make it explicit.
in reply to slembcke

@slembcke @dysfun but how do i know how to make it explicit if the assembly format is not documented…?
Unknown parent

mastodon - Collegamento all'originale
mcc

@dysfun @slembcke This is the reference I've been using, but I don't know how to turn lines in an instruction page into a thing I type in the window?

felixcloutier.com/x86/mov

This for example does not clarify which things should literally be present and which are notation of the reference itself. Some of the lines contain non-typeable typesetting like superscripts.

Unknown parent

mastodon - Collegamento all'originale
mcc

@MonniauxD the manual specifically says not to do this (it doesn't say what to do instead, but it says not to do it) and if i try to compile it, it says

src/number-echo-intel.s: Assembler messages:
src/number-echo-intel.s:21: Warning: mnemonic suffix used with `mov'
src/number-echo-intel.s:21: Warning: NOTE: Such forms are deprecated and will be rejected by a future version of the assembler

Unknown parent

mastodon - Collegamento all'originale
mcc

@dysfun @slembcke There is a line

MOV r32, imm32

How do I write a 32-bit immediate? That is what I have been trying to figure out. If I write "mov eax, 60" gas prints "Error: ambiguous operand size for `mov'"

in reply to mcc

@dysfun @slembcke would you consider switching to NASM? it is widely available, it uses intel syntax, and it infers argument sizes (like it's supposed to).
in reply to azul

I'd consider switching to clang but I would not consider this for this particular project, no.
Questa voce è stata modificata (9 mesi fa)
in reply to mcc

@dysfun I suspect you are missing a special character? In 6502 asm "lda 60" means load A from memory address 60. "lda #60" means load 60 into register A. I don't remember how that works, and the code I have open in front of me doesn't use any immediate values to check. 🙁
in reply to mcc

@dysfun @slembcke nasm is not clang but ok. anyway i bring this up because for intel syntax the only time you should have to specify an operand size is when you are doing an operation with only memory & immediate operands. otherwise it should be able to infer the size from the register names. it sounds like gas just doesn't support intel syntax properly...
Unknown parent

mastodon - Collegamento all'originale
mcc
@unnick @dysfun @slembcke i had .intel_syntax at the top but had not included noprefix. this fixes several things. thanks.
in reply to azul

@typeswitch It turns out that if you type ".intel_syntax" at the top of a gas file, gas will not give you intel syntax OR AT&T syntax but some secret third thing they do not document. apparently i was supposed to say ".intel_syntax noprefix" to get the real intel syntax.
@azul
Unknown parent

mastodon - Collegamento all'originale
Joe Groff
@dysfun does gas have macros other than running the C preprocessor? preprocessing should work regardless of the syntax mode
Unknown parent

mastodon - Collegamento all'originale
mcc
@dysfun knowing these facts are entirely useless unless i know what document canonicalizes them
Unknown parent

mastodon - Collegamento all'originale
David Monniaux
movq rax, 60 works with gas
in reply to mcc

have you put .intel_syntax noprefix somewhere at the top? if not, does it fix it?
Unknown parent

mastodon - Collegamento all'originale
David Monniaux

nasm accepts

bits 64
toto:
mov rax, 60

Unknown parent

mastodon - Collegamento all'originale
mcc
@dysfun This is helpful, thanks. Should I be worried however about the fact I am not writing x86, but rather x86_64?
Unknown parent

mastodon - Collegamento all'originale
mcc

@tomjennings @TomF Clang was supposed to have been about cleaning up the by-convention morass that GNU had fallen into. I expect GNU to accept this state of affairs on the world's most popular PC platform and not try to change it for 25 years, but Clang I expect better!!

This is two weeks after I discover Clang's libunwind library has literally no api documentation at all.

Unknown parent

mastodon - Collegamento all'originale
mcc
@erincandescent @TomF @tomjennings That's why Apple funded it and why Google jumped on board, but in my opinion it wouldn't have taken over the market *just* because corporations like it. I use clang/lldb preferentially even when GCC/gdb is an option because it's just cleaner, more intentional software.

Oblomov reshared this.

Unknown parent

glitchsoc - Collegamento all'originale
tom jennings
@TomF
Assembler syntaxes are peculiar to each assembler. There never was any standardization there. It's still the 1950s in there!
in reply to mcc

it was always a goal of clang to accept all of the same input that GCC (and Gas) did...
in reply to Erin 💽✨

I mean, clang became an actual thing because Apple needed a replacement for GCC 4.2 after FSF changed their license
Unknown parent

mastodon - Collegamento all'originale
Tom Forsyth
@tomjennings That's the whole discussion. They do not have the *assembler syntax* which is a different thing.
Unknown parent

glitchsoc - Collegamento all'originale
tom jennings

@TomF

I had all of those books, yards of them, as they came out. But no one but board designers used the data books, they're were tediously hardware oriented.

What we all used were the cheatcharts. Foldout instruction set summaries.
Oh how I wish I had my collection! I had probably 25, chips if written code for. Weird stuff like 8x300. Cosmac. Weird little Intel and moto.

The cheatcharts are what you want. All dog-eared torn and coffee stained from use.

But the datebook will have the official instructions descriptions and assembler mnemonics in excruciating detail.

@mcc

Unknown parent

mastodon - Collegamento all'originale
Tom Forsyth
@tomjennings Let us know if you find it.
in reply to Tom Forsyth

@TomF @tomjennings I assert in this case we'd be transitioning from 0 standards to 1!!!
in reply to mcc

@tomjennings @TomF
API documentation? Let me think... Oh, yes, I vaguely recall, from back in the early days, that such things existed. How quaint!
🙁