The Exact Behavior of MOV

The intention of this long post is to summarize the behavior of mov-related instructions I have observed. This can be helpful for other students as a note as well as for me to check whether my notes are indeed correct. So, do not assume my notes are 100% correct and feel free to comment if you want to add something or fix errors in the post.

Constant -> Register

# %rax = 0x0011223344556677
movb $-1, %al   # %rax = 0x00112233445566FF
movw $-1, %ax   # %rax = 0x001122334455FFFF
movl $-1, %eax  # %rax = 0x00000000FFFFFFFF
movq $-1, %rax  # %rax = 0xFFFFFFFFFFFFFFFF

The symbol (b=byte=8bit, w=word=16bit, l=long=32bit, q=quad=64bit) indicate source size. Notice movl, unlike other mov-related instruction, will zero-padding the rest. Notice without z=zero-extend or s=sign-extend, mov will keep the original bits (except for movl, I guess for compatibility reason as we move from 32-bits to 64-bits).

# %rax = 0x0011223344556677
mov $-1, %al   # %rax = 0x00112233445566FF
mov $-1, %ax   # %rax = 0x001122334455FFFF
mov $-1, %eax  # %rax = 0x00000000FFFFFFFF
mov $-1, %rax  # %rax = 0xFFFFFFFFFFFFFFFF

Notice that mov without size indication, mov will infer the size from the size of the destination register. If the destination is not a register (ie. is a memory), then the size will be inferred from the source. If the source is not a register (ie. is a memory or constant), then the operation is not allowed.

Memory -> Register

Problems of Unknown Memory Chunk Size

# Operating in Little Endian Machine
# %rax = 0x0011223344556677
movb (%rdi), %al   # %rax = 0x00112233445566AA
movw (%rdi), %ax   # %rax = 0x001122334455AABB
movl (%rdi), %eax  # %rax = 0x00000000AABBCCDD
movq (%rdi), %rax  # %rax = 0xAABBCCDDEEFFABCD

# Operating in Big Endian Machine
# %rax = 0x0011223344556677
movb (%rdi), %al   # %rax = 0x00112233445566CD
movw (%rdi), %ax   # %rax = 0x001122334455ABCD
movl (%rdi), %eax  # %rax = 0x00000000EEFFABCD
movq (%rdi), %rax  # %rax = 0xAABBCCDDEEFFABCD

Notice since the machine does not know how long the memory chunk is, the trunking behavior is different depending on the endianness of the machine. This can be a problem, and therefore mov[s/z][scr][dst] are designed to solve the ambiguity.

Solution by Adding Suffixes

There are total of 5+6 such operations:

Zero Extend

Zero Extend

Sign Extend

Sign Extend

(from Randal E. Bryant, David R. O’Hallaron - Computer Systems. A Programmer’s Perspective [3rd ed.] (2016, Pearson))

Exact Behavior of Suffix

(By this paper, there is no need for movzlq)

The zero-padding and sign-extend behavior will cover at least the size of the destination. But movzbl affects on all 64 bits while strangely movsbl only affect 32 bits.

Zero / Signed Variations

Zero / Signed Variations

(from USC Lectures)

Register -> Register

TODO: for mov, I guess it works like Constant -> Register except I am not sure about moving a register of size l to size of q. I am not sure how mov[z/s] will behave for this case. Good comments will be helpful.

Constant -> Memory

TODO: I have no idea how it works (ie. if I put -1 there, will -1 be treated as 64 bits constant? 32 bits constant?). Good comments will be helpful.

Register -> Memory

It will overwrite the size of the source register to memory as expected. There is no ambiguity.

Memory -> Memory

Not allowed

Table of Content