Posted by TheGun ( | 202.12.144.19) on June 25, 1999 at 02:33:21:
In Reply to: asm3.htm posted by TheGun on June 25, 1999 at 02:31:19:
This document covers the opcodes native to the 65816 cpu. If you think you can skim over this section once or twice, then come back for reference when you need it, you won't get very far at all.
Even though some opcodes are more important (read common) than others, you are advised to study this entire listing many times over. Not having a thorough understanding of opcodes when you are reading or writing code will cause problems. Granted there's a lot of information here, but assembly is as much the ability to memorize and regurgitate (think typical highschool busy-work) as any other talent.
Until now, the use of the word opcode and instruction has been fairly interchangeable. From now on, the correct terminology will be used - an instruction is a category of opcodes (such as LDA), but an opcode is a single instruction bound to one addressing mode (LDA Absolute). In other words, ADD is an instruction as the addressing mode is not specified, but ADD #1234 is an opcode with operand. No two opcodes are the same - each addressing mode of each instruction has it's own hex code specifying such. For example:
LDA #$1234 ; in a hex editor, this would appear as A9 34 12
LDA $1234 ; in a hex editor, this would appear as AD 34 12
As you should see, the two operations above use the same instruction (LDA), but appear differently in a hex editor - LDA Immediate being A9h and LDA Absolute being ADh. A9h and ADh are two different opcodes, but share the same instruction.
This section will have the following format:
Instruction Description Examples Flags Affected
* Exceptions |
This borrows greatly from other documents, residing at 6502.txt and *****, whose authors I am unsure of. Whoever created them, I would like to say now that their efforts are greatly appreciated.
The instructions are grouped so that similar opcodes follow each other. It's a bit more logical than alphabetical sorting.
Of great importance is the Addressing Mode table. The addressing modes listed there are the only possible functions of that instruction. If you wrote LDX $800000 and tried to assemble it, your assembler would (hopefully) give you an invalid operand error. That is because LDX Absolute Long is just not possible - that instruction was not allocated a hex code when the official spec was written, so it didn't make it's way into the chip. Basically, if you want to use a certain instruction in your code, make sure the addressing mode you're trying to use is valid.
Addressing Mode # ab ab,x ab,y abl abl,x d d,x (d) (d),y [d] [d],y (d,x) d,s (d,s),y | Syntax LDA #$50 LDA $8000 LDA $8000,X LDA $8000,Y LDA $C01000 LDA $C01000,X LDA $01 LDA $01,X LDA ($50) LDA ($50),Y LDA [$03] LDA [$03],Y LDA ($80,X) LDA $03,S LDA ($03,S),Y | Bytes 2* 3 3 3 4 4 2 2 2 2 2 2 2 2 2 |
Addressing Mode # ab ab,y d d,y | Syntax LDX #$80 LDX $8000 LDX $8000,Y LDX $04 LDX $04,Y | Bytes 2* 3 3 2 2 |
Addressing Mode # ab ab,x d d,x | Syntax LDY #$80 LDY $8000 LDY $8000,X LDY $04 LDY $04,X | Bytes 2* 3 3 2 2 |
Addressing Mode ab ab,x ab,y abl abl,x d d,x (d) (d),y [d] [d],y (d,x) d,s (d,s),y | Syntax STA $1000 STA $1000,X STA $1000,Y STA $7E0000 STA $7E0000,X STA $03 STA $03,X STA ($06) STA ($06),Y STA [$10] STA [$10],Y STA ($10,X) STA $01,S STA ($01,S),Y | Bytes 3 3 3 4 4 2 2 2 2 2 2 2 2 2 |
Addressing Mode ab d d,y | Syntax STX $2000 STX $97 STX $97,y | Bytes 3 2 2 |
Addressing Mode ab d d,x | Syntax STY $F000 STY $0A STY $0A,X | Bytes 3 2 2 |
This instruction adds the operand onto the value in A, and also adds the Carry flag (hence Add with Carry). You may remember that the carry flag is set (amongst other circumstances) when an addition results in a number larger than the A register can hold. This quality can be used to obtain addition results larger than 2 bytes - after adding 2 values, if the carry flag is set you know the answer's greater than FFFFh. As always, though, the size of the numbers added depends on the M flag - if it's set to 1, you can only add 1 byte from the operand onto A's lower byte, giving addition results from 0 to FFh (1FEh using the carry bit). When M=0, addition can give answers from 0 to FFFFh (1FFFEh using a carry). Since the carry bit is always added, it is customary (and strongly advised) that this flag is cleared before using ADC. This is done with the CLC (Clear Carry) opcode. PHX ; push 2 bytes from X onto the stack (at locations $0001FF and $0001FE) LDA #$0100 ; A = 0100h Flags: V (Overflow) LDA #$7000 Z (Zero) LDA #$8000 ; since A will now be zeroed, the Zero flag becomes set C (Carry) LDA #$F000 ; doing this sum on paper would give you a carry after the highest bit (10000h)ADC - Add with Carry
DB = C0h
S = 01FFh
M flag = 0
X flag = 0
CLC ; make sure the Carry flag is clear (0)
ADC $01,s ; add the A register, the carry flag and the 2 bytes at $0001FE
PLX ; pull X back off the stack
CLC ; Carry flag = 0
ADC $8000 ; A now equals 8100h
N (Negative) LDA #$7000
CLC
ADC #$8000 ; the high bit of A will become set after this operation
CLC
ADC #$7000 ; A and the operand are positive but the result's negative - a signed overflow is triggered
CLC ; (the carry and overflow flags would also be set here)
ADC #$8000
CLC
ADC #$2000
Addressing Mode # ab ab,x ab,y abl abl,x d d,x (d) (d),y [d] [d],y (d,x) d,s (d,s),y | Syntax ADC #$80 ADC $1000 ADC $1000,X ADC $1000,Y ADC $C11000 ADC $C11000,X ADC $09 ADC $09,X ADC ($0B) ADC ($0B),Y ADC [$0D] ADC [$0D],Y ADC ($0B,X) ADC $01,S ADC ($01,S),Y | Bytes 2* 3 3 3 4 4 2 2 2 2 2 2 2 2 2 |
Addressing Mode # ab ab,x ab,y abl abl,x d d,x (d) (d),y [d] [d],y (d,x) d,s (d,s),y | Syntax SBC #$80 SBC $0100 SBC $0100,X SBC $0100,Y SBC $808100 SBC $808100,X SBC $77 SBC $77,X SBC ($88) SBC ($88),Y SBC [$99] SBC [$99],Y SBC ($A0,X) SBC $01,S SBC ($01,S),Y | Bytes 2* 3 3 3 4 4 2 2 2 2 2 2 2 2 2 |
Addressing Mode A ab ab,x d d,x | Syntax ASL A ASL $8000 ASL $8000,X ASL $90 ASL $90,X | Bytes 1 3 3 2 2 |
Similar to ASL, this instruction shifts the operand right by one bit, effectively halving and rounding down the source. The lowest bit of the operand is shifted into the Carry flag, giving similar bitplane uses to ASL. The highest bit of the operand is always made zero after LSR is executed. It is useful that the Carry flag is altered by this instruction, as it allows you to divide by powers of 2 with a remainder. For example: The actual code involved in this division/remainder use for LSR is a bit too complex for this section, but will be covered in a later section. LDA $80 ; A = %1111111100000000 = FF00h, Carry flag unknown LDA #$0100 ; A = %0000000100000000 = 0100h, Carry flag unknown Flags: Z (Zero) LDA #$0001 C (Carry) LDA #$FFFFLSR - Logical Shift Right
D = 0000h
$000080 = 00h
$000081 = FFh
M flag = 0
LSR A ; A = %0111111110000000 = 7F80h, Carry flag is cleared
LSR A ; A = %0000000010000000 = 0080h, Carry flag is cleared
N (Negative) Since the highest bit is cleared, the N flag is -always- set to 0 by LSR
LSR A ; the only set bit will be shifted into the Carry flag, leaving A as 0000h
LSR A ; the lowest bit (which is set) is moved into the Carry flag
Addressing Mode A ab ab,x d d,x | Syntax LSR A LSR $1000 LSR $1000,X LSR $05 LSR $05,X | Bytes 1 3 3 2 2 |
Addressing Mode A ab ab,x d d,x | Syntax ROL A ROL $1200 ROL $1200,X ROL $03 ROL $03,X | Bytes 1 3 3 2 2 |
Addressing Mode A ab ab,x d d,x | Syntax ROR A ROR $1000 ROR $1000,X ROR $09 ROR $09,X | Bytes 1 3 3 2 2 |
PHA, and the next 6 push instructions, are all extremely similar. PHA stores the contents of the A register (1 or 2 bytes, depending on the M flag) at the memory location pointed to by the S register. This action changes the value of the S register to point to the next free byte on the stack. LDA #$7700 No flags are affected by the Push instructions. Since all the other push instructions work the same as PHA, here's a brief listing of what they push and how many bytes end up on the stack: The reason for pushing the PB register may not be obvious, as PB can only be modified with a jump-style instruction. It's most often used to make sure the DB register points to the same bank as PB, so any addressing modes that are DB sensitive will load from the bank whose code is currently being run. PLA, and the next 5 pull instructions, are also extremely similar to one another (surprised?). PLA loads into A either 1 or 2 bytes from the S register + 1 (+1 because S points to the next free byte). The S register is then incremented by the number of bytes pulled. PHB ; The DB register is stored at $0001FF, the Stack register is decremented by 1 That bit of code allows us to transfer DB to A, something not possible with the set of transfer instructions. You should remember that the push instructions don't alter any flags - however the pull instructions do. For most of these instructions, the flags altered are the same as those for a typical LDA instruction, since after all a PLA is effectively doing LDA (S+1). Z (Zero) LDA #$00 ; Zero flag is set The PLP instruction can alter any and all of the flags in the P register - since you're simply pulling a byte off the stack and sticking it in P: LDA #$20 PLP cannot alter the emulation bit, however, as that bit is only changeable with the XCE instruction. The branch instructions are (almost all) conditional operators - they alter which course your code takes depending on conditions specified by the P register. These instructions are immensely important to assembly on any cpu - think how limited code would be that couldn't say "If this, run code x, if that, run code y". Understanding these instructions is vital to any 65816 assembly work you wish to do. LDA $0800 ; A = 40h LDA #$05 ; if the Carry flag was set, this code would be executed as the above branch would fail SomeCode ; this is a code label - the assembler uses these to figure out where branches go In this example, the ASL A affected the carry flag. If that flag became clear, the BCC SomeCode would make the CPU jump to SomeCode. If the flag became set, the CPU would look at the BCC instruction and completely ignore it, going straight on to the next instruction. Here we see a basic 'if' statement - if the highest bit of $000800 was set, store #$05 at $000800. If the highest bit was clear, store the shifted value there instead. No flags are affected by the branch instructions, whether they succeed or fail. BCS is extremely similar to the BCC, only the branching condition is reversed. This is helpful for similar conditions to BCC, such as bitplanes, as well as extended addition. By extended addition, I mean calculating sums that would otherwise be too large for the A register to express: LDA $80 ; A = 8000h RTS ; if the carry flag was clear, this code would have been executed Carry This code isn't the most efficient that could have been written, but it serves the purpose. Here, a number has 8000h added to it, then if the result is greater than 10000h (carry flag set, in other words), the 2 bytes at $82 are incremented. Remember the reverse byte ordering - $80 is the lowest byte, $81 the high byte. This can be extended to say $82 is an even higher byte, and $83 is higher still. This code actually allows 32 bit addition - though some extra code would be needed for it to be actually useful. BEQ, along with its sister function BNE, branch depending on the status of the Zero flag. If the zero flag becomes set by some action, BEQ will succeed (jump to new code, in other words). If the zero flag is clear, the branch will fail and the cpu will continue processing like the opcode wasn't there. This instruction is useful for seeing if a variable is zero or not (duh). It's useful for things like joypad testing, where values of zero mean nothing is happening: LDA $004218 ; $004218 returns the status of player 1's joypad ; insert joypad response code here ; NoAction ; code label If any of the bits in A had become set, the BEQ would have failed and joypad processing would have occurred. As it happens, no buttons had been pressed so the joypad processing was skipped altogether. The companion to BEQ, this opcode will branch if the zero flag is cleared. This is useful for the same reasons as BEQ - it's pretty much a personal decision which to use (though sometimes one makes more sense than the other does). BNE is especially useful for loops, where a value is continuously being counted down. LDA #$00 ; set up the A, X and Y registers Repeat ; another label - all these do to your code is make it more readable The loop at Repeat will cycle through 8 times, storing 8 copies of 00h at $7E8000. It is a good example of what the index registers are designed for - Y is indexing the storing of values, and X is counting down the loop. As this instruction tests the setting of the N flag, it is useful for both detecting negative numbers and quickly testing the high bit of a variable. If you decide to use 7 bit values for your text, with the sign bit denoting a special action or substring, you could have code like the following: LDA $118400,x ; load a byte from $118400+X - N flag is set if it's negative ; normal text code ; Special ; code label Hopefully you understand the concept of branching now - if the condition for the branch is met (in this case, if the N flag is set), the cpu jumps to wherever the branch points. If the condition fails, the cpu continues on to the next instruction following the branch. This instruction is the opposite of BMI - it branches if the N flag is clear (the last action gave a positive result). This is useful for seeing if the high bit is clear, so it lends itself to waiting for the snes to reach it's VBlank. The VBlank is the period when you can safely update the on-screen graphics, as the snes has finished drawing a frame and is waiting for the electron gun to get back to the top of the screen. TestVBL In this very common loop, the PPU register $4210 is continuously tested to see if it's high bit is set - at which point you can safely update vram/sprites. Contrary to the previous conditional branch operations, this opcode forces the cpu to jump without testing any of the P register's flags - hence the name. This is useful for cleaning up after other branches, as sometimes you want your code to continue past another, conditional section: LDA $7E9011 ; load a variable ; if the high bit was clear, this code is executed: SomeCode CleanUp ; whether or not the BMI was successful, this code is run If the BRA statement wasn't in there, as soon as the code following the BMI was run, the cpu would have continued on to run whatever is at SomeCode, which is often not desirable. The BRA statement lets us bypass SomeCode and go straight to CleanUp. I've never actually had to use this instruction, or it's mirror image BVS, so it's not too easy to think up an example. Basically it just branches if the V flag is clear, which can be done by a myriad of actions. See BVC. This instruction is exactly the same as BRA, only you can branch further. If you remember the addressing modes (as you surely do :) all the branch instructions have a 1 byte signed operand, letting them jump a maximum of 128 bytes backwards or 127 bytes forwards in your code. The BRL (and PER) instruction allows a 2 byte operand letting you jump 32768 bytes backwards or 32767 byte forwards. Although that makes it almost identical in functionality to the JMP instruction, remember the operand is relative to the current location, so there's nothing stopping you copying your code in a hex editor, pasting it somewhere else and still having it run properly - something a JMP instruction would merrily crash. Whether you use BRA or BRL in your code pretty much depends on what kind of errors you get - if you assemble your code and get "error - branch out of range" all over the place, you'll need to either optimize your code or stick in a few BRL's here and there.PHA - Push A
S = 01FFh
M flag = 0
PHA ; 00h is stored at $0001FF, 77h at $0001FE
; S register now contains 01FDh ( S = S - # of bytes pushed )PLA - Pull A
S = 01FFh
M flag = 1
PLA ; load a byte from S+1, which is the value of DB we just pushed
Flags affected by PLA, PLB, PLD, PLX and PLY:
N (Negative) LDA #$80 ; Negative flag is set
PHA ; flags unchanged
LDA #$10 ; Negative flag is cleared
PLB ; Negative flag is set again - DB now contains #$80
PHA ; Zero flag unchanged
INC A ; Zero flag cleared (A = 01h)
PLA ; Zero flag set again (A = 00h)
M flag = 1
PHA
PLP ; bit 5 of P is now set, all other flags are cleared.BCC - Branch on Carry Clear
DB = 00h
$000800 = 40h
M flag = 1
ASL A ; Carry flag becomes clear (high bit moved into carry)
BCC SomeCode ; if the Carry flag is clear, branch to the code label SomeCode
STA $0800 ; Store A at $000800. The value stored depends on whether the BCC was successful.BCS - Branch on Carry Set
$000080 = 00h
$000081 = 80h
D = 00h
M flag = 0
CLC
ADC #$8000 ; A = 0000h, Carry flag becomes set
STA $80
BCS Carry ; if the carry flag became set, jump to the code label Carry
INC $82 ; if a carry occurred, increment the high byte of $80
RTS ; RTS causes a subroutine to return to the code that called itBEQ - Branch if Equal
$004218 = 00h ; these memory locations are SNES registers - not like normal memory
$004219 = 00h
M flag = 0
BEQ NoAction ; if no buttons have been pressed, don't do any joypad processing
RTS ; return to calling codeBNE - Branch if Not Equal
DB = 7Eh
M flag = 1
X flag = 1
LDX #$08
LDY #$00
STA $8000,y ; store 00h at $7E8000+Y
INY ; add 1 to Y
DEX ; subtract 1 from X
BNE Repeat ; if X hasn't reached 0 yet, loop back to RepeatBMI - Branch if Minus
M flag = 1
BMI Special ; if the highest bit is set, jump to the label Special
; special char code ;BPL - Branch if Plus
M flag = 1
LDA $004210 ; the high bit of this register is set if the VBlank period has been reached
BPL TestVBL ; keep loading $004210 until the high bit is setBRA - Branch Always
M flag = 1
BMI SomeCode ; if the high bit's set, jump to SomeCode
; insert unimportant code ;
BRA CleanUp ; now we jump to CleanUp
; more unimportant code ;
; code that was required either way ;
RTSBVC - Branch on Overflow Clear
BVC - Branch on Overflow Set
BRL - Branch Long
Addressing Mode axy | Syntax MVN $7E,$7F | Bytes 3 |
Addressing Mode axy | Syntax MVP $90,$7E | Bytes 3 |
Addressing Mode ab ab,x d d,x | Syntax STZ $8000 STZ $1000,X STZ $80 STZ $50,X | Bytes 3 3 2 2 |
This instruction doesn't do anything to the DB register, it's just an operand-free way to swap the high and low bytes of A. The need to swap the low and high bytes of A (known as A and B for this instruction alone) pops up every now and then, so it's worth knowing about. When the M flag is set to 1, it's useful to store a temporary byte variable with XBA (the high byte of A will be otherwise untouchable, much like pushing it onto the stack). Will wonders never cease. Another name for the A register stems from this instruction - 'C' denotes A as being 2 bytes (A = 1 byte, B = 1 byte, C = 2 bytes). A being called C rears its ugly head in the register transfer instructions (TCD instead of TAD). Z (Zero) LDA #$00FF ; even if the M flag is set to 0,XBA - Exchange B with A
Flags:
N (Negative) LDA #$0080 ; let's assume the M flag is 0 for now
XBA ; A is now 8000h, presumed negative
XBA ; the zero flag is set if A's low byte becomes zero
Addressing Mode # ab ab,x ab,y abl abl,x d d,x (d) (d),y [d] [d],y (d,x) d,s (d,s),y | Syntax CMP #$87 CMP $8000 CMP $8000,X CMP $8000,Y CMP $7E3000 CMP $7E3000,X CMP $03 CMP $03,X CMP ($06) CMP ($06),Y CMP [$09] CMP [$09],Y CMP ($0C,X) CMP $01,S CMP ($01,S),Y | Bytes 2* 3 3 3 4 4 2 2 2 2 2 2 2 2 2 |
Addressing Mode # ab d | Syntax CPX #$89 CPX $1200 CPX $12 | Bytes 2* 3 2 |
This instruction is the same as CPX in every way - even addressing modes - except for the fact that it focuses on the Y register. The number of bytes Y is compared against (and taken into account by the subtraction) is governed by the Y flag. For examples/flags/addressing modes, see CPX. This single-byte instruction simply eats up clock cycles - 2 to be exact. It doesn't alter any flags or any registers at all - just takes 2 clock cycles to run. NOP is most useful for time-sensitive hardware-related issues, such as multiplication. In the world of snes hardware multiplication/division (there is none built into the 65816, so nintendo added several registers capable of these functions), you have to wait 15 or so clock cycles after you store the values to be computed, so a common way to waste that time is with NOP. In terms of hacking, though, NOP is a useful way to clear out unwanted checksum calculation, copy-protection routines or other unwanted code. These 3 instructions, BRK (Break), COP (Coprocessor) and STP (Stop) are completely and utterly useless in the SNES universe - they are simply remnants carried over from the fact that the 65816 was actually used in real computers, computers that needed these extra interrupts. If you really want to learn about these instructions, consult the all-knowing, all-seeing EPR. This handy little instruction clears the Carry flag of the P register. Useful for setting up addition, and not a heck of a lot else. This clears the Decimal flag, thus leaving the snes in the good wholesome state of hexadecimal arithmetic. By clearing the Interrupt Disable flag in the P register, you allow interrupts to take control of the CPU when they are triggered. More specifically, you cause the cpu to jump to the NMI vector every time you reach the Vertical Blank (scanline 224 in NTSC mode), as well as jumping to the IRQ vector if you enabled the Horizontal or Vertical Interrupts. The actual usage of interrupts is a bit complex to explain here, and will be covered later. This clears the Overflow flag, which is only ever much use if you're attempting signed addition/subtraction (remember you trigger signed overflows when adds/subs overflow the high bit of A). Setting the Carry flag is always advised before a SBC instruction, and apart from that it's also useful for the ROR/ROL instructions to move a 1 into a variable's top or bottom bit. Setting the Decimal Flag to 1 invokes the 65816's decimal mode, where any loads into registers convert the regular hex number to the bastard child of decimal and hex (0100h = 0256h in decimal mode). There are some remote uses for decimal mode, such as printing a decimal number on the screen (just store a variable, invoke decimal mode, load the variable, and it's already converted), but not much else.CPY - Compare with Y
NOP - No Operation
BRK, COP and STP
CLC - Clear Carry Flag
CLD - Clear Decimal Flag
CLI - Clear Interrupt Disable Flag
CLV - Clear Overflow Flag
SEC - Set Carry Flag
SED - Set Decimal Flag
|
|
AND is a useful way to isolate certain parts of a value - AND #$0F will leave the low nibble of a variable in A for example.
And, of course, the number of bytes you AND with depends on the M flag.
$000080 = 45h
M flag = 1
D = 0000hLDA $80 ; A = 45h
AND #$40 ; A = 40h (01000101b & 01000000b)
BNE Bit6Set ; if a bit remains set, branch (zero flag clear)
Here, we AND a variable in A with 40h, which will leave either bit 6 set or all bits clear. Testing individual bits is a typical use for AND.
Flags:
Z (Zero) LDA #$FF ; all bits set
AND #$00 ; all bits clear -> Z flag set (11111111b & 00000000b)N (Negative) LDA #$FF ; all bits set -> N flag set
AND #$7F ; high bit cleared -> N flag cleared
Addressing Mode # ab ab,x ab,y abl abl,x d d,x (d) (d),y [d] [d],y (d,x) d,s (d,s),y | Syntax AND #$0F AND $9000 AND $9000,X AND $9000,Y AND $819000 AND $819000,X AND $03 AND $03,X AND ($06) AND ($06),Y AND [$F0] AND [$F0],Y AND ($70,X) AND $03,S AND ($03,S),Y | Bytes 2* 3 3 3 4 4 2 2 2 2 2 2 2 2 2 |
|
|
ORA can be used to combine two variables (read joypad 1 and 2 at the same time, for example), as well as more advanced functions such as overlapping font tiles (variable width fonts, in other words). Also, the number of bytes computed depends on the M flag.
M flag = 0LDA $004218 ; load player 1's joypad information (2 bytes)
ORA $00421A ; combine with player 2's joypad information (2 bytes)
That piece of code will let the game read joypad information whether the player is using joypad 1, 2 or both at once. Final Fantasy II is an example of this. ORA is also useful for setting individual bits in A, as ORA #$80 will set the negative bit of A, for instance.
Flags:
Z (Zero) LDA #$00 ; zero flag set
ORA #$FF ; zero flag cleared - A = FFhN (Negative) LDA #$01
ORA #$80 ; sets high bit in A -> N flag is set
Addressing Mode # ab ab,x ab,y abl abl,x d d,x (d) (d),y [d] [d],y (d,x) d,s (d,s),y | Syntax ORA #$80 ORA $1000 ORA $1000,X ORA $1000,Y ORA $7E9000 ORA $7E9000,X ORA $43 ORA $43,X ORA ($46) ORA ($46),Y ORA [$90] ORA [$90],Y ORA ($00,X) ORA $01,S ORA ($01,S),Y | Bytes 2* 3 3 3 4 4 2 2 2 2 2 2 2 2 2 |
|
|
This instruction is mostly used to get the twos-complement of a variable for the addition of negative values. The number of bytes EOR affects depends on the M flag.
M flag = 0LDA $80 ; contains number of bytes to go backwards
EOR #$FFFF ; this and the INC perform 2's complement
INC
CLC
ADC $82 ; subtract offset from $82
That somewhat cryptic code will make sense to people familiar with binary math, but not many others. EOR does have it's uses, though since this is an assembly document, not a disection of algorithms, it won't be discussed here.
Flags:
Z (Zero) LDA #$FF ; zero flag cleared
ORA #$FF ; all bits in A are flipped -> A = 00h and Z flag is setN (Negative) LDA #$01 ; N flag is cleared
EOR #$80 ; flips high bit in A -> N flag is set in this case
Addressing Mode # ab ab,x ab,y abl abl,x d d,x (d) (d),y [d] [d],y (d,x) d,s (d,s),y | Syntax EOR #$FF EOR $1E00 EOR $1E00,X EOR $1E00,Y EOR $C01000 EOR $C01000,X EOR $04 EOR $04,X EOR ($07) EOR ($07),Y EOR [$09] EOR [$09],Y EOR ($0A,X) EOR $01,S EOR ($01,S),Y | Bytes 2* 3 3 3 4 4 2 2 2 2 2 2 2 2 2 |
Addressing Mode # ab ab,x d d,x | Syntax BIT #$C0 BIT $1000 BIT $1000,X BIT $04 BIT $04,X | Bytes 2* 3 3 2 2 |
Addressing Mode ab d | Syntax TSB $C000 TSB $04 | Bytes 3 2 |
Addressing Mode ab d | Syntax TRB $C000 TRB $04 | Bytes 3 2 |
Addressing Mode A ab ab,x d d,x | Syntax INC A INC $1100 INC $1100,X INC $20 INC $20,x | Bytes 1 3 3 2 2 |
INX is ridiculously simple - it adds 1 to the X register. That's about it, really. Whether 1 or 2 bytes in X are affected depends on the X flag's setting. LDX #$0000 Repeat Flags: N (Negative) LDX #$7FFF ; N flag is clear See INX - this instruction works exactly the same.INX - Increment X
M flag = 1
X flag = 0
LDY #$1000
STZ $2000,x ; zero 1000h bytes at $2000
INX
DEY
BNE Repeat
Z (Zero) LDX #$FFFF ; zero flag cleared
INX ; X = 000h, zero flag set
INX ; X = 8000h, N flag is setINY - Increment Y
Addressing Mode A ab ab,x d d,x | Syntax DEC A DEC $1000 DEC $1000,X DEC $00 DEC $00,X | Bytes 1 3 3 2 2 |
Decrement X does exactly what you think it should - subtracts 1 from X. The number of bytes in X affected depend on the X flag. LDX #$1000 ZeroVRAM Flags: N (Negative) LDX #$0000 ; N flag is clear See DEX. TAX is the first of the register tranfser instructions - operand free ways to copy one register's contents to another without resorting to push/pull instructions. There are a number of reasons you'd want to use these instructions - they're fast (2 cycles versus 7 for push/pull), can help avoid messy use of the stack, and can supplement instructions with few addressing modes. By supplementing, I mean you could use LDA's absolute long addressing to fetch a value, then TAX it to X, getting around the fact that LDX can't use absolute long addressing. One interesting quirk about the transfer instructions is how many bytes they copy - what if the M flag is 1 but the X flag is 0? What about the other way around? To deal with this, each of the transfer functions have a rule governing how many bytes to copy. In the case of TAX: The number of bytes transferred is the current width of X, as in the set with the X flag So, whenever the X flag is 0, the full 2 bytes of A are transferred across. When X = 1, only the low byte of A is transferred to the low byte of X. LDA [$03],y ; LDX doesn't have the [d],y addressing mode Flags: N (Negative) LDA #$8000 ; N flag set Works exactly the same as TAX (follows the same rule and all), but copies A's contents to Y. The number of bytes transferred is controlled by the setting of the X flag This instruction calls the A register C, and for good reason. As 'C' denotes the high and low bytes of A, it means that 2 bytes are always transferred, regardless of the M flag's setting. TCD is useful for setting up a new direct page somewhere other than 0000h. This has been used in commercial games to allocate temporary memory and allow a low-level implementation of threads. For example, most games set their D register to something different when talking to the SPC, so 1-byte operands can be used where 2 would be required normally. 2 bytes are always transferred by TCD Flags are affected in the same way as TAX This instruction allows you to change where the stack register points. This is helpful for initializing the snes as the stack defaults to the position 01FFh, which doesn't leave much room for allocating memory and such. As 'C' is used in the instruction mnemonic, 2 bytes are always transferred. 2 bytes are always transferred by TCS No flags are affected by TCS Again, since C is in the mnemonic, it means 2 bytes are always transferred. TDC is a useful way to zero A (LDA #$0000), as the D register is normally set to 0000h. There are times it isn't, though, which can cause huge problems if you're expecting 0000h instead of 1E00h. 2 bytes are always transferred by TDC Flags are affected in the same way as TAX. As the name implies, this instruction transfers the 2 bytes in the stack register to the A register, useful for allocating memory in front of the stack. Flags are affected in the same way as TAX. 2 bytes are always transferred by TSC TSC ; A = S Allocating memory in this fashion can get both extremely messy and complex, but as it is extremely useful in 65816 coding it will be covered in the next section. This instruction is almost the same as TSC, but in this case the transfer isn't always 2 bytes in size. If the X flag is set to 1, only the stack's low byte is moved to X's low byte. Otherwise, 2 bytes are transferred. Apart from that, the P register's flags are affected identically to TAX. The number of bytes transferred is governed by the X flag The opposite to TAX, this transfers the contents of X to A. Flags affected are the same as TAX. The M flag governs the number of bytes transferred. If the X flag = 1, the high byte of the transfer will be 00h TXS is used for the same reasons as TCS - to set the stack to wherever you want it. As 2 bytes are always transferred, if X is 1 byte wide the high byte transferred will be 00h. Flags are affected identically to TAX. 2 bytes are always transferred. If the X flag = 1, the high byte of the transfer will be 00h TXY is useful for addressing modes where only Y can be used, such as (d),y and [d],y. The number of bytes transferred depends on the X flag. Flags are affected identically to TAX. The number of bytes transferred is dictated by the X flag This instruction is identical to TXA in every way, except Y is transferred instead of X. See TXY, but with the two index registers reversed.DEX - Decrement X
M flag = 0
X flag = 0
DB = 00h
STZ $2118 ; write 0000h into the SNES video ram ($2118 is another register)
DEX
BNE ZeroVRAM
Z (Zero) LDX #$0001 ; zero flag clear
DEX ; X = 0000h, zero flag set
DEX ; X = FFFFh, N flag is setDEY - Decrement Y
TAX - Transfer A to X
M flag = 0
X flag = 0
TAX ; transfer the 2 bytes loaded to X
Z (Zero) LDA #$0000 ; zero flag set
LDX #$0001 ; zero flag clear
TAX ; zero flag set
LDX #$0000 ; N flag clear
TAX ; N flag setTAY - Transfer A to Y
TCD - Transfer C (A & B) to D
TCS - Transfer C to S
TDC - Transfer D to C
TSC - Transfer Stack to C
M flag = 0
SEC
SBC #$0F ; A holds the address 10h bytes in front of the stack
TCD ; $00 would now access the memory 0Fh bytes in front of the stackTSX - Transfer S to X
TXA - Transfer X to A
TXS - Transfer X to S
TXY - Transfer X to Y
TYA - Transfer Y to A
TYX - Transfer Y to X
Addressing Mode ab (ab) (ab,x) | Syntax JMP $99A0 JMP ($1000) JMP ($1000,X) | Bytes 3 3 3 |
Adressing Mode abl (ab) | Syntax JML $C00000 JML ($0200) | Bytes 4 3 |
Addressing Mode ab (ab,x) | Syntax JSR $9000 JSR ($9000,X) | Bytes 3 3 |
Addressing Mode abl | Syntax JSL $ED4000 | Bytes 4 |
This is the companion instruction to JSR, allowing you to return to whatever code called it. Though it sounds fun and wholesome, all RTS does is pull two bytes off the stack and put them into the PC register. Unfortunately, if you've been pushing and pulling a lot of values in your subroutine, a misplaced RTS will simply pull off of the stack whatever the last thing was that you pushed on. Whenever you're using subroutines, you have to make sure all your push actions have been pulled off at some point, or a RTS will direct the cpu to who-knows-where. As was the case with RTS/JSR, RTL allows you to return from a JSL instruction back to the original code. RTL pulls 2 bytes off the stack and into PC, then a third byte into PB. As always, caution must be taken to make sure the stack has had everything pulled back off that was pushed after the JSL, or horrid things will happen.RTS - Return from Subroutine
RTL - Return from Subroutine Long
Addressing Mode # | Syntax PEA $5050 | Bytes 3 |
Addressing Mode (d) | Syntax PEI ($01) | Bytes 2 |
PER is another of the load-directly-into-the-stack instructions, this time pushing the 2 byte result of (PC + Operand + 2). I can think of few uses for this off the top of my head. As is implied by the name, this instruction allows you to set/clear the hidden emulation flag of the 65816. At some point, all snes games are going to execute the famous CLC, XCE sequence to get the cpu out of emulation mode, which it thoughtfully starts up in. After the XCE has been executed, the Carry flag is assigned the previous value of the Emulation bit. In the case of a CLC, XCE at startup, the Carry flag would be set to 1 afterwards. This has all kinds of uses if you're writing an NES emulator and want to know what mode you're in (?). As demonstrated earlier, executing this instruction makes the 65816 idle until an interrupt hits. If interrupts are disabled (I flag = 1), the instruction simulates a NOP and continues to the next instruction. Now that you're completely fluent with the opcodes (hohoho), it's time to use them in some meaningful code. The next section covers assembly techniques and common, useful bits of code for any hacking work - Coding in 65816 Assemblers.
PER - Push Effective Relative
SEP - Set Bits in P
REP - Reset Bits in P
XCE - Exchange Carry with Emulation Bit
WAI - Wait for Interrupt
Congrats!