6502 Assembly Code – Analysis

Analyzing code objectives, run-time and memory complexity.

Introduction

A few weeks ago, I enrolled in the Software Portability & Optimization course at my college. We got familiar with the structures and components of a CPU (Status Register, Accumulator, Program Counter, Stack…) in order to understand how assembly code works, and what it means. Modern CPU’s are incredibly complex, so throughout the course we will focus on the 6502 microprocessor, as it is relatively simple, and well documented.

Code Functionallity

For this lab, we were provided with the following lines of code, which we used to run in the online 6502 emulator.

lda #$00     ; set ptr in adrs $40, point to $0200
sta $40	     ; ... low byte ($00) goes in adrs $40
lda #$02	
sta $41	     ; ... high byte ($02) goes in adrs $41
lda #$07     ; colour number
ldy #$00     ; set index to 0

loop: sta ($40),y ; set pixel colour at the adrs (ptr)+Y
      iny         ; increment index
      bne loop    ; continue until done page (256 px)

      inc $41	  ; increment the page
      ldx $41	  ; get the current page number
      cpx #$06	  ; compare with 6
      bne loop	  ; continue until done all pages
A looping gif of the pixels on the emulator bitmap display, filling up in yellow, page by page.
Loop of the program output
taken from the 6502 emulator.

Although this code seems confusing, after some analysis of the instructions, I have formed some conclusions:

  • The objective is to color all pixels on the screen, into a single color.
  • There are 256 pixels on each page.
  • There are 4 pages of pixels on the screen (each page is a horizontal slice).
  • There is an inner loop, coloring each pixel on the page.
  • There is an outer loop, coloring each page.

Performance Analysis

To analyze the run-time and memory usage, I used the 6502 Reference to derive the number of cycles, or operations, that each instruction “takes”.

Alternative Cycles & Counts

The Alternative Cycles and Counts, are specifically relevant when using branching, or conditional actions. What it essentially means, is that instructions like BNE (Branch if not zero), have different performance when their condition evaluates to True, or False.

For example, if our loop executes a total of n=4 times, 3 times of those 4 (n-1), the BNE will evaluate to True (when result is not zero), and jump back to the beginning of the loop. In situations like these, the documentation states 3 cycles (the additional jump adds a cycle). However, in cases where the end of the loop is reached, (when the result evaluates to 0), which executes once in the loop, the instruction only takes 2 cycles. Therefore we get the formula:

# Main (when loop exits, branch is not taken): 
Cycles = 2 
Count = 1 (exits once).
=> 2 x 1 = 2

# Alt (when loop continues, branch is taken):
Cycles = 2 + 1 = 3 ( +1 for branch is taken)
Count = n - 1 (where n is the number of times the loop executes)
=> 3 x (n-1)
CyclesCountAlt-CyclesAlt-CountTotal
LDA #$00212
STA $40313
LDA #$02212
STA $41313
LDA #$07212
LDY #$00212
LOOP STA ($40), Y6256(4)6144
INY2256(4)2048
BNE LOOP21(4)3255(4)3068
INC $415420
LDX $413412
CPX #$06248
BNE LOOP213311

Total Cycles: 11325
Clock Speed: 1 MHz
Cycle Time: 0.000001 Seconds
Execution Time: 0.011325 Seconds

The code takes 11.325 mS to execute, at 1 MHz clock speed.

Memory Usage Analysis

To calculate the “total memory usage for the program code plus any pointers or variables”, I will first analyze the code to see how many, and what kind of variables and pointers were used, and then use the 6502 Reference to determine the memory usage of each instruction.

In lines 2 and 4, we are using the STA command, to load the low and high byte of the address of the first page to print on (#$0200). This 1 pointer takes up 2 bytes:

  • Low Byte, $#00, is stored in address $40.
  • High Byte, $#02, is stored in address $41
InstructionMemory (Bytes)
LDA #$002
STA $402
LDA #$022
STA $412
LDA #$072
LDY #$002
LOOP STA ($40), Y2
INY1
BNE LOOP2
INC $412
LDX $412
CPX #$062
BNE LOOP2

The total memory usage, including the instructions and the pointers, is 27 bytes.

Conclusion

In conclusion, the code that was provided above is used to fill the bitmap screen to the color yellow, iterating pixel by pixel, page by page, until completion. To achieve this, it uses a nested loop, which is quite costly, both in terms of memory and in terms of run-time. Overall, it runs 11.325 mS at 1MHz clock speed, and takes up approximately 27 bytes in memory.

Next, I will be trying to optimize this code, and play around with it, to learn more about how assembly code works, and what different functionalities exist.

Comments

One response to “6502 Assembly Code – Analysis”

  1. […] 6502 Assembly Code – Analysis […]

Leave a Reply to 6502 Assembly Code – Optimization – Blog Cancel reply

Your email address will not be published. Required fields are marked *