6502 Assembly Code – Optimization

Optimizing Performance and Memory Usage of Assembly Code.

Introduction

In my previous post, I covered how to analyze performance and memory usage of assembly code. Here, I will start looking into how to optimize the code, using the 6502 Reference, and some programming logic.

Then, I will modify the code to achieve different results, using random colors, different sectioning and more.

Optimization

The goal is to optimize the old implementation of “coloring the entire screen yellow” code, to improve the runtime, and hopefully the memory. It is known that there is a solution that achieves the same result at almost half the cost in performance, and even bigger memory difference.

Previous Code
lda #$00     ; set ptr in adrs $40, point to $0200
sta $40	     ; ... low byte ($00) goes in adrs $40
lda #$02	
sta $41	     ; ... high byte ($02) goes in adrs $41
lda #$07     ; colour number
ldy #$00     ; set index to 0

loop: sta ($40),y ; set pixel colour at the adrs (ptr)+Y
      iny         ; increment index
      bne loop    ; continue until done page (256 px)

      inc $41	  ; increment the page
      ldx $41	  ; get the current page number
      cpx #$06	  ; compare with 6
      bne loop	  ; continue until done all pages

Previous implementation: 11.325 mS, 27 bytes

I noticed that the inner loop, coloring each pixel in the page, doesn’t look like it can be optimized much further. However, the functionallity of the outer loop, which executes 4 times for each page on the bitmap screen, seems like it can be optimized in some way.

Attempt 1: Expand Outer Loop

The first approach I can see here, is to “expand” the loop, and repeat the code 4 times, for each page. This might improve the runtime a little bit, but will definitely increase the memory usage of the program.

lda #$07	; set color

ldy #$00	; reset index
PAGE1: 
   sta $0200,y	; set pixel color
   iny		; increment index
   bne PAGE1    ; continue until done page (256 px)

ldy #$00
PAGE2: sta $0300,y	
   iny
   bne PAGE2

ldy #$00
PAGE3: sta $0400,y
   iny
   bne PAGE3

ldy #$00
PAGE4: sta $0500,y
   iny
   bne PAGE4

Now, let’s analyze the runtime and memory usage of this approach. Since there is a chunk (highlighted below), that repeats 4 times, there’s no need to analyze each line separately.

InstructionCyclesAlt CyclesTotal CyclesMemory
lda #$072x 122
ldy #$002x 12 (x 4)2 (x 4)
loop sta $0200,y5x 2561280 (x 4)3 (x 4)
iny2x 256512 (x 4)1 (x 4)
bne loop213255767 (x 4)2 (x 4)
Total Cycles = 10,246 cycles
Execution Time: 10.246 mS (1 MHz Clock Speed)

Total Memory: 34 bytes

Well – this didn’t really optimize the runtime, and it definitely didn’t optimize the memory usage. So this isn’t the way to go.

But let’s look closer at the code. Each page-loop repeats exactly the same amount, colors the same location relative to the page, and increments Y by 1. What if we combine all the loops into 1?

Attempt 2: Combine into Single Loop

lda #$07  ; set color
ldy #$00  ; reset index

loop: 
   sta $0200,y
   sta $0300,y	
   sta $0400,y	
   sta $0500,y	
   iny		      
   bne loop
InstructionCycles & CntAlt Cycles & CntTotal CyclesMemory
lda #$072x 122
ldy #$002x 122
loop: sta $0200, y5x 25612803
sta $0300, y5x 25612803
sta $0400, y5x 25612803
sta $0500, y5x 25612803
iny2x 2565121
bne loop213x 2557672
Total Cycles = 6403 cycles
Execution Time: 6.403 mS (1 MHz Clock Speed)

Total Memory: 19 bytes

Okay! That looks a lot better. As you can see from the little gif, each iteration of the loop colors 1 pixel in each of the 4 pages. But most importantly, it does so in only 6403 cycles, which is a significant optimization from the previous 11325 cycles! Similarly, the 27 bytes that were used in the old implementation, got reduced to only 19 bytes. This is because we used less instructions, and completely eliminated the use of the pointer.

Modifications

Now that we have such a short, maintanable and optimized version of out code, we can easily make modifications to it.

Modify Fill Color

The next task, is to simply change the color that the pixels will be set to, from yellow to light blue. Using this 6502 Emulator reference, we see that $e cooresponds to light blue.

lda #$0e ; set color (light blue)
ldy #$00 ; reset index

loop: 
   sta $0200,y
   sta $0300,y	
   sta $0400,y	
   sta $0500,y	
   iny		      
   bne loop

Differently-Colored Pages

The goal is to color each page in the display in a different color. Given we already have an optimized loop that colors each page, the only component left is changing the color from page to page.

The selected approach below keeps the same loop structure, and stores the current color in the X register. Each iteration, the color is incremented for each page, and then reset back to the original color, to prepare for the next run.

define ST_CLR $02 ; define macro for starting color

ldy #$00     ; set y index
lda #ST_CLR  ; load starting color

loop:
    clc               ; clear carry for addition
    sta $0200,y       ; color pixel on 1st page
    adc #$01          ; increment color
    sta $0300,y       ; color pixel on 2nd page
    adc #$01          ; increment color
    sta $0400,y       ; color 3rd page
    adc #$01          ; increment color
    sta $0500,y       ; color 4th page
    lda #ST_CLR       ; Reset color to starting value
    iny               ; increment y index
    bne loop
InstructionCycles & CntAlt Cycles & CntTotal CyclesMemory
define ST_CLR $02
lda #$072x 122
ldy #ST_CLR2x 122
loop: clc2x 2565121
sta $0200, y5x 25612803
adc #$012x 2565122
sta $0300, y5x 25612803
adc #$012x 2565122
sta $0400, y5x 25612803
adc #$012x 2565122
sta $0500, y5x 25612803
lda #ST_CLR2x 2565122
iny2x 2565121
bne loop2x13x 2557672
Total Cycles = 8963 cycles
Execution Time: 8.963 mS (1 MHz Clock Speed)

Total Memory: 28 bytes

This indeed seems to be an optimal, and correct solution to the modification that was required. The runtime increased, but by very little, and although the memory usage increased by quite a bit, other implementations, like coloring each page in a separate loop, would have had a greater impact on it.

Random Pixel Color

Random generator in assembly??? Well, our 6502 emulator actually holds a pseudo-random number generator, at $fe. This memory location will store a random byte, after every read.

The code below, loads a random number into memory each time, and then loading it on the pixel. The result is a complete random image. Technically, we could just load a single random value each iteration, but that would produce a somwhat symmetrical image.

ldy #$00          ; set y index

loop:
    lda $fe       ; load random byte
    sta $0200,y   ; color pixel on page
    lda $fe
    sta $0300,y       
    lda $fe          
    sta $0400,y    
    lda $fe        
    sta $0500,y    
    iny           ; increment Y
    bne loop
InstructionCycles & CntAlt Cycles & CntTotal CyclesMemory
ldy #$002x 122
loop: lda $fe3x 2567682
sta $0200, y5x 25612803
lda $fe3x 2567682
sta $0300, y5x 25612803
lda $fe3x 2567682
sta $0400, y5x 25612803
lda $fe3x 2567682
sta $0500, y5x 25612803
iny2x 2565121
bne loop2x 13x 2557672
Total Cycles = 9473 cycles
Execution Time: 9.473 mS (1 MHz Clock Speed)

Total Memory: 25 bytes

Conclusion

Now that I’ve optimized my code, and played around with some simple modifications, it’s time to move on to experimenting!

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *