ARM code for beginners

ARM code for beginners

Part 4 — stand-alone code

So far we have relied on the BASIC assembler to allocate memory for our code fragments, and run them with the CALL statement. This is an inefficient way of working with ARM code, since once a piece of code is finished, we only need to compile it once, then (theoretically) forget about the program that assembled it. So this month, rather than reserve some memory, assemble the code, then CALL it from BASIC, we`re going to save the code after assembly and run it as a separate program. This is where ARM code becomes most useful, since stand-alone code can be in any number of different contexts.

This month, I`m concentrating on writing an 'Absolute' piece of code whose filetype is &FF8. They`re the ones inside most applications, produced by C compilers and, just as easily, by hand. When the user runs such a file, RISC OS loads it at address &8000 and runs it until you execute either MOV PC,R14 or the SWI OS_Exit. Either of these will kill the running task and return you to the last environment you were using, which is most commonly the desktop. The amount of memory the task is given is controlled by the 'Next' slot in the task manager; any attempts to access memory above this will result in an 'Abort on data transfer' error.

The first thing we need to do when running such a program, then, is to check we have enough memory before doing anything else. OS_GetEnv is the SWI to use; amongst other things, it returns the highest RAM address available in R1. If we subtract this from &8000 (our base address), we can work out how much memory is available. As a rough guess, the program will refuse to work unless 32K is allocated.

Tiny lites

    This month I`ve written a common demo effect of yesteryear: a layered, horizontally scrolling starfield. In terms of assembling our program, we need to take our new environment into account. What code there is will be loaded at &8000, and any memory after this can be used to store data in. In order to assemble the program, I`ve used what`s called offset assembly. This means that you assemble the code in one part of the memory, but the assembler makes sure the code will run when loaded in another part. Confused? Look at this contrived code fragment which should return the number 256 when called:

Offset Instruction

+0 LDR R0,points_to_256

+4 LDR R0,[R0]

+8 MOV PC,R14

+12 .points_to_256       EQUD real_256

+16 .real_256            EQUD 256

With our usual method of assembly (i.e. code assembled from BASIC, then run), this code will work since points_to_256 will indeed point to the address containing the number 256. However, think how .points_to_256 will be assembled; say when we reserve memory for our code with DIM, the memory address returned is &8F10. When we specify real_256 as an absolute address to be assembled, it will only point to .real_256 when the code is in that particular location in memory. And if we want the code to work at &8000, as we do now, the code will fail when relocated.

So BASIC provides us with this method of offset assembly which will assemble the code at any address, though it will only run at the address we specify. To use it, we need to set bit 2 of the assembly OPTions (i.e. add 4 to what we normally use), then set P% to the address we want the code to work in and O% to the address we want the code assembled at. Also this month, I`ve used BASIC`s range checking which will ensure that, when assembling, we don‘t go over our memory allocated with DIM. To use this, set bit 3 of the assembly options, and L% to the highest value reserved in memory. This explanation, as always, will become crystal clear by looking at the Stars_src program on the cover disc. Once we`ve assembled the code, we can save the block of memory it was assembled in as an 'Absolute' file and run that directly; this is done as the last thing in the program. In this case, it goes to the currently selected directory (probably $).

Smooth operator

I ought to cover the topic of animation quickly; last month we were quite happily scribbling on the screen without a thought for how smoothly the patterns changed when we pressed the keys. This month, smooth animation is essential, so I`ve used bank switching. To explain: monitors update the picture so many times every second, from top to bottom, and to achieve smooth animation, we should only change the display on the screen when the monitor is about to update the screen, not when it is half-way through an update. So to do this, we allocate twice the amount of screen memory we need (add 128 to the mode number when changing mode with VDU22), then use each one alternately. This means displaying one screen bank constantly until the other is ready to be shown, then instantly switching the screen bank being displayed and the screen bank being drawn on for smooth animation.

In terms of our code, we can use OS_ReadVduVariables to read the base addresses of the 'shadow' (i.e. the one we`re drawing on) and 'display' banks, as they`re known. OS_Byte 112 and 113 (i.e. R0=112/113) switch the banks over, the former selecting which one VDU commands should work on, and the latter selecting which one to make the VIDC chip display. OS_Byte 19 waits for the monitor to finish its redraw, so we can time the bank switching accurately.

Start me up

Besides all the fixed variables in the program which have labels on them and can be LDRed and STRed safely, we have to deal with the table of stars tagged onto the end of the code. Each star is represented in memory by a single word; its position is denoted by its offset from the screen base, so that can vary from 0-screen_size (80K in the case of mode 13). To scroll the stars left, we just need to subtract a constant value from the star‘s offset until it reaches zero, then wrap it around to screen_size to bring it on at the bottom-right again. The star`s colour and speed are denoted by the top 2 bits, allowing for four different levels of stars moving at speeds corresponding to their colour. Each actual star colour is denoted by a four-byte table, since colours 0-3 are all exciting shades of dark grey. Note that this month there is a random number routine to generate the table; its workings are lifted from some example code fragments provided by Acorn, and it returns a pseudo-random bit pattern in R0 when called.

Each star is created by choosing a random word with the .random function, then removing the top two bits to represent the colour and subtracting the screen length from what‘s left of the random number. The top 12 bits are removed with two BIC (bit clear) instructions:

BIC R0,R0,#&FF000000
BIC R0,R0,#&00F00000

This clears any bits of op1 that are specified in op2, and puts the result into the destination register. Note the use of two BIC instructions, since &FFF00000 cannot be represented as an immediate constant. When the number is in range (see the .in_range loop for a simple MOD operation), the colour is recombined with the star offset (i.e. so they`re one word again) with the ORR (logical OR) instruction:

ORR R0,R0,R1,LSL#30

Put simply, ORR has the effect of 'merging bits from two words'. Here, the ORR instruction takes R1 (the star colour previously chosen at random), shifts it back up to the top two bits of the word, and merges it with our offset ready for storage in the star table. The start of our star table is marked by the end of the code (.end_code), and the address of the 'next star location' is kept in R3 while the stars are being created, hence the (hopefully) familiar instruction:

STR R0,[R3],#4

to put the star in place and move the 'next star' counter on.

Looping stars

Once the table is set up, we can start on the main loop. Follow the code as you read this, to help to understand the loop structure:

Select the 'shadow' bank to plot on. (SWI OS_Byte, r0=112, r1=shadow bank (1 or 2) )
Clear the screen. (done with a VDU 12 command for simplicity)
Pick the next star from the table, looking up its colour. (LDRB byte_to_plot,[star_colours,colour_number])
Plot the star with an STRB instruction. (STRB byte_to_plot,[screen_base,star_offset])
Move the star on according to its colour (i.e. darker ones move slower because they`re 'further away'), looping it around to the bottom if necessary.
Put the colour number back with the new offset and store the star back in its place in the table.
Continue until all stars are plotted.
Wait for monitor to finish refreshing one frame. (SWI OS_Byte, r0=19)
Change over the roles of 'shadow' and 'display' banks. (note how OS_Byte 112/113 return the /previous/ bank number in R1 when they exit, so it`s easy to swap the banks over)
Carry on until escape is pressed.

Note again how BIC and ORR are used to separate, manipulate and re-join the colour and offset into a single 32-bit word. I haven`t had space to go into as much detail as I`d like with the code, but I hope you will see the connection between the plan above and the instructions. If you are unsure as to a register`s usage in a particular part of the program, look backwards to find where that register was last used as a destination register. This technique works particular well in such a linear program. Next month, we`re doing a screen-saver...

Matthew Bloch

There are a few hundred SWI calls under RISC OS, with many more provided by third-party modules. These articles will only document SWIs briefly, if at all. For full documentation you will have to get the Programmer`s Reference Manuals, costing £100 from Acorn. However, nearly all of the common SWIs are documented in the StrongHelp SWI manual, which is freeware and available from http://www.soup-kitchen.demon.co.uk/arm.html, or the Datafile PD library (disc AW001) for £1. The StrongHelp SWI manual also comes with a brief assembler guide, containing a complete list of ARM2 instructions and condition codes, and is an essential reference if you‘re just starting out.

The BASIC assembler OPTions

A brief summary of what we can do with OPT:

bit 0 (+1): Listing produced; this is printed on screen, looks crude and is generally useless unless you are very curious.
bit 1 (+2): Errors reported; we usually turn this off for the first pass.
bit 2 (+4): Offset assembly; if this is set, P% should be the 'pretend' offset to assemble at and O% should point to the 'real' address to assemble at.
bit 3 (+8): Range checking; if this is set, L% should point to the highest address the assembler should assemble to. If it goes over this, the assembler will produce an error.

`Offset`	`Instruction`
`+0`	`LDR R0,points_to_256`
`+4`	`LDR R0,[R0]`
`+8`	`MOV PC,R14`
`+12`	`.points_to_256 EQUD real_256`
`+16`	`.real_256 EQUD 256`