ARM code

Part 2 of the course — in which we meet SWIs





    This month`s example ('FProcess' on the cover disc) is a program which takes a text file and changes all the upper case letters into lower case.  Useless, but I hope you'll find a way of adapting it, and it does demonstrate some more ARM code fundamentals.  You'll notice that there's no BASIC equivalent this time; I hope the commented code will speak for itself.  For this reason, a printout of the program would be tremendously useful to annotate as you read this article.  Because of its relative complexity, Fig 1 shows the program flow and the various decisions the program has to make.

SWIs: a helping hand

    If you`ve programmed in much depth with BASIC V, you can`t have failed to come across SWIs.  For those who haven`t SWIs are useful routines provided by RISC OS which can be called by name from BASIC.  Each routine has a unique name and number; the name consists of group name, then the routine name.  The group name will give you an idea of where the routine is handled.  So OS_File is a general call for doing things to files, Sound_Control makes noises, Font_Paint draws fonts onto the screen, and so on.  SWI stands for '(S)oft(W)are (I)nterrupt'; a call to each SWI is actually a unique ARM instruction, assembled as four bytes like any other.  SWIs can take parameters, passed in the processor registers, which the programmer should set up beforehand.  For instance, OS_WriteC is the SWI to write a single character to the screen.  It takes one parameter in R0, which is the ASCII code that it should write.  So if we were to execute this code fragment:

MOV   R0,#65
SWI   "OS_WriteC"

    we`ll see an 'A' printed on the screen.  Remember that what BASIC does is to convert the SWI name in quotes into a number (in this case, OS_WriteC = SWI 0), and then assembles a 'SWI 0' instruction after the MOV instruction.  When the SWI comes to be executed, the ARM chip looks up the routine associated with this SWI number (in the RISC OS ROMs), and calls it without the user needing to know or care how the routine works.

Strings

    Some SWIs will need to act on areas of memory; this is particularly so when we`re dealing with strings.  When programming in ARM code, we have none of the niceties of BASIC`s string handling.  Strings consist of a sequence of bytes in memory, terminated with a zero-byte, and sometimes we need to be able to set registers to hold the addresses of these strings (or simply 'point to' them) from ARM code.  We can assemble strings directly into the code by using EQUS: this writes bytes to wherever the BASIC assembler usually assembles code, and you specify the string after the EQUS instruction.  See the end of the program for an example of this.  Following each string, you should put a EQUB 0 directive, to add a terminating byte.

    To display strings in ARM code, it is easiest to use a SWI.  There are two suited for the purpose, and both are demonstrated in this program.  The simplest is OS_WriteS, which you‘ll find in the .file_error routine.  It takes no parameters; to use it, we just write the string into the code, directly after the SWI instruction, and the SWI routine will start the program counter off at the end of the string when it has finished.  The only thing to remember is that we must put an ALIGN directive at the end of the string, because ARM code instructions must be assembled on word boundaries (addresses in memory which are multiples of four), and writing a series of bytes may disrupt the word-alignment.  ALIGN simply ensures that the PC is on a word boundary, or else moves it onto the next one.  OS_WriteS is inflexible and makes the program flow look strange, but is easy to use.

    The other string displaying SWI is OS_Write0.  This, more conventionally, takes one parameter, which is the address of the string to display.  This is passed in R0.  To do this from BASIC, we need to use the ADR directive.  Imagine this section of code:

         ADD R0,PC,#4  ; R0 = PC + 4
         SWI "OS_Write0"
         MOV PC,R14
         :
.string  EQUS "Hello world!":EQUB 0

    Because of the way the ARM chip works, the program counter always points two instructions ahead of the one that the processor is actually executing.  So in this example, this will point R0 to the start of the string, and passes it as a parameter to OS_Write0.  The BASIC assembler (and all others) provide an instruction which assembles these ADD and SUB (if .string wasn`t in front of the SWI) instructions automatically.  'ADR R0,string' will do exactly this, and has the advantage that if .string moves in the code, the offset assembled will change too.

    One limitation which I`ve deliberately ignored in this month`s example is the problem of long ADR instructions.  Going back, remember that each instruction you see is assembled into a 32-bit code, and if we specify an 'immediate constant' in an instruction, such as #4, this number has to be coded in 32 bits, along with the nature of the instruction (ADD/SUB/MOV...), and the other registers involved.  So we can`t represent every possible constant in an ARM instruction.  In fact, only 12 bits are allocated for representing an immediate constant.  This is divided into an eight bit  data value, and four bit shift value.  The shift value determines how the bits of the data value are arranged in the 32 bit constant we`re trying to represent.  The various patterns are shown in Fig 2.  So when we ask for a constant to be assembled, the assembler will try to find a way of representing it in 8 bits with a shift value: if it can`t, it returns the error 'bad immediate constant'.  In practise, this limitation is not such a problem, and we`ll return to it in detail next month.

Doing it twice?

    One thing you`ll notice about the code is that it is surrounded by an odd-looking FOR-NEXT loop, controlling the assembly options.  The trouble is, for this program, we need to make a reference to some code which hasn‘t been assembled by BASIC yet.  Imagine, if you were the BASIC compiler, trying to compile this:

B routine
... more code ...
.routine

    The first time the BASIC assembler sees the  '.routine' instruction, it has not seen the .routine label later on in the program, and under normal circumstances would report an 'Unknowing or missing variable'.  OPT 0, conveniently, stops the BASIC assembler reporting errors while assembling: so if we assemble the code once with OPT 0, then a second time with the usual OPT 2 (hence the STEP value), it won't complain about missing labels, and will still report any 'real' errors on the second pass.

Subroutines

    Another important feature introduced this month is the BL instruction, short for (B)ranch and (L)ink.  This is identical to a normal branch, except that before the ARM chip jumps to the new address, it stores current address in R14.  This means that we can resume where we left off (after the BL instruction) with the familiar MOV PC,R14.  However, our BASIC return address is preserved in R14 as well, so it will be overwritten if we`re not careful.  You`ll notice at the start of the program, we store the BASIC return address in R12, so to return to BASIC at the end we can use MOV PC,R12 instead.  The upshot of this is that we can implement PROCedures easily, sections of code which are re-used in different contexts.

    In this program, we need to get two filenames from the user, but both use an identical section of code, which displays a prompt then gets a string from the keyboard into an area of memory (buffer).  We can write the code once, and make it adaptable to use any prompt string and any buffer.  So our inputs for this procedure are two pointers: one for the prompt string, and one for the buffer into which we`re going to read the filename.  SWI "OS_Write0" takes only one parameter in R0, as described above, and since it`s the first thing the routine needs to do, we might as well pass the string pointer in R0.  The next SWI we use, OS_ReadLine, takes five parameters, in R0-R3: R0 should pointer to the buffer into which to get the text, R1 contains the size of the buffer, R2 and R3 contain the lowest and highest ASCII characters which are allowed to be typed; in our case anything between 33 and 126 is acceptable (i.e. no spaces).

The processor flags

    During the course of executing instructions, we`ve used the occasional condition code.  In our program for instance, there`s an OS_Find instruction which opens files; if it fails, it returns with R0 = zero.  To check for this, we use a CMP (compare) instruction, as last week, and tag the EQ condition code onto the (B)ranch instruction that follows.  This means that the branch is only followed if the previous two things compared were equal.

    CMP is one of a few instructions that specifically set the processors` status flags.  There are four: carry, zero, negative and overflow, referred to as C, Z, N, and V respectively.  Every instruction has a condition code attached.  This determines the flags that need to be set in order for the instruction to be executed by the ARM chip: if the relevant flags are not set, the instruction will be ignored.  The two 'extreme' condition codes are AL and NV.  AL means always; instructions with this code attached will execute regardless of the status flags.  NV means never; so all NV instructions are skipped.  Because AL is the most common condition code, it is taken as the one to use if no other is specified.  The CMP instruction sets and clears various combinations of flags depending on the two values which it compares; a summary of these can be found in the StrongHelp SWI reference.

    This is relevant for the file processing program, since the SWI for reading a byte from a file sets the (C)arry flag if it reaches the end, and we can use a BCS (branch if carry set) instruction to jump out of the main loop when the program is finished.
Next month, we‘ll introduce LDR and STR in detail, and turn the screen strange colours.
 

There are a few hundred SWI calls under RISC OS, with many more provided by third-party modules.  These articles will only document SWIs briefly, if at all.  For full documentation you will have to get the Programmer‘s Reference Manuals, costing £100 from Acorn.  However, nearly all of the common SWIs are documented in the StrongHelp SWI manual, which is freeware and available from http://homepages.enterprise.net/mattbee/arm.html, or the Datafile PD library (disc AW001) for £1.  The StrongHelp SWI manual also comes with a brief assembler guide, containing a complete list of ARM2 instructions and condition codes, and is an essential reference if you‘re just starting out.