Instruction Set Overview
Cortex-M3 instruction set
Conventional Arm processors employ two instruction sets: the Arm instruction (32 bits) and the Thumb instruction (16 bits).However, the Cortex-M3 does not adopt the Arm instruction set, but a new version of the Thumb instruction set, the Thumb-2 instruction set.
The Thumb-2 instruction set is traditional Thumb instruction set binary compatible and works as is without recompiling traditional Thumb programs.
Thumb-2 is a mixed set of 16- and 32-bit instructions, with additional instructions that could not be employed by the previous Thumb instructions.For this reason, there are privileged instructions that could only be written in the Arm instruction, and all processes can be written without the Arm instruction.
Also added are bit manipulation instructions, division instructions and table branch instructions.
table branch instruction
Using tables to control the program flow based on the values of variables is a common feature in high-level languages.This is also true for the Arm and Thumb instruction sets.
Arm is often used for high-performance code, and compilers tend to choose code sequences that prioritize speed at the expense of size.Conversely, the Thumb compiler tends to minimize the memory used by the combined code and tables in code sequences using packed data tables.
The Thumb-2 core technique uses a table branch instruction that combines the best of both techniques.This minimizes the number of instructions used for pack data and maximizes performance with a very small code and data footprint.
IT – if then –
The Arm instruction set has the ability to conditionally execute all instructions.This feature is useful when the compiler generates code that consists of short conditional clauses.However, there is not enough space in the Thumb 16-bit instruction encoding space to maintain this feature, so the Thumb compiler does not have this feature.
But the Thumb-2 core technology has instructions that provide a similar mechanism.This IT instruction can generate up to four Thumb instruction blocks based on a single condition code among condition codes based on one or more condition flags contained in the status register.This gives the Thumb code a level of performance close to that of the Arm code.
|LDREQ r0,[r1]||BNE L1||ITETE EQ（if then命令）|
|LDRNE r0,[r2]||LDR r0, [r1]||LDR r0, [r1]|
|ADDEQ r0, r3, r0||ADD r0, r3, r0||LDR r0, [r2]|
|ADDNE r0, r4, r0||B L2||ADD r0, r3, r0|
|—||L1||ADD r0, r4, r0|
|—||LDR r0, [r2]||—|
|—||ADD r0, r4, r0||—|
The above example uses 16 bytes of Arm code, 12 bytes of Thumb code, and 10 bytes of Thumb-2 core technical code.Arm code takes 4 cycles to execute, Thumb code takes 4 to 20 cycles, and Thumb-2 core technology code takes 4 or 5 cycles.The number of cycles for Thumb depends on whether the branch is accurately predicted or not.In the case of Thumb-2 core technology, the number of cycles has been reduced from 5 to 4 because IT instructions can be folded in the same way as branch instructions.
This post is also available in: Japanese