The main Thumb-2 instruction
Instructions to insert and extract bit fields have been added to both the Thumb 16-bit and Arm32-bit instructions to improve processing of the pack data structure.This reduces the number of instructions required to insert a large number of bits or extract a large number of bits from a register, thus increasing the benefits of using a packed data structure and reducing the amount of data memory required.
This instruction transfers each bit of the source register from bits [n] to bits [31-n] of the destination register.There are various ways to do this without using a bit reversal instruction, such as continuing to swap bit by bit until all bits are reversed in a sequence of 15 instructions, but this requires a working register.Bit reversal instructions eliminate this tedious work and save a lot of instruction count.Bit reversal is used in DSP algorithms such as FFT.
Zero Comparison & Branching – CZB –
Questa istruzione sostituisce la sequenza generale di confronto zero seguita da un’istruzione di ramo.The purpose is usually to test the address pointer.The new instructions include program flow control, data processing, and load/store instructions, as well as coprocessor access instructions.
The coprocessor access instruction allows the writing of Thumb code for Vector Floating Point (VFP) units for the first time in a coprocessor.In addition to instructions to access system registers, this instruction allows the entire application to be written in the Thumb state, eliminating the need to switch to the Arm state to access special functions.
Two new instructions have been added to the Arm32-bit instruction set and Thumb 16-bit instruction set to provide more flexibility in handling program constants.One is the MOVW instruction, which loads a 16-bit constant into the register and extends the result to zero.The other is the MOVT instruction, which loads a 16-bit constant into the upper half of the register.To load a 32-bit constant into a register, use a combination of the two.
This is often used when loading the address of a peripheral before accessing one or more registers of the peripheral, and currently literal pools are used.A literal pool is a set of 32-bit constants built into the instruction stream and accessed with respect to a program counter.
A literal pool is useful for storing constants and reducing the size of the code needed to access them.However, there is an overhead in the core that implements the Harvard architecture.This overhead is the number of cycles required to make a constant in the instruction stream available to the core’s data port.This means that the constant can be loaded into the data cache or accessed from the processor’s data port into the program memory.
If you split the constant in half and incorporate it into two instructions, the constant is already in the instruction stream and no data access is required. This is effective for smaller literal pools.This is because fewer cycles to access the constants improves performance and, in effect, reduces the power consumption of accessing the constants.
|Arm (v6 and earlier)
|AND r2, r1, #bitmask
|BFI r0, r1, #bitpos, #bitwidth
|BIC r0, r0, #bitmask « bitpos
|ORR r0, r0, r2, LSL #bitpos
The above shows that three instructions are needed for a simple case where the mask and shift mask do not exceed the field limits in an Arm instruction.The larger the field width, the more instructions are needed.The Thumb-2 core technology does not have this limitation.In addition, Arm code requires one extra register for the intermediate value.