Macro for generating delays in machine cycles
Proposal for adding very usefull macro (not function) to generate delay with specified number of cpu machine cycles.
It is present in some other compilers and named __delay_cycles().
Compiler should translate it to assembler code as follows:
__delay_cycles(1) -> NOP
__delay_cycles(2) -> 2xNOP
__delay_cycles(..) -> calculated assembler loop
__delay_cycles(...) -> calculated assembler loop in loop
I know that it isn't accurate while interrupts are active, but delay is not shorter than expected.
Blueprint information
- Status:
- Not started
- Approver:
- None
- Priority:
- Undefined
- Drafter:
- asier
- Direction:
- Needs approval
- Assignee:
- None
- Definition:
- New
- Series goal:
- None
- Implementation:
- Unknown
- Milestone target:
- None
- Started by
- Completed by
Related branches
Related bugs
Sprints
Whiteboard
Let me propose my decision:
#include <stdint.h>
static __inline__ __attribute_
{
#if ARCH_PIPELINE_
# define EXTRA_NOP_CYCLES "nop"
#else
# define EXTRA_NOP_CYCLES ""
#endif
__asm__ __volatile__
(
".syntax unified" "\n\t" // is to prevent CM0,CM1 non-unified sintax
"loop%=:" "\n\t"
" subs %[cnt],#1" "\n\t"
" bne loop%=" "\n\t"
: [cnt]"+r"(cy) // output: +r means input+output
: // input:
: "cc" // clobbers:
);
}
static __inline__ __attribute_
{
#define MAXNOPS 4
if (x<=MAXNOPS)
{
if (x==1) {nop();}
else if (x==2) {nop(); nop();}
else if (x==3) {nop(); nop(); nop();}
else if (x==4) {nop(); nop(); nop(); nop();}
}
else // because of +1 cycle inside delay_4cycles
{
uint32_t rem = (x-1)%MAXNOPS;
if (rem==1) {nop();}
else if (rem==2) {nop(); nop();}
else if (rem==3) {nop(); nop(); nop();}
if ((x=(x-1)/MAXNOPS)) delay_4cycles(x); // if need more then 4 nop loop is more optimal
}
}
By @Traumflug
For a calibrated delay loop with microseconds as parameter, see https:/
Next to interrupts, the prefetch engine is another source of unexpected additional delays. To deal with this, one can add a __ASM (".balign 16"). Then the compiler adds NOPs to make sure code always starts at a 16-byte boundary, giving consistent behavior in the loop. Moving such a loop to a place where it crosses a 16-byte boundary makes it slower by a few clocks without additionally executed instructions, the CPU just sleeps for a clock tick or two.
Also something to consider is that more feature rich Cortex' may simply ignore NOPs. They enter the CPU pipeline then, but get discarded before they consume time. This info is picked up from one of the Cortex-M user manuals.
By David Brown
The delay_cycles function should check that the parameter x is constant:
static __inline__ __attribute_
{
if (__builtin_
... // same as above
} else {
delay_4cycles(x / 4);
}
}
It would be a bit fiddly to try and get the dynamic version cycle-perfect, rather than rounded to a multiple of 4 cycles - and I doubt if it is worth the effort. But this version would still be more accurate than the first version.
And could another instruction be used instead of NOP? Like:
asm volatile(" add %[x], #0 " : [x] "+r" (x) : )