Fix Hanging Assembly Delay In Cortex-M0 & Keil

Aug 4, 2025 by Sebastian Müller 47 views

Why Your Assembly Delay Function Hangs: A Cortex-M0 & Keil Guide

Hey everyone! Ever written an assembly delay function that just… hangs? Frustrating, right? Especially when you're working with Cortex-M0 and Keil uVision. Today, we're diving deep into why this happens and how to fix it. We'll break down a common scenario, analyze the problem, and provide a solid solution. So, grab your coffee, and let's get started!

The Case of the Hanging Delay Function

Let's set the stage. Imagine you're working on a project that requires precise timing. You decide to write a delay function in assembly for your Cortex-M0 microcontroller using Keil uVision 5.38. You whip up some code that looks something like this:

static __INLINE __ASM void _asm_delay10us(unsigned int num)
{
 /* R0 contains the number of 10us delays */
  PUSH {r1-r3}
loop
 SUBS R0, R0, #1
  BNE loop
 POP {r1-r3}
 BX LR
}

Sounds simple enough, right? You pass in a number (num), and the function loops, decrementing the counter until it hits zero. Each loop iteration should consume a certain amount of time, ideally 10 microseconds in our example. But when you run your code, you find that the program gets stuck in the delay function, never returning. What gives?

Diving Deep: Why the Hang?

The issue often lies in a few key areas. It's important to address these aspects while writing the assembly delay function, so let's discuss them in detail.

First, let's talk about the clock frequency. Your delay loop's timing is directly tied to the microcontroller's clock speed. If your clock frequency isn't what you expect, your delay function will be off. Imagine you think your CPU is running at 48 MHz, but it's actually at 4 MHz. Your delay loop will take ten times longer than you planned, potentially leading to unexpected behavior or a complete hang.

Second, the loop overhead plays a crucial role. Each instruction in your loop takes a certain number of clock cycles to execute. The SUBS (subtract) instruction and the BNE (branch if not equal) instruction aren't free – they consume time. If you don't account for this overhead, your delay will be inaccurate. The processor needs time to execute these instructions, and this time contributes to the overall delay. Failing to consider instruction execution time can lead to significant errors in short delay functions.

Third, the optimization level set in Keil can affect things. The Keil compiler is smart and tries to optimize your code for speed and size. Sometimes, this optimization can interfere with your delay loop, especially if it's very short. The compiler might unroll the loop or eliminate it altogether if it thinks it's unnecessary, resulting in the delay not functioning as intended. For instance, if the loop count is small, the compiler may decide to inline the entire loop, leading to unexpected behavior.

Fourth, register usage and preservation is key. In the provided code snippet, PUSH {r1-r3} and POP {r1-r3} are used to save and restore registers. However, if other parts of your code modify these registers, or if you're not saving and restoring all the necessary registers, you might corrupt data or cause a crash. Registers are a limited resource, and improper handling can lead to conflicts and unpredictable outcomes.

Fifth, consider interrupts. If interrupts are enabled, they can preempt your delay loop, adding extra time and making the delay inaccurate. Imagine an interrupt occurring midway through your loop – the processor will jump to the interrupt handler, execute its code, and then return to your loop. This detour adds extra clock cycles to your delay, making it longer than expected. Disabling interrupts during the delay can provide a more accurate timing if precise delays are crucial.

Pinpointing the Culprit

So, how do you figure out which of these factors is causing your delay function to hang? Here's a troubleshooting strategy:

Verify Clock Frequency: Double-check your system clock configuration. Use the debugger to inspect the clock registers and ensure they're set up correctly. A wrong clock configuration is a common culprit, so always start here. The debugger in Keil provides real-time visibility into register values, making it easy to confirm the clock settings.
Calculate Loop Overhead: Use the Keil debugger to step through your assembly code and measure the number of clock cycles each instruction takes. Add up the cycles for the SUBS and BNE instructions, as well as any other instructions in your loop. This will give you a precise idea of the overhead. Breakpoints in the debugger can be used to pause execution at specific points and inspect the cycle count.
Adjust Optimization Level: Try compiling your code with different optimization levels (e.g., O0, O1, O2, O3) to see if it affects the delay. Sometimes, disabling optimization (O0) can make your delay function work as expected. Higher optimization levels (O2, O3) may aggressively optimize loops, altering the timing.
Inspect Register Usage: Carefully review your code to ensure you're saving and restoring all necessary registers. Use the debugger to watch register values and identify any unexpected changes. Consistent register usage is essential for preventing data corruption.
Consider Interrupts: If your delay function needs to be highly accurate, consider disabling interrupts temporarily. Remember to re-enable them after the delay. Interrupt handling can introduce variations in timing, especially if the interrupt service routines are lengthy.

A Robust Solution: Crafting a Precise Delay Function

Okay, enough about the problems. Let's talk solutions. Here's a more robust assembly delay function for Cortex-M0 in Keil, along with explanations:

static __INLINE __ASM void _asm_delay_us(unsigned int us)
{
 PUSH {r4-r5} ; **Save registers**
 MOV r4, us ; **Move the microsecond delay value to r4**
 LDR r5, =SystemCoreClock / 1000000 ; **Load the number of cycles per microsecond**

loop
 SUBS r4, r4, #1 ; **Decrement microsecond counter**
 BEQ end ; **If r4 is zero, exit loop**

 loop_inner
 SUBS r5, r5, #1 ; **Decrement cycle counter**
 BNE loop_inner ; **Loop until r5 is zero**
 B loop ; **Go back to decrement microsecond counter**

end
 POP {r4-r5} ; **Restore registers**
 BX LR ; **Return**
}

Let's break this down step by step:

Register Preservation: PUSH {r4-r5} saves the values of registers r4 and r5 onto the stack. This is crucial to avoid clashing with other functions that might use these registers. The corresponding POP {r4-r5} restores these values before the function returns, ensuring that the calling function doesn't experience any unexpected side effects.
Moving the Delay Value: MOV r4, us copies the desired delay in microseconds from the input argument us into register r4. Register r4 will act as our main delay counter, which we'll decrement in the outer loop. This instruction efficiently sets up the duration of the delay based on the input provided.
Calculating Cycles per Microsecond: LDR r5, =SystemCoreClock / 1000000 loads the number of clock cycles per microsecond into register r5. This calculation is vital for creating an accurate delay. SystemCoreClock is a predefined variable in Keil that holds the system clock frequency. Dividing it by 1,000,000 gives us the number of cycles in a microsecond. Using the LDR instruction with the = prefix allows us to load a 32-bit immediate value, which is necessary for handling larger clock frequencies.
The Outer Loop: The loop section is the heart of our delay function. SUBS r4, r4, #1 decrements the microsecond counter in r4. BEQ end checks if r4 has reached zero. If it has, it means we've completed the requested delay, and we branch to the end label to exit the function. This outer loop ensures that the correct number of microseconds has passed.
The Inner Loop: The loop_inner section handles the fine-grained timing. SUBS r5, r5, #1 decrements the cycle counter in r5. BNE loop_inner loops until r5 becomes zero, effectively consuming one microsecond. This inner loop is critical for achieving the desired level of precision in the delay.
Looping Back: B loop jumps back to the beginning of the outer loop to continue decrementing the microsecond counter until the full delay has elapsed. This ensures that the entire delay period is accounted for, making the function accurate over longer durations.
Restoring Registers and Returning: POP {r4-r5} restores the saved register values from the stack, and BX LR returns from the function. The BX LR instruction branches to the address stored in the link register, which is how we return to the calling function. Restoring the registers is crucial for maintaining the integrity of the program's state.

Key Improvements

Cycles per Microsecond: This version explicitly calculates the number of cycles per microsecond based on SystemCoreClock. This makes the delay more accurate and adaptable to different clock frequencies.
Nested Loops: The use of nested loops allows for finer-grained control over the delay. The outer loop handles microseconds, while the inner loop handles clock cycles within each microsecond.
Register Preservation: The code now correctly saves and restores registers, preventing potential conflicts.

Putting It All Together

To use this function, simply call it with the desired delay in microseconds:

_asm_delay_us(100); // Delay for 100 microseconds

Final Thoughts

Writing accurate delay functions in assembly can be tricky, but understanding the factors that affect timing is crucial. By considering clock frequency, loop overhead, optimization levels, register usage, and interrupts, you can craft reliable delay functions for your Cortex-M0 projects. Remember to always test your delay functions thoroughly to ensure they meet your application's requirements. And hey, don't be afraid to use the debugger – it's your best friend in these situations! Happy coding, guys!