Cortex-M debugging: Performance optimization using SWV statistical profiling

Posted by Magnus Unemyr on Apr 8, 2015 8:00:00 AM

So, where do you hang out? Or rather, where do you spend your time? While this question may appear a bit personal, it is a valid question in terms of embedded software. Knowing where your application spends most time executing is the first step in efficient speed optimizations. Profiling your application to understand where it spends most of the time, is the best way to optimize performance the most, with the least efforts.

Say for example that your application spends 95% of the time in 3 C-functions, and the remaining 5% in another 125 C-functions. Then it is quite obvious you should spend your optimization efforts in the 3 functions that use most of the CPU time. While this 95% in 3 functions/5% in 125 functions ratio may appear to be a rigged and fairly extreme example, it really isn’t that uncommon. Usually, only a few C-functions use most of the CPU cycles. By analyzing what C-functions steal most of the CPU cycles, you know where to optimize for best results.


Traditionally, speed optimization profiling have been implemented by adding intrusive instrumentation to the machine code (thus modifying the real-time and timing behavior), or by using very expensive hardware emulators now less commonly used.

In the Cortex-M core used in popular microcontroller families like STM32, Kinetis or LPC, ARM added a very interesting, useful and low-cost alternative; statistical PC sampling. The Cortex-M core can send the Program Counter (PC, or machine code instruction address currently being executed) periodically to the debugger. This is done non-intrusively by the CPU core hardware itself, with no need for software instrumentation that modifies the timing behavior. It also removes the need for very expensive emulator hardware, as low cost debugger probes like SEGGER J-Link and ST-LINK support this technology.

The Cortex-M CPU core uploads the current PC address to the debugger at periodic times, using the Serial Wire Viewer (SWV) event tracing capability and the Serial Wire Output (SWO) pin supported by modern debugger probes, such as SEGGER J-Link and ST-Link from STMicroelectronics.

The debugger can then analyze how many of these PC samples belong to different C-functions in the application. The more samples come from a particular C-function, the more time the CPU is likely spending in that function. Please note that the CPU only sends the PC address at periodic times, and so it is perfectly possible the CPU executes other code between the PC samples, and this may go undetected if no PC samples ever happens to be taken when that code is executed.

However, in practice this effect balance out if you run an application just a short while; the number of PC samples taken by the Cortex-M core and sent to the debugger is massive. This means that over time, the inaccuracies introduced by only taking periodic PC samples instead of doing a full/true profiling is reduced. Unless you are very unlucky, statistical profiling with periodic PC samples is quite accurate and very useful in real-life. Additionally, the method removes the need for software instrumentation modifying the timing behavior, and it removes the need for very expensive hardware emulators too.

And so, if you want to improve the speed of your Cortex-M application, I recommend you look into the statistical profiling capability, using periodic PC samples done with the Serial Wire Viewer (SWV) technology and the Serial Wire Output (SWO) pin. Advanced modern ARM Cortex debuggers like Atollic TrueSTUDIO can analyze the PC samples automatically, and present you with nice graphical bar charts outlining how much time is spent in each C-function.

Serial Wire Viewer tracing is not supported by standard ECLIPSE and GNU gcc/gdb tools (and thus not statistical performance profiling either); but the Atollic TrueSTUDIO IDE which is based on Eclipse/GNU do support SWV fully using proprietary extensions. Atollic TrueSTUDIO support statistical profiling and other SWV event- and data tracing capabilities using both the SEGGER J-Link and ST-Link debugger probes.

For more information on Serial Wire Viewer event- and data tracing, read this white paper:

Read our SWV event and data tracing whitepaper!


Topics: ECLIPSE, ARM Cortex, GNU tools (GCC/GDB), Debugging, Atollic TrueSTUDIO, SEGGER J-Link, ST-LINK