Visualization of stateless UART/SPI exported call sequences

with tags embedded debugging visualization serial saleae logic sigrok -

Serial sequence decode

This article introduces a generic and light-weight debugging aid / event tracing mechanism that is both feasible for targets without proper debugger connection, as well as visual, providing visibility into high-level and low-level events.

“What would an ocean be without a monster lurking in the dark? It would be like sleep without dreams.” ― Werner Herzog

“The harder you work… and visualize something, the luckier you get.” ― Seal

Background (or: for every complex problem, there is an answer that is clear, simple, and wrong)

For the past few weeks, I’ve been busy banging my head against the wall with a system that contains a complex SoC and the vendor SDK is also a .. mess (to put it politely). In the specific subsystem that I’ve been working with, the SDK code structure is such that it’s pretty impossible to know who does what, in what order and when.

In a normal world, one could use a debugger and various hardware assisted code tracing tools, but the debugger connection to this system is somewhat unstable to begin with and CoreSight TRACE is not available. Utilizing SRAM-buffered message streams (e.g. Segger RTT) is also too unstable (since it relies on the debugger connection).

In similar situations I’ve gotten away by using multiple GPIOs and toggling them from strategic places and then used a logic analyzer to see what’s what. The technique has many good qualities and is quite general, but requires pins to be available to be toggled by software. This sadly is not the case on the system in question.

There is an UART that is used for text-based debugging messages, but using printf() from within highly asynchronous code (and ISRs) is akin to digging out dragons out of one’s nose. On the other hand, the UART is the only thing not dedicated to core system functionality, so I ended up using it.

In the end, I instrumented the code with a lot of UART writes (single bytes from different parts), indicating entry and return from functions (including ISRs) in a race-free manner. While I’ve used this technique with great success in the past, this time the complexity of the system was such that manually “decoding” the events from the logic analyzer’s protocol decoder was pretty slow and tedious and involved a lot of head scratching and crying. I wanted a way to visualize the data so that I could have both the long-term view as well as the short-term one.

As a side note, I actually managed to solve my acute problem without these tools before I wrote them, but these are the tools that would’ve aided my adventures quite a bit, and I plan to use them in the future.

What failed

Armed with my trusty logic analyzer, I thought that since it already has a rather nice UART decoding support, I’ll just export the stuff from it into a simple to parse file, and then do a small program to draw pretty things from that.

Saleae Logic UART decoder

Except that the export options for Saleae Logic do not support exporting the protocol analyzer results (It actually does, please read on). I still love the analyzer and the software, but I’d have to leave it behind. At least I could export the actual digital trace data out from the program.

I’ve been waiting for a good problem to try sigrok with. I’ve been reading about it for a while, but didn’t have a problem to solve yet. sigrok has a capable UART decoder, and supports command line exporting and whatnot. Should’ve been trivial to achieve what I wanted. So, downloaded the latest release of sigrok-cli and PulseView and victory! Except that I failed completely to load either of the exported formats (CSV and VCD) into PulseView. After figuring out where the code lives, seems that the CSV decoder is incapable of handing data that is change-based. Sequential CSV format would’ve worked, but I like working at high capture rates, so the file sizes would’ve been prohibitive. The VCD import didn’t work either, with very wrong imported results. So, damn you sigrok, I wanted to like you! (I might come back to it eventually)

So, I was left without any sane manner of getting actual decoded data unless I used the graphical Logic program manually. In the past, I’ve implemented various tools that did quite complex analyses based on the change based CSV that Saleae Logic exports, so I knew that writing a custom UART decoder was doable. This was how decode-serial was born. It reads the change based CSV in, recognizes UART frames and emits another CSV with the timestamped recognized frames (aka “bytes” or “octets” if using 8-data bits per frame).

From this (decode-serial-csv/testdata/input/2017-09-23-serial-3000000-at-500Msps.csv.gz):

Time[s], DTRACE/SPI_MOSI, RX, UARTX
0.000000000000000, 0, 0, 1
0.362525714000000, 0, 0, 0
0.362526030000000, 0, 0, 1
0.362526330000000, 0, 0, 0
0.362526684000000, 0, 0, 1
0.362527020000000, 0, 0, 0
...

To this:

timestamp(s),byte,isFrameValid,subascii
0.362525714000000,69,1,E
0.370543750000000,87,1,W
0.370661094000000,119,1,w
0.370664710000000,101,1,e
0.370762978000000,66,1,B
...

Like this (you can use the tool without the pipeline as well, but I tend to keep CSV files in compressed format):

zcat testdata/input/2017-09-23-serial-3000000-at-500Msps.csv.gz | \
./decode-serial.py UARTX 3000000 | \
gzip -9 > testdata/output/2017-09-23-serial-3000000-transitions.csv.gz

What didn’t actually fail

While writing this article I decided to do some “Deep Machine Learning Analysis” and google whether it was at least somehow possible to export the protocol analyzer data from Logic, and for sure, it is. It’s actually quite easy. The feature just wasn’t located in a place where I looked for it. Ugh. Co-incidentally, now that I’ve tried the export, the output format is compatible with generate-call-sequence (covered below) once the display radix is first set to Dec. So, ROTFL. On the other hand, it’s been a while that I had a chance to write python properly and I have some future plans for the decoder, so it wasn’t a completely useless exercise.

Leading Advanced Deep Machine Learning Expert

Instrumenting and writing the event specification

Once we have the individual frames recognized, the next issue is using them to reconstruct “what happened”. Due to the nature of the tracing mechanism, it’s best to use a single frame output to indicate when a specific function is entered or left (separate codes and emitting are necessary for both). Sticking to such an arrangement allows for a fairly light-weight operation, and if the hardware peripheral has any kind of built-in queue available, pushing data out can be made to happen without blocking the operation of the rest of code. Although in my case the underlying output mechanism was strongly synchronized, so I might’ve as well emitted more than a single byte at a time.

So, the code would have instrumentation akin to this:

function execute {
  emitTraceByte('E');
  ...
  write(..);
  ...
  emitTraceByte('e');
}

function write(..) {
  emitTraceByte('W');
  ...
  // we've reached an important point within write, mark it (see below). Unlike
  // the other instrumentation here, this code does not have an "ending" pair
  emitTraceByte('D');
  ...
  emitTraceByte('w');
}

ISR complete {
  emitTraceByte('B');
  ...
  execute();
  ...
  emitTraceByte('b');
}

Given a list of single byte codes and timestamps when they were emitted, we can then reconstruct a call-graph of a sorts (or rather, a call sequence). Assuming we instrument the code always in pairs (enter and leave), this will be sufficient to reconstruct a call “stack”. If the output mechanism works safely from within ISRs, additional context information can be implicitly formed.

Next, we tell the generate-call-sequence tool how to translate the codes into events (by placing these rules into a specification file):

execute: E, e
write: W, w
complete: B, b, ISR1
callback: F, f
powerdown: C, c
sync: D

Above:

  • Each line starts with a label (e.g. execute). This label will be emitted into the visualization and should describe the function entered (e.g. the function name might be a good choice here)
  • Each label is followed by 1 to 3 additional parameters.
  • The 2 parameter form describes that we have something with a start and an end. Start and end codes can be given in ASCII if in [a-zA-Z] or hex (0x-prefix) or unsigned decimal numbers. Internally the sequence generator uses python ints, so the available code space is fairly large.
  • The 1 parameter form describes that we have an event “mark”. These are meant to mark some important “thing” which does not represent a function call per se. Marks will have a distinct visual representation in the event viewer.
  • The 3 parameter form is similar to the 2 parameter form, but also includes the name of the context that will be “entered” when the start code is encountered. It is currently assumed that the context will be left simultaneously to leaving the function that created the context, so expressing that separately is not required (nor supported).
  • The convention of using capital letters as start codes and lowercase letters as end codes makes it slightly easier to read the instrumented code, but this is not a technical requirement. Any codes are allowed as long as they’re unique in the specification and the parser does not reject them.
  • Acceptable characters for both labels and contexts are from the set [a-zA-Z0-9_].
  • File format is ASCII, with support for comments (# starts a comment). For file suffix, I’ve used .gcs, but you can use whatever you like.

Visualization

I don’t particularly like writing desktop GUI programs. I’ve yet to see a toolkit / GUI-library that would fit my square hole I guess. So in general, whenever user representation on desktop is required, I try to stick with either CLI or HTML. However, some time back I ran across ninjatracing, which is a tool that reads build logs generated by the quite awesome ninja build tool and emits “events” in JSON format. These events are in a format that the Chrome’s built-in Trace Event profiling tool (aka about:tracing) will read and lay out graphically. While the event viewer is geared towards Chrome developers (and perhaps developers of highly-interactive web pages), it can be “re-purposed” for my needs in a similar manner to what ninjatracing does to represent the complete build history across elapsed time.

Once generate-call-sequence converts the input CSV event list (with the specification) into an ordered list of JSON events, the result can be opened by anyone in their Chrome. And no need to build or install anything, which is perhaps the best bit.

The tool can also be used as a part of a pipeline:

zcat testdata/input/test-async.csv.gz | \
./generate-call-sequence.py testdata/input/test-async.gcs | \
gzip -9 > testdata/output/test-async.json.gz

And once opened in about:tracing, you should see something like this:

Trace event viewer with custom generated data

The image above demonstrates the support for nested contexts (ISR2 pre-empts ISR1, which continues after ISR2 ends) as well as the two “marks” (vertical lines). Unfortunately I failed to find a way to use custom colors for the events (help is appreciated) which explains the relative proximity of the colors in the events (same event always uses the same color, but still).

Preparation

  1. Prepare your target UART so that nothing else is emitting to it. Currently generate-call-sequence does not care for codes not present in the input specification and will end with an error if they’re encountered.
  2. Implement a suitable method of emitting your codes into the UART. If you need the emitter to work from multiple threads or from ISRs, you’ll have to think a moment about synchronization here. This becomes an interesting problem once you try to do this lock-free and support nested ISRs.
  3. Instrument your code as you see best. If you’re dealing with a monster (that you didn’t create yourself), a good starting point is instrumenting the ISRs and the main “user-facing” functions that make things happen. Instrumenting everything probably doesn’t make sense, but YMMV.
  4. While instrumenting, also write the decoding spec. The syntax is light-weight on purpose.

Use flow

  1. While capturing your UART output, run your program. For capturing, I find the most flexible mechanism is to use a modern logic analyzer, since then you don’t have to care about getting baudrates exactly correct and can also inspect the data for glitches and other fishiness.
  2. Use the UART decoder in your logic analyzer to decode the UART traffic and export it in compatible CSV, or store the digital trace into change based CSV format and use decode-serial to emit the event CSV.
  3. Run generate-call-sequence with the specification file and the event CSV and store the JSON results.
  4. Open Chrome and enter about:tracing in the address bar. Then press the Load button on the top-left part of the page and select the JSON file to import.
  5. Enjoy zoomable and scrollable visualization! The format of the events from generate-call-sequence has been carefully selected so that the trace viewer “selection” and “aggregation” features work correctly. So feel free to select a slice of time in the graph and check out total time time spent in specific functions, or number of marks, etc. (I don’t really know how to use the trace viewer properly btw). To get started, try the WASD keyboard shortcuts, although mouse navigation also works once you find the correct toolbox.

I have no spare UARTs, but I have a spare SPI

If you cannot use an UART, perhaps you have a spare SPI (and a single pin that can be muxed to the MOSI of the SPI block). Arrange the emitter to work so that MOSI will be “high” at rest, and the first bit sent out will be low. This will then correspond to a valid “START” condition for UART. Your SPI hardware will likely return the MOSI back to “rest” for a while between each SPI frame, and this gap is likely to be long enough to represent a valid “STOP”-condition. If not, you might get away by using an protocol analyzer with “7-data bits” setting and keep your last bit high when emitting. You can also use decode-serial for this.

All the hints and tips below relating to using an UART are equally applicable to the SPI-case.

Note that you don’t need to bring out any chip-select signals, the SPI CLK signal nor the MISO through the pin muxes for this to work. Just the MOSI will be sufficient.

A lot of the newer SoCs also contain various generic(ish) “serial shifter” logic blocks that can be equally well utilized here. If you have a choice, go with whatever that has the highest bitrate (barring signal integrity issues) and then by whatever has the deepest hardware transmission queue.

Hints & tips

When to use:

  • If JTAG/SWD is unsuitable for you (unstable, unavailable or deep sleep power down modes cause instability and you want to run your software with “production” power management settings).
  • If you cannot use SRAM event buffering (this will require timestamping and extra SRAM) or you cannot reach useful buffer sizes.
  • You want to specifically control what to trace (you’ll have to).
  • You have a single pin available for this and it can be muxed to an UART TX. Extending the technique to SPI and generic serial shifters is also possible, see above.
  • You can arrange for the UART emitting path to be race free (important if you want to trace from ISRs, or if you have multi-threaded environment). If all else fails or if you’re in a hurry, you may opt to use the RTOS provided mutexes (just take care that they will work for you in all contexts that you want to instrument). You might even get away by using putchar() if your stdlib is integrated properly with your RTOS.
  • You have specialized visualization needs that go beyond anything that is provided by existing commercial software. Please feel free to modify both decode-serial and generate-call-sequence as you see fit. GPLv3 only restricts your rights at the point when you distribute your potentially changed software, so you can modify the tools for your internal use as much as you like.

When not to use:

  • You’re writing desktop software. There are better tools for code flow analysis for this environment. However, nothing technically stops you from emitting event based CSV directly from your program if you want (emitting the JSON events might make more sense though, since the trace viewer supports multiple threads and processes too).
  • You have a working hardware trace setup and supporting software for that (see links below).
  • You have ample number of GPIO pins to toggle from within your code path and the GPIO toggling code is race-free (some vendor SDKs include GPIO-code that is not, but most hardware is actually capable of race-free GPIO toggles, if different pins are toggled from different contexts).

Other tips:

  • If you don’t have a signal analyzer available, you might get away with using a serial terminal emulator (assuming it can store incoming binary data with timestamps that you can then convert into change based CSV) or write your own little program for this. Please be careful with USB/serial bridges (“cables”): they often add their own latency and buffering into the mix which is hard or impossible to control. Since the emitted trace data does not contain timestamps, all of the extra latency (also from your desktop operating system) will skew the results.
  • Change based CSV can be visualized using “X-Y Scatter plot”-type graphs, but not all spreadsheet software deals with even medium sized datasets well. YMMV.
  • You’ll want to use the highest possible bitrate for your UART (smallest possible baudrate divisor). This is so that the instrumentation code will have the lowest possible impact on the operation of rest of your code. If you have a choice between UARTs, pick the one that has the deepest hardware FIFO. Arrange your emitting code to return immediately when the emitted byte is in the queue (do not wait for the end of transmission). This will allow for good decoupling of the transmission from your code flow.
  • You’ll probably want to avoid using DMA for the emitting code. A lot of the vendor SDKs are quite heavy in DMA setup compared to a single byte emit, and some DMA hardware is equally heavy by itself. Using DMA will also involve triggering ISRs (most likely), and you’ll probably want to avoid that so that your regular code flow won’t be disturbed much by your instrumentation.
  • If you’re using a logic analyzer to capture data, your emitting baudrate can be anything (no need to stick to “standard” baudrates).
  • When using very high bitrates, take care of your signal path (quality connections, “clean” probes, etc). If in doubt, first try using slower speeds (sub MHz) and take it up from there. If your logic analyzer acquisition rate is high enough, it can be used to visually inspect for signal integrity as well (or just use an oscilloscope).
  • If you’re using low power modes of the SoC that will remove/stop/gate the clock from the emitting hardware, just before entering the problematic low power mode, you’ll need to add a small loop that will only continue when all of the transmissions complete and are seen “on the wire”. If this is not possible due to non-deterministic latency effects or no visibility into the power management functionality, you’ll have to modify your emitting code so that it will wait for transmissions to finish before returning. This will then “spread” around the latency and make the tracing more deterministic, although it will also introduce additional (fixed) latency into your system (since the hardware FIFO can no longer hide the transmission latencies).
  • There are no guarantees that the trace event viewer will remain an integral part of Chrome in the future, or that the event format will stay compatible.

Source code, limitations and future work

Both of the tools live at https://github.com/majava3000/slib-tools . Please use that repo to submit issues and contributions. The tools are written in python (should run equally well in versions 2 and 3) and have no dependencies outside the python stdlib.

All of the tools currently in the repository are licensed under the GPLv3.

Each tool has its own README.md in the repository, which contains the current limitations and possible future work I’ve been thinking about (although I have no immediate plans to continue the work).

The testdata/ directories contain ready to use examples (and outputs). If you just want to try the trace viewer, just load the compressed .jsons and you’re golden.

Additional resources

Alternative tools:

Additional debugging techniques:

Chrome trace viewer:

Credits:

  • I had trouble finding the proper attribution information for the science dog picture. If anyone can help with this, I’d appreciate it. Here’s the starting point that I used.
Written by Aleksandr Koltsoff
comments powered by Disqus