From unixtime into Gregorian calendar and back

with tags time calendar embedded baremetal api design -

Conversion from linear time (elapsed seconds since some point in time) into calendar representation and back often crops up when doing work in the embedded space. While the commonly accepted truth is that one should always try to use the historical “standard library” facilities, there are several cases where this becomes problematic. This article is a story about util/calendar which is a small collection of functions that are suitable for resource constrained systems and when correct operation and repeatability in different environments becomes important.

“The calendar of the Theocracy of Muntab counts down, not up. No-one knows why, but it might not be a good idea to hang around and find out.” ― Terry Pratchett, Wyrd Sisters

“The calendar was a mathematical progression with arbitrary surprises.” ― Paul Scott, The Towers of Silence

Time is the enemy ― Quantic

Terminology

Linear time: Number of seconds since a specific starting point in time. Makes it trivial to calculate and express durations between two events in computing. Compact and easy to serialize. While linear time expressed in other units than the SI second are certainly possible, the SI second is the most established unit in modern computing and “wall clock” sources.

Epoch: Name for the “anchor” from which specific linear time starts from. Normally expressed in terms of specific calendar.

Unixtime: Number of seconds since the “UNIX Epoch” (midnight on the 1st of January, 1970, in GMT) without counting any added leap seconds in UTC. As such, each day is defined as exactly 86400 seconds, leading to a linear relationship between elapsed seconds and elapsed days. Some systems support or may be configured to support leap second operation, but this is rarely done due to associated difficulties when communicating such times between systems. Since unixtime does not normally express leap seconds, this also leads to additional complexities and partial solutions.

Calendar: A system in which linear time is decomposed (broken down) into components of differing “magnitude” using specific rules. Common calendars use historically established rules which either have religious or astrological “reasoning”. Core structure commonly established by empirically experienced actions of the planetary bodies and the Sun. Conversion from or into linear time is rarely trivial.

Gregorian calendar: The most common calendar used in the “western” sociological sphere currently. History and “specification” on Wikipedia. Basic structure can be traced back to Babylonian calendar (as so many other things related to expression of time).

Time zones/daylight savings time: Additional (modern) rules placed on top of the established calendar system. Not covered in this article since util/calendar implements neither.

General problems

While dealing explicitly with linear time would lead to simpler computer systems, there are many cases where the conversion from linear into calendar and back are required. Some common interfaces where this is required:

  • Representing creation and modification event timestamps to a human operator (e.g. desktop file managers / browsers)
  • Expressing programmable “calendar”/“timer” functionality to a human operator. (e.g. planning and “calendaring” software, email software)
  • Specific components in the system that express and communicate time in calendar format (e.g. calendar-style real-time-clock devices, global positioning systems)
  • Specific software components whose APIs and interfaces use calendar time to provide “convenience” for human programmers. Unfortunately this aim is somewhat misguided once we move past the “beginner-level” in programming, but such interfaces are rarely controllable by the firmware engineer. Example of this is the format of timestamps stored on the venerable FAT filesystem, timestamps expressed in text-line based communication protocols (email, “web”, etc) and unfortunately there are many others as well
  • Additional sources for confusion (for the user as well as the programmer) are sometimes misguided design decisions in storing timestamp events in the user’s timezone (aka “localtime”). While establishing what the “user timezone” was relatively easy in the 1980s, it is now a complex problem especially if the stored event timestamp does not contain the timezone information (PC BIOS RTC conventions established by DOS and used also by Windows, and FAT, again). Since this article does not deal with timezones, we’ll try to push this issue out of our minds for now

In a perfect world, only the interface between systems and the human operator would be the point where conversions would be required, but reality unfortunately does not follow my wishes. In many ways, most of the issues in the list above are good examples of general API-failures in the “time” and “timestamp”-problem domains.

Problems specific to embedded and portability (testability)

Once we leave the general problem domain and approach embedded use and repeatability (testing, simulation), we have additional issues to deal with:

  1. Historically, the conversion facilities that exist in computers either directly implement the C standard library conventions, or extend the conventions. However, due to the point in time where the conversion facilities were included in the stdlib, a lot of the interfaces are under-specified and leave a lot of implementation freedom to the library, toolchain or OS vendor.
  2. The standard library was only later extended to cater for multi-threaded (or multi-context) use. This unfortunately means that it will depend a lot on the specific environment where code is compiled, whether the thread-safe (“re-entrant”) functions are implemented or not. The original functions use hidden internal shared state which makes thread-safe usage only possible by arranging for strong mutual exclusion outside the functions. This is sometimes problematic due to functions being used before the actual mutex use is possible in a safe manner (e.g., before RTOS scheduler has been started).
  3. Similar to the previous problem, timezone support was added in later stages and there is considerable variability between environments on how to use it correctly. Most common way to implement this is “hiding” the implementation in the standard library, without clean portable APIs. This translates to behavioral differences between systems even if the user code is identical in each case.
  4. Historically, the most common underlying type to represent unixtime was the venerable time_t which is normally implemented as a signed integer type with at least 32-bits allocated for storage. This leaves 31-bits to represent time after the UNIX epoch (leading us to the Year 2038 problem). While using a signed type to express unixtime allows to represent time before the epoch, this is not part of standardized behavior and cannot be relied on. Using a 64-bit underlying type seems like a good solution, but means that more resources are required for storing, communicating and processing such unixtimes. Relying on a specific size of time_t also leads to non-portable code not to mention additional issues when time_t is serialized/communicated.
  5. There is a lot of variance in the operation of the composition interface (mktime). Some implementations will automatically “fix” (normalize) invalid calendar dates and times (e.g., January 32nd is converted to February 1st transparently and without signaling that such conversion was done). Some systems do not normalize at all, assuming that the input data is always valid.
  6. There is no standardized way of using composition in UTC/GMT (mktime operates in local timezone, whatever that means in different systems). Some systems implement additional functions to do the conversion in UTC/GMT, but are not in general portable, or the workarounds are relatively slow to execute.
  7. Non-orthogonal interfaces and representation of the fields in struct tm. Some fields are expressed as zero-based offsets, some are expressed as 1-based offsets and don’t get me started on tm_isdst. To add to the insult, while some language runtimes emulate the C standard library structures, they have their own offsets (1-based offsets for date parts, except for year) adding to the confusion when a “beginner-level” programer searches the web for code to copy-paste.
  8. In general, the standard conversion related structures and functions are not designed for resource constrained systems and while more compact solutions are possible, they’re not possible while keeping the same API.
  9. Some target environments may not have a C standard library at all or using it would increase the resulting target binary size beyond acceptable.

All of the above reasons lead into a situation that if the same user code is to execute identically in a large number of different platforms, all of the idiosyncrasies must be handled in the user software “portability layer”. Since some of the behavioral differences are not even documented, this becomes a box of worms for testability. One cannot rely that software executing on a non-hardware simulator will execute correctly/in any portable way on the actual hardware target or even on top of different operating systems. It is also quite easy to assume thread-safety in some environments which do not have such guarantees, leading to difficult to reproduce bugs (“but it works on the simulator running on platform X!“).

Enter the Fist util/calendar!

util/calendar was born out of my general frustration with the embedded related issues above. Additionally it became clear that a lot of people have issues with using the standard library interface correctly in different environments which drove the API design into a direction which (hopefully) makes doing conversions correctly easier.

Prominent features/limitations:

  • Valid range of expression 1970 - 2099 (GMT/UTC)
  • No input validation (not suitable for end user provided input)
  • Light on resources (all divisions converted to multiplications with gcc for armv7-m, compact structures, light stack usage, reduces to a small amount of code)
  • Fully re-entrant (no shared state)
  • Only precise types used (works identically in all target environments)

The process of development started by generic googling around for solutions. While one might think that this is yet another invention of a wheel, it turned out that my wishes did not match reality. While there are several (re)implementations of the standard library, I couldn’t see myself using them in my own projects (mostly due to resource reasons). Implementing my own wheel also was fun (who doesn’t like algebra and numbers?!)

The actual decomposition problem is interesting in many ways, since there are so many different ways of solving it. I did dabble with the idea of an O(1) solution for the daysSinceNewYear -> (monthNumber, remainingDays) problem but couldn’t find one that wouldn’t increase code size more than would be saved by having the linearly reduced table of days-per-month. Having an O(1) implementation would result in a stable execution time. To be fair, most of the other implementations’ execution time also varies with input.

Tools utilized in development and order of work:

  1. Reviewing existing usage of historical stdlib interface in a largish and complex embedded project
  2. Reviewing existing implementations
  3. Reviewing various historic implementations using Wikipedia (this is hardly a “new” problem after all)
  4. Writing prototype algorithms in python (once my first implementation in C didn’t pass tests and I couldn’t figure out why not). Initial versions of tests were also written in this python environment (comparisons against python’s library functions, which follow the C standard library implementation mostly)
  5. Spreadsheets to validate calculation and to detect possible reductions
  6. Implementations in C (and another tester, written in C)

Full-disclosure may perhaps be required at this point. I have dabbled with the subject matter before and have also done implementations that handle leap seconds and timezones in a correct way, but that code unfortunately is locked forever in proprietary codebases. However, it is a fun problem to solve and I also need a proper solution to deal with pesky calendar-style RTCs embedded in some SoCs (argh, second counter- style RTCs are so much superior!).

Downloads / browse source

util/calendar is released under a very permissive “MIT-style” -license, with the hopes that I can easily embed it into non-permissive projects. You may do so as well (contributions back are obviously welcome).

Please copy it into your own project from the slib project on github (please note that rest of slib is not licensed under “MIT”).

Usage

For features, limitations and the API, please see the header file of util/calendar. Perhaps some day I’ll also write a “converting from stdlib to util/calendar” -guide, but until then, I’m hopeful the header contains all the necessary information to start. The source code is documented too!

Future work

I might return to the problem space later, to add a separate validation function and to deal with timezones and DST. Most likely I will not implement leap-second capable version, since utility of such would be quite low and solving the leap-second issue also in data-storage systems where embedded units might send data becomes rather tricky. I will not add support for events before the UNIX epoch, or after year 2099 GMT/UTC.

An interesting experiment would be using structures as return values directly. This would allow the functions to be marked as “pure” and “const”, although the pattern is somewhat uncommon in C. Unknown code-side impact.

Additional resources

C standard library conversion:

Alternative implementations:

Calendars (and related):

Credits:

  • Wyatt915 for the picture of 50 years perpetual calendar (PD) (seen in unfurled links)
Written by Aleksandr Koltsoff
comments powered by Disqus