The surprising struggle to get a Unix Epoch time from a UTC string in C or C++

The surprising struggle to get a Unix Epoch time from a UTC string in C or C++(berthub.eu)

131 points by PascalW 1 year ago | 96 comments

chikere232 1 year ago |

Is it a struggle though?

They needed to have a locale matching the language of the localised time string they wanted to parse, they needed to use strptime to parse the string, they needed to use timegm() to convert the result to seconds when seen as UTC. The man pages pretty much describe these things.

The interface or these things could certainly be nicer, but most of the things they bring up as issues aren't even relevant for the task they're trying to do. Why do they talk about daylight savings time being confusing when they're only trying to deal with UTC which doesn't have it?

johnisgood 1 year ago | |

It is not.

  int main(void) {
    struct tm tm = {0}; 
    const char *time_str = "Mon, 20 Jan 2025 06:07:07 GMT"; 
    const char *fmt = "%a, %d %b %Y %H:%M:%S GMT"; 

    // Parse the time string
    if (strptime(time_str, fmt, &tm) == NULL) {
        fprintf(stderr, "Error parsing time\n");
        return 1;
    }

    // Convert to Unix timestamp (UTC)
    time_t timestamp = timegm(&tm);
    if (timestamp == -1) {
        fprintf(stderr, "Error converting to timestamp\n");
        return 1;
    }

    printf("Unix timestamp: %ld\n", timestamp);
    return 0;
  }

It is a C99 code snippet that parses the UTC time string and safely converts it to a Unix timestamp and it follows best practices from the SEI CERT C standard, avoiding locale and timezone issues by using UTC and timegm().

You can avoids pitfalls of mktime() by using timegm() which directly works with UTC time.

Where is the struggle? Am I misunderstanding it?

Oh by the way, must read: https://www.catb.org/esr/time-programming/ (Time, Clock, and Calendar Programming In C by Eric S. Raymond)

1vuio0pswjnm7 1 year ago | | |

"Mon, 20 Jan 2025 06:07:07 GMT"

I thought the default output of date(1), with TZ unset, is something like

   Mon Jan 20 06:07:07 UTC 2025

That's the busybox default anyway

paxcoder 1 year ago | | |

I can't find `timegm` neither in the C99 standard draft nor in POSIX.1-2024.

The first sentence of your link reads:

>The C/Unix time- and date-handling API is a confusing jungle full of the corpses of failed experiments and various other traps for the unwary, many of them resulting from design decisions that may have been defensible when the originals were written but appear at best puzzling today.

michaelt 1 year ago | |

> Is it a struggle though?

It’s twelve lines or more, if you include the imports and error handling.

Spreadsheets and SQL will coerce a string to a date without even being asked to. You might want something more structured than that, but you should be able to do it in far less than 12 lines.

C has many clunky elements like this, which makes working with it like pulling teeth.

Suppafly 1 year ago | | |

>Spreadsheets and SQL will coerce a string to a date without even being asked to.

But only when you don't want them to, when you do want them to do it it's still a pain.

stonogo 1 year ago | | |

Spreadsheets and SQL will coerce a string to a date because someone programmed them to in C or C++.

sitzkrieg 1 year ago | | |

almost like C is logically operating at a lower level than spreadsheets or SQL or something

oguz-ismail 1 year ago | | |

> you should be able to do it in far less than 12 lines

In C++, maybe. In C, not necessarily. If you're not willing to reinvent the wheel why'd you choose C anyway?

pif 1 year ago | |

What's a man page? [cit]

johnisgood 1 year ago | | |

"manual pages", type "man man" in your terminal.

https://man7.org/linux/man-pages/man1/man.1.html

amelius 1 year ago | | |

It's where people went for programming information before ChatGPT and even before StackOverflow.

pif 1 year ago | | |

I'm sorry the sarcasm was not evident. I learnt to program when men were men, and man was man.

d_burfoot 1 year ago |

My personal rule for time processing: use the language-provided libraries for ONLY 2 operations: converting back and forth between a formatted time string with a time zone, and a Unix epoch timestamp. Perform all other time processing in your own code based on those 2 operations, and whenever you start with a new language or framework, just learn those 2.

I've wasted so many dreary hours trying to figure out crappy time processing APIs and libraries. Never again!

avalys 1 year ago | |

Starting from timestamp A, how do I find the Unix timestamp B corresponding to exactly 6 months in the future from timestamp B?

cryptonector 1 year ago | | |

Adding or subtracting "months" is inherently difficult because months don't have set lengths, varying from 28 through 31 days. Thus adding one month to May 31 is weird: should that be June 30 or July 1 or some other date?

Try not to have to do this sort of thing. You might have to though, and then you'll have to figure out what adding months means for your app.

Spivak 1 year ago | | |

I think the parent is describing a "bring your own library" approach where a set of known to the author algorithms will be used for those calculations and the only thing the host language will be used for is the parse/convert.

It does remove a lot of the ambiguity of "I wonder what this stdlib's quirks are in their date calculations" but it also seems like a non-trivial amount of effort to port every time.

d_burfoot 1 year ago | | |

The difficulty of this problem rests on the ambiguity of the phrase "exactly 6 months", which is going to depend totally on the precise business logic. But there's no reason to suppose that the requirements of the business logic will agree with the concepts implemented by the datetime library.

layer8 1 year ago | | |

"Exactly 6 months in the future" from an arbitrary timestamp is not well-defined, even when assuming a fixed time zone. What is it supposed to mean?

1970-01-01 1 year ago |

13 more years to go until the 2038 problem.

Surely we'll have everything patched up by then..

ahubert 1 year ago | |

wow that is dedication 1970-01-01! :-)

xnorswap 1 year ago | |

It worries me how blasé we seem to be to the 2038 problem.

I wonder if people will still be repeating the "Y2k myth" myth as things start to fail.

robertlagrant 1 year ago | | |

People are doing things[0]. We'll see closer to the date what's left, I suppose.

[0] https://en.wikipedia.org/wiki/Year_2038_problem#Implemented_...

TZubiri 1 year ago | | |

https://xkcd.com/795/

quesera 1 year ago | |

Almost exactly 13 years, in fact!

The overflow happens at 2038-01-19T03:14:08Z.

account42 1 year ago |

The concept of a process-wide locale was a mistake. All locale-dependent functons should be explicit. Yes that means some programs won't respect your locale because the author didn't care to add support but at least they won't break in unexpected ways because some functions magically work differently between the user's and developers system.

robertlagrant 1 year ago | |

Totally agree. Python's gettext() API feels so ancient because it can only cope with one locale at a time, and it would love to get that locale from an environment variable. Not ideal for writing an HTTP service that sends text based on the Accept-Language header.

layer8 1 year ago | |

It was a very reasonable design when most programs were local-only.

account42 1 year ago | | |

It really wasn't. Even local-only programs need to process data that isn't formatted in the user's locale.

jonstewart 1 year ago |

The headline doesn’t match the article. As it points out, C++20 has a very nice, and portable, time library. I quibble with the article here, though: in 2025, C++20 is widely available.

jeffbee 1 year ago | |

Indeed. The article should be retitled "C still useless in 2025, including time handling".

chikere232 1 year ago | | |

It would be incorrect, but it's already incorrect as what they're doing isn't really a struggle, so I guess the net result is neutral?

spacechild1 1 year ago | |

Damn, I didn't notice that C++20 added a whole bunch of new features to the std::chrono library! Nice!

zX41ZdbW 1 year ago |

The first rule of thumb is to never use functions from glibc (gmtime, localtime, mktime, etc) because half of them are non-thread-safe, and another half use a global mutex, and they are unreasonably slow. The second rule of thumb is to never use functions from C++, because iostreams are slow, and a stringstream can lead to a silent data loss if an exception is thrown during memory allocation.

ClickHouse has the "parseDateTimeBestEffort" function: https://clickhouse.com/docs/en/sql-reference/functions/type-... and here is its source code: https://github.com/ClickHouse/ClickHouse/blob/74d8551dadf735...

bagels 1 year ago | |

I came to make the thread safe comment. Got bit by that myself formatting is8601, would get wrong output... Sometimes.

I won't believe anyone who tells me that handling time in c/c++ isn't perilous.

p0w3n3d 1 year ago |

I think that time handling is the most hard thing in the world of programming.

Explanation: you can learn heap sort or FFT or whatever algorithm there is and implement it. But writing your own calendar from scratch, that will do for example chron job on 3 am in the day of DST transition, that works in every TZ, is a work for many people and many months if not years...

timewizard 1 year ago | |

Time handling is exceptionally easy. Time zone handling is hard. It doesn't help that the timezone database isn't actually designed to make this any easier.

p0w3n3d 1 year ago | | |

Meanwhile I edited my comment but we're still agreeing. And adding them for example to embedded systems is additional pain. Example: tram or train electronic boards / screens

DougN7 1 year ago | | |

I don’t know. I’ve written that seemed like obvious simple code that got tripped up with the 25 hour day on DST transition. That’s when I learned to stick to UTC.

wang_li 1 year ago | |

Assuming the unstated requirement that you want your cron job to only run once per day, scheduling for 3 am is not a software problem. It's a lack of understanding by the person problem. By definition times around the time change can occur twice or not at all. Also, in the US 3am would never be a problem as the time changes at 2 am.

Also, naming things, cache coherency, and off by one errors are the two hardest problems in computer science.

blindriver 1 year ago |

I used the ICU packages when I needed to do something like this but it's been a decade since I coded in C++.

https://unicode-org.github.io/icu/userguide/datetime/

havermeyer 1 year ago |

The Abseil time library makes time and date parsing and manipulation a lot nicer in C++: https://abseil.io/docs/cpp/guides/time

rstuart4133 1 year ago |

For those skimmimg the problem is mktime() returns local time, and they want it in UTC. So you need to subtract the timezone used, but the timezone varies by date you feed mktime() and there is no easy way to determime it.

If you are happy for the time to perhaps be wrong around the hours timezone changes, this is an easy hack:

    import time
    def time_mktime_utc(_tuple):
        result = time.mktime(_tuple[:-1] + (0,))
        return result * 2 - time.mktime(time.gmtime(result))

If you are just using it for display this is usually fine as time zone changes are usually timed to happen when nobody is looking.

TZubiri 1 year ago |

Fun fact, http 1 used to pass expirations and dates in string format.

[Missing scene]

" We are releasing Http1.1 specifications whereby expirations are passed as seconds to expire instead of dates as strings."

richrichie 1 year ago |

> give us some truly excellent code that we really don’t deserve

Why such self flagellation?

TZubiri 1 year ago |

This comment section is so nerdy I love it.

udidnmem 1 year ago |

You cannot since it's missing time zone

RHSeeger 1 year ago | |

UTC is a timezone, though. Or am I misunderstanding what you're saying?

cvadict 1 year ago | | |

That is fine as long as the input / output is always in UTC... but at the end of the day you often want to communicate that timepoint to a human user (e.g. an appointment time, the time at which some event happened, etc.), which is when our stupid monkey brains expect the ascii string you are showing us to actually make sense in our specific locale (including all of the warts each of those particular timezones have, including leap second, DST, etc.)

jxjsndbxbd 1 year ago | | |

UTC would be marked as +Z

Without any marking, it could be anything

sylware 1 year ago |

Until you understand that the core of unix time is the "day", in the end, you only need to know the first leap year (If I recall properly it is 1972), then you have to handle the "rules" of leap years, and you will be ok (wikipedia I think, don't use google anymore since they now force javascript upon new web engines).

I did write such code in RISC-V assembly (for a custom command line on linux to output the statx syscall output). Then, don't be scared, with a bit of motivation, you'll figure it out.

DamonHD 1 year ago | |

The core of the UNIX time is seconds since epoch, nothing else. 'Day' has no special place at all. There are calendars for converting to and from dates, including Western-style, but the days in those calendars vary in length because of daylight saving switches and leap seconds for example.

Kwpolska 1 year ago | | |

UNIX time ignores leap seconds, so every day is exactly 86400 seconds, and every year is either 365*86400 or 366*86400 seconds. This makes converting from yyyy-mm-dd to UNIX time quite easy, as you can just do `365*86400*(yyyy-1970) + leap_years*86400` to get to yyyy-01-01.

sylware 1 year ago | | |

You are perfectly wrong, the day is the main calendar object related to the epoch seconds.

I wrote conversion code, I know what I am talking about.