Show HN: High-precision date/time in SQLite(antonz.org) |
Show HN: High-precision date/time in SQLite(antonz.org) |
https://stackoverflow.com/questions/6841333/why-is-subtracti...
And computerphile explains so well in their 10-min video:
https://www.youtube.com/watch?v=-5wpm-gesOY
---
I've long ago learned to never build my own Date/Time nor Encryption libraries. There's endless edge cases that can bite you hard.
(Which is also why I'm skeptical when I encounter new such libraries)
I do think the documentation could be a little clearer. The author talks about “time zones” but the library only deals with time zone offsets. (A time zone is something like America/New_York, while a time zone offset is the difference to UTC time, which is -14400 seconds for New York today, but will be -18000 in a few months due to daylight saving time changes.)
Not even that. UTC has leap seconds, which this code doesn’t handle (FTA: “The calendrical calculations always assume a Gregorian calendar, with no leap seconds”)
It copies that from the golang time package, which makes the same claim (https://pkg.go.dev/time)
That makes life a lot simpler for the implementer, but doesn’t that mean you can only reliably use these two libraries for computing with durations, not with moments in time or vice versa? The moment you start mapping these times to real world clocks and adding durations to them, you run the risk of getting small (up to about half a minute, at the moment) inconsistencies.
It's US/Eastern. Paul Eggert can call this a "deprecated compatibility time" all he wants, but "Eastern Time Zone" is the official name of the time zone as maintained by the civil time keeping authority.
Once you've decided you're using nanosecond precision, a 64-bit representation can only cover 584 years which ain't enough. You really want at least 2 more bits, so you can represent 2024 years.
But once you're adding on 2 bits, why not just add on 16 or even 32? Then your library can cover the needs of everyone from people calculating how it takes light to travel 30cm, to people calculating the age of the universe.
That's how I imagine the design decisions went, anyway :)
Of course you can't really provide sub-second accuracy without leapsecond support and what does pre-human-civilisation leapsecond support even mean?
SELECT * FROM my_table WHERE duration_s >= 2h
and have the database DWIM, converting "2h" to 7200.0 seconds and comparing like-for-like during the table scan.Years ago, I wrote a special-purpose SQL database that had this kind of native unit handling, but I've seen nothing before or since, and it seems like a gap in the UI ecosystem.
And it shouldn't be for time. We should have the whole inventory of units --- mass, volume, information, temperature, and so on. Why not? We can also teach the database to reject mathematical nonsense, e.g.
SELECT 2h + 15kg -- type error!
Doing so would go a long way towards catching analysis errors early.But bit pattern is an issue internal to the library. If you can find a bug in the code, certainly point it out and offer a fix if it’s in your skillset.
However, at the same time it seems to indicate that it stores data using sqlites built in number type, which to my understanding does not support unsigned? Secondly, the docs mention you can store with a range of 290 years and the precision is nanoseconds, which if you calculate it out works out to about 63 bits of information, suggesting a signed implementation.
(uint8)(-3) is 253, for example, and (uint8)5-(uint8)253 = (uint8)8, corresponding to 5 - (-3)
How so?
Here is an example for signed integers.
These represent zero time but have different representations in memory:
Seconds: 2 Nanoseconds: -2,000,000,000 (fits in a 32 bit number) Time: zero seconds
Seconds: -2 Nanoseconds: 2,000,000,000 Time: zero seconds
Here is an example for unsigned: Seconds: 1 Nanoseconds: 0 Time: 1 second
Seconds: 0 Nanoseconds: 1,000,000,000 Time 1 second
Extensible type system is a worst thing that could happend with database end-user performance. Then one may not short-cut no single thing in query parsing and optimization: you must check type of any single operand, find correct operator implemenation, find correct index operator family/class and many more all through querying system catalog. And input/output of values are also goes through the functions, stored in system catalog. You may not even answer to "select 1" without consulting with system catalog.
There should be sane set of builtin types + struct/json like way of composition. That is like most DBs do except PostgreSQL. And I strongly believe it is right way.
Not with static typing.
The problem with PG is that it's not fully statically typed internally. SQLite3 is worse still, naturally. But a statically typed SQL RDBMS should be possible.
I don't do "proper" science so the value of nanoseconds seems limited to very clever experiments (or some financial trade tracking that is probalby even more limited in scope).
But being able to represent historical dates seems more likely to come up?
Thoughts?
Simply reducing the precision to 10ns will provide enough range in practice.
Overall, this means if you're representing historical dates I would question whether a seconds-since-epoch timestamp representation is what you want at all, regardless of range and precision.
Edit: yes, you can kinda handle this as part of handling timezones, but still, it's complicated enough that you may want to retain more or different information if you're displaying or letting users enter historical dates.
With nanosecond precision? Just decide what you want to do beforehand, i bet even datetime don't make much sense for that time period, bare date would suffice. also, you'll likely need location, calendar system etc since real dates were not that standardized back then
select time_to_nano(time_now());
-- 1722979335431295000I wonder what does
select time_sub(time_date(2011, 11, 19), time_date(1311, 11, 18));
return?I can think of a few plausible reasons, but the only one that is really significant is "what epoch"? In the case of UNIX-based systems and systems that try to mimic that behaviour, that is well defined. But as you haven't said what your complaints are, it's hard to provide any counterpoint or justification for why things are as they are.
> time_date(1311, 11, 18)
That isn't defined in the epoch used by most computer systems, so all bets are off. Perhaps it'll return MAX_INT, MIN_INT, 0, something that's plausible but doesn't take into calendar reforms that have no bearing on the epoch being used, or perhaps it translates into a different epoch and calculates the exact number of seconds, or anything else. One could even argue that there are no valid epochs before GMT/UTC because it was all just local time before then.
But of course, you can argue either way whether -ve values should be supported. Exactly 24 hours before 1970-1-1 0:00:00 UTC could be reasonably expected to be -86400, on the other hand "since" strongly implies positive only.
Other people might have entirely different epochs for different reasons, again within the domain it's being used, that's fine as long as everyone agrees.
Or did you have some other objection?
If we add a leap second at the end of 2025, nothing in 2024 gets rewritten. Only the future meaning of pointer expressions like "12 pm on January 2nd 2025" change their value. When I want exactly 48 hours after 12 pm Dec 31, I use a leap second independent time representation. But since usually I want the same thing everyone calls 12 pm Jan 2, I usually use a representation that gives me that.
And I, among many, take meticulous care to do my date math (for a bank core system) only in ways that naturally support leap seconds.
Each value stored in an SQLite database (or manipulated by the database engine) has one of the following storage classes:
# some omitted...
INTEGER. The value is a signed integer, stored in 0, 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.Do you plan to address the use cases in the SO post, or asked differently - what is the intended use case of this library?
I tried to recreate it on your site (which is very cool btw in allowing the code to run in browser) and it seems to fail and give the wrong time difference.
select time_compare(time_date(1927, 12, 31, 23, 58, 08, 0, 28800000), time_date(1927, 12, 31, 23, 58, 09, 0, 28800000));
Results in an answer of '1', which is incorrect.Please don't take my comments as being negative or unappreciated, this is super difficult stuff and anyone who tries to make the world an easier place should be thanked for that. So thank you.
----
EDIT: this post explains why the answer isn't "1"
https://stackoverflow.com/questions/6841333/why-is-subtracti...
This query returns -1 (minus one, not one), which seems correct to me. The first date is before the second:
select time_compare(
time_date(1927, 12, 31, 23, 58, 08, 0, 28800000),
time_date(1927, 12, 31, 23, 58, 09, 0, 28800000)
);
-1That "unix-style monstrosity" is literally seconds since the UNIX time epoch, which is unambiguously defined as starting on 1970-1-1 0:00:00 UTC.
Or it would have been, had leap seconds not been forced upon the world in 1972, at which point yes, arguably it's no longer "physical earth seconds" since the epoch but "UNIX seconds" where a day is defined as exactly 86400 UNIX seconds.
In retrospect, it'd have been better if UNIX time was exactly a second, and the leap seconds accounted for by the tz database, but that didn't exist until over a decade after the first leap seconds were added, so probably everybody thought it was easier just to take the pragmatic option to skip the missing seconds, exactly the same way that the rest of the world was doing.
I'm still not sure if that's what your complaint is about, as I don't know of time systems defined any other way handle this correctly if you were to ask for the time difference in seconds between a time before and after a leap second.
Maybe a better question would be: what do you think would be a better way of defining a representation of a date and time, and that would allow for easy calculations and also easy transformations into how it's presented for users?
That could drift a bit if you can represent a 580 million year interval (with millisecond precision), wouldn't it?
Luckily, leap seconds are on the way out (https://en.wikipedia.org/wiki/Leap_second#International_prop...)
> And it's annoying that you need to read some footnote to figure out what exactly it means; it's annoying that it is basically a code-phrase that you just need to know that it's not supposed to be taken literally.
Even to point of deferring the real explanation to a secondary paragraph.
The problem is that we have a very precise definition of a second in terms of decay of atoms (maybe precise is the wrong word, as it's the statistical likelihood of x atoms decaying given certain conditions, but whatever). The problem is arguably that this is over-defined.
There's a good case that a second is actually how it's always been defined historically up until 1967 - as 1/60th of a minute, which is 1/60th of an hour, which is 1/24th of a day. That's what UNIX seconds are. 86400 of them in one day. And we have a pretty good idea what a solar day is, and have been doing calendars based on them for thousands of years.
But if you want to base your times on the decay of caesium, then you can do that, but you have to accept that it no longer corresponds neatly to a solar day any more. The length of a day fluctuates by a couple of seconds a day in either direction, which we largely just ignore because over time, that mostly cancels out. Personally, I don't think leap seconds should ever have been introduced - over the last century or so, the earth has been rotating "faster" than our idealised second based on radioactive decay, so we've added leap seconds. But more recently, it's been rotating "slower", and we're at the situation where we need negative leap seconds. Maybe really, we should have just left it alone and over a longer period it'd all have averaged out anyway.
But what's interesting is that apart from the meddling with leap seconds, we've decided that a "typical day" has exactly 86400 seconds where a second is some constant time, even though that isn't true of the reality of our planet. Some days are too short when defined this way, some days are too long. But on average, this 86400 seconds is pretty much right.
And arguably, any day that needs a leap second isn't "wrong", the problem is actually that we over defined a second before we realised that the periodicity of the solar day wasn't a constant. I wouldn't advocate trying to redefine what a second is again, because actually having a constant time second is incredibly useful for defining all the other derived SI units. But with that usefulness, you also need to be aware that it's not the same as the traditional timekeeping second.
But in any case, except for leap seconds, all the world's time systems agree on 84000 seconds per day. So, can you make the case for why you think UNIX time in particular is a problem? And what would you rather have instead?
Quite often, we want to use something originally recorded using the former to calculate the latter. It is most convenient to have a second that is of fixed duration. Which is kind of exactly why the tz database exists in the first place.
Except it's made more complicated because most systems that use unixtime also use NTP, and that means they employ smearing because essentially nothing in computing supports 23:59:60 or 23:59:61 or repeated seconds of 23:59:59. So on the day of the leap, the recorded time for events doesn't match standard time. Which is why the unsmear library exists (among others, probably).
Note that TAI (international atomic time) truly is the number of actual seconds since it was first synchronized in 1958. That is what is used to define UTC, and it's about 30 seconds ahead of UTC currently.
All that is to say... Calling unixtime "seconds since epoch" is a forgivable sin in terms of the practicalities of communication, but it's not really defensible as a matter of being a factual description of reality. The truth is that the new definition of a second was agreed upon decades before Unix came along, and when we're measuring time in seconds we don't typically care about the solar day or sidereal day. Further, there is no practical way to construct a computer or clock (barring a sundial) so that supports the original dynamic definition of time divisions. I can't even imagine how relativistic times with GPS satellites would have to work. It would be the longitude problem all over again.
Is it a factual description of reality? "Seconds since epoch" is an almost entirely abstract idea, given that neither seconds nor epochs exist in the universe. The only way it's connected to reality is that time moves forwards, so "since" has meaning. So it seems to me that someone who says "seconds since epoch" can choose to give the words any meaning they like as long as everyone understands what is meant.
So to me this is as relevant a complaint as saying that the special case of graphs shouldn't be called "trees" because branches in real trees sometimes rejoin. It's a metaphor. We're dealing with entirely human-made concepts with barely any input from the real world, we can use any words we like.
That's also why we don't use the kings body dimensions as the standard of measurement anymore. We don't need to recalibrate the entire nation's standard of "foot," "inch," "cubit," or whatever when the king grows. That's not useful.
Just because the inch is the width of an adult man's thumb doesn't mean the purpose of the inch is to define thumbs and should always follow from that. That is not the goal of wanting the measuring unit. It's the same for seconds.
That's why I said: Sometimes you want to know what time of day it is, and sometimes you want to know how long something took to occur. That is to say, a datetime and a timespan. Now, you could use a different unit of measurement for those to, but in practical terms it's stupid.
In the end, all you're doing is finding different ways to dissect physical constants. But you're still using those constant. The constant itself doesn't change because the Earth revolves slower today than it did yesterday, yet the planet certainly does revolve the same distance about its axis in more time.
Which is why I say distance and time are not really variable or imprecise (better terms than abstract, which simply has to many nebulous meanings) until you start to introduce relativity.
In fact, time is a fundamental element of many physical constants. The Planck constant, the gravitational constant, and the speed of light are all fundamentally based on time and distance. And since the speed of light relates to mass and energy, whatever units you pick need to be consistent. If we say that 1 second is always 1/84,000th of a solar Earth day (instantaneously I suppose?) then you've immediately made 3 of the known physical constants depending on how quickly the Earth spins. And you could do that and recalculate all of physics every day if you wanted to. But it's kind of stupid. It would be like re-graduating a tape measure every day you build a house. In theory it's just fine. Practically it's really not.