Show HN: High-precision date/time in SQLite

Show HN: High-precision date/time in SQLite(antonz.org)

274 points by nalgeon 1 year ago | 68 comments

alberth 1 year ago |

Does this handle the special case of timezone changes (and local time discontinuity) that Jon Skeet famously documented?

https://stackoverflow.com/questions/6841333/why-is-subtracti...

And computerphile explains so well in their 10-min video:

https://www.youtube.com/watch?v=-5wpm-gesOY

---

I've long ago learned to never build my own Date/Time nor Encryption libraries. There's endless edge cases that can bite you hard.

(Which is also why I'm skeptical when I encounter new such libraries)

sltkr 1 year ago | |

This library doesn't deal with the notion of local time at all. It's all UTC-based times, possibly with a user-supplied timezone offset, but then the hard part of calculating the timezone offset must be done by the caller.

I do think the documentation could be a little clearer. The author talks about “time zones” but the library only deals with time zone offsets. (A time zone is something like America/New_York, while a time zone offset is the difference to UTC time, which is -14400 seconds for New York today, but will be -18000 in a few months due to daylight saving time changes.)

Someone 1 year ago | | |

> It's all UTC-based times

Not even that. UTC has leap seconds, which this code doesn’t handle (FTA: “The calendrical calculations always assume a Gregorian calendar, with no leap seconds”)

It copies that from the golang time package, which makes the same claim (https://pkg.go.dev/time)

That makes life a lot simpler for the implementer, but doesn’t that mean you can only reliably use these two libraries for computing with durations, not with moments in time or vice versa? The moment you start mapping these times to real world clocks and adding durations to them, you run the risk of getting small (up to about half a minute, at the moment) inconsistencies.

nalgeon 1 year ago | | |

Thanks for the suggestion! True, only fixed offsets are supported, not timezone names.

akira2501 1 year ago | | |

> A time zone is something like America/New_York

It's US/Eastern. Paul Eggert can call this a "deprecated compatibility time" all he wants, but "Eastern Time Zone" is the official name of the time zone as maintained by the civil time keeping authority.

mynameisash 1 year ago |

I find the three different time representations/sizes curious (eg, what possible use case would need nanosecond precision over a span of billions of years?). More confusing is that there's pretty extreme time granularity, but only ±290 years range with nanosecond precision for time durations?

michaelt 1 year ago | |

> what possible use case would need nanosecond precision over a span of billions of years?

Once you've decided you're using nanosecond precision, a 64-bit representation can only cover 584 years which ain't enough. You really want at least 2 more bits, so you can represent 2024 years.

But once you're adding on 2 bits, why not just add on 16 or even 32? Then your library can cover the needs of everyone from people calculating how it takes light to travel 30cm, to people calculating the age of the universe.

That's how I imagine the design decisions went, anyway :)

Of course you can't really provide sub-second accuracy without leapsecond support and what does pre-human-civilisation leapsecond support even mean?

nalgeon 1 year ago | |

It works very well for me and thousands of other Go developers. That's why I chose this approach.

g15jv2dp 1 year ago | | |

There's no reason it wouldn't "work", the question is "why". Having such precise dates obviously comes with some compromises (e.g., the representation is larger, or it's variable depending on the value which comes with additional complexity, etc.). So surely there must be some pros to counterbalance the cons. "Because it's what Go does" is an answer, but I don't know if it's a convincing one.

bongodongobob 1 year ago | | |

Nice. Smoking cigarettes works for me and millions of others but it's still stupid and will take years or decades of your life.

quotemstr 1 year ago |

Related tangent: databases should track units. If I have a time column, I should be able to say a column represents, say, durations in float64 seconds. Then I should be able to write

    SELECT * FROM my_table WHERE duration_s >= 2h

and have the database DWIM, converting "2h" to 7200.0 seconds and comparing like-for-like during the table scan.

Years ago, I wrote a special-purpose SQL database that had this kind of native unit handling, but I've seen nothing before or since, and it seems like a gap in the UI ecosystem.

And it shouldn't be for time. We should have the whole inventory of units --- mass, volume, information, temperature, and so on. Why not? We can also teach the database to reject mathematical nonsense, e.g.

    SELECT 2h + 15kg -- type error!

Doing so would go a long way towards catching analysis errors early.

zokier 1 year ago | |

Postgresql interval units allow already querying with natural-like expressions: https://www.postgresql.org/docs/current/datatype-datetime.ht...

n_plus_1_acc 1 year ago | |

What about leap seconds?

quotemstr 1 year ago | | |

The leap second mechanism amounts to a collective agreement to rewrite chronological history. It's like a git rebase for your clock. Everyone (almost) in practice does math as if leap seconds never happened, and the consequent divergence from physical time ends up not mattering.

davidhyde 1 year ago |

I think it’s important to be explicit about whether or not signed integers are used. From reading the document it seems that they may be signed but they could not be. If they are signed then you could have multiple bit strings that represent the same date and time which is not great.

jagged-chisel 1 year ago | |

Definitely signed - “use negative duration to subtract”

But bit pattern is an issue internal to the library. If you can find a bug in the code, certainly point it out and offer a fix if it’s in your skillset.

sigseg1v 1 year ago | | |

I think the negative number here refers to the amount of days/etc to subtract (eg. add negative days to subtract, not supply a negative date).

However, at the same time it seems to indicate that it stores data using sqlites built in number type, which to my understanding does not support unsigned? Secondly, the docs mention you can store with a range of 290 years and the precision is nanoseconds, which if you calculate it out works out to about 63 bits of information, suggesting a signed implementation.

gcr 1 year ago | | |

Subtraction of unsigned negative values still works just fine because of two’s compliment.

(uint8)(-3) is 253, for example, and (uint8)5-(uint8)253 = (uint8)8, corresponding to 5 - (-3)

kaoD 1 year ago | |

> multiple bit strings that represent the same date and time

How so?

davidhyde 1 year ago | | |

You’re right, whether or not the integers are signed has nothing to do with the issue above. Unsigned integers have the same issue.

Here is an example for signed integers.

These represent zero time but have different representations in memory:

Seconds: 2 Nanoseconds: -2,000,000,000 (fits in a 32 bit number) Time: zero seconds

Seconds: -2 Nanoseconds: 2,000,000,000 Time: zero seconds

Here is an example for unsigned: Seconds: 1 Nanoseconds: 0 Time: 1 second

Seconds: 0 Nanoseconds: 1,000,000,000 Time 1 second

simontheowl 1 year ago |

Very cool - definitely an important missing feature in SQlite.

cryptonector 1 year ago |

I so wish that SQLite3 had an extensible type system.

funny_falcon 1 year ago | |

As a PostgreSQL smallish contributor I just can say: NO, DON'T DO THIS!!!!

Extensible type system is a worst thing that could happend with database end-user performance. Then one may not short-cut no single thing in query parsing and optimization: you must check type of any single operand, find correct operator implemenation, find correct index operator family/class and many more all through querying system catalog. And input/output of values are also goes through the functions, stored in system catalog. You may not even answer to "select 1" without consulting with system catalog.

There should be sane set of builtin types + struct/json like way of composition. That is like most DBs do except PostgreSQL. And I strongly believe it is right way.

cryptonector 1 year ago | | |

> you must check type of any single operand, find correct operator implemenation, find correct index operator family/class and many more all through querying system catalog.

Not with static typing.

The problem with PG is that it's not fully statically typed internally. SQLite3 is worse still, naturally. But a statically typed SQL RDBMS should be possible.

lifeisstillgood 1 year ago |

This is a sort of lazy Ask HN: but in your experience, what is more useful / valuable - nanosecond representation, or years outside the nano range of something like 1678-2200

I don't do "proper" science so the value of nanoseconds seems limited to very clever experiments (or some financial trade tracking that is probalby even more limited in scope).

But being able to represent historical dates seems more likely to come up?

Thoughts?

cyberax 1 year ago | |

Historical dates, for sure.

Simply reducing the precision to 10ns will provide enough range in practice.

rokkamokka 1 year ago | |

A bit like asking if a hammer or a screwdriver is more useful. It depends on the work

out_of_protocol 1 year ago |

Why not go golang style, unix timestamp as nanoseconds, in signed int64. Maybe you can't cover millions of years with nanosecond precision, do you really need it?

commodoreboxer 1 year ago | |

With that precision and size, you can only cover the years from 1678 to 2262, which strongly limits your ability to represent historical dates and times.

azornathogron 1 year ago | | |

If you're representing dates back into the 1600s you need to keep in mind that calendar maths and things like "was this year a leap year" become more complicated. The Gregorian calendar was introduced in the 1500s but worldwide adoption took a long time - for example, the UK didn't adopt it until the 1700s. So you've got more than a century where just having "a date" isn't really sufficient information to know when something happened, you'll need to also know what calendar system that date is in.

Overall, this means if you're representing historical dates I would question whether a seconds-since-epoch timestamp representation is what you want at all, regardless of range and precision.

Edit: yes, you can kinda handle this as part of handling timezones, but still, it's complicated enough that you may want to retain more or different information if you're displaying or letting users enter historical dates.

out_of_protocol 1 year ago | | |

> represent historical dates and times.

With nanosecond precision? Just decide what you want to do beforehand, i bet even datetime don't make much sense for that time period, bare date would suffice. also, you'll likely need location, calendar system etc since real dates were not that standardized back then

nalgeon 1 year ago | |

Storing unix timestamp as nanoseconds is not Go's style, but you can do just that with this extension.

    select time_to_nano(time_now());
    -- 1722979335431295000

zokier 1 year ago |

I just wish people would stop using the phrase "seconds since epoch" (or equivalent) unless that is exactly what they mean.

I wonder what does

    select time_sub(time_date(2011, 11, 19), time_date(1311, 11, 18));

return?

ralferoo 1 year ago | |

Why do you wish that?

I can think of a few plausible reasons, but the only one that is really significant is "what epoch"? In the case of UNIX-based systems and systems that try to mimic that behaviour, that is well defined. But as you haven't said what your complaints are, it's hard to provide any counterpoint or justification for why things are as they are.

> time_date(1311, 11, 18)

That isn't defined in the epoch used by most computer systems, so all bets are off. Perhaps it'll return MAX_INT, MIN_INT, 0, something that's plausible but doesn't take into calendar reforms that have no bearing on the epoch being used, or perhaps it translates into a different epoch and calculates the exact number of seconds, or anything else. One could even argue that there are no valid epochs before GMT/UTC because it was all just local time before then.

But of course, you can argue either way whether -ve values should be supported. Exactly 24 hours before 1970-1-1 0:00:00 UTC could be reasonably expected to be -86400, on the other hand "since" strongly implies positive only.

Other people might have entirely different epochs for different reasons, again within the domain it's being used, that's fine as long as everyone agrees.

Or did you have some other objection?

zokier 1 year ago | | |

The problem with "seconds since epoch" expression is that almost always it doesn't mean literally seconds since epoch, but instead some unix-style monstrosity. And it's annoying that you need to read some footnote to figure out what exactly it means; it's annoying that it is basically a code-phrase that you just need to know that it's not supposed to be taken literally.

nalgeon 1 year ago | |

> If the result exceeds the maximum value that can be stored in a Duration, the maximum duration will be returned.