Making every (leap) second count with our new public NTP servers(cloudplatform.googleblog.com) |
Making every (leap) second count with our new public NTP servers(cloudplatform.googleblog.com) |
Having every second increased by a non-trivial amount (~0.001%) on some days, and not on others, will produce subtly wrong results in all kinds of fields, from manufacturing to astronomy.
This was a bad, bad choice by Google.
https://developers.google.com/time/
"We recommend that you don’t configure Google Public NTP together with non-leap-smearing NTP servers."
If you point your NTP clients to time1.google.com to time4.google.com then don't point them to anything else.
That's all.
If you use Google's time servers you are using them either to be fully in sync with Google or because you like the time-smearing feature. In both cases, just use them, don't mix. Think it as a non-standard-service which is for convenience API compatible with the "standard" NTP.
As magicalist pointed, there are already other smearing algorithms online:
https://developers.google.com/time/smear#othersmears
and Google plans to switch to the new algorithm soon. If all those who need smear standardize around one algorithm, it's going to be even better: there will be one more standard, with the new name, but then it will be even more obvious to everybody what's going on. Obviously both approaches are needed, depending on the usage scenario.
Network Time Protocol Best Current Practices
draft-ietf-ntp-bcp-02
1. Introduction NTP Version 4 (NTPv4) has been widely used since its publication as
RFC 5905 [RFC5905]. This documentation is a collection of Best
Practices from across the NTP community.Holy leaping second, batman! Unilaterally being off by up to a half second from the rest of the world's clocks is a pretty aggressive step. I think I would have preferred to see a resolution made by an independent body on something this drastic.
Smear is a workaround for those who care about phase alignment but don't care about frequency error. ... and who don't need to exchange times with anyone else. This last point reduces the set to no one, since it can't extend to everyone (some parties care a lot more about frequency error than phase error!).
This circus is enhanced by NTP's inability to tell you what timebase it's using (or, god forbid, offsets between what its giving you and other timebases...)
It's going be especially awesome when NTP daemons with both smear and non-smear peers get both the smear frequency error AND get a leap second.
I for one welcome this great opportunity for an enhanced trash fire to help convince the world that we need to stop issuing leap seconds. (It's absurd-- causes tens of millions in disruption easily, -- and it would take 4000 years to even drift an hour off solar time, at which point timezones could be rotated if anyone really cared).
I don't quite understand that point. E.g. the typical web server doesn't have much of a need to exchange precise time with others. HTTP, TLS, ... require timestamps, timestamps are shown to users occasionally, but as long as they are roughly right that is enough. As long as all internal systems work off the same standard it is fine. Which seems to be the reasoning under which Google choose to use it, even though one might argue that with their cloud offerings they are not as insular.
http://arstechnica.com/science/2016/04/the-leap-second-becau...
Edit: the worst thing is: it would make those calculations harder to do correctly, but it would be too seductive to not care for leap seconds. After all, what's a few seconds error, really? This leaves you in a state where, guaranteed, 99% of time software will not be correct and the error will compound over time.
Or... recognize that super rare bugs are inevitable and create a higher level way to avoid them entirely. I vote for option 2.
Now if I want to write software that uses precise TAI, I can't do that because of broken UTC from time servers and TAI is defined as UTC+tai_offset on my side.
Here are two Red Hat articles on how to deal with the leap second, from 2016 and 2015:
https://access.redhat.com/articles/15145
http://developers.redhat.com/blog/2015/06/01/five-different-...
https://developers.google.com/time/smear#othersmears
> preferred to see a resolution made by an independent body
Independent bodies have spent the last decade debating if leap seconds should even exist. Agreeing on how to treat them if we keep them is way down the priority list.
Hilarious. Was this intentional?
[1] for good reasons
Google doesn't think so: "No commonly used operating system is able to handle a minute with 61 seconds"
> All Google services, including all APIs, will be synchronized on smeared time, as described above. You’ll also get smeared time for virtual machines on Compute Engine if you follow our recommended settings.
Yup. Even if your NTP UTP source is perfect (lol), there aren't any cryptographically authenticated sources for the offset as far as I'm aware. (and NTP doesn't carry even an unauthenticated one).
GPS carries an offset between UTC and the (leapsecondless) GPS timescale... but the GPS signal is a bit of a pain to get to, and also unauthenticated...
Leap seconds exist only in real time, not in historic recorded time.
There are in fact 86400 "calendar seconds" in a day, exactly.
Essentially, when a day is done, we call it 84600, even though it's actually 86400.epsilon.
Only special applications need to know the exact physical number of seconds between two calendar times, rather than the calendar seconds.
Leap seconds basically add a corrective jump to physical time (what is measured by our super accurate clocks that use physical seconds and not calendar seconds) to match calendar time.
Leap seconds matter if you're doing some scientific or engineering calculation (astronomy, aerospace or whatever) and you need an exact physical time down the fraction of a second between two events that are far apart in the calendar.
They do not enter into everyday calculations, like using time_t seconds to calculate the number of days between two dates.
Imho a bit short notice to publish something like that Nov. 30th. Just at the time when they had to start advertising the leap second in NTP (or not advertise it where smearing). Not sure but somehow it sounds in draft-ietf-ntp-bcp-02 that that clients may need attention.
Clients that are connected to leap smearing servers must not apply
the "standard" NTP leap second handling. So if they are using ntpd,
these clients must not have a leap second file loaded, and the
smearing servers must not advertise that a leap second is pending.Remember when leap seconds caused ALL JVMs to lock up until restarted? Or kernel bugs? ick!
No. Leap seconds were a rationalization of a prior system that really was ugly for computers. The conversion between TAI and UT(n) that was used before UTC involved table-driven algorithms with multiple rules and microsecond adjustments.
If you thought that six months' notice to add a leap second to /etc/leapsecs.dat is a huge imposition, then you you should try creating a computer system that can cope with rules like "for the next three months, from January the 1st to March the 31st 1964, you must add 0.001296 of a second for each day since 38761 and then add a further 3.240130 seconds".
Ironically, UTC and the leap second system are geared towards the same sort of timekeeping that computers do and away from the civil timekeeping that preceded it: a constant length second that can be measured with oscillators and electronic counters, being the basis for civil time; rather than astronomical calculation.
In the spirit of DevOps small frequent changes are better than big infrequent changes we have leap seconds instead of leap minutes, hours, days etc. In this way noon is still when the sun is at it's highest (+/- 0.5 relative earth surface seconds).
It is better to be deliberately wrong in a controlled fashion than to be accidentally wrong because you never expected your clock to be non-monotonic. You seem to be arguing for the status quo, are you aware of just how deeply broken the status quo is?
Windows, for example. That's another pretty big example. Just ignores the leap second bit and goes backwards at the next synchronization.
I'm not even talking about bugs here—these are straight up design flaws.
That should most definitely be in the standard, along with communicating to the client full details about how smearing is configured.
I don't, however, think it makes sense to unilaterally change this, without (any obvious signs of) coordination with the timekeeping maintainers and the maintainers of major NTP servers.
Never mind the theory, the practice is a clusterfuck.
gettimeofday doesn't return hour/minute/second divisions; it just returns seconds/microseconds since the epoch. Functions like strftime and gmtime handle the components of time. And leap seconds don't make applications see 59 twice; they make them see 60 once (58, 59, 60, 0, 1, ...).
Quoting the manpages for gmtime and strftime:
> tm_sec The number of seconds after the minute, normally in the range 0 to 59, but can be up to 60 to allow for leap seconds.
> %S The second as a decimal number (range 00 to 60). (The range is up to 60 to allow for occasional leap seconds.) (Calculated from tm_sec.)
Google have never been in the NTP business - there's no reason for them to have worked towards a concensus on this. But when a company their size makes their approach publicly available to all, it starts to pave the way for a consistent standard for everyone.
Another example: Suppose that a portion of an industrial monitoring system processes remote sensor data in a cloud datacenter with smeared time, while the sensor nodes keep strict UTC time. Your SCADA system had better not have any hard-baked assumptions like "messages cannot come from the future", or you're going to have a hard time, too.
Lets say that a company's internal NTC servers include several sources for reliability and redundancy. Much like Google DNS, perhaps one of the sources is Google NTP, while another is derived from the NTP pool. How do you expect the NTP daemon to behave in this situation? It will certainly be able to observe a 500ms difference between its source timeservers.
I can't think of anyone who cares that much about timekeeping who isn't running their own internal NTP infrastructure.
Google's Spanner requires accurate global time, so they deployed GPS and atomic clocks. Same for CDMA. There are some applications for high-resolution time (eg finance), so protocols like PTP exist.
A smeared NTP source in an otherwise normal list of time sources doesn't seem like that big of a deal either - eventually the daemon is just going to mark it as a falseticker and life goes on.
If you run it on a VM it's IMHO your responsibility to make sure your time sensitive database nodes have shared time.
Same for the second example, interesting point for SaaS scenario, although it seems like that could break through normal deviations already.
EDIT: ok, the blog post actually mentions "local clocks in sync with VM instances running on Google Compute Engine", my bad. Not sure what to think about that. In comparison, Amazon recommends running NTP on your VMs and their Linux AMIs come with pool.ntp.org configured as default. </edit>
Third: It's going to figure out some solution (if Google is only one source it's probably going to drop it as faulty), but you probably should not have added a time source that's officially documented to not strictly follow standards. It's not like Google offered a NTP service for years and now suddenly switched how it works.
I guess I underestimate the amount of trust people put into random time sources: practice is probably messier than theory.
From their FAQ:
> We recommend that you do not mix smeared and non-smeared NTP servers. The results during a leap second may be unpredictable.
I read that as a soft SHOULD NOT, not MUST NOT. Would be a fun exercise to try doing it intentionally with common NTP implementations and see what happens.
* http://www.madore.org/~david/computers/unix-leap-seconds.htm...
And if you replace a simple API with one that requires distributing leap-second tables…
Not much worse than the distributed time zone tables we already need to update thrice a year. At least leap seconds aren't decided on by politicians.