Making every (leap) second count with our new public NTP servers

Making every (leap) second count with our new public NTP servers(cloudplatform.googleblog.com)

110 points by scommab 9 years ago | 69 comments

liotier 9 years ago |

“Leap Smearing must not be used for public-facing NTP servers” - https://tools.ietf.org/html/draft-ietf-ntp-bcp-02

etatoby 9 years ago | |

There's good reason it's in the standard. Many applications of computer science rely on being able to accurately measure time.

Having every second increased by a non-trivial amount (~0.001%) on some days, and not on others, will produce subtly wrong results in all kinds of fields, from manufacturing to astronomy.

This was a bad, bad choice by Google.

acqq 9 years ago | |

I don't see any possibility for problems, as long as you just do what has sense to do.

https://developers.google.com/time/

"We recommend that you don’t configure Google Public NTP together with non-leap-smearing NTP servers."

If you point your NTP clients to time1.google.com to time4.google.com then don't point them to anything else.

That's all.

If you use Google's time servers you are using them either to be fully in sync with Google or because you like the time-smearing feature. In both cases, just use them, don't mix. Think it as a non-standard-service which is for convenience API compatible with the "standard" NTP.

As magicalist pointed, there are already other smearing algorithms online:

https://developers.google.com/time/smear#othersmears

and Google plans to switch to the new algorithm soon. If all those who need smear standardize around one algorithm, it's going to be even better: there will be one more standard, with the new name, but then it will be even more obvious to everybody what's going on. Obviously both approaches are needed, depending on the usage scenario.

klodolph 9 years ago | |

Wow, that's a really boneheaded thing to put in a standard. I think we can all agree that it's important to make leap smearing available for those who want to use it, especially considering the bugs in leap second handling for common NTP clients.

paulajohnson 9 years ago | | |

I disagree. The point of NTP, and of time services in general, is that everyone agrees about the time. If an organisation wants to use non-standard time it can, but public-facing NTP servers should all agree and all provide the standard time. Google, for whatever reasons, is making its NTP servers deliberately wrong, and there is no mechanism in NTP for a server to say "I'm using time-smearing". So they shouldn't be doing this on public-facing NTP.

jve 9 years ago | | |

Is that a standard?

              Network Time Protocol Best Current Practices
                         draft-ietf-ntp-bcp-02

1. Introduction

   NTP Version 4 (NTPv4) has been widely used since its publication as
   RFC 5905 [RFC5905].  This documentation is a collection of Best
   Practices from across the NTP community.

agwa 9 years ago | |

That's an Internet-Draft, which is a work-in-progress of the IETF. It's not a formal specification. https://www.ietf.org/id-info/

brandmeyer 9 years ago |

> Instead of adding a single extra second to the end of the day, we'll run the clocks 0.0014% slower across the ten hours before and ten hours after the leap second, and “smear” the extra second across these twenty hours.

Holy leaping second, batman! Unilaterally being off by up to a half second from the rest of the world's clocks is a pretty aggressive step. I think I would have preferred to see a resolution made by an independent body on something this drastic.

nullc 9 years ago |

I predicted this for leap smear a while back-- we have time sync because having systems with different times is a source of problems... logical fix: get them onto the same time.

Smear is a workaround for those who care about phase alignment but don't care about frequency error. ... and who don't need to exchange times with anyone else. This last point reduces the set to no one, since it can't extend to everyone (some parties care a lot more about frequency error than phase error!).

This circus is enhanced by NTP's inability to tell you what timebase it's using (or, god forbid, offsets between what its giving you and other timebases...)

It's going be especially awesome when NTP daemons with both smear and non-smear peers get both the smear frequency error AND get a leap second.

I for one welcome this great opportunity for an enhanced trash fire to help convince the world that we need to stop issuing leap seconds. (It's absurd-- causes tens of millions in disruption easily, -- and it would take 4000 years to even drift an hour off solar time, at which point timezones could be rotated if anyone really cared).

detaro 9 years ago | |

> Smear is a workaround for those who care about phase alignment but don't care about frequency error. ... and who don't need to exchange times with anyone else. This last point reduces the set to no one, since it can't extend to everyone (some parties care a lot more about frequency error than phase error!).

I don't quite understand that point. E.g. the typical web server doesn't have much of a need to exchange precise time with others. HTTP, TLS, ... require timestamps, timestamps are shown to users occasionally, but as long as they are roughly right that is enough. As long as all internal systems work off the same standard it is fine. Which seems to be the reasoning under which Google choose to use it, even though one might argue that with their cloud offerings they are not as insular.

leephillips 9 years ago |

A lot of interesting geophysics in the unpredictable need for leap seconds. I mention Google's "smearing" approach here:

http://arstechnica.com/science/2016/04/the-leap-second-becau...

leni536 9 years ago |

Why the hell aren't time servers and clients sync to TAI instead? Dealing with leap seconds should be a client side problem.

Unklejoe 9 years ago | |

At least the IEEE 1588-2008 (PTPv2) protocol uses TAI time (with the POSIX epoch). The current UTC offset is then passed in the Announce messages (as well as some flags for indicating an upcoming leapsecond) which allows the slave to derive UTC if it wants to.

nothrabannosir 9 years ago | |

Because leap seconds are not deterministic, kind of like time zone changes. It would make timestamp <--> date calculations A) harder and B) need constant updates to work. Hell for firmware and embedded code.

Edit: the worst thing is: it would make those calculations harder to do correctly, but it would be too seductive to not care for leap seconds. After all, what's a few seconds error, really? This leaves you in a state where, guaranteed, 99% of time software will not be correct and the error will compound over time.

russdill 9 years ago | |

Yup, it seems awesome, but software needs to be written to handle it properly. I think it'd work no problem with software already written against monotonic clocks, but everything else would probably need some fixing.

paulajohnson 9 years ago | | |

The problem is that time_t (seconds since 1970) implicitly assumes 86400 seconds per day. You would have to redefine time_t and rewrite every piece of code that uses it.

azinman2 9 years ago | | |

Right... Because leap seconds are high on everyone's priority list. It's the programmers fault!

Or... recognize that super rare bugs are inevitable and create a higher level way to avoid them entirely. I vote for option 2.

detaro 9 years ago | |

Google introduced this exactly to not have to deal with it client-side, while still having UTC timestamps that match the rest of the world most of the time.

leni536 9 years ago | | |

You have to deal with it at client side. It doesn't make sense to fix broken software by introducing broken time servers. It could make sense to provide a wrapper library that provides smeared leap seconds and use that for broken software only (like libfaketime).

Now if I want to write software that uses precise TAI, I can't do that because of broken UTC from time servers and TAI is defined as UTC+tai_offset on my side.

antoncohen 9 years ago |

For people talking about Google unilaterally doing this, it has been common to smear the leap second for the last couple years. Usually companies do it internally by having their NTP servers skew time, either with Chrony or `ntpd -x`. Standards bodies have not been able to react quickly enough to the need to smear the leap second in a consistent way. I'm thankful that Google has decided to run public NTP servers with consistently smeared leap seconds.

Here are two Red Hat articles on how to deal with the leap second, from 2016 and 2015:

https://access.redhat.com/articles/15145

http://developers.redhat.com/blog/2015/06/01/five-different-...

JdeBP 9 years ago | |

I hope that there are people on standards bodies who remember or learned what it was like before UTC when civil time seconds were not one SI second long, and in effect "smearing" happened all the time.

newman314 9 years ago |

Does anyone know if Google has open sourced the time smearing algorithm?

detaro 9 years ago | |

They discuss various ways of smearing here, but I haven't seen their actual implementation code: https://developers.google.com/time/smear