Leap smear

For reasons I have discussed before it is occasionally necessary to add* a leap second to the time, in order to keep the time on Earth in line with Earth’s inconsistent rotation.

Many systems require an accurate time to function correctly and the addition of a leap second can cause these systems to malfunction. In June 2012 the addition of a leap second caused a number of major websites such as Reddit, FourSquare, Yelp, LinkedIn, Gawker and StumbleUpon to malfunction and crash, but Google came up with a unique workaround – the Leap Smear – that prevented this from happening.

Google, like many others, uses the Network Time Protocol (NTP), to synchronise time across a network.† In order to cope with the leap second problem they configured their NTP servers to gradually add a small fraction of a second over a long period of time (in this case one day) so that at the end of this period their NTP servers’ time would have caught up with the adjusted time.

Google used the following algorithm:

t \left(\textnormal{Google}\right) = t + gain \left( 1 -cos \left( \pi \left( \frac{t}{window} \right) \right) \right)

Where t(Google) is the time according to Google’s NTP servers; t is the actual UTC time; gain is the desired amount of gain time (in this case one second); and window is the time over which this gain should happen (in this case twenty-four hours).

The effect of using the cosine function is such that the time offset is small at first (in the first hour only four milliseconds are added) and gradually increases (to sixty-five milliseconds per hour at most) before decreasing again towards the end of the window.

leap-smear-offset

This prevented servers and devices connected to Google’s NTP servers from “noticing” that something was wrong and applying their own corrections.

As they say in their blog post,

The leap smear is talked about internally in the Site Reliability Engineering group as one of our coolest workarounds, that took a lot of experimentation and verification, but paid off by ultimately saving us massive amounts of time and energy in inspecting and refactoring code. It meant that we didn’t have to sweep our entire (large) codebase, and Google engineers developing code don’t have to worry about leap seconds.

I wouldn’t be at all surprised to see others employing Google’s Leap Smear technique in the future.

* There are also provisions to subtract a leap second, but this has never yet happened.

† The NTP does contain a “leap indicator” but Google decided to force their NTP servers not to apply this.

Leave a Reply