What is “Five Sigma” Data?

or “Why do some experiments take such a long time to run?”

Before you go any further, watch the first minute of this video of Professor Andrei Linde learning from Assistant Professor Chao-Lin Kuo of the BICEP2 collaboration that his life’s work on inflationary theory has been shown by experiment to be correct.

The line we’re interested in is this one from Professor Kuo:

“It’s five sigma at point-two … Five sigma, clear as day, r of point-two”

You can see, from Linde’s reaction and the reaction of his wife, that this is good news.

The “r of point-two” (i.e. r = 0.2) bit is not the important thing here. It refers to the something called the tensor-to-scalar ratio, referred to as r, that measures the differences in the polarisation of the cosmic microwave background radiation caused by gravitational waves (the tensor component) and those caused by density waves (the scalar component).

The bit we’re interested in is the “five sigma” part. Scientific data, particularly in physics and particularly in particle physics and astronomy is often referred to as being “five sigma”, but what does this mean?

Imagine that we threw two non-biased six-sided dice twenty thousand times, adding the two scores together each time. We would expect to find that seven was the most common value, coming up one-sixth of the time (3333 times) and that two and twelve were the least common values, coming up one thirty-sixth of the time (556 times each). The average value of the two dice would be 7.00, and the standard deviation (the average distance between each value and the average) would be 2.42.

I ran this simulation in Microsoft Excel and obtained the data below. The average was 6.996 and the standard deviation (referred to as sigma or ?) was 2.42. This suggests that there is nothing wrong with my data, as the difference between my average and the expected average was only 0.004, or 0.00385 of a standard deviation, and this is equivalent to a 99.69% chance that our result is not a fluke, but rather just due to the expected random variation.

20000-throws-fair

Now imagine that we have a situation in which we think our dice are “loaded” – they always come up showing a six. If we repeated our 20000 throws with these dice the average value would obviously 12.0, which is out from our expected average by 5.00 or 2.07 standard deviations (2.07?). This would seem to be very good evidence that there is something very seriously wrong with our dice, but a 2.07? result isn’t good enough for physicists. At a confidence level of 2.07? there is still a 1.92%, or 1 in 52, chance that our result is a fluke.

In order to show that our result is definitely not a fluke, we need to collect more data. Throwing the same dice more times won’t help, because the roll of each pair is independent of the previous one, but throwing more dice will help.

If we threw twenty dice the same 20000 times then the expected average total score would be 70, and the standard deviation should be 7.64. If the dice were loaded then the actual average score would be 120, making our result out by 6.55?, which is equivalent to a chance of only 1 in 33.9 billion that our result was a fluke and that actually our dice are fair after all. Another way of thinking about this is that we’d have to carry out our experiment 33.9 billion times for the data we’ve obtained to show up just once by chance.

This is why it takes a very long time to carry out some experiments, like the search for the Higgs Boson or the recent BICEP2 experiment referenced above. When you’re dealing with something far more complex than a loaded die, where the “edge” is very small (BICEP2 looked for fluctuations of the order of one part in one hundred thousand) and there are many, many other variables to consider, it takes a very long time to collect enough data to show that your results are not a fluke.

The “gold standard” in physics is 5?, or a 1 in 3.5 million chance of a fluke, to declare something a discovery (which is why Linde’s wife in the video above blurts out “Discovery?” when hearing the news from Professor Kuo). In the case of the Higgs Boson there were “tantalising hints around 2- to 3-sigma” in November of 2011, but it wasn’t until July 2012 that they broke through the 5? barrier, thus “officially” discovering the Higgs Boson.

dBA and Grey Noise

The human ear doesn’t hear equally well at all frequencies. The ear is much less sensitive to low frequencies, below about 1000 Hz, and to high frequencies above about 6000 Hz, and peaks in sensitivity at around 2500 Hz.

A microphone doesn’t have the same issue. This means that after sound is recorded, a filter is applied so that the recorded sound mimics what a human ear would have heard. This filter is called A-weighting, and the volume of sound that is recorded is referred to as dB(A).

dBA-weighting-linear
dB(A) weighting (linear frequency scale)

dBA-weighting-logarithmic
dB(A) weighting (logarithmic frequency scale)

White noise is often taken to be equally loud at all frequencies, but this is not the case: although the sound that is produced is equally loud at all frequencies, this is not what the ear hears. Grey noise is white noise that has been A-weighted so that it is heard to be equally loud at all frequencies.

White noise:

Grey noise:

The Fibonacci Sequence and Converting from Miles to Kilometres

The Fibonacci Sequence is a very famous sequence of numbers, named after Italian mathematician Leonardo Fibonacci (though the sequence had already been described by Indian mathematicians). Each term in the sequence is composed of the sum of the previous two terms:

F = 1, 1, 2, 3, 5, 8, 13, 21, 34, 55 and so on …

As the sequence grows longer the ratio of each term to the previous term becomes closer and closer to the golden ratio of 1.618 (to four significant figures). This is helpful because the ratio of kilometres to miles is 1.609, which differs by only 0.556 percent.

Therefore, to quickly convert from miles to kilometres you only need to find the value in miles in the Fibonacci sequence and look at the next number in the sequence: 21 miles is 34 kilometres, 34 miles is 55 kilometres and so on. If converting in the other direction, look at the previous term to convert from kilometres to miles.

Bent Spears, Broken Arrows and Empty Quivers

In my research for a previous post I came across the US’s official list of nuclear weapons-related codewords, and they are some of my favourite codewords ever.

PINNACLE is a codeword (technically a flagword) that indicates that a message is of interest to the major command units of the military. It’s mentioned here because whilst it can be used on its own, it is often used, or must be used, in combination with the codewords listed below.

BENT SPEAR is used to report incidents involving nuclear weapons that are “of significant interest” but which are not categorised as NUCFLASH or BROKEN ARROW. The incident in which six AGM-129 cruise missiles with live 150 kiloton W80-1 nuclear warheads (which were supposed to have been removed) were loaded onto a B-52 Stratofortress bomber and left unguarded at Minot and Barksdale Air Force Bases was classified as a BENT SPEAR.

NUCFLASH is used to report incidents that could create a risk of nuclear war. This includes any incident involving the actual or possible detonation of a nuclear weapon, or any incident in which a nuclear-armed or nuclear-capable aircraft deviates from its approved flightplan. It also covers incidents with the possibility of, or the appearance of, a nuclear detonation or attack, such as a ballistic missile launch, the presence of cruise missiles on non-friendly aircraft that are not on an approved flight path, or objects from space reentering Earth’s atmosphere. A PINNACLE NUCLFASH report has the highest priority of any report in the US military.

BROKEN ARROW is used to report incidents involving US nuclear weapons that do not create the risk of nuclear war. This includes the nuclear or non-nuclear detonation of a US nuclear weapon, the burning or jettisoning of a nuclear weapon or radioactive contamination or other hazard from a US nuclear weapon. The incident in which a nuclear-armed Titan-II missile caught fire and exploded in its silo was classified as a BROKEN ARROW, as were a number of incidents in which B-52 bombers carrying nuclear weapons crashed. (The incident in the 1996 movie Broken Arrow would actually have been classified as EMPTY QUIVER.)

EMPTY QUIVER is used to report the seizure, theft or loss of a nuclear weapon. The incidents in which the USS Scorpion submarine sank with two eleven kiloton Mark 45 nuclear torpedoes aboard, or the incident in which an A-4E Skyhawk aircraft carrying a one megaton B43 bomb fell over the side of the aircraft carrier USS Ticonderoga would probably be classified as EMPTY QUIVER events.

DULL SWORD is used to report minor incidents involving nuclear weapons or systems which could impair their ability to be deployed. This includes damage to systems capable of carrying or deploying nuclear weapons but which are not carrying nuclear weapons at the time. FADED GIANT is used to report incidents involving military nuclear reactors, or any other military radiological incident that does not involve nuclear weapons.

Two less cool-sounding codewords are EMERGENCY EVACUATION and EMERGENCY DISABLEMENT. The EMERGENCY EVACUATION codeword is used when nuclear weapons have to be removed from their approved location at short notice, without advance planning, e.g. if an Air Force base or silo holding nuclear weapons was being overrun by enemy forces. EMERGENCY DISABLEMENT refers to the use of the weapon’s command disable system, in which a warhead is deliberately made inoperational, preventing its use by enemy forces. The method by which this is achieved is unknown, but it is thought to operate by destroying either the warhead’s power supply, the sensitive electronic components within the warhead, or another part of the warhead’s triggering system.