In the International System of Units there are standard prefixes, based on powers of ten, used to indicate multiplication or division: kilo- to indicate multiplication by one thousand (103), mega- to indicate multiplication by one million (106), giga- to indicate multiplication by one billion (109), and so on.
But computer scientists don’t like powers of ten. The most basic unit of digital storage, the bit, is represented either as a one or a zero (with eight bits making a byte) and thus computer scientists are much happier working in binary, with powers of two rather than powers of ten. Standard binary prefixes do exist: kibi- for 210, mebi- for 220, gibi- for 230, etc.
SI Unit | Size /B | Binary Unit | Size /B |
kilobyte (kB) | 1000 | kibibyte (KiB) | 1024 |
megabyte (MB) | 1 000 000 | mebibyte (MiB) | 1 048 576 |
gigabyte (GB) | 1 000 000 000 | gibibyte (GiB) | 1 073 741 824 |
The problem is that barely anyone uses the standard binary prefixes. During the “kilobyte era”, because 1000 and 1024 aren’t much different (2.4%) the difference was mostly ignored. But as file and hard disk drive (HDD) sizes have increased the difference between them has become more noticeable.
HDD manufacturers have stuck with SI (10x) sizes whilst operating systems calculate sizes in binary, but incorrectly use SI prefixes. A 256 gigabyte hard drive (i.e. one containing 256 billion bytes) will be reported by an operating system as being only 238 GB in size, a 6.9% difference. As HDDs becomes ever larger the problem will get worse: at the terabyte level the difference is 9.1% and at the petabyte level it is 11.2%.
Persuading operating systems to alter the way they report file sizes, thereby confusing users in the process, is unlikely to be a successful approach. A far better approach would be to persuade HDD manufacturers to change their marketing so that users purchasing a HDD receive the size they are expecting.*
* Though obviously, as a physicist, it causes me great mental anguish to abuse SI units in this fashion!