Tag Archives: linguistics

Soundex

Soundex is a system for indexing names by sound. It was designed so that homophones, words that sound the same but which are spelt differently, resolve to the same encoding. For example, the names Reid and Reed would both be encoded as R300, McDonald and Macdonald are both M235, etc.

To create a Soundex:

  1. The first letter of the Soundex is the first letter of the name.
  2. Then remove all vowels, and all occurrences of y, h and w.
  3. The remaining letters are encoded one-by-one according to their place of articulation, i.e. where in the mouth or throat the sound is formed.
    1. The labial consonants b, f, p and v, which are formed by the lips, are coded as a one.
    2. The guttural and sibilant consonants, c, g, j, k, q, s, x and z, which are formed at the back of the throat and with the tongue close to the roof of the mouth respectively, are coded as a two.
    3. The dental consonants, d and t, which are formed by the tongue against the teeth are coded as a three.
    4. The long liquid consonant l is encoded as a four.
    5. The nasal consonants, m and n, in which air escapes through the nose, are encoded as a five.
    6. The short liquid consonant r is encoded as a six.
  4. If two letters that are encoded as the same number are next to each other (e.g. the d and t in Schmidt) then the encoding is used only once.
    1. If two letters that are encoded as the same number are separated by a yh or w then the encoding is used only once.
    2. If two letters that are encoded as the same number are separated by a vowel then the encoding is used twice.
  5. The letters are encoded one-by-one until three numbers are produced. If the name is too short, the remainder of the Soundex is encoded using zeroes.

If we use the example of Macdonald from above:

  1. First letter is M.
  2. Removing the vowels leaves us with Mcdnld.
  3. c is encoded as two, giving us M2.
  4. d is encoded as three, giving us M23.
  5. n is encoded as five, giving us M235.

Adjective Order in English

Adjectives in English follow a certain order. This is why “That’s a beautiful white house” sounds correct, but “That’s a white beautiful house” does not.

The order of adjectives begins with opinions: “beautiful”, “nice”, “great”, etc.

It’s a great car.

After opinions comes size: “big”, “small”, “long”, etc.

It’s a great small car.

It’s a small great car.

After opinions and size comes age: “new”, “old”, “ancient”, etc.

It’s a great small old car.

It’s an old great small car.

(Apologies for how clunky the sentences get beyond here. In English you don’t normally describe objects with quite so many objectives!)

After opinions, size and age comes shape: “rectangular”, “circular”, “boxy”, etc.

It’s a great small old curvy car.

It’s a curvy great small old car.

After opinions, size, age and shape comes colour: “red”, “blue”, “green”, etc.

It’s a great small old curvy blue car.

It’s a blue great small old curvy car.

After opinions, size, age and shape come materials: “leather”, “brick”, “wood”, etc.

It’s a great small old curvy blue metal car.

It’s a metal great small old curvy blue car.

After opinions, size, age, shape and material comes (geographical) origin: “British”, “Spanish”, “Roman”, etc.

It’s a great small old curvy blue metal British car.

It’s a British great small old curvy blue metal car.

Finally, after opinions, size, age, shape, material and origin comes purpose:

It’s a great small old curvy blue metal British racing car.

It’s a racing great small old curvy blue metal British car.

Any combination that doesn’t have the adjectives in the correct order ends up looking weird.

It’s a fantastic big new red American house.

It’s a fantastic American big new red house.

It’s a big new American red fantastic house.

It’s a red fantastic American big new house.

It’s an American new red big fantastic house.

Not all languages use an order for adjectives. For example, in Polish it doesn’t matter what order the adjectives are in: “What a wonderful small blue bag!” and “What a blue small wonderful bag!” would sound just as “correct” as each other.