Add new comment


It might be expected, prima facie, that roughly the same number of surnames in a sample would begin with each letter of the Roman alphabet and that the proportions of surnames categorised by their initial letters would be approximately uniform and equal to 1/26.
However, for many kinds of alphabetic data, the distribution of initials is skewed. A mathematical relationship (known as Benford's law for numeric data) seems to hold when adapted to model alphabetic data.
See regarding numeric data.
Using logs with base 27, the expected proportion (P) of surnames beginning with any letter is P = log[(n+1)/n], where 0 < n < 27 is the alphabetic rank of the letter and the cumulative function of P = log[(n+1)/n] is Sum(P) = log(n+1).
This model indicates a probability that 33% of a sample of surnames will begin with either A or B and that 67% of the surnames in that sample can be expected to begin with one of the eight letters from A to H.
A generalised version of this law would not work for truly random sets of data. It would work best for data that are neither completely random nor overly constrained, but rather lie somewhere in between. These data could be wide ranging and would typically result from several processes with many influences.
Michael Mernagh, Cork, Ireland. February 17, 2011.

Filtered HTML

  • Web page addresses and email addresses turn into links automatically.
  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Want facts and want them fast? Our Maths in a minute series explores key mathematical concepts in just a few words.

  • What do chocolate and mayonnaise have in common? It's maths! Find out how in this podcast featuring engineer Valerie Pinfield.

  • Is it possible to write unique music with the limited quantity of notes and chords available? We ask musician Oli Freke!

  • How can maths help to understand the Southern Ocean, a vital component of the Earth's climate system?

  • Was the mathematical modelling projecting the course of the pandemic too pessimistic, or were the projections justified? Matt Keeling tells our colleagues from SBIDER about the COVID models that fed into public policy.

  • PhD student Daniel Kreuter tells us about his work on the BloodCounts! project, which uses maths to make optimal use of the billions of blood tests performed every year around the globe.