News from the world of maths: The mystery of Zipf

Thursday, August 21, 2008

Zipf's law arose out of an analysis of language by linguist George Kingsley Zipf, who theorised that given a large body of language (that is, a long book — or every word uttered by Plus employees during the day), the frequency of each word is close to inversely proportional to its rank in the frequency table. We thought we would test this out on Plus. What does this imply about how we use language and how it evolved?

At 3:41 PM, Blogger Cassandra said...

Is it really a mystery? I have at least one idea off the top of my head.

Since one of the ways you can construct power law distributed networks (competitive scale networks) is through growth/decay rules (e.g. the next added link will have the highest probability of connecting to the node with the higher degree or existing connections) and thinking a little about how language evolves by adopting and abandoning words, it seems likely that words frequency could follow a power law because they are added to and removed from over time with a similar set of rules (at some level).

The only question is what exactly do such network nodes and their degrees map to?

Nodes seems map to words or perhaps the idea represented by the word or word-sound or word-ideas. If the nodes map to ideas then there is also a link to memes and various mind-external scale-free structures.

Nodal degree seems to related to usage of the word - either simply the frequency of usage or something deeper that results in that frequency.



