Think you need a lab to do population genetics research? Chances are, you already have everything you need for basic population structure analysis right at home. All you really need is a phonebook, a set of equation and either a pencil, paper and a calculator or, preferably, a computer with a good spreadsheet program. What follows here is a very short introduction to using surnames as genetic markers in population and kinship studies.
In our culture, along with other European derived societies, surnames are inherited from the father. As such, they behave as a system analogous to Y-chromosome inheritance, which has been shown to reflect actual genetic within and between populations. Surnames have been used for decades by biological anthropologists as measures of genetic relationships. Medical geneticists have also used surnames to study genetic linkages with various diseases, such as certain cancers.
There are two basic types of isonymy (literally "same name") studies; random and non-random. Non-random studies are generally geared toward analyses of inbreeding levels in a population, and are usually centered on levels of surname repetition among married couples. This was the technique employed by George Darwin (son of Charles Darwin) in his landmark study of inbreeding in first-cousin marriages in England.
Random isonymy, by contrast, involves calculating the kinship coefficient for a population based on the repetition of surnames throughout that population. Calculating the random isonymy of either a single population or between two populations is the first step in any kinship study. Unfortunately, there are many different methods of computing this figure, since many different researchers have worked out their own methods of calculating random isonymy. One common method for calculating isonymy within a single population is:
Iii = ∑ nik (nik – 1)
Ni ( Ni-1)
Iii equals random isonymy. nik equals the number of occurrences of a particular surname within a population (say, for example, Anderson). This is multiplied with the same number of the occurrences of Anderson minus 1. So let’s say that Anderson occurs 6 times in a population. You would multiply 6 * 5, getting 30. The next step is to repeat this for every surname in the population. Surnames that only occur once drop out of the equation (since 1 * 0 is always 0). Then take the results of all these calculations and add them up, which is indicated by the sigma (∑) symbol. Moving to the bottom of the equation, Ni equals the total number of surnames in the population. This is multiplied by the total number of surnames minus 1. Then divide the numerator by the denominator and you have the random isonymy for that population.
From the random isonymy score, you can calculate other metrics of kinship within a population. The simplest of these are Laskers coefficient of relationship (which corresponds to Wrights coefficient of relationship) which is obtained by dividing random isonymy by 2 (Lasker, 1977). Divide random isonymy by 4 and you have the coefficient of inbreeding. These are the very basic elements of surname genetics, which I hope to explore in greater depth in the course of this blog.
Darwin, G. 1875. Marriage between first cousins and their effects. The Journal of the Statistical Society of London, 38: 153-184.
Lasker, GW. 1977. A coefficient of relationship by isonymy: a method for estimating the genetic relationship between populations. Human Biology, 49: 489-493.