Mitochondrial DNA

SMGF has built the world’s foremost collection of mitochondrial DNA data and corresponding genealogies. Currently, it contains data from more than 75,000 people throughout the world.

Both men and women have mtDNA, which they inherit from their mothers, who in turn, inherit their mtDNA from their mothers. This makes mtDNA an excellent tool for maternal line research.

Searching by mitochondrial DNA allows us to find cousins who may be connected across generations and around the world.

Other individuals with exact matches may share a common ancestor, though it can be difficult to determine how far in the past that common ancestor may have lived without the genealogical information found in pedigree charts.

Searching for Mitochondrial DNA

Mitochondrial DNA testing results help you find the haplogroup that your direct maternal ancestors belonged to many generations ago. For example, for a family from Southern Germany who don’t know who their earliest known female ancestor was, this would help determine if she would have belonged to one of the mitochondrial haplogroups that are common in Southern Germany.

Based on where the family is from, this test will help determine if the earliest female ancestor would have belonged to one these haplogroups.

A mitochondrial haplogroup is a large group of people who have inherited the same genetic mutation on their mitochondrial DNA from a common female ancestor. The point where all members of a haplogroup descend from one woman is called the root, and this common ancestor is called Mitochondrial Eve . It’s important to note that mtDNA does not change, it is passed from mother to daughter and thus can be used to trace direct maternal lines. It also can be used for approximate dating and geographical location of Mitochondrial Eve’s existence.

Lineage Map

Because mitochondria are inherited maternally, this test only traces your direct maternal line. This also means that you will only be able to determine your haplogroup and not the haplogroups of any other lines in your genealogy. And this test will not help you determine where your family is from, it only helps you find out (roughly) when and where Mitochondrial Eve existed. 

Lineages of mtDNA are important for genealogical applications, but they are difficult to identify because the hypervariable D-loop region contains only ~1% of the mitochondrial genome. Fortunately, there are sequence features in the coding regions that can be used to assign lineages. The presence or absence of 939R and 10398H become fixed in descendants, allowing these mutations to be used as lineage markers. When combined with coding region SNPs, they can also give clues about the ethnic origin of an individual.

Clinical observations are consistent with the existence of both subpopulations of mtDNA molecules carrying 939R or 10398H, and it is likely that there were at least two independent mutational events. The first 939R mutation must have occurred before the division of A and B subpopulations, which is estimated to be ~200 000 years ago (Krings et al., 1999). This implies that it predates the divergence of Homo and Pan and is given a firm lower bound of 120 000 years by the observation of 939R-negative humans. The second mutation at 10398H is harder to date, but we can be certain that it postdates the separation of European and Near Eastern lineages because this mutation occurs in all European mtDNAs.

To obtain a better time estimate for the age of 10398H, we developed a Bayesian-based phylogeographic analysis that allows for the incorporation of information derived from coding region SNPs. The method uses a coalescent model of sequence evolution to estimate both population parameters, such as effective population size and divergence times, and geographic parameters, such as ancestral location. We used sequences obtained in this study together with HVS1 sequences from the literature to infer parameters for a Bayesian coalescent model. The divergence of the Near Eastern and European lineages around 45 000 years ago gave a time estimate of ~10 400 years for the age of 10398H, in good agreement with that obtained in an independent study that used HVS1 diversity to conclude that this mutation occurred ~10 000 years ago (Torroni et al., 1998).

Maternal Surnames

Written records of maternal surnames started in the 1600s and most often found in church records. These documents played a crucial role in forming this type of genealogy.

Children were often given their father’s surname. Sometimes this was not possible, so they used the same last name as their mother.

The use of maternal surnames is extremely common in Portuguese genealogy because the patronymic naming system has added another layer to it. People are identified by three parts: Last Name, First Name, and Middle Name or Patronymic. For example, João Carlos Pereira is a man whose last name is Pereira and first name is João and patronymic is Carlos.

It became common practice to use the mother’s maiden name as a middle name within an individual’s official documents during the Early Modern period in Portugal. This is a unique characteristic, as one’s mother’s name is used as a middle name.

The use of a maternal surname can be a huge aid when attempting to link families together at this time in history because these surnames continue from generation to generation. There are many examples of individuals with the same last name and different middle names that come from two separate mothers with different surnames.

In addition, the maternal surname is commonly used as a way to describe the relationship of one individual to another in official documents. As an example, if there is a man named José Maria and he has a son named João Maria, José Maria’s mother may be Maria Pereira and his son’s mother may be Maria Gonçalves. It is very common to see documents where people are referred to as “filho de/son of” their mother’s mother or father. This creates a step-by-step guide for genealogists in finding the family members they are looking for.

Mitochondrial DNA Demographics

We studied the phylogeography of 636 complete mitochondrial genomes from Africa, Asia, Europe and Oceania. We find haplogroup M to be among the oldest in Eurasia, while its heaviest frequency is observed in South Asian populations that appear to have expanded out of an area roughly corresponding to Iran and the Himalayas.

A phylogeographically well-supported route of migration, running from India through Sri Lanka and further into Southeast Asia, is proposed. Our results also show a significant genetic connection between populations along the fringes of the Indian Ocean with a trend towards decreasing genetic distance from East Africa to South Asia.

In this study, we investigate the population genetic structure of mtDNA haplogroup M, which constitutes nearly 30% of maternal lineages outside Africa. It is likely that M first appeared in East or Southeast Asia because all other non-African haplogroups descend from L3, whose geographic distribution does not predominate in South Asia. L3’s southern reach includes the Arabian Peninsula, thus M’s presence there cannot be attributed to back-migration from Africa. Nor can it be explained by a northern route out of Africa, which would have brought L3 through the Near East and into Central Asia. Thus, M likely arose in South or Southeast Asia based on its present-day geographic distribution.

Haplogroup M was subdivided into six major clades (M1–M6) by Bayesian analysis of phylogeographic data for this haplogroup following the recent classification scheme, which proposes that M1 and M3 originated in Eastern Asia, whereas M2, M4 and M6 evolved from a common ancestor that likely arose somewhere within Western or Central Asia.

Mitochondrial Statistics

The following chart shows the number of sequences versus the percentage of completion for each sequence.

number of sequences versus the percentage of completion for each sequence
Sequence StatisticsNumber
Total sequences:75,406
Total unique mutations:2,127
Average mutations per sequence:10.54
Total insertions:144,982
Total substitutions:603,075
Total deletions:42,196
Genealogy StatisticsNumber
Total unique maternal ancestors:303,171
Total maternal ancestors:322,465
Total unique ancestors:2,694,224
Total ancestors:8,923,025

Maternal Generation Interval

Generation StatisticsYears
Mean generation interval:27.9
Median generation interval:27
Modal generation interval:22
Number of mother/child pairs:192,394

Top 50 Mutations

These are the 50 mutations that are found most frequently.

LocationMutationTypeCount
00315.1CInsertion61,482
00263A to GSubstitution61,380
00073A to GSubstitution40,800
16519T to CSubstitution39,815
00309.1CInsertion31,394
16223C to TSubstitution16,225
00152T to CSubstitution15,622
00195T to CSubstitution12,871
00523ADeletion12,074
00524CDeletion12,074
16189T to CSubstitution11,259
16311T to CSubstitution11,127
16126T to CSubstitution9,927
00146T to CSubstitution9,806
16362T to CSubstitution8,619
00489T to CSubstitution8,474
00309.2CInsertion7,607
16294C to TSubstitution7,538
00150C to TSubstitution7,387
16278C to TSubstitution6,291
16129G to ASubstitution5,265
16183A to CSubstitution5,187
16270C to TSubstitution4,932
16304T to CSubstitution4,762
00295C to TSubstitution4,416
16069C to TSubstitution4,372
16193.1CInsertion4,329
16224T to CSubstitution3,940
16298T to CSubstitution3,852
00524.2CInsertion3,755
00524.1AInsertion3,750
16093T to CSubstitution3,539
00204T to CSubstitution3,500
16172T to CSubstitution3,432
00462C to TSubstitution3,406
00185G to ASubstitution3,346
16256C to TSubstitution3,304
16192C to TSubstitution3,303
16390G to ASubstitution3,257
00499G to ASubstitution3,219
16319G to ASubstitution3,113
00189A to GSubstitution3,099
16261C to TSubstitution2,893
16217T to CSubstitution2,834
00153A to GSubstitution2,798
00228G to ASubstitution2,781
16325T to CSubstitution2,537
16296C to TSubstitution2,489
16290C to TSubstitution2,374
00207G to ASubstitution2,355