Molecular Genealogy (also called “Genetic Genealogy”) is the application of DNA to traditional genealogical research.
By simple DNA testing methods and the comparison of results, we can determine if individuals may be genetic cousins and share a common ancestor.
Thousands of people have been able to extend their family trees across generations, continents and cultures through the power of Molecular Genealogy and through tools developed by SMGF.
DNA encodes the complete genetic blueprint of human beings. It’s what makes the billions of people on the planet unique, yet at the same time, genetically similar to their parents and ancestors. DNA is found in most cells in the human body, and can be classified into three types that are useful in genealogy:
- Y-Chromosome DNA (Y-DNA)
- Mitochondrial DNA (mtDNA)
- Autosomal DNA
Y-DNA is a type of DNA that is only carried by men, who inherit it from their fathers. This means that males with a common paternal ancestor have similar Y-DNA.
Y-DNA is particularly useful for tracing one’s direct paternal line (father, paternal grandfather, etc.) because it changes slowly from generation to generation, and in most societies, the surname of the father is also inherited by his sons.
SMGF is actively engaged in Y-DNA research and Y-DNA results and genealogies. Go to Y-Chromosome DNA to learn more about Y-DNA.
mtDNA is a type of DNA carried by both males and females, but is only inherited from the mother. This makes mtDNA useful for tracing one’s direct maternal line (mother, maternal grandmother, maternal great-grandmother, etc).
SMGF is actively engaged in mtDNA research. Go to Mitochondrial DNA to learn more about mtDNA.
Autosomal DNA is the type of DNA responsible for most physical characteristics, such as height, eye color, etc. Autosomal DNA is inherited by sons and daughters from both parents (and from all four grandparents, etc).
SMGF can currently link autosomal DNA to genealogical information. It will give the world a new perspective on genetic genealogy.
Y-chromosome DNA (Y-DNA) is a type of DNA that is only carried by men and is only inherited from their fathers.
Men who share a common paternal ancestor will have virtually the same Y-DNA, even if that male ancestor lived many generations ago.
Women, on the other hand, do not have Y-DNA. They neither inherit it from their fathers nor pass it down to their sons. In other words, a grandson does not inherit Y-DNA from his mother’s father.
Women may still benefit from Y-DNA in their paternal line research when the DNA results of their father or brothers are compared with those of other males.
Y-Chromosome DNA Markers
The Y-chromosome contains 59 million bits of information, each of which is encoded by a “base pair.” Looking at all of these base pairs is impractical, so geneticists have identified a number of specific chromosome locations that can be used for analysis and comparison.
These unique locations are generally called “markers” and when they occur on the Y-chromosome, they are typically given names starting with “DYS”.
At some Y-chromosome locations, there are small segments of base pairs that are repeated in the DNA. Markers with these types of repetitions are called “STR markers,” where STR means “Short Tandem Repeat.”
For instance, a particular genetic sequence at marker location DYS391 might be:
Note there are ten repeats of the segment TCTA. The number of repeats is the “value” that is shown on a Y-DNA test report for the marker.
In this example, a lab report would show DYS391=10 for this marker.
Markers DYS385, DYS459, and YCAII are often called “duplicate markers” since they have two values for each marker location. SMGF considers these locations as a single marker with two values, rather than two separate markers (e.g. DYS385a and DYS385b).
Y-Chromosome DNA Haplotypes
Y-DNA test reports typically show a series of markers and their corresponding values. These results are referred to as a “haplotype.” For instance, the sequence:
12 15 13 12 29 22 10 11 12 16 11 15 would be called a 12-marker haplotype.
In some ways, DNA marker values are like telephone numbers.
Consider that the same seven-digit telephone number, 428-1040, might appear in both Boston and Miami. However, adding more numbers (“area codes”) allows us to distinguish between regions. The same thing is true about DNA results.
If we compare a limited number of DNA markers (for example, 12), then it’s possible for two individuals to have the same marker values, yet not be closely related.
Testing for more markers helps avoid this possible ambiguity. In general, the more markers tested, the easier it is to distinguish individuals and family tree branches.
SMGF considers 36 markers to be a sufficient number of Y-chromosome markers for most genealogical research.
The order of Y-DNA markers may vary from one testing company to another, and slight differences in standards can appear confusing.
Males who share a common paternal ancestor will have virtually the same Y-chromosome DNA. We use the word “virtually” since occasionally there are small changes or “copy errors” that might occur with each descendant.
Those copy errors are called “mutations” and are generally harmless, but are useful for tracing one’s direct paternal line.
For example, let’s look at three first cousins who have the following haplotypes (where we show 22 out of 36 marker values),
11 14 12 13 29 24 10 13 13 14 12 15 12 12 13 12 12 12 14 25 19 30 …
11 14 12 13 29 24 10 13 13 14 12 15 12 12 13 12 12 12 14 25 19 30 …
11 14 12 13 29 24 10 13 13 14 12 15 12 12 13 12 12 12 14 25 19 31…
Note that the first two cousins have the same haplotype, but the third cousin has a difference of one marker value (31 instead of 30).
That difference would have been due to a mutation that occurred in his Y-DNA (or his father’s), but not in the other cousins.
In general, the greater the number of mutations we find between two males, the further in the past their common paternal ancestor lived.
Matching Y-Chromosome DNA Results
Comparing Y-DNA results is similar to comparing telephone numbers.
For instance, if we look at the following two North American business telephone numbers:
… we immediately notice that 12 out of 13 numbers are identical, and would conclude that the two numbers are likely from the same business.
Likewise, if we look at two Y-DNA haplotypes (two sets of marker results), such as:
11 14 12 13 29 24 10 13 13 14 12 15 12 12 13 12 12 12 14 25 19 30
11 14 12 13 29 24 10 13 13 14 12 15 12 12 13 12 12 12 14 25 19 31
… we also notice that all the values are identical except for one, and would conclude the two participants might be related.
In general, the higher the percentage of matching markers, the closer two participants are likely to be related.
Matches of 34/36, 35/36, and 36/36 — and a common surname — generally indicate a common ancestor in the time that public records have been in existence.
Family trees often date back this far, and individuals with matches of 34/36 and higher may have intersecting ancestries.
Matches of 34/36 and higher — but a different surname — may indicate a surname change in one line or may be coincidence.
Matches of 32/36 or less generally indicate a connection before the widespread use of public records or even the advent of hereditary surnames in most countries.
Matches of 33/36 can be ambiguous and should generally be interpreted in the context of individual family history.
Molecular genealogy is based on probabilities, and like forecasting the weather, is not an exact science. It can provide important clues for family history research but traditional genealogy methods continue to be an important part of molecular genealogy.
Time to Most Recent Common Ancestor (TMRCA)
For Y-DNA, a Most Recent Common Ancestor is defined as the closest direct paternal ancestor that two males have in common (such as a grandfather or g-g-g-grandfather).
In general, the closer the match in haplotypes between two individuals, the shorter the time back to a most recent common ancestor.
For instance, if two individuals share 35 out of 36 markers, they almost certainly share a more recent common ancestor than two individuals who share 32 out of 36 markers.
Calculating the Time to Most Recent Common Ancestor is based on probability and is not an exact science. We can identify the most likely time that a common ancestor might have lived, but there will always be a degree of uncertainly. For this reason, it is better to think of TMRCA as a range of time rather than a point in time.
We have learned that the term “haplotype” refers to a set of Y-chromosome STR marker values. For instance, 11 14 12 13 29 24 10 13 13 14 12 15 12 12 13 12 12 12 14 25 19 30 is a haplotype.
We have also learned that a mutation can occur at any STR marker location. For instance, a change from 30 to 31 in the above haplotype would be a mutation.
There is another type of DNA mutation which is called a Single Nucleotide Polymorphism (SNP). A SNP is a change in a single base pair on the DNA molecule.
SNPs on the Y-chromosome are very rare, but when they occur, they are passed down unchanged from father to son for literally hundreds of generations. SNPs can, in fact, be used to define entire populations of men.
Populations that have the same Y-SNPs are said to belong to the same haplogroup. Y-chromosome haplogroups are listed by letter – Haplogroup J or Haplogroup Q, for instance.
Most haplogroups can be subdivided into smaller groups. For example, Haplogroup R can be further divided into “R1a” and “R1b”.
Haplogroups are useful for population studies, but have somewhat limited application in genealogy. Since SMGF is focused primarily on the application of DNA testing to traditional genealogy, SMGF does not display haplogroups along with haplotypes.
Mitochondrial DNA (mtDNA) is a type of DNA that is carried by both men and women but is only inherited from their mother. Mothers, in turn, inherit their DNA from their mothers … and so on back in time along one’s maternal line.
Note above that there is no contribution from your paternal line, or from any other female ancestor other than ones in your direct maternal line (as shown below).
Locations and Regions
Mitochondrial DNA (mtDNA) is a continuous circle of 16569 genetic bases, each appearing at a distinct marker location. Locations are designated by number from 00001 to 16569.
Locations on the mtDNA circle from 16001 to 00579 are most useful for recent genealogy research. This range of locations is called the “D-loop” or the “control region”. The D-loop can be divided into three regions called HVR1 (16024 to 16365), HVR2 (00073 to 00340), and HVR3 (00438 to 00574).
The table below displays the range of locations reported by various companies for each HV region. Please note that these ranges don’t always correspond to the technical HV Region definitions.
Mutations and Results
Each mtDNA location consists of a single base value of: A, C, T, or G. For instance, location 16073 might have a base value of “C”.
Bases at different locations can randomly change from one value to another. For instance, a C might change to a T. This type of change is called a mutation.
mtDNA mutations are very rare. For this reason, we don’t often see mtDNA mutations between maternal-line cousins in genealogical time (the last 500 years).
Most mtDNA tests cover a minimum of 400 locations, and SMGF tests over a thousand locations. Displaying base values at this many locations is impractical, so SMGF and other companies report base values based on the “CRS”.
The CRS (Cambridge Reference Sequence) is a set of mtDNA locations and base values that is universally used as a reference. Test reports then show the differences with the CRS. Locations with no differences to the CRS are not listed.
For example, an mtDNA report might list your mutations as 16184T and 16399G. This means that your location 16184 had a different base value of “T” than the CRS, and your location 16399 had a different base value of “G” from the CRS.
Some mtDNA reports will abbreviate the locations. For instance, 16184T may simply be reported as 184T. This will be the case when the test company only reports results from region HVR1.
For companies that also test HVR2, the second set of results will always be in the 00000 series, but may also be abbreviated. For instance, 00073G may simply be reported as 073G.
SMGF tests mtDNA at over 1000 locations. The base value for each location is compared with the CRS (Cambridge Reference Sequence). Locations and differences in base values called “mutations” are stored in the mtDatabase.
The mtResults screen will show matches in the mtDNA to the mutations you entered on the search page. Results will also be filtered based on your input parameters.
Exact Matches: An exact match indicates another participant has the same mtDNA values that you entered. An exact match may mean that you share a common maternal ancestor in genealogical time (the last 500 years).
You should always view the pedigrees of exact matches to see if you share a common ancestor in the maternal line.
However, an exact match may also mean you share a common ancestor before genealogical time. These distant matches may be of “deep ancestry” interest, but may not help extend your genealogy in more recent time.
One Difference: These are matches with another participant who has a difference of one mutation from your entry on the search page. Matches with one difference may on rare occasions mean that you share a common maternal ancestor in genealogical time.
You may wish to look at matches with one difference, especially if your pedigree extends over many generations. You may on rare occasions find a common ancestor in the maternal line.
Two Differences: These are matches with another participant who has a difference of two mutations from your entry. Matches with two differences almost never indicate a common maternal ancestor within genealogical time, but may be of “deep ancestry” interest.
There are other mutations that occur in mitochondrial DNA that are outside the regions of use for genealogy studies. Some of those mutations occur very slowly and can be grouped to define “haplogroups”.
People who share the same haplogroup also share a common maternal ancestor who lived many thousands of years ago. Haplogroups are given designators, such as A, B, H, J, V, or X.
The letters are often referred to by name. For instance, haplogroup “H” is called “Helena” and haplogroup “X” is called “Xenia”.
Haplogroups are useful for population studies, but have somewhat limited application in genealogy. Since SMGF is focused primarily on the application of DNA testing to traditional genealogy, SMGF does not currently display haplogroups.
What is Genealogy?
Genealogy is the study of your unique family history. It is a personal record of your ancestors — when they were born and where they lived, who their children were and who they married, and where you belong in your extended family tree.
Learning about your family history usually starts at home by talking with relatives and friends, and recording information about your ancestors.
You may have useful sources at home such as birth certificates, obituaries, wedding announcements, a family Bible, etc.
You may find that others in your family have already done genealogy work, so don’t hesitate to ask close or distant relatives if they have already started researching your family.
However, there generally comes a point in family history research when you have to turn to sources beyond your immediate circle.
Once you have exhausted all your genealogical resources at home, you may need to broaden your search with outside sources.
Local public libraries are an excellent place to start. Most will have books on genealogy, and some will have genealogy departments.
Most state and provincial capitals worldwide have archives or libraries of public records that may be invaluable in genealogical research.
County courthouses, city halls and other government centers also are good sources of primary records for genealogists.
Most major cities have genealogical or historical societies with archived resources for family researchers.
Many organizations offer courses on genealogy, and local societies are a great place to make new connections to help in your family history.
Genealogy on the Internet
Internet provides a wide assortment of genealogy tutorials, databases of records, collections of family trees, and general information.
Cyndi’s List and ThoughtCo provide comprehensive lists of free genealogy resources, and are great places to start for beginners and seasoned researchers alike.
You may also find that others have done much of your work for you and have contributed information to websites such as familysearch.org or the Rootsweb World Connect Project.
Message boards and family forums can be a great way to contact others who are researching the same family lines you are.
Genealogical research can help you extend your family tree. It’s also fun and challenging, and can be a great way to meet new friends who share your interests.
Putting it All Together
People across the world are bridging the genealogical gap through the application of molecular genealogy. For instance, many men with similar Y-DNA and the same surname have learned they share a common paternal ancestor in the recent past.
In general, the closer the match in Y-DNA marker values, the more recently your common ancestor may have lived. By comparing pedigrees and locations, individuals may be able to identify additional common ancestors and family relationships – including living relatives.
Even if a connection is not obvious, the ancestral data of matching individuals may provide important information about your own ancestry.