All About the Y Chromosome

Only males have Y-chromosome DNA, which is passed down virtually unchanged from father to son. This means that males with a common parental ancestor will have almost identical Y-DNA. In addition to their Y-DNA, men generally inherit their surnames from their fathers. This makes Y-DNA an excellent tool for surname research.

Searching by surname and/or by genetic marker values allows us to find cousins who may be connected across generations and around the world. Men who have similar genetic marker values and the same surname likely share a common ancestor. In general, the more marker values in common, the closer the relationship.

Women may also search for the surname of their father or brothers.

Content Table

Searching for Y-Chromosome DNA

Y-chromosome is passed from father to son, so it’s only found in males.

Y-DNA can be used as a genealogical tool.

Surnames are inherited from a person’s father and a small number of surnames are inherited from the mother.

A person’s Y-DNA is passed down virtually unchanged from father to son, so people with a common paternal ancestor will have almost identical genetic markers on their Y-chromosome.

In general, the more marker values in common, the closer the relationship. Two men who share all 25 marker values have a very recent common ancestor.

Place/Time Analysis

The Y-chromosome is a chromosome found in almost all human males. In general, it contains the genetic information that determines the development of male characteristics. This includes gender determination, and other aspects of the body differentiating from female to male, such as sexual differentiation, facial hair, voice change and adult height. Thus, a human male has one pair of sex chromosomes in each cell, the Y chromosome and a pair of X chromosomes.

Females have two pairs of sex chromosomes: for example, an XX female would have the same traits as any other female. Males however, end up with just one X chromosome (XY) which results in them becoming male due to how this chromosome works.

A male’s chromosome pair is referred to as 46,XY and the female’s is 46,XX . In a normal human male, one of the X chromosomes in each cell is randomly deactivated and all his descendants will inherit this X-chromosome silence. This process happens early in development.

Females retain both active X chromosomes. Therefore, the genes present on a human female’s single X chromosome are expressed in her. The Y chromosome carries male-specific genes of which there is only one copy per cell. This means a male cannot compensate for detrimental mutations on his single X chromosome.

Most of the Y chromosome does not recombine with the X chromosome during meiosis. This is due to the fact that the majority of the Y chromosome does not include any gene, including those needed for its recombination, unlike the X chromosome that has many essential genes located on it.

The pseudoautosomal regions , or parts of X and Y chromosomes that can cross over to each other during meiosis, are very short. This means that most of the Y-chromosome is passed on as a haploid , i.e., not recombined, and thus is not shared with other males.

Lineage Map

The Y chromosome spans more than 58 million base pairs (a structure of nucleic acids that forms a sequence of DNA or RNA) and represents approximately 1% of the total DNA in men’s body. Scientists have been able to map more than 90% of the Y chromosome with functional genes that offer important information into studying male-specific traits, including gender determination .

The autosomes (non-gender chromosomes) are labeled according to their positions on certain human chromosomal maps. The Y chromosome can be found at position 1 (1p36.3, base pairs from the p -terminus).

Currently, scientists have identified three regions of interest on the human Y chromosome that they believe contain many genes that code for numerous functions essential to male-specific traits: AZFa region, AZFb region and AZFc region. The AZFa region contains several thousand genes while the AZFb and AZFc regions only have a few dozen.

AZFa is a highly variable region of Y chromosome, approximately 15-30Mb in size and includes a gene called DDX3Y which is believed to be involved in male sex determination. AZFb has been found to be the largest region on the Y chromosomes, but also the most highly degraded. Scientists believe that this region contains hundreds of genes that code for essential functions such as production of sperm and testosterone. The most common genetic disorder associated with these regions is spermatogenic failure due to deletions in AZFb region.

AZFc, also known as the ‘most variable region of Y chromosome’, contains over 300 genes that are important for sperm production. AZFc is composed of clusters of testes-specific repetitive elements (Y-repetitive element – YRep), which are scattered over the entire region. These sequences are more abundant on the long arm of the chromosome. The Y-repetitive element regions make up approximately 12% of the AZFc region, and each cluster is about 60 kb long.

Paternal Surnames

The term ‘paternal surname’ is used here to refer specifically to the last name of a child’s father, e.g., if the father’s last name is Smith, then that child will have Smith as his or her paternal surname. The term ‘surname’, however, has several meanings in genealogy.

SMGF addresses the most common sense of the term—as the last name of a person, regardless of whether it is from the mother’s family or from the father’s family. For example, in some cultures the children may have a different surname than their father when they are from an earlier marriage of their mother. In this case, no reference to ‘paternal surname’ is made.

In genealogy, it is often of interest to know from which family a person’s last name came from. It might be the case that one would like to know which countries use a certain surname or whether a certain first name is used more in some families than others.

In the past people did not have only one name; they had a first name, which might have been a nickname to distinguish one person from another with the same first name, and a last name. They would often call themselves after their father’s family or use their father’s occupation as a surname, e.g., if their father was called John Smith, then they may call themselves Jack Smith.

In some cultures this was not allowed and the children might change their name, e.g., if a father named John Smith had a son called Tom, then that son may become Thomas Johnson by calling himself after his mother’s family or by using his mother’s occupation as a surname (e.g., he could be called Tom Carpenter).

Y-Chromozone Demographics

For many men, finding information on their paternal side is relatively easy because surnames often follow patrilineal lines. Of course, there are exceptions to that rule and others who may not know who their fathers were including children born out of wedlock among other things. It’s also possible for a man to learn his father was not the man who raised him.

But chances are that you can find out if your grandfather or great-grandfather was born into a patrilineal lineage is pretty good just by looking through family records and information passed down orally. While it is possible for females to discover their paternal line, it’s much more difficult than if they are trying to research their mother’s side. For one, women do not have Y-chromosomes encoded in their DNA.

Another reason why it is harder to find information on one’s paternal line is the fact that surnames are not always handed down. This happens for many different reasons including cases where a man marries into his wife’s family and changes his names, or if the surname was originally based on geographic location rather than from a male ancestor.

In other cases, surnames are given to children based on personal or physical attributes such as eye color, hair color and so on. This can also be the case if a family traditionally has a certain first name they pass down to each generation.

Y-Chromosome Statistics

The following chart shows the number of haplotypes versus the number of markers. Note that more than 80% of all haplotypes have at least 34 markers. We are working to extend all haplotypes to 34 markers or more.

Haplotype Statistics	Number
Total haplotypes with more than 7 markers:	23,403
Total haplotypes with more than 34 markers:	16,032
Total marker values (haplotypes × markers):	754,880
Total allele values (haplotypes × markers × alleles):	857,894

Genealogy Statistics	Number
Total unique paternal ancestors:	115,131
Total paternal ancestors:	124,320
Total unique ancestors:	1,887,626
Total ancestors:	5,985,659

Paternal Generation Interval

This chart shows the average interval in years between father and son pairs. The table shows three methods of determining the generation interval. The TMRCA page uses the median value for converting the number of generations to years.

Generation Statistics	Years
Mean generation interval:	32.6
Median generation interval:	31
Modal generation interval:	26
Number of father/son pairs:	92,309

Y-Chromosome Marker Details

The presence or absence of certain genetic markers on the Y chromosome can be informative for genealogical purposes. For example, if two men have different haplotypes for a marker known to mutate quite rapidly, they are unlikely to share a common ancestor within the last few hundred years.

Conversely, if two men have identical Y-chromosome haplotypes for a number of markers, it suggests that they share a recent common ancestor.

Marker Standards

The following Y-chromosome markers are generally considered most useful for genealogical purposes. However, commercial testing companies follow different standards when determining marker values.

DYS # 393 390 19 391 385a 385b 426 388 439 389-1 392 389-2 Alleles 13 24 14 11 12 12 12 13 13 29 17 9 10 11 14 15

DYS # 458 459a 459b 455 454 447 437 448 449 464 a b c d Alleles 7 6 16 8 5 5 6 5 7 9 6 3 3 6

This haplogroup typically has its origin in Eastern Europe or the Mediterranean, although it’s also found in North Africa and the Middle East. About 25% of Ashkenazi Jews belong to this haplogroup.

DYS # 577 576 570 CDY a b c d e f g Alleles 12 23 14 11 17 19 15 16 16 16 36 22 10 10 15 9 8 11 24 21 15 10 12 12 15

DYS # 390 391 385a 385b 426 388 439 389-1 392

This haplogroup has its origin in central Asia, and is found throughout Europe, the Middle East, India and Pakistan. It was formerly known as “R1”.

DYS # 458 459a 459b 455 454 447 437 448 449

This haplogroup has its origin in central Asia, and is found throughout Europe, the Middle East, India and Pakistan. It was formerly known as R2.

DYS # 577 576 570 CDY a b c d e f g

This haplogroup has its origins in Italy or western Asia, and is found throughout Europe, particularly in the north. It was formerly known as “I1b”.

DYS # 389-2 392 441 437 448 444 446 449 455 GATA H4 YCA II a b c d e f g Alleles 11 10 13 14 16 10 11 21 23 15 9 9 11 11 25 16 19 29 13 15 16 17

This haplogroup appears to have arisen in Central Asia about 16 thousand years ago. It spread into Europe perhaps 8-10,000 years ago and has since spread all over the world. It is found at its highest frequency among people of northern European extraction, particularly those of Celtic background, but is also common among people of Jewish and African ancestry.

DYS # YCA II a b c d e f g 391 392 439 390 19 458 459 435 Alleles 18 19 20 23 16 26 27 29 15 9 10

This haplogroup appears to have arisen about 41,000 years ago in Europe, where it is found at its highest frequency. It is also common in the Middle East, having spread there about 19-23 thousand years ago, and has since spread all over the world. It is found at its highest frequency among people of central Asian extraction, particularly Finns and Ashkenazi Jews.

DYS # YGATAA 20 391 392 439 390 19 391 385a 385b 426 388 439 389-1 392 Alleles 18 9 15 10 14 16 26 30 13 11 12

This haplogroup has its origin in the Middle East, and is found throughout Europe, North Africa and central Asia. It was formerly known as “J2”.

DYS # YGATAH 14 DYS # 439 392 Alleles 12 10

This haplogroup has its origin in the Middle East, and is found throughout Europe. It was formerly known as “J1”.

DYS # 393 390 19 391 385a 385b 426 388 439 389-1 392 389-2 YCA II a b c d e f g Alleles 13 24 14 11 12 12 12 13 13 29 17 9 10 11 14 15

This haplogroup appears to have arisen in Europe about 16,000 years ago. It spread all over the world and is very common among Europeans, east Indians and central Asians. However, it is also common among other ethnic groups, apparently due to intermarriage. It appears to have several subgroups with different histories, some of which are found almost exclusively in Europe and others which are found across a wide geographic range.

This haplogroup is most closely associated with people of Celtic background, but it is also common among western Asians (Turks, Kurds, etc) and north Indians. About 10% of Ashkenazi Jews belong to this haplogroup.

This haplogroup is most closely associated with people of Germanic background, however, it is also common among Englishmen, Scandinavians and Russians. About 30-35% of Ashkenazi Jews belong to this haplogroup.

DYS # YCA II a b c d e f g 439 389-1 392 458 459a 459b 455 Alleles 13 23 15 10 14 18 20 27 13 11 12

This haplogroup has its origin in Asia, and is found throughout the world, particularly among Asians and Native Americans. It was formerly known as “C”.

DYS # YCA II a b c d e f g 456 607 576 570 CDY Alleles 13 22 14 10 11 14 12 12 13 13 29 21 9 9 11 11 25 15 19 30 15 16 17 18

This haplogroup appears to have arisen in Asia over 50,000 years ago, perhaps among the original group of modern humans who emerged from Africa. This haplogroup is now found throughout the world, but its frequency drops drastically beyond Central Asia. It represents about 20% of all American Indians and 10% of Europeans.

DYS # YGATAA 27 DYS # 458 459a 459b 455 454 447 Alleles 17 17 18 13 12 22 15

This haplogroup has its origin in Asia and has spread to Europe, where it represents about 1.5% of the population. It is also common among Native Americans (about 2-6%), but rare outside those two groups.

DYS # YGATAH 11 DYS # 454 447 Alleles 15 13

This haplogroup has its origin in Asia and is common among Asians, particularly Southeast Asians (about 7-10%). It is also common among Polynesians (about 10-20%), but rare outside those two groups.

DYS # YGATAH 26 DYS # 447 Alleles 19

This haplogroup has its origin in Asia and is found mostly among Asians, with a minority representation among Native Americans. It represents about 1-3% of the population.

DYS # YGATAH 12 DYS # 458 459a 459b 455 Alleles 18 17 17

This haplogroup has its origin in Asia and is found among Asians, especially Southeast Asians. It represents about 10% of Native Americans, but only 2% of Europeans. This group may have been more common in North America before the immigration by Europeans which began in the 17th century.

DYS # YGATAH 23 DYS # 459 Alleles 15

This haplogroup has its origin in Asia and is found among Asians. It represents about 2% of Native Americans, but only 0.5-1% of Europeans.

DYS # YGATAH 10 DYS # 493 531 578 570 CDY Alleles 12 25 15 11 13 14 12 12 13 14 29 17 9 10 11 11 25 15 19 30 15 16 17 18

This haplogroup is found almost exclusively in Europe, where it represents about 40% of the population. It is also found among Native Americans (about 5-10%) and among Asians. This group was very common during the last ice age.

DYS # YGATAH 9 DYS # 590 537 Alleles 26 11 17 11 13 11 12 10 14 13 31 19 8 10 11 11 24 15 20 29 15 16 16 18

This haplogroup is found almost exclusively in Europe, where it represents about 95% of the population. It is also found among Asians (about 5%) and Native Americans (about 2-10%).

DYS # YGATAH 8 DYS # 635 Alleles 11

This haplogroup has its origin in Europe. It represents about 3% of the population.

DYS # YGATAH 14 DYS # 438 Alleles 11

This haplogroup has its origin in Europe and is found mostly among Europeans (about 85%). It also occurs at a very low frequency among other groups such as Native Americans, Asians and Africans.

Duplicated Markers

DYS385, DYS459, and YCAII are three markers where genetic material is typically duplicated at two locations.

For duplicated markers, two values can be determined for each marker, but not the locations at which they occur. Since the order of the locations is unknown, SMGF will locate all of the values that match your input values, regardless of the order.

For Example, all the following result in a partial match.

Input Values	Database Values	Result
15 15	15 17	ab
15 16	15 17	ab
14 15	15 17	ba

If there is a mismatch at one or more of the values (i.e. ab, ba, aa), then the number of matches shown on the results screen decreases by one. For matching purposes, both together count as one marker mismatch (36/37 for an otherwise exact match).

The TMRCA (Time to Most Recent Common Ancestor) calculations are also based on a single match or mismatch for each duplicated marker.

Y-Chromosome Marker DYS389

DYS389 is a special marker that has two different values at the same location.

The two values are designated “DYS389I” and “DYS389B”. Typical values would be “DYS389I=13” and “DYS389B=16”.

However, DYS389 is different from other markers in that the sum of the values for DYS389I and DYS389B is also designated “DYS389II”.

For Example, if DYS389I=13 and DYS389B=16, then DYS389II=29.

SMGF maintains that using DYS389I and DYS389B will give more accurate matches and TMRCA calculations. Consequently, if you choose SMGF (or Genographic), you may enter values for these markers directly.

If you choose any other lab standard, you may enter the values for DYS389I and DYS389II. However, since DYS389II is actually dependent on DYS389I, please enter the value for DYS389I first, and then DYS389II.

Y-Chromosome Marker DYS464

DYS464 is a marker where genetic material is typically duplicated at four locations.

SMGF permits separate entry of the four allele values (“a”, “b”, “c”, and “d”). Matches are displayed in light blue while a mismatch is displayed in dark blue in the appropriate quadrant.

For DYS464, four values can be determined, but not the locations at which they occur. Since the order of the locations is unknown, SMGF will locate all of the values that match your input values, regardless of the order.

For Example, all the following result in a partial match.

Input Values	Database Values	Result
13 15 17 17	14 15 17 17	abcd
15 17 17 17	14 15 17 17	abdc
15 16 17 17	14 15 17 17	acdb

If there is a mismatch of one or more of the values (i.e. abcd, abdc, acdb), then the number of matches decreases by one. For matching purposes, all four together count as one marker mismatch (36/37 for an otherwise exact match).

Since the order of the DYS464 values is unknown, we do not include DYS464 in the TMRCA (Time to Most Recent Common Ancestor) calculation at this time.