Skip to main content
Medicine LibreTexts

14.3: Linkage analysis and genome-wide association studies (GWAS)

  • Page ID
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    There is tremendous interest in finding specific genes that predispose individuals to common disease traits, most of which follow complex inheritance patterns rather than Mendelian (single gene) patterns. Physicians will find frequent references in the medical literature related to the search for genes with high predictive value in common disorders.

    While we know the DNA sequence of the vast majority of the coding regions of the genome, we still do not understand the full function of the majority of genes or how they are involved in human health conditions. There are two major approaches to identifying genetic loci, which contribute to disease presentation: linkage analysis and genome-wide association studies.

    Linkage analysis

    Linkage analysis relies on the fact that disease-causing mutations are inherited jointly (linked) with genetic markers located in their immediate vicinity. In order for a gene and a genetic marker to be linked, they must be syntenic, meaning they must be located on the same chromosome. Most genes or markers within the human genome are inherited independently of one another, and therefore are transmitted together 50 percent of the time.

    Linkage between two genes means that they tend to be inherited together more often than expected by chance.

    For linkage to occur, two conditions must be met:

    • First, the genes must be syntenic.
    • Second, they need to reside relatively close to one another.

    Syntenic genes may become detached from one another through crossing over (or recombination). For large chromosomes, crossing over is so common that genes at opposite ends of the chromosome are inherited together no more often than if they resided on entirely different chromosomes.

    When markers are close enough together on the same chromosome, crossing over fails to separate them frequently enough for them to be inherited independently of one another. This is evidenced by coinheritance of greater than 50 percent.

    The unit of measure in linkage studies is “centimorgans." This concept can be confusing because we refer to the “distance” between two traits, but what is measured experimentally is the frequency of coinheritance, not physical distance.

    A very small linkage distance means the traits are rarely separated during meiosis. A distance of 0 cM means two traits always stay together, implying that they are extremely close to one another on the same chromosome. If the two traits separate from one another 1 percent of the time during meiosis, they are described as being 1 cM apart; if the two traits separate from one another 5 percent of the time during meiosis, they are described as being 5 cM apart (figure 14.6).

    Chromosome with genes b, cn, and vg from left to right. From genes b to vg there are 20 centimograns (cM) or map units. The recombination frequency between b and cn is 9% and between cn and vg is 11%.

    Figure 14.6: Relationship between centimorgans and recombination frequency.

    The further apart two genes or markers are on the same chromosome increases the probability of a crossover occurring in between the two markers. Studies to determine linkage require the careful study of large numbers of parents and their offspring. Careful study of the family relationships across three generations allows linkage phases to be determined. In this case, the grandparents' information is required to determine how the genes are initially linked in the parents, and the grandchildren are studied to determine recombination events (crossing over) that separate the genes or markers during meiosis in the parents.

    Distance can be expressed in cM as described previously, or in terms of theta (Θ), which are proportions. Remember, both are measures of probability, not physical distance. Linkage determinations are based on the fundamental rules of probability and binomial mathematics. Like any probability issue, a ratio greater than one reflects odds in favor (of linkage), and less than one reflects odds against.

    For linkage studies, each family represents an independent estimate of the odds in favor of (or against) linkage. The property within standard probability laws is the concept of joint probability. To determine joint probability, meaning the chance that BOTH of two events will happen, we use what is often called the “AND rule." The AND rule applies whenever the probabilities under study are independent of one another.

    Multiplying the results of many families is challenging, and was particularly so before computer resources became readily available. It is simpler mathematically to add numbers. We can move from multiplication to addition if we simply use the log of the probability instead of the probability number itself. Remember that the log of a number that is less than one is a negative number, and for a number greater than one, it is a positive number. Using a log conversion makes it simple to see if the ratio of the odds is favorable (positive) or unfavorable. The term “LOD score” refers to the log (base 10) of the odds of linkage, looking across a series of independent families.

    There really are just two things to remember about LOD scores:

    • First, it is a convenient system for combining the observations across a large number of families to describe the odds of linkage.
    • Second, the values of LOD scores define “proof” that two genes or markers are linked or not linked.
    • When the odds reach an LOD score of 3, the two markers are considered to be proven to be linked.
    • When odds reach a level of -2, this is taken as conclusive evidence that the two genes or markers are not linked. LOD scores appear in a great deal of medical literature where the identification or location of disease-related genes is being considered.

    Genome-wide association studies (GWAS)

    Population association is easily confused with the concepts surrounding linkage. These studies look for a statistical association between a marker (often a single nucleotide polymorphism or SNP) and a specific trait. The concept of population association can be exploited to simultaneously study a very large number of detectable genetic markers (SNPs) in patient populations with common disorders.

    Much of the power of personalized medicine is derived from such associations. There is an abundance of GWAS that appear in the medical literature. This is a highly sophisticated type of case-control study for which careful study design is required to avoid spurious findings. These studies provide information related to common genetic traits but do not help address genetic manifestations of rare traits in a population (figure 14.7).

    Large cohort of cases and controls (n>1,000): Matched for confounding variables, such as race, ethnicity, and sex. Stratified in order to maximize signals. Microarray-based SNP genotyping: 1 million random marker SNPs or 25,000 risk-enhancing SNPs (ie nsSNPs). Derivation of haplotypes: Predicted on international HapMap. Detection of association signals: χ2 or similar test. Uncorrected P < 10-7 or false discovery rate-like correction. Fine mapping of association signal: Directed genotyping of additional SNPs in region. Fine mapping of LD in region of association. Empirical derivation of haplotypes. Examination of effect of stratification, if available. Replication of association: Large independent cohort of cases and controls (n>1,000). Genotyping of nominated candidate SNPs (<20). χ2 or similar test; replication of initial signal. Biological validation of association: Identification of risk-enhancing variant. Examination of functional consequences of variant. Determination of mechanism of risk-enhancement.

    Figure 14.7: Schematic of GWAS.

    For more information on these types of studies, please see:

    References and resources


    Clark, M. A. Biology, 2nd ed. Houston, TX: OpenStax College, Rice University, 2018, Chapter 10: Cell Reproduction, Chapter 12: Mendel's Experiments and Heridity, Chapter 13: Modern Understandings of Inheritance.

    Le, T., and V. Bhushan. First Aid for the USMLE Step 1, 29th ed. New York: McGraw Hill Education, 2018, 55–59.

    Nussbaum, R. L., R. R. McInnes, H. F. Willard, A. Hamosh, and M. W. Thompson. Thompson & Thompson Genetics  in Medicine, 8th ed. Philadelphia: Saunders/Elsevier, 2016, Chapter 7: Patterns of Single Gene Inheritance, Chapter 9: Genetic Variations in Populations, Chapter 10: Identifying the Genetic Basis for Human Disease.


    Grey, Kindred, Figure 14.6 Relationship between centimorgans and recombination frequency. 2021. CC BY 4.0.

    Tam, V., Patel, N., Turcotte, M. et al. Figure 14.7 Schematic of GWAS study. Adapted under Fair Use from Benefits and limitations of genome-wide association studies. Nat Rev Genet 20, 467–484 (2019). Fig. 1: GWAS study design. Added Mitochondrial inheritance by Domaina, Angelito7 and SUM1. CC BY-SA 4.0. From Wikimedia Commons. Added Genetic similarities between 51 worldwide human populations (Euclidean genetic distance using 289,160 SNPs) by Tiago R. Magalhães, Jillian P. Casey, Judith Conroy, Regina Regan, Darren J. Fitzpatrick, Naisha Shah, João Sobral, Sean Ennis. CC BY 2.5. From Wikimedia Commons. Added Histopathology of adenosquamous carcinoma of the pancreas by Yeung, Vincent; Palmer, Joshua D.; Williams, Noelle; Weinstein, Jonathan C.; Fortuna, Danielle; Sama, Ashwin; Winter, Jordan; Bar-Ad, Voichita. CC BY 4.0. From Wikimedia Commons.

    Additional resources

    • Was this article helpful?