Introduction

In this vignette, you will calculate basic population genetic statistics from microsatellite data using R packages.

  1. Genetic diversity,
  2. Test Hardy Weinberg
  3. \(F_{is}\) and global \(F_{st}\)

Assumptions

Data

The data used for these analyses are contained in an R dataset: nancycats, a “genind” object, i.e. an object of the package adegenet. It contains microsatellite genotypes of 237 cats from 17 colonies of Nancy (France). Each individuals of the 17 colonies are genotyped for 9 microsatellite loci.

Resources/Packages required

library("adegenet")
library("pegas")
library("hierfstat")

Analysis

Section 1: Load the data

data("nancycats", package = "adegenet")  
summary(nancycats)
## 
## // Number of individuals: 237
## // Group sizes: 10 22 12 23 15 11 14 10 9 11 20 14 13 17 11 12 13
## // Number of alleles per locus: 16 11 10 9 12 8 12 12 18
## // Number of alleles per group: 36 53 50 67 48 56 42 54 43 46 70 52 44 61 42 40 35
## // Percentage of missing data: 2.34 %
## // Observed heterozygosity: 0.67 0.67 0.68 0.71 0.63 0.57 0.65 0.62 0.45
## // Expected heterozygosity: 0.87 0.79 0.8 0.76 0.87 0.69 0.82 0.76 0.61

Section 2: Genetic diversity (observed and expected heterozygosity)

nancycats  
## /// GENIND OBJECT /////////
## 
##  // 237 individuals; 9 loci; 108 alleles; size: 145.3 Kb
## 
##  // Basic content
##    @tab:  237 x 108 matrix of allele counts
##    @loc.n.all: number of alleles per locus (range: 8-18)
##    @loc.fac: locus factor for the 108 columns of @tab
##    @all.names: list of allele names for each locus
##    @ploidy: ploidy of each individual  (range: 2-2)
##    @type:  codom
##    @call: genind(tab = truenames(nancycats)$tab, pop = truenames(nancycats)$pop)
## 
##  // Optional content
##    @pop: population of each individual (group size range: 9-23)
##    @other: a list containing: xy
div <- summary(nancycats)
div
## 
## // Number of individuals: 237
## // Group sizes: 10 22 12 23 15 11 14 10 9 11 20 14 13 17 11 12 13
## // Number of alleles per locus: 16 11 10 9 12 8 12 12 18
## // Number of alleles per group: 36 53 50 67 48 56 42 54 43 46 70 52 44 61 42 40 35
## // Percentage of missing data: 2.34 %
## // Observed heterozygosity: 0.67 0.67 0.68 0.71 0.63 0.57 0.65 0.62 0.45
## // Expected heterozygosity: 0.87 0.79 0.8 0.76 0.87 0.69 0.82 0.76 0.61
plot(div$Hobs, xlab="Loci number", ylab="Observed Heterozygosity", 
     main="Observed heterozygosity per locus")

plot(div$Hobs, div$Hexp, xlab="Observed Heterozygosity", ylab="Expected Heterozygosity", 
     main="Expected heterozygosity as a function of observed heterozygosity per locus")

bartlett.test(list(div$Hexp, div$Hobs)) # difference of means 
## 
##  Bartlett test of homogeneity of variances
## 
## data:  list(div$Hexp, div$Hobs)
## Bartlett's K-squared = 0.046962, df = 1, p-value = 0.8284

We get various information from the command summary. The one that interest us is the observed and expected heterozygosity per locus. We observed that heterozygosity varies among loci.

The results from the Bartlett test indicates that we have no difference between the mean observed and expected heterozygosity

Section 3: Basic statistics with hierfstat

The function basic.stats() provides the observed heterozygosity (\(H_o\)), mean gene diversities within population (\(H_s\)), \(F_{is}\), and \(F_{st}\). The function boot.ppfis() provides confidence interval for \(F_{is}\). The function indpca() does a PCA on the centered matrix of individuals’ allele frequencies.

nancycats.hfstat <- genind2hierfstat(nancycats)
basicstat <- basic.stats(nancycats, diploid = TRUE, digits = 2) 
names(basicstat)
## [1] "n.ind.samp" "pop.freq"   "Ho"         "Hs"         "Fis"       
## [6] "perloc"     "overall"
boot.ppfis(nancycats.hfstat) 
## $call
## boot.ppfis(dat = nancycats.hfstat)
## 
## $fis.ci
##         ll     hl
## 1  -0.0063 0.2637
## 2   0.1278 0.2641
## 3   0.0607 0.2533
## 4   0.0841 0.2332
## 5   0.0410 0.2168
## 6   0.0419 0.2950
## 7   0.0425 0.2991
## 8   0.0695 0.3235
## 9  -0.2449 0.0708
## 10 -0.0684 0.2318
## 11  0.0796 0.2289
## 12  0.0615 0.2543
## 13 -0.0898 0.1260
## 14  0.0383 0.2365
## 15 -0.1633 0.1154
## 16 -0.0510 0.1533
## 17 -0.1361 0.1494
x <- indpca(nancycats.hfstat) 
plot(x, cex = 0.7)

Section 4: Testing for Hardy-Weinberg Equilibrium

hw.test(nancycats, B = 1000)
##           chi^2  df  Pr(chi^2 >) Pr.exact
## fca8  395.80006 120 0.000000e+00        0
## fca23 239.34221  55 0.000000e+00        0
## fca43 434.33397  45 0.000000e+00        0
## fca45  66.11849  36 1.622163e-03        0
## fca77 270.52066  66 0.000000e+00        0
## fca78 402.80002  28 0.000000e+00        0
## fca90 217.19836  66 0.000000e+00        0
## fca96 193.36764  66 1.965095e-14        0
## fca37 291.00731 153 1.209777e-10        0

We get for each locus a test of significance of the null hypothesis: \(H_0\) the locus is in HW equilibrium in the population/ \(H_1\). The locus is not in HW equilibrium.

We can conclude from the p-values of each test that any locus is in HW equilibrium.

Conclusions

What did we learn today?

In this vignette, we learned how to explore the patterns of genetic diversity in one population. Also, you have an idea of potential violations of the dataset to the null Wright-Fischer model.

What is next?

You may now want to move on to looking into population differentiation.

Contributors

Session Information

This shows us useful information for reproducibility. Of particular importance are the versions of R and the packages used to create this workflow. It is considered good practice to record this information with every analysis.

options(width = 100)
devtools::session_info()
## Session info ---------------------------------------------------------------------------------------
##  setting  value                       
##  version  R version 3.3.2 (2016-10-31)
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  tz       <NA>                        
##  date     2017-01-04
## Packages -------------------------------------------------------------------------------------------
##  package    * version date       source        
##  ade4       * 1.7-5   2016-12-13 CRAN (R 3.3.2)
##  adegenet   * 2.0.1   2016-02-15 CRAN (R 3.3.2)
##  ape        * 4.0     2016-12-01 CRAN (R 3.3.2)
##  assertthat   0.1     2013-12-06 CRAN (R 3.3.2)
##  backports    1.0.4   2016-10-24 CRAN (R 3.3.2)
##  boot         1.3-18  2016-02-23 CRAN (R 3.3.2)
##  cluster      2.0.5   2016-10-08 CRAN (R 3.3.2)
##  coda         0.19-1  2016-12-08 CRAN (R 3.3.2)
##  colorspace   1.3-2   2016-12-14 CRAN (R 3.3.2)
##  DBI          0.5-1   2016-09-10 CRAN (R 3.3.2)
##  deldir       0.1-12  2016-03-06 CRAN (R 3.3.2)
##  devtools     1.12.0  2016-12-05 CRAN (R 3.3.2)
##  digest       0.6.10  2016-08-02 CRAN (R 3.3.2)
##  dplyr        0.5.0   2016-06-24 CRAN (R 3.3.2)
##  evaluate     0.10    2016-10-11 CRAN (R 3.3.2)
##  gdata        2.17.0  2015-07-04 CRAN (R 3.3.2)
##  ggplot2      2.2.0   2016-11-11 CRAN (R 3.3.2)
##  gmodels      2.16.2  2015-07-22 CRAN (R 3.3.2)
##  gtable       0.2.0   2016-02-26 CRAN (R 3.3.2)
##  gtools       3.5.0   2015-05-29 CRAN (R 3.3.2)
##  hierfstat  * 0.04-22 2015-12-04 CRAN (R 3.3.2)
##  htmltools    0.3.5   2016-03-21 CRAN (R 3.3.2)
##  httpuv       1.3.3   2015-08-04 CRAN (R 3.3.2)
##  igraph       1.0.1   2015-06-26 CRAN (R 3.3.2)
##  knitr        1.15.1  2016-11-22 CRAN (R 3.3.2)
##  lattice      0.20-34 2016-09-06 CRAN (R 3.3.2)
##  lazyeval     0.2.0   2016-06-12 CRAN (R 3.3.2)
##  LearnBayes   2.15    2014-05-29 CRAN (R 3.3.2)
##  magrittr     1.5     2014-11-22 CRAN (R 3.3.2)
##  MASS         7.3-45  2016-04-21 CRAN (R 3.3.2)
##  Matrix       1.2-7.1 2016-09-01 CRAN (R 3.3.2)
##  memoise      1.0.0   2016-01-29 CRAN (R 3.3.2)
##  mgcv         1.8-16  2016-11-07 CRAN (R 3.3.2)
##  mime         0.5     2016-07-07 CRAN (R 3.3.2)
##  munsell      0.4.3   2016-02-13 CRAN (R 3.3.2)
##  nlme         3.1-128 2016-05-10 CRAN (R 3.3.2)
##  pegas      * 0.9     2016-04-16 CRAN (R 3.3.2)
##  permute      0.9-4   2016-09-09 CRAN (R 3.3.2)
##  plyr         1.8.4   2016-06-08 CRAN (R 3.3.2)
##  R6           2.2.0   2016-10-05 CRAN (R 3.3.2)
##  Rcpp         0.12.8  2016-11-17 CRAN (R 3.3.2)
##  reshape2     1.4.2   2016-10-22 CRAN (R 3.3.2)
##  rmarkdown    1.3     2016-12-21 CRAN (R 3.3.2)
##  rprojroot    1.1     2016-10-29 CRAN (R 3.3.2)
##  scales       0.4.1   2016-11-09 CRAN (R 3.3.2)
##  seqinr       3.3-3   2016-10-13 CRAN (R 3.3.2)
##  shiny        0.14.2  2016-11-01 CRAN (R 3.3.2)
##  sp           1.2-4   2016-12-22 CRAN (R 3.3.2)
##  spdep        0.6-8   2016-09-21 CRAN (R 3.3.2)
##  stringi      1.1.2   2016-10-01 CRAN (R 3.3.2)
##  stringr      1.1.0   2016-08-19 CRAN (R 3.3.2)
##  tibble       1.2     2016-08-26 CRAN (R 3.3.2)
##  vegan        2.4-1   2016-09-07 CRAN (R 3.3.2)
##  withr        1.0.2   2016-06-20 CRAN (R 3.3.2)
##  xtable       1.8-2   2016-02-05 CRAN (R 3.3.2)
##  yaml         2.1.14  2016-11-12 CRAN (R 3.3.2)