The following table describes specialized objects to store data represented in population genetics packages. Conversion between all types is possible.
Anyone developing a package for population genetic analysis is encouraged to use or build upon these data structures. If a new data structure is needed, please provide a conversion method to one or more of the classes listed below.
| Class {type} (package) | Strengths | Weaknesses | 
|---|---|---|
| DNAbin {S3} (ape) | stores all sets of sequences (aligned or not) | less compact than 2-bit coding (but by a factor 4 at most) | 
| uses matrices (aligned) or lists so usual R’s commands ( names,rownames,[,[[,$) can be used | ||
| many as.DNAbinmethods in ape (inc. from BioConductor) | ||
| efficient functions in ape ( dist.dna,seg.sites,base.freq,read.FASTA) and in pegas (haplotype) | ||
| loci {S3} (pegas) | low memory usage | not really appropriate for some analyses (e.g., multivariate analyses) | 
| all levels of ploidy and any number of alleles | needs to improve the treatment of NA’s (especially when data are read with read.vcf() | |
| genotypes can be phased | ||
| any kind of individual data can be associated in the data frame | ||
| efficient to compute genotype and allele frequencies | ||
| genind {S4} (adegenet) | stores allelic counts; ideal for multivariate analyses | requires more memory | 
| additional slots for individual data | less efficient to compute frequencies | |
| additional slot for population strata | ||
| all levels of ploidy | ||
| genpop {S4} (adegenet) | equivalent to genind at group level; ideal for multivariate analysis | requires more memory | 
| genlight {S4} (adegenet) | stores binary SNPs using bit-level coding; very memory efficient | more computationally intensive to handle; less functionalities | 
| additional slots for individual data and population strata | ||
| all levels of ploidy | assumes bi-allelic loci | |
| genclone {S4} (poppr) | inherits genind object; gains all advantages | all the same weaknesses plus slightly more memory | 
| stores multilocus genotype/lineage definitions ( @mlgslot) for clonal populations | ||
| snpclone {S4} (poppr) | inherits genlight object; gains all advantages | all the same weaknesses plus slightly more memory | 
| stores multilocus genotype/lineage definitions ( @mlgslot) for clonal populations | ||
| genambig {S4} (polysat) | stores microsatellite data with ambiguous ploidy | does not handle any other data type | 
| exports to genpop objects | cannot easily be transferred to any other object | |
| phyDat {S3} (phangorn) | very general inspired by R data.frame,factorandcontrasts, can contain any discrete data type; nucleotides, amino acids and codons have some more support | designed having phylogenetic analysis in mind; requires alignments, where all sequences have same length | 
| can be converted to and from DNAbinobjects (as.DNAbin/as.phyDat) | ||
| a few generic functions work on it: c,unique,subsetand utility functionsbaseFreq,allSitePattern, etc. | data are not necessarily very memory efficient (as integer + contrast matrix), but stores only unique site patterns and their weights (as double) | |
| “efficient” maximum likelihood, maximum parsimony and distance functions in phangorn | ||
| gtype {S3} (strataG) | a simple R listcontaining amatrixwhere the first column is a stratification scheme and columns afterward are either haplotypes or diploid loci. If haploid data, thegtypeobject can also contain a list of DNA sequences. | Can likely be made more efficient in terms of storage and preprocessing for other analytical routines in package | 
| can be converted to data.frameormatrixwith appropriateas.functions. | ||
| has manipulation functions like subsetwhich will select certain strata and/or loci,mergeto combine mulitplegtypes, andsummary. | ||
| can create input files for Genepop, STRUCTURE, fastsimcoal, Arlequin, MEGA, and PHASE | ||
| multiDNA {S4} (apex) | stores multiple DNAbinobjects from ape | |
| multiPhyDat {S4} (apex) | stores multiple phyDatobjects from phangorn |