Mcheza

User guide

Mcheza shares most of its interface with the dominant version LOSITAN. We try in as much as possible to make both interfaces as similar as possible to allow users to in order to make life as easy as possible to users that have to deal with both dominant (mcheza) and co-dominant (lositan) markers.

As such, the manual for LOSITAN is a good starting point for MCHEZA. LOSITAN is presented in two videos shown below.

Important note: There are some important differences between LOSITAN and MCHEZA, please do read the section on fundamental differences below the videos.

Commmon procedures

Advanced options

Mcheza specifics

Important differences! (READ!)

If you are going to read just a single paragraph, read this: In your genepop file, your recessive (null) allele has to be coded as 01 (or 001). The other as 02 (002). Mcheza assumes that the recessive allele goes first in the coding. If you do not do this, your results will be wrong!

Before you load the data, you should specify the Theta, Zhivotovsky (beta) priors and the critical frequency. Most probably the default values will suit you, but please be aware of this parametres (a description is found below). Please note that the critical frequency is used only to filter the simulated data. All markers will be shown in the final chart.7C

Differences

Mcheza offers functionality that is specific for dominant markers when compared with Lositan. New generic fuctionality is also included.

In theory you will provide a file with haploid data (presence of dominant allele), but as the DFDIST procedure is only concerned with allele counts per population, Mcheza can read "diploid" files, where each position accounts for a different individual. Feel free to use the representation that is most conveninent to you. To make this clear, the following two cases are equal from Mcheza's point of view:

ind1, 01 02 01
ind2, 01 01 01
ind3, 02 01 01
and
ind1-2, 0101 0201 0101
ind3-0, 0200 0100 0100

Use the one that is easier for you.

Mcheza is also very slow simulating average negative Fsts. In such a case automated discovery of neutral Fst is turned off (as it is too expensive). Of course manual approximation is still possible.

Mcheza's console is similar to Lositan's (click to enlarge):

The main differences can be seen on the bottom left and center input consoles, they include:

  • The False Discovery Rate. This allows to parametrize the false discovery rate as specified in Benjamini,Y., Hochberg,Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing, J R Statist Soc B, 57, 289--300.
  • Estimation of allele frequencies. This is not trivial with dominant markers. We use the approach in Zhivotovsky,L.A. (1999) Estimating population structure in diploids with multilocus dominant DNA markers, Molecular Ecology, 8, 907-913. This approach requires the parameters of the beta distribution. Both at 0.25 seem to be reasonable values. We also include a critical frequency, i.e. a frequency for the most common allele above which the locus is ignored. 0.99 seems a good value.
  • Theta. Theta is two times the mutation rate (per site per generation) times the number of heritable units in the population
  • Sample size. Here you have 3 choices: (i) supply a value, which will be assumed equal in all loci. (ii) Use the supplied default. (iii) Put 0 and mcheza will estimate a value per locus. You will probably want to do iii (or simply accept the default)

Algorithm details

Force mean Fst

Mcheza allows approximating average simulated Fst to the average value found in the real dataset even when the experimental conditions are far from the ones where the theoretical formula

Fst = 1/(4Nm + 1)

holds (e.g. low number of demes or the usage of the stepwise mutation model, common in microsatellite markers).
To be able to approximate the average Fst in conditions far from the theoretical optimum, Mcheza starts by running dfdist for 10,000 realizations using the theoretical value, calculating the average simulated Fst, if the value is too far from the real average Fst, Mcheza uses a bisection approximation algorithm running 10,000 realizations for every tentative bisection point. The algorithm works by iteratively slicing the interval of possible Fst values (i.e., between 0 and 1) in half at each iteration and choosing the mean of the bounds on each iteration (with the exception of the first iteration where one of the extremes is chosen). An example is provided to make the approach clearer:
In a certain demographic scenario we want to simulate a neutral Fst of 0.08. The algorithm starts by trying 0.08. If the result is higher than desired then 0.0 will be tried (creating an absolute lower bound limit), after that 0.04 (0.0 + 0.08)/2 will be tried, if the result is too low, 0.06 will be used next (i.e. (0.04 + 0.08)/2), the process repeats until the error margin is acceptable.
In practical terms the method was able to converge to the desired value in all cases tested (a completely trivial bisection approach is not possible as the method for computing Fst is stochastic and results might vary for the same input conditions).

Neutral mean Fst

The initial mean dataset Fst is often not neutral in the sense that (initially unknown) selected loci are often included in the computation. Mcheza can optionally be run once to determine a first candidate subset of selected loci in order to remove them from the computation of the neutral Fst. This value will be, in most cases, a better approximation of the neutral Fst (Beaumont & Nichols 1996). The procedure works as follows: Mcheza is run a first time, using all loci to estimate the mean neutral Fst. After the first run, all loci that are outside the desired confidence intervals (e.g. 99% CIs) are removed and the mean neutral Fst is computed again using only putative neutral loci that were not removed. A second and final run of Mcheza, using all loci, is then conducted using the last computed mean. This procedure lowers the bias on the estimation of the mean neutral Fst by removing the most extreme loci from the estimation. Naturally all loci will be present in the last run will have their estimated selection status reported.

Combined algorithms

Both algorithms can be combined. There will be a first pass to determine the first neutral set, that first pass with try to force the mean Fst. After the initial neutral set is determined, the Fst from this set is computer and, again the mean Fst will be forced.

Mcheza means dance in Swahili.

Questions? Contact us!