## Mcheza |
## User guideMcheza shares most of its interface with the dominant version LOSITAN. We try in as much as possible to make both interfaces as similar as possible to allow users to in order to make life as easy as possible to users that have to deal with both dominant (mcheza) and co-dominant (lositan) markers. As such, the manual for LOSITAN is a good starting point for MCHEZA. LOSITAN is presented in two videos shown below.
## Commmon procedures## Advanced options## Mcheza specifics## Important differences! (READ!)If you are going to read just a single paragraph, read this: In your genepop file, your recessive (null) allele has to be coded as 01 (or 001). The other as 02 (002). Mcheza assumes that the recessive allele goes first in the coding. If you do not do this, your results will be wrong! Before you load the data, you should specify the Theta, Zhivotovsky
(beta) priors and the critical frequency. Most probably the default
values will suit you, but please be aware of this parametres (a
description is found below). ## DifferencesMcheza offers functionality that is specific for dominant markers when compared with Lositan. New generic fuctionality is also included. In theory you will provide a file with haploid data (presence of dominant allele), but as the DFDIST procedure is only concerned with allele counts per population, Mcheza can read "diploid" files, where each position accounts for a different individual. Feel free to use the representation that is most conveninent to you. To make this clear, the following two cases are equal from Mcheza's point of view: ind1, 01 02 01 ind2, 01 01 01 ind3, 02 01 01and ind1-2, 0101 0201 0101 ind3-0, 0200 0100 0100 Use the one that is easier for you. Mcheza is also very slow simulating average negative Fsts. In such a case automated discovery of neutral Fst is turned off (as it is too expensive). Of course manual approximation is still possible. Mcheza's console is similar to Lositan's (click to enlarge): The main differences can be seen on the bottom left and center input consoles, they include: - The False Discovery Rate. This allows to parametrize the false discovery rate as specified in Benjamini,Y., Hochberg,Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing, J R Statist Soc B, 57, 289--300.
- Estimation of allele frequencies. This is not trivial with dominant markers. We use the approach in Zhivotovsky,L.A. (1999) Estimating population structure in diploids with multilocus dominant DNA markers, Molecular Ecology, 8, 907-913. This approach requires the parameters of the beta distribution. Both at 0.25 seem to be reasonable values. We also include a critical frequency, i.e. a frequency for the most common allele above which the locus is ignored. 0.99 seems a good value.
- Theta. Theta is two times the mutation rate (per site per generation) times the number of heritable units in the population
- Sample size. Here you have 3 choices: (i) supply a value, which will be assumed equal in all loci. (ii) Use the supplied default. (iii) Put 0 and mcheza will estimate a value per locus. You will probably want to do iii (or simply accept the default)
## Algorithm details
Fst = 1/(4Nm + 1) holds (e.g. low number of demes or the usage of the stepwise mutation model, common in microsatellite markers). To be able to approximate the average Fst in conditions far from the theoretical optimum, Mcheza starts by running dfdist for 10,000 realizations using the theoretical value, calculating the average simulated Fst, if the value is too far from the real average Fst, Mcheza uses a bisection approximation algorithm running 10,000 realizations for every tentative bisection point. The algorithm works by iteratively slicing the interval of possible Fst values (i.e., between 0 and 1) in half at each iteration and choosing the mean of the bounds on each iteration (with the exception of the first iteration where one of the extremes is chosen). An example is provided to make the approach clearer: In a certain demographic scenario we want to simulate a neutral Fst of 0.08. The algorithm starts by trying 0.08. If the result is higher than desired then 0.0 will be tried (creating an absolute lower bound limit), after that 0.04 (0.0 + 0.08)/2 will be tried, if the result is too low, 0.06 will be used next (i.e. (0.04 + 0.08)/2), the process repeats until the error margin is acceptable. In practical terms the method was able to converge to the desired value in all cases tested (a completely trivial bisection approach is not possible as the method for computing Fst is stochastic and results might vary for the same input conditions).
Mcheza means dance in Swahili. Questions? Contact us! |