Simple user guide
OgaraK is a population genetics simulator, this means that it can be used to generate synthetic datasets but not to do data analysis (though we supply a large amount of sample analysis scripts). The data that is generated is the frequency of genotypes in the parasite population. OgaraK greatly facilitates the data analysis procedure by generating output in the Genepop format which can be read by most population genetics data analysis programs like e.g. Arlequin. As ogaraK is, in fact, simulating an infinite population, the Genepop format (being based on a finite sample) is not enough to capture the genotype frequencies with full precision, therefore ogaraK also exports its results in a simple format with precise information of genotype frequencies over time. Here we describe the parameters for ogaraK and the output format along with example scenarios.
The console looks like this (click to enlarge):
The left part is the most important, as it is where the user inputs the model parameters. Top right is the expected relative cost of simulations, plus status. Bottom right is the result of the last execution in terms of frequency (Black - sensitive to light read, multiple resistent); this is only indicative as the data files generated are the fundamental artifact. The chart can be zoomed in, saved, etc (most operations are available by right-clicking on the chart)
Many example scenarios are avaiable (please see the File/Open menu). They are named according to the following convention:EpistasisMOI or
FullEpistasis2 contains an example for full epistasis (see definition below) with a MOI of 2.
Full+Assym4 is Full Epistasis and Assymetry (SP-based) with MOI 4.
2MFTFull+DGF3 is 2 drugs, multiple first line therapies, Full Epistasis and Assymetry with MOI of 3.
2RotFull+DGF3 is 2 drugs, rotation, Full Epistasis and Assymetry with MOI of 3.
OgaraK provides an easy to use interface. The following parameters are available:
Both fitness penalty and drug usage can be specified as a single value or as a range. If specified as a range, then more than one simulation will be run, example: Fitness is specified as range between 0 and 1 with a step of 0.1 and drug usage is specified as a range between 0.1 and 0.8 with a range of 0.05, this means that 11x15 simulations will be done, with fitness being 0.0, 0.1, ..., 1.0 and drug usage being 0.1, 0.15,... , 0.8. If single values are specified for both parameters then only one simulation will be made.
It should be noted that the computational cost of running ogaraK simulations can vary widely. Users are recommended to read the section below, regarding computational cost before starting simulating with ogaraK.
Configuration has to be saved to a file before the simulator can be run. Configuration files can be easily edited by hand, this being specially useful for batch mode runs.
When running, the simulator writes all the simulations to disk in an internal format (in addition to output formats). This allows to re-use previous simulation results or to restart a batch of simulations that might be interrupted. All data files are written to the same directory where the configuration file is written.
For every simulation made three files are written: one with a sample of individuals in Genepop format - each population representing a different generation. A file with the frequency of all genotypes per generation and a file noting when drugs are rotated. The last file only is created when drug rotation policies are in simulated. The files are named with the following prefix:
E.g freqMFT-21-005-03 is a the prefix of files containing the results from an MFT policy with 2 drugs, 1 locus per drug, a fitness penalty of 0.05 and a drug usage of 0.3. The suffixes for file names are .txt for genepop files, .og for the frequencies and .sw for the swaps.
While the genepop file adheres to a widely used standard and thus needs no description, we provide here the description of the content of the frequency and swap files.
In some specific cases, a 4th file is written with LD (r). This only applies to scenarios with one drug and two loci. This computation can easily be done, for other cases, with external population genetics analysis tools.
Each line has the frequency of a certain genotype configuration. This consists of two columns: the genotype and the frequency. All genotypes are enumerated. the process repeats for all generations simulated. Here is an example where a genome for 2 drugs and 1 loci per drug is simulated:
0 0.9 1 0.05 2 0.04 3 0.01 0 0.8 1 0.083 2 0.073 3 0.043
In this case 2 generations are shown: in the first the fully sensitive genotype has 90% of frequency, the one resistant to drug 1 has 5%, the one resistant to drug two has 4% and the multiple-resistant has 1%.
Swap files depict when drugs are rotated, they format is very simple: each lines depicts when a drug is swapped, the first column indicates the generation and the second the drug introduced, example
0 0 7 1 14 0
Drug 1 (coded as 0) is introduced in generation 0, rotation first happens at generation 7, and again a 14.
Command-line (batch) modeOgaraK can be run in command-line mode, please see the separate guide.
Complexity and computational cost
The number of different genotypes that must be tracked is equal to 2^l where l is number of drugs times the number of loci per drug (or the number of drugs plus 1 in the case of the Artemisinin model). As per the formula above all possible combinations of genotypes for each MOI will have to be considered. The number of permutations is then 2^l^MOI. This makes the calculation above computationally very intensive, as an example to study a clonal multiplicity of 4 with 64 different genotypes (3 drugs with 2 loci per drug) requires considering 16 million cases. The most extreme case theoretically allowed, with a MOI of 7 would require dealing 4398046511104 cases (which is not feasible in practice). Furthermore, this computation has to be done for each environment (untreated and each epistasis mode per drug), for every generation and for every simulation (the number of simulations being dependent on the ranges of both fitness penalty and drug usage).
Therefore, the user will have to decide on the best compromise between available computing resources and the the parameter choice (noticing that different parameters have a completely different impact on the computational time).
OgaraK makes tries in as much as possible to facilitate the run of computationally intensive simulations by doing the following: