Simple user guide

OgaraK is a population genetics simulator, this means that it can be used to generate synthetic datasets but not to do data analysis (though we supply a large amount of sample analysis scripts). The data that is generated is the frequency of genotypes in the parasite population. OgaraK greatly facilitates the data analysis procedure by generating output in the Genepop format which can be read by most population genetics data analysis programs like e.g. Arlequin. As ogaraK is, in fact, simulating an infinite population, the Genepop format (being based on a finite sample) is not enough to capture the genotype frequencies with full precision, therefore ogaraK also exports its results in a simple format with precise information of genotype frequencies over time. Here we describe the parameters for ogaraK and the output format along with example scenarios.

The console looks like this (click to enlarge):

The left part is the most important, as it is where the user inputs the model parameters. Top right is the expected relative cost of simulations, plus status. Bottom right is the result of the last execution in terms of frequency (Black - sensitive to light read, multiple resistent); this is only indicative as the data files generated are the fundamental artifact. The chart can be zoomed in, saved, etc (most operations are available by right-clicking on the chart)

Example scenarios

Many example scenarios are avaiable (please see the File/Open menu). They are named according to the following convention:

EpistasisMOI or
NumberofEpistasisMOI

Examples:

FullEpistasis2 contains an example for full epistasis (see definition below) with a MOI of 2.

Full+Assym4 is Full Epistasis and Assymetry (SP-based) with MOI 4.

2MFTFull+DGF3 is 2 drugs, multiple first line therapies, Full Epistasis and Assymetry with MOI of 3.

2RotFull+DGF3 is 2 drugs, rotation, Full Epistasis and Assymetry with MOI of 3.

Parameters

OgaraK provides an easy to use interface. The following parameters are available:
Number of drugs The number of drugs used
Drug type Either "Standard" or "Artemisinin". In standard drugs, loci involved in drug resistance are independent from drug to drug. In artemisinin drugs, there are always two loci per drug and one locus is shared among all drugs.
Loci per drug Number of loci per drug. Always two if the drug type is Artemesinin.
Epistasis Either "Full epistasis", "Duplicate gene function" (DGF), "Full epistasis plus DGF" or "Full epistasis plus asymmetry". For "Artimisin" based drugs only "Full epistasis" is expected to be used, but the user can test alternative models.
Policy Either "Rotation", "Multiple first line" or "combination".
Rotation If policy is rotation, the user can specify the type of rotation: Either "Fixed", i.e., every n generations, nbeing user specified or "Dynamic" where a drug is rotated after a certain resistance thereshold for the drug being used is reached.
Infectivity Either competitive release or independent transmission
Fitness penalty The type of penalty that will be computed, the most commonly used being "Erythrocytic": Here any fitness penalty is modelled as competition in the blood stages, this means that the penalty is only useful in cases where the MOI is bigger than 1 (i.e., there is more than one infection competing in the blood). "Exo-erythrocytic" penalty exists mainly to model a penalty at the transmission and liver stages, i.e., it is MOI independent.
Penalty value The fitness penalty incurred per mutation (multiplicative). Specified as a range (see below).
Drug usage The percentage of infected humans that are treated. Untreated hosts function as sensitive reservoirs. Also specified as a range.
MOI The Multiplicities of Infection in the simulation. The user can specify from a MOI of 1 (selfing) to 7. All the MOIs must add up to 1. E.g. 0.25 of MOI 1 and 0.75 of MOI 2, means that 25% of all humans will have only one infection at a certain point in time, and the other 75% will have two.
Max generations The maximum generations for which each simulation can be run. A simulation can stop for two reasons: Either at the limit specified here or if the threshold of resistance is reached (see below).
Threshold The value of spread of resistance after which the simulation stops. It is normally not interesting to run the simulation above certain levels of resistance as after certain levels the drugs are not used anymore (as they have long lost efficacy).
Initial frequency The initial frequency of the resistant forms
Sample size The sample size for the Genepop file. While the model is not individual based, a sample of the every generation simulated can be produced. The sample size is approximated (it can be slightly lower than value requested) in order to try to assure a rounded approximation of the existing population
Resistance introduction Activates mutation
Prob. introduction The probability of introduction of of a resistant per generation. Each locus will have the same probability specificed here
Percent replaced The percentage of sensitive alleles that will be replaced with a resistant version

Both fitness penalty and drug usage can be specified as a single value or as a range. If specified as a range, then more than one simulation will be run, example: Fitness is specified as range between 0 and 1 with a step of 0.1 and drug usage is specified as a range between 0.1 and 0.8 with a range of 0.05, this means that 11x15 simulations will be done, with fitness being 0.0, 0.1, ..., 1.0 and drug usage being 0.1, 0.15,... , 0.8. If single values are specified for both parameters then only one simulation will be made.

It should be noted that the computational cost of running ogaraK simulations can vary widely. Users are recommended to read the section below, regarding computational cost before starting simulating with ogaraK.

Configuration has to be saved to a file before the simulator can be run. Configuration files can be easily edited by hand, this being specially useful for batch mode runs.

When running, the simulator writes all the simulations to disk in an internal format (in addition to output formats). This allows to re-use previous simulation results or to restart a batch of simulations that might be interrupted. All data files are written to the same directory where the configuration file is written.

Output

For every simulation made three files are written: one with a sample of individuals in Genepop format - each population representing a different generation. A file with the frequency of all genotypes per generation and a file noting when drugs are rotated. The last file only is created when drug rotation policies are in simulated. The files are named with the following prefix:

freqPOLICY-NUMDRUGSLOCIDRUG-PENALTY-DRUGUSAGE

E.g freqMFT-21-005-03 is a the prefix of files containing the results from an MFT policy with 2 drugs, 1 locus per drug, a fitness penalty of 0.05 and a drug usage of 0.3. The suffixes for file names are .txt for genepop files, .og for the frequencies and .sw for the swaps.

While the genepop file adheres to a widely used standard and thus needs no description, we provide here the description of the content of the frequency and swap files.

In some specific cases, a 4th file is written with LD (r). This only applies to scenarios with one drug and two loci. This computation can easily be done, for other cases, with external population genetics analysis tools.

Frequency files

Each line has the frequency of a certain genotype configuration. This consists of two columns: the genotype and the frequency. All genotypes are enumerated. the process repeats for all generations simulated. Here is an example where a genome for 2 drugs and 1 loci per drug is simulated:

0 0.9
1 0.05
2 0.04
3 0.01
0 0.8
1 0.083
2 0.073
3 0.043

In this case 2 generations are shown: in the first the fully sensitive genotype has 90% of frequency, the one resistant to drug 1 has 5%, the one resistant to drug two has 4% and the multiple-resistant has 1%.

Swap files

Swap files depict when drugs are rotated, they format is very simple: each lines depicts when a drug is swapped, the first column indicates the generation and the second the drug introduced, example

0 0
7 1
14 0

Drug 1 (coded as 0) is introduced in generation 0, rotation first happens at generation 7, and again a 14.

Command-line (batch) mode

OgaraK can be run in command-line mode, please see the separate guide.

Software issues

Complexity and computational cost

The number of different genotypes that must be tracked is equal to 2^l where l is number of drugs times the number of loci per drug (or the number of drugs plus 1 in the case of the Artemisinin model). As per the formula above all possible combinations of genotypes for each MOI will have to be considered. The number of permutations is then 2^l^MOI. This makes the calculation above computationally very intensive, as an example to study a clonal multiplicity of 4 with 64 different genotypes (3 drugs with 2 loci per drug) requires considering 16 million cases. The most extreme case theoretically allowed, with a MOI of 7 would require dealing 4398046511104 cases (which is not feasible in practice). Furthermore, this computation has to be done for each environment (untreated and each epistasis mode per drug), for every generation and for every simulation (the number of simulations being dependent on the ranges of both fitness penalty and drug usage).

Therefore, the user will have to decide on the best compromise between available computing resources and the the parameter choice (noticing that different parameters have a completely different impact on the computational time).

OgaraK makes tries in as much as possible to facilitate the run of computationally intensive simulations by doing the following:

  1. A relative value of computational cost is presented to the user. Whenever the user changes parameters, an estimate of the relative cost can be made. While not being related to any physical measure of time, it gives a good relative comparison of time among different parameter choices.
  2. Simulations are independent and can be run concurrently over a certain range of fitness and drug usage. OgaraK will generate different file names based on the parameters chosen, facilitating concurrent runs.
  3. OgaraK can be run in batch mode (allowing multiple runs being starting automatically by a script on a single computer or on a grid), using a configuration file generated by the UI. This can be simple done by calling java malaria.Article path_to_database path_to_config_file (see the command line guide for more information)

Back to main page.