Extension manual

Important note: the extension system is targeted to computer savvy users. If you just want to use m4s2, please check the User guide.

Introduction

Here we explain how to extend m4s2 demography models. OgaraK supports a set of demographies, which, although covering most of the demographies we have found on publications are far from covering all possible cases (which are, obviously, limited only by ones imagination). As such we provide an extension system, which allows for new models to be incorporated into m4s2.

The extension system was developed with the following in mind: Although m4s2 can be used by the computer illiterate user, the extension system is targeted to a more computer savvy type of user. The idea is that a more computer literate user creates new models, making them publicly available on the web for all users to use. m4s2 is able to directly import models from the web (we provide one example here. A user extending m4s2 needs to now how to code Simcoal2 models, to use a simple embedded language described below, and in extreme cases to code in Python. We expect, in reality that most extension models are very easy to code, but the full language expressiveness of Python is available for the most complex cases.

This text is organized around examples. We present 3 examples, in increasing order of difficulty, introducing concepts, small domain specific languages and Python extensibility. The last part of the document concerns deploy models on the web. If you like to learn by looking at a working case, then you can look to our own extensions available here.

Simple example

We start with the simplest example possible: a single, constant size population. The parameters for this model are simply the population size and the sample size

Three artifacts have to be provided: An image, a model template and a parameter file. An optional artifact is a python extension file which will only be used in the hard example.

The image

The image has to be called image.png, any size is acceptable as m4s2 resizes it. Here is an example (normally, most real cases are bigger and more complex):

The properties file

Here is our parameter file called properties, followed by an explanation

id=simple
name=A simple model
numParameters=2
1.name=Ne
1.desc=Effective population size
1.pythonName=pop_size
1.value=100
1.type=int
2.name=sample_size
2.desc=Sample size per population
2.pythonName=sample_size
2.value=60
2.type=int

id is an internal id, it should be unique (among all models) and composed of small letters, underscores and numbers,

name is simply the name that is presented to the user,

numParameters is the number of parameters that the user has to parametrize, in our case just the population size and the sample size.

We can then describe the parameters, each parameter has a slot (identified by a number). Each parameter has a name and a description (which will be shown to the user), a type (either float or int), a default value and a pythonName, which is simply the name that the parameter will have on the template file (see next subsection).

There should also be a constraint between sample size and population size (namely that the population size has to be bigger than the sample size), m4s2 allows this, but for the sake of simplicity, we will only present that in the next example.

The template/model file

The template file (model.par), is a SimCoal2 file with added macro features, some of the most simplest are shown on the first example:

Important notes

  1. An explanation of SimCoal2's format can be found on SimCoal2 web site. You need to understand it in order to understand our macro language on top of it.
  2. SimCoal2 parser is very, very sensitive and counter-intuitive, most of the problems will be getting the final result (before SimCoal2 processing) correct. As an example, comment lines are important, in most cases they have to be on precise positions and no extra lines can be added, this is a SimCoal2 issue.
//Parameters for the coalescence simulation program : simcoal.exe
1 samples
//Population effective sizes (number of genes 2*diploids)
?pop_size
//Samples sizes (number of genes 2*diploids)
?sample_size
//Growth rates	: negative growth implies population expansion
0
//Number of migration matrices : 0 implies no migration between demes
0
//historical event: time, source, sink, migrants, new deme size, new growth rate, migration matrix index
0 historical events

This is a simple SimCoal2 file where the population and sample sizes are replaced by ?pop_size and ?sample_size. Whenever '?' followed by a pythonName (see parameter file explanation) is found it will be replaced by the value chosen by the user.)

Intermediate example

In this example, we will introduce constraints and in-model code execution. We will not discuss image issues any further, but we present it in order to facilitate the discussion.

This is a decline followed by a split.

The properties file

The fundamental part is at the end, where constraints are introduced.

id=decline_split
name=Decline followed by split
numParameters=7
1.name=Ne1
1.desc=Effective population size before decline
1.pythonName=ne1
1.value=100
1.type=int
2.name=Ne2
2.desc=Effective population size after decline
2.pythonName=ne2
2.value=50
2.type=int
3.name=Ne3
3.desc=Effective population size of each split branch
3.pythonName=pop_size
3.value=100
3.type=int
4.name=T2
4.desc=Generation of decline
4.pythonName=contract_gen
4.value=50
4.type=int
5.name=T1
5.desc=Generation of split
5.pythonName=split_gen
5.value=25
5.type=int
6.name=mig
6.desc=Migration rate
6.pythonName=mig
6.value=0.01
6.type=float
7.name=sample_size
7.desc=Sample size per population
7.pythonName=sample_size
7.value=60
7.type=int
numConstraints=3
1.constraint=split_gen=ne2
2.message=Population size before decline has to be bigger or equal to size after decline
3.constraint=ne2<=pop_size
3.message=Population size before expansion has to be smaller or equal to size after expansion

numConstraints tells the number of constraints, in our case 3.

Then for each constraint, we have a condition (in the format of a python conditional) and a message, which will be reported to the user in case she violates the condition, in that case the system is not allowed to proceed, until the constraint is satisfied.

The template/model file

This template file introduces embedded executable code, the fundamental part is at the bottom.

//Parameters for the coalescence simulation program : simcoal.exe
2 samples
//Population effective sizes (number of genes 2*diploids)
?pop_size
?pop_size
//Samples sizes (number of genes 2*diploids)
?sample_size
?sample_size
//Growth rates	: negative growth implies population expansion
0
0
//Number of migration matrices : 0 implies no migration between demes
2
//mig
0 ?mig
?mig 0
//nothing
0 0
0 0
//historical event: time, source, sink, migrants, new deme size, new growth rate, migration matrix index
2 historical events
?contract_gen 0 0 1 !!!1.0*?ne1/?ne2!!! 0 1
?split_gen 1 0 1 !!!1.0*?ne2/(2*?pop_size)!!! 0 1

All the text between !!! will be executed by the Jython interpreter, its result will replace the code on the template. Arbitrary functions can be called (see next example).

Hard example

The next example is an (excessively) complicated on, so brace... It was developed as a model for the evolution of domesticated species. The properties file is not shown as it introduces no new concepts.

The template/model file

The template file is presented below, it introduces function calls, a simple concept. Some functions are not available on the core m4s2, so they will have to be provided (see next subsection).

//Parameters for the coalescence simulation program : simcoal.exe
!!!?pops_per_group*3!!! samples
//Population effective sizes (number of genes 2*diploids)
!!!generate_pop_sizes(?pops_per_group, ?pop_size)!!!
//Samples sizes (number of genes 2*diploids)
!!!dupe(str(?sample_size), ?pops_per_group*3)!!!
//Growth rates	: negative growth implies population expansion
!!!dupe('0', ?pops_per_group*3)!!!
//Number of migration matrices : 0 implies no migration between demes
4
//multimig
!!!generate_goatsie_all_mat(?pops_per_group,?pop_size,?minternal,?me,?mimi)!!!
//mig3
!!!generate_goatsie_three_mat(?pops_per_group,?me,?mimi)!!!
//mig2
!!!generate_goatsie_two_mat(?pops_per_group,?me)!!!
//nothing
!!!generate_null_mat(?pops_per_group*3)!!!
//historical event: time, source, sink, migrants, new deme size, new growth rate, migration matrix index
!!!?pops_per_group*3 - 3 + 5!!! historical events
4000 0 0 1 50 0 3
?expand_gen 0 0 1 0.2 0 3
?split_gen 1 0 1 2 0 3
?split_europe_gen 1 1 1 !!!3.0*?neb/(2*?pop_size)!!! 0 2
!!!?split_europe_gen-1!!! 2 1 1 1 0 2
!!!generate_goatsie_sub_pop_events(?pops_per_group,?hier_split_gen)!!!

Some of the functions called are built-in on m4s2, the most useful case is dupe, which replicates a certain string a number of times (this is quite useful when the number of populations is a parameters, making the number of lines unknown on template construction).

Some others, like generate_goatsie_all_mat have to be supplied by the model maker, please see the subsection below.

The Jython extension file

In case of need an extension file (extension.py) can be created. This file can include any Jython code and do literally anything. It is expected that any code called returns a string (which will be replaced in the model).

Here is on of the functions used in this example:

def generate_goatsie_three_mat(pops_per_group, me, mi):
    goatsie_mat = ''
    total_size = pops_per_group*3
    for x in range(1, total_size + 1):
        for y in range(1, total_size + 1):
            if x==1 and y==2:
                goatsie_mat += str(me) + ' '
            elif x==2 and y==1:
                goatsie_mat += str(me) + ' '
            elif x==2 and y==3:
                goatsie_mat += str(mi) + ' '
            elif x==3 and y==2:
                goatsie_mat += str(mi) + ' '
            else:
                goatsie_mat += '0   '
        goatsie_mat += '\r\n'
    return goatsie_mat

Existing support functions

The following functions are available built-in (more can be added in the future if the seem to be useful in a wide scope)

Packaging

After creating the models, these have to be packaged.

The first file to be is the list of models, here is an example

Model list file
Domestication and humans
http://popgen.eu/soft/m4s2/ext_models/goat4/

The first line has to be Model list file

The second line is an human readable identifier

Each extra line points to an URL where a model can be found, on that URL the files properties, model.par, image.png, extension.py (the last one being optional) have to be found.

This list file has to be put somewhere on webserver. The URL to the list file is the one to be supplied to m4s2.

An example can be found here