Model selection from the command line with discriminatEM¶
Installation¶
The code is assumed to be run on a Linux environment with Python 3.6 or later installed. For example, the Anaconda Python 3.6 distribution can be used. It is installed via:
wget https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-x86_64.sh
bash Anaconda3-4.4.0-Linux-x86_64.sh
following the guided installation process.
Note
The Anaconda installer asks at the end of the installation whether to use Anaconda Python as the default Python:
Do you wish the installer to prepend the Anaconda3 install location
to PATH in your /home/username/.bashrc ? [yes|no]
[no] >>>
If this is positively answered, the path to the Anaconda installation is prepended to
the PATH
environment variable and subsequent calls to pip
(see below) use the Anaconda Python pip (check with the command
which pip
).
If the answer is no, it has to be manually ensured that the correct Python
installation is used.
Then, discriminatEM can be installed from the provided .tar.gz
file:
pip install discriminatEM-0.1.3.tar.gz
Optional: configuration of the parallel environment¶
SGE (UGE) like environments can be used by discriminatEM. However, they need to be configured properly. The following information is required
the path to an existing directory for temporary files
SGE/UGE queue name
SGE/UGE parallel environment name
IP address of a redis-server (installation instructions are provided below)
This information is then assembled in a configuration
file ~/.parallel
residing within the home directory.
The content should be similar to the following:
[DIRECTORIES]
TMP=/path/to/my/tmp
[SGE]
QUEUE=p.openmp
PARALLEL_ENVIRONMENT=openmp
PRIORITY=-500
[BROKER]
TYPE=REDIS
[REDIS]
HOST=WWW.XXX.YYY.ZZZ
The TMP
directory has to be replaced with an appropriate one and is used to store temporary job files.
Also, the values for the SGE QUEUE
and PARALLEL_ENVIRONMENT
have to be replaced.
Running:
qconf -sql
yields a list of all defined queues, from which one can be chosen for the QUEUE
.
Running:
qconf -spl
yields a list of all defined parallel environments, from which one can be chosen for the PARALLEL_ENVIRONMENT
.
Redis can be installed via:
conda install redis
and started with:
redis-server --protected-mode no
The IP address of the host on which redis is running is to be entered for the HOST
value:
WWW.XXX.YYY.ZZZ
is to be replaced by the IP of the host on which the redis server is running.
The IP address can be retrieved with the ifconfig
command.
Important
The redis-server has to be running throughout the complete ABC run. It manages the communication between the discriminatEM main process and the jobs started on the SGE/UGE cluster.
Running model selection¶
For example:
discriminatEM --noise-prior="beta(2, 10)" --noise="[0.2]"\
--subsampling="[0.9]" abcsmc.db
executes an ABCSMC model selection run with a Beta(2, 10) prior on connectome samples
perturbed with noise=0.2
and subsampling (fractional measurement) 0.9.
The results are stored in abcsmc.db
.
Additionally, a folder abcsmc.db.results
is created with confusion matrix plots.
The syntax for the noise prior follows the scipy.stats distributions.
However, a delta point prior can also be used:
discriminatEM --noise-prior=0 --noise="[0.2]"\
--subsampling="[0.9]" abcsmc.db
starts a run with no noise in the prior, but still applied to the connectome sample.
Note
The arguments --noise
and --subsampling
are lists.
Several values can be provided here. The full cross product of
provided noise and subsampling values is executed.
Examination of the results¶
Plots and text files are generated in
the directory abcsmc.db.result
(assuming that the chosen database name
was abcsmc.db
, in general, the path is <database>.results
).
Reproduction of Figures 4a, 4b and 4c of the manuscript¶
Figure 4a: noise-free:
discriminatEM --noise-prior=0 --noise="[0]" fig_4a.db
Figure 4b: noise of intensity 0.15 on the samples, but not in the prior:
discriminatEM --noise-prior=0 --noise="[0.15]" fig_4b.db
Figure 4c: Beta(2,10) prior and noise of intensity 0.15 on the samples:
discriminatEM --noise-prior="beta(2,10)" --noise="[0.15]" fig_4c.db
(The --subsampling
argument can be omitted since --subsampling=1
is the default
value and this is the value used for Figure 4)
Note
The model selection runs are stochastic, therefore the obtained results may vary from the ones in Fig. 4a-c. This is expected especially for the case of a noisy connectome under a noise-free prior (Fig. 4b).