Process Mappers in batch

We always want to run Mapper with multiple configurations and see the final result. The tool implemented here is a helper tool to run a batch of Mappers on data.

These are the main parts of processing one Mapper for one data item (subject):

Load data item from a cohort file into a matrix. Detailed here
Preprocess the matrix. Detailed here
Run a Mapper configuration on our matrix
Run all analysis steps. Detailed here

The run_main() script runs in parallel the steps above for all data items (subjects) and for all mapper configurations.

Usage Example

Outside of tests and small experiments, the process is run from command line, where the inputs are set as variables:

Example command line call to run_main

matlab -r "cohort_csv='...'; config_path='...'; run('/.../demapper/code/analysis/run_main.m'); "

Below, I present a more detailed example with all the required arguments set:

Example script (BASH / SBATCH)

PROJECT_ROOT=/Users/daniel/project
CONFNAME=mappers_v0.1.json

MATLAB_ARGS=""
MATLAB_ARGS="${MATLAB_ARGS} poolsize=8;"
MATLAB_ARGS="${MATLAB_ARGS} cohort_csv='${PROJECT_ROOT}/cohort_mapper.csv';"
MATLAB_ARGS="${MATLAB_ARGS} config_path='${PROJECT_ROOT}/${CONFNAME}';"
MATLAB_ARGS="${MATLAB_ARGS} data_root='${PROJECT_ROOT}/data/';"
MATLAB_ARGS="${MATLAB_ARGS} output_dir='${PROJET_ROOT}/results/${CONFNAME}';"

DEMAPPER_MAIN="${PROJECT_ROOT}/demapper/code/analysis/run_main.m"

# write command, submit, wait
CMD="matlab -r \"${MATLAB_ARGS} run('$DEMAPPER_MAIN')\"";
echo $CMD;
eval $CMD;
wait

where we have the following files:

cohort_mapper.csv

id0,id1,id2,path,TR
SBJ99,,,SBJ99_BOLD.npy,1.5

mappers_v0.1.json

{
  "preprocess": [
    { "type": "zscore" }
  ],
  "mappers": [{
    "type": "BDLMapper",
    "k": 32,
    "resolution": [20, 40],
    "gain": 60
  },
  {
    "type": "NeuMapper",
    "k": 16,
    "resolution": 100,
    "gain": [70, 80, 90]
  }],
  "analyses": [
    { "type": "plot_graph" },
    { "type": "compute_stats","args": { "HRF_threshold": 11 } },
    { "type": "compute_temp" }
  ]
}

The JSON configuration above will generate five Mapper results for each individual item in cohort file. Two BDLMapper results for 2 resolutions: 20 and 40, finally named: BDLMapper_32_20_60 and BDLMapper_32_40_60. Three NeuMapper results for 3 gain parameters: 70, 80, and 90, similarly named.

This configuration runs 1 preprocessing step (zscore) and 3 analysis steps: plot_graph, compute_stats, and compute_temp. Check the run_preprocess function and the run_analysis function for detailed descriptions on possible preprocessing and analysis possible.

Usage

code.analysis.run_main()

RUN_MAIN runs mapper on a large cohort of items based on the config. The mapper config is defined at config_path.

The items (cohort) to be processed it defined by each line of the CSV file at path cohort_csv. Each item’s path is relative to the the path data_root.

Finally, the output is saved at the path output_dir

Parameters:

cohort_csv – the path to a CSV file with the header: id0,id1,id2,path the id`s can be any string and the path is the location of the 1D file used for processing Mapper. The path is relative to the `data_root path.
data_root – the absolute root directory to the paths defined in the CSV file: cohort_csv
config_path – the path to the config file that defines what mappers and analysis to be used. Some example configurations are at tests/fixtures/config*.json or below. This configuration also defines the preprocessing step.
output_dir – the absolute root directory of where to save the results of the mappers and the analysis.
poolsize – (optional) number of threads to use for parallelizing the process and analysis. This is highly recommended for big jobs. Usually its set as the number of CPU cores.
rerun_analysis – (optional) if this is set, the process will use pre-computed mapper results from a previous run and reruns only the specified analysis. Possible Values are: “plot_graph”, “plot_task”, “compute_stats”, “compute_temp”. Or check run_analysis function for the latest.
rerun_uncomputed – (optional) if this boolean option is set, the process will rerun all items of the cohort that either failed or didn’t finish successfully based on the status.csv file.

Output structure:

The result of an individual item will be saved at: id0/id1/id2/MapperID/, where:

Each id (id0, id1, id2) is taken from the cohort_csv for the item

The id MapperID is generated from the specific mapper from config_path

Moreover, there is a status file saved at config_name/status.csv, that displays the status of the process running all mappers.

Process Mappers in batch

Usage Example

Usage

Individual Steps