Run Mapper as a library

Mapper Definition

There is one class for Mapper that runs all types of configurations. The library is applied on a 2-dimensional matrix and the configuration option is determined by the options struct class passed in. Check the documentation below to see all possible options for calling the Mapper function.

For simpler usage, you can use one of the preset configuration options.

Check Tutorial 1 for a step-by-step guide on how to using the library.

code.mapper.mapper()

Main function to run the Mapper Algorithm on (time x features) data

Mapper is running multiple steps, named and defined below as transformations on the data:

input data X

-> [1] (optional) preprocessing

-> [2] distance: pair-wise distance calculation

-> [3] rknn: Reciprocal k-Nearest Neighbors

-> [4] embed: Embedding into a smaller space

-> [5] binning: Segmenting the space into bins

-> [6] clustering: Partial Clustering to group data points into nodes

-> [7] finalgraph: Create a final graph by adding edges between nodes

Each step is parameterized and configurable through the input options argument. A step can be activated if the option named <step>_type is not of the value “none”. Below we define the options, and the possible values that each can take:

Parameters:
  • X (2D matrix) – the input data. The rows will be reduced considering the features set in the columns (e.g., rows are observations TRs, columns are variables ROIs)

  • options (struct) – the struct defining the parameteres to run Mapper. Defined below:

  • options.preprocess_type ("none" | "PCA" | "custom") – defines the preprocessing type. Activated if preprocess_type ~= “none”.

  • options.preprocess_varexpl (number between 50.0 and 100.0) – defines the number of dimensions to take (based on target variance to be explained). Used if preprocess_type == “PCA”.

  • options.preprocess_func (function) – custom function to run on data. Used if prereprocess_type == “custom”.

  • options.dist_type (distance type) – defines the distance type to be used to compute the pair-wise distances between datapoints. Common examples: “cityblock”, “euclidean”, “correlation”. For all examples, check MATLAB documentation of “pdist2”.

  • options.dist_func (function) – custom function to use as pair-wise distance function. Used if dist_type == “custom”.

  • options.prelens_type ("dist" | "original" | "preprocessed" | "bin-pen" | "wtd-pen") – defines the way to compute the input to the lens function. Either use reciprocal k-Nearest Neighbor algorithm, or use the original, preprocessed, or distances input data. It uses a reciprocal kNN if lens_type in {“wtd-pen”, “bin-pen”}.

  • options.prelens_rknnparam (number) – defines the number of neighbors (k) to use as parameter for the reciprocal kNN algorithm (PKNNG). Used if rknn is the either weighted or binarized penalized algorithm: lens_type in {“wtd-pen”, “bin-pen”}.

  • options.prelens_rknn_directed (boolean) – defines if reciprocal kNN constructs a directed (time-based) graph. Used if rknn is the weighted or binarized penalized algorithm: lens_type in {“wtd-pen”, “bin-pen”}. [Currently not implemented]

  • options.embed_type ("none" | "custom" | "MDS" | "CMDS" | "PCA" | "DiffusionMaps" | "ManifoldChart" | "Isomap" | "LLE" | "HessianLLE" | "Laplacian" | "LTSA" | "SNE" | "tSNE" | "UMAP") – defined the embedding algorithm to use to reduce the datapoint space to fewer dimensions. Activated if embed_type ~= “none”.

  • options.embed_dim (number) – defines the number of dimensions in the final embedding space. Used if embed_type ~= “none” AND embed_type ~= “custom”.

  • options.embed_kparam (number) – defines the k-param if embed_type is one of “Isomap” | “LLE” | “HessianLLE” | “Laplacian” | “LTSA” | “UMAP”

  • options.embed_perplexity (number) – defines the perplexity needed for SNE embedding algorithms. Used if emed_type is one of “SNE” | “tSNE”.

  • options.embed_func (function) – custom function to use as embedding algorithm. Used if embed_type == “custom”.

  • options.embed_postproc ("identity" | "time") – defines the post-processing type. If embed_postproc == “time”, then time is added as a dimension for each datapoint.

  • options.binning_type ("Nd" | "cball") – defines the binning strategy. If binning_type == “Nd”, then the N-dimensional space is segmented based on polygons or hypercubes. This option requires that the embedding step was ran and the space is reduce to N < 9 dimensions. If binning_type == “cball”, then the reciprocal kNN graph is segmented into landmarks to construct a cover of the space. This option needs embed_type == “none”.

  • options.binning_resolution (number) – defines the resolution of the binning algorithm. For binning_type == “Nd”, the resolution is the number of polygons or hypercubes in one dimension. For binning_type == “cball”, the resolution is the number of landmarks.

  • options.binning_gain (number between 0.0 and 100.0) – defines the percentage of overlap of the bins.

  • options.binning_nsides (number) – defines the number of sides for the polygon used for binning. If binning_nsides ~= 4, then the input embeded space needs to have exactly 2 dimensions.

  • options.clustering_type ("none" | "custom" | "DBSCAN" | "linkage") – defines the clustering algorithm. This step is activated if clustering_type ~= “none”.

  • options.clustering_linkage ("average" | "complete" | "single") – defines the linkage used the clustering algorithm. Used if clustering_type == “linkage”.

  • options.clustering_histo_bins (number) – defines the number of histogram bins to be used by the clustering algorithm. Used if clustering_type == “linkage”.

  • options.clustering_eps_type (string) – defines how to compute the epsilon used by the DBSCAN clustering algorithm. Used if clustering_type == “DBSCAN”. Options: ‘fixed’ for a fixed value; ‘elbow’ for getting epsilon based on elbow method of kNN distances; ‘median’ for getting epsilon based on the median of kNN distances

  • options.clustering_eps_arg (number) – defines the argument used for getting and epsilon used by the DBSCAN clustering algorithm. Used if clustering_type == “DBSCAN”. if clustering_eps_type == ‘fixed’, then it represents the epsilon value to be used. if clustering_eps_type == ‘elbow’ or ‘median’, then it represents the knn value used for finding a knn value.

  • options.clustering_minpts (number) – defined the minimum points to be taken for a cluster. Used if clustering_type == “DBSCAN”

  • options.clustering_func (function) – custom function to use as clustering algorithm. Used if clustering_type == “custom”.

  • options.finalgraph_type ("full" | "neighbors") – defines the construction of the final output graph. If finalgraph_type == “neighbors”, then the edges added are restricted to neighbors.

  • options.verbose (boolean) – print runtime debugging comments if verbose == true. Defaults to false.

  • options.low_mem (boolean) – return all intermediary steps as part of the output, if low_mem == false. Defaults to true.

Returns res:

a struct containing the resulting Mapper Graph

  • adjacencyMat: (nodes x nodes) links between Mapper nodes

  • memberMat: (nodes x TRs) logical matrix if node contains TR

  • nodeMembers: (nodes x list<TRs>) cell list for the TRs of each node

  • options: a struct of options used to generate the results

  • Samir Chowdhury

  • Caleb Geniesse

  • Manish Saggar

Preset configurations

You can use preset configurations to easily set all required Mapper parameters into default configurations. This is the easiest way to use the library.

For example, to run a simple mapper algorithm, once you have a 2-dimensional matrix data.

opts = BDLMapperOpts(32, 20, 70);
res = mapper(data, opts);

BDLMapper

function options = BDLMapperOpts(k, resolution, gain, verbose)

if nargin < 4
    verbose = false;
end

options = struct;
options.verbose=verbose;
options.preprocess_type = 'none';
options.dist_type = 'euclidean';
options.prelens_type = 'wtd-pen';
options.prelens_rknnparam = k;
options.embed_type = 'CMDS';
options.embed_dim = 2;
options.binning_type = 'Nd';
options.binning_resolution = resolution;
options.binning_gain = gain;
options.binning_nsides = 4;
options.clustering_type = 'linkage_histo';
options.clustering_histo_bins = 10;
options.finalgraph_type = 'full';

end

NeuMapper

function options = NeuMapperOpts(k, resolution, gain, verbose)

if nargin < 4
    verbose = false;
end

options = struct;
options.verbose=verbose;
options.preprocess_type = 'none';
options.dist_type = 'cityblock';
options.prelens_type = 'wtd-pen';
options.prelens_rknnparam = k;
options.embed_type = 'none';
options.binning_type = 'cball';
options.binning_resolution = resolution;
options.binning_gain = gain;
options.clustering_type = 'linkage_histo';
options.clustering_histo_bins = 10;
options.finalgraph_type = 'full';


end

KeplerMapper

function options = KeplerMapperOpts(resolution, gain, dist_type, eps_kvalue, verbose)

if nargin < 3
    dist_type = 'euclidean';
end

if nargin < 4
    eps_kvalue = 50;
end

if nargin < 5
    verbose = false;
end

options = struct;
options.verbose=verbose;
options.preprocess_type = 'none';
options.dist_type = dist_type;
options.prelens_type = 'preprocessed';
options.embed_type = 'tSNE';
options.embed_dim = 2;
options.embed_perplexity = 50;
options.binning_type = 'Nd';
options.binning_resolution = resolution;
options.binning_gain = gain;
options.binning_nsides = 4;
options.clustering_type = 'DBSCAN';
options.clustering_eps_type = 'median';
options.clustering_eps_arg = eps_kvalue;
options.clustering_minpts = 2;
options.finalgraph_type = 'neighbors';

end