Multivariate Analysis Toolbox for Matlab®

We have tried to break down a typical process of multivariate data analysis, in trying to identify key components. We then built a fully object-oriented toolbox, with an object fitting each of those key components.

Data objects. We have identified three entities, which are the building blocks of any multivariate data process. The sampleset object carries information about the different samples, also called observations, conditions, or experiments. grouping object carries information about labeling of the samples, i.e., their association with specific clusters. The measurements themselves are in a data matrix. The datamatrix object is the general framework of a datamatrix, from which more specialized data matrices are derived by object-oriented inheritence. These more specialized data matrices encompass most of the data organization forms one may encounter. The vsmatrix object describes a rectangular two-way matrix of variables-by-samples. For example, a result of a gene array experiment in the form of genes-by-conditions will be represented in our toolbox by a vsmatrix object. The ssmatrix object describes relationships between samples. For example, a distance matrix will be represented in our toolbox as a ssmatrix. The vvmatrix object describes relationships between variables. For example, a correlation matrix will be represented in our toolbox as a vvmatrix.

Graph theory. The graph object describes a general mathematical graph. This toolbox includes more specific graphs (such as digraphs and trees) that are derived from this general object.

Pairwise data objects. These objects describe specific forms of pairwise data, and are all derived from either ssmatrix or vvmatrix (see above). The covmatrix object describes a covariance or a correlation matrix. The distmatrix object describes a distance matrix. The dissimatrix and simatrix objects describe pairwise dissimilarity or similarity information, respectively.

Dimensionality reduction algorithms. Each object in this group stands for a particular dimensionality reduction technique. Currently available are pcatrans that makes principal component analysis (PCA), wpcatrans that makes weighted principal component analysis (wPCA), and fishtrans that identifies discriminant direction according to the Fisher linear discriminant analysis.

Statistics. This portion of the toolbox includes general statistical functions, mainly various hypothesis testing procedures, as well as the object ctable that describes a contingency table.

Navigate to: General Description List of Objects List of Functions Download

Core objects:

grouping: labeling of the data according to a classification scheme
sampleset: information about samples (observations, conditions, experiments)
variable: information about variables (features, coordinates)

Core data objects:

datamatrix

a two-way matrix object

ssmatrix: samples-by-samples two-way datamatrix (e.g., distance matrix)
vsmatrix: variable-by-samples two-way datamatrix (typical data matrix)
vvmatrix: variables-by-variables two-way datamatrix (e.g., correlation matrix)

dataset

repository of information regarding a certain dataset

graph

general undirected graph

digraph

general directed graph

tree

binary tree

bintree: binary tree

Pairwise data objects:

covmatrix: covariance matrix (inherits vvmatrix)
dissimatrix: dissimilarity matrix (inherits ssmatrix)
distmatrix: distance matrix (inherits ssmatrix)
simatrix: similarity matrix (inherits ssmatrix)

Dimensionality reduction algorithms:

lintrans

general dimensionality reduction by linear transformation

fishtrans: Fisher linear discriminant analysis
pcatrans: principal component analysis (PCA)
wpcatrans: weighted principal component analysis (wPCA)

Statistics:

ctable: contingency table

Combinatorics:

multinom: computes the multinomial coefficient.

Data manipulations:

lineup: ranks a vector in increasing order.
majority: finds the most frequent entry.
subs_incomp_data: substitue given data in an incompleted data array
subsample: picks up at random a subsample of a vector.
substitute: substitutes values in a list with a different set of values.

Graph Theory:

chowliu: applies the Chow-Liu algorithm.
code2dag: finds the DAG associated with a DAG-code.
code2rank: finds the rank of a DAG-code.
dispdagcode: displays a DAG code to the screen.
enumdagcodes: enumerates all DAG codes for a fixed number of nodes.
enummarkovclasses: enumerates all DAG codes for a fixed number of nodes.
nodags: computes the number of DAGs with fixed number of nodes.
rank2code: finds the DAG-code whose rank is {r}.
thd2wgt: computes, given THD, a default weight matrix.
wgt2thd: computes, given weights, a default THD matrix.

Grouping:

group: turns a list into assignment vector and naming cell array.

Hypothesis testing:

testbinom: computes the p-value of testing a binomial parameter.
testchi2hist: uses the chi2 test to compare a histograms to a standard.
testchi2hists: uses the chi2 test to compare two histograms.
testchi2independence: computes the p-value of independence hypothesis.
testfisheromnibus: computes p-value for the Fisher Omnibus test.
testkshist: uses KS test to compare a histograms to a standard.
testkshists: uses KS test to compare two histograms.
testmultinom: computes the p-value of testing multinomial parameters.

Information Theory:

centropy: computes the conditional entropy between two variables.
emutualentropy: estimates pairwise mutual entropy
entropy: computes the entropy of a distribution.
kdiv: computes the K-divergence between distributions p and q.
kl: computes the relative entropy.
ldiv: computes the L-divergence between distributions p and q.

Linear transformations:

fa_engin: performs factor analysis on the data.
factorscores: estimate the scores after factor analysis.
fish_engin: performs Fisher transformation of a grouped dataset.
pca_engin: performs PCA analysis on the data.

Pairwise Relationships:

distmat: calculates distance matrix.

Regression analysis:

regress1d: linearly regress one variable on another.

Statistics:

fdr: calculates
kendall: computes the Kendall rank correlation matrix.
pearson: computes the Pearson (linear) correlation matrix.
spearman: computes the Spearman rank correlation matrix.

Visualization:

scatter2d_engin: the engine used for 2D scatter plots.
scatter3d_engin: the engine used for 3D scatter plots.

Multivariate Analysis Toolbox for Matlab®

written by: Liran Carmel

Last modified: 13:14, Mon 13-Sep-2010

Core objects:

Core data objects:

Pairwise data objects:

Dimensionality reduction algorithms:

Statistics:

Combinatorics:

Data manipulations:

Graph Theory:

Grouping:

Hypothesis testing:

Information Theory:

Linear transformations:

Pairwise Relationships:

Regression analysis:

Statistics:

Visualization: