We have tried to break down a typical process of multivariate data analysis, in
trying to identify key components. We then built a fully object-oriented toolbox,
with an object fitting each of those key components.

**Data objects**. We have identified three entities, which are the building
blocks of any multivariate data process. The *sampleset* object
carries information about the different samples, also called observations, conditions, or experiments.
*grouping* object carries information about labeling
of the samples, i.e., their association with specific clusters. The measurements themselves are in
a *data* matrix. The *datamatrix* object is the general
framework of a datamatrix, from which more specialized data matrices are derived by object-oriented
inheritence. These more specialized data matrices encompass most of the data organization forms
one may encounter. The *vsmatrix* object describes
a rectangular two-way matrix of variables-by-samples. For example, a result of a gene
array experiment in the form of genes-by-conditions will be represented in our toolbox by a
*vsmatrix* object. The *ssmatrix* object describes
relationships between samples. For example, a distance matrix will be represented in our toolbox
as a *ssmatrix*. The *vvmatrix* object describes
relationships between variables. For example, a correlation matrix will be represented in our
toolbox as a *vvmatrix*.

**Graph theory**. The *graph* object
describes a general mathematical graph. This toolbox includes more specific graphs
(such as *digraphs* and *trees*) that are derived from this general object.

**Pairwise data objects**. These objects describe specific
forms of pairwise data, and are all derived from either *ssmatrix* or
*vvmatrix* (see above). The *covmatrix*
object describes a covariance or a correlation matrix. The
*distmatrix* object describes a distance matrix.
The *dissimatrix* and
*simatrix* objects describe pairwise dissimilarity
or similarity information, respectively.

**Dimensionality reduction algorithms**. Each object in this group
stands for a particular dimensionality reduction technique. Currently available are
*pcatrans* that makes principal component
analysis (PCA), *wpcatrans* that makes
weighted principal component analysis (wPCA), and
*fishtrans* that identifies discriminant
direction according to the Fisher linear discriminant analysis.

**Statistics**. This portion of the toolbox includes
general statistical functions, mainly various hypothesis testing procedures, as
well as the object *ctable* that describes
a contingency table.

- grouping
- labeling of the data according to a classification scheme
- sampleset
- information about samples (observations, conditions, experiments)
- variable
- information about variables (features, coordinates)

- datamatrix
- a two-way matrix object
- dataset
- repository of information regarding a certain dataset
- graph
- general undirected graph

- covmatrix
- covariance matrix (inherits
*vvmatrix*) - dissimatrix
- dissimilarity matrix (inherits
*ssmatrix*) - distmatrix
- distance matrix (inherits
*ssmatrix*) - simatrix
- similarity matrix (inherits
*ssmatrix*)

- lintrans
- general dimensionality reduction by linear transformation

- ctable
- contingency table

- multinom
- computes the multinomial coefficient.

- lineup
- ranks a vector in increasing order.
- majority
- finds the most frequent entry.
- subs_incomp_data
- substitue given data in an incompleted data array
- subsample
- picks up at random a subsample of a vector.
- substitute
- substitutes values in a list with a different set of values.

- chowliu
- applies the Chow-Liu algorithm.
- code2dag
- finds the DAG associated with a DAG-code.
- code2rank
- finds the rank of a DAG-code.
- dispdagcode
- displays a DAG code to the screen.
- enumdagcodes
- enumerates all DAG codes for a fixed number of nodes.
- enummarkovclasses
- enumerates all DAG codes for a fixed number of nodes.
- nodags
- computes the number of DAGs with fixed number of nodes.
- rank2code
- finds the DAG-code whose rank is {r}.
- thd2wgt
- computes, given THD, a default weight matrix.
- wgt2thd
- computes, given weights, a default THD matrix.

- group
- turns a list into assignment vector and naming cell array.

- testbinom
- computes the p-value of testing a binomial parameter.
- testchi2hist
- uses the chi2 test to compare a histograms to a standard.
- testchi2hists
- uses the chi2 test to compare two histograms.
- testchi2independence
- computes the p-value of independence hypothesis.
- testfisherexact
- computes the p-value of Fisher's exact test (© A. Trujillo-Ortiz et al.).
- testfisheromnibus
- computes p-value for the Fisher Omnibus test.
- testkshist
- uses KS test to compare a histograms to a standard.
- testkshists
- uses KS test to compare two histograms.
- testmultinom
- computes the p-value of testing multinomial parameters.

- centropy
- computes the conditional entropy between two variables.
- emutualentropy
- estimates pairwise mutual entropy
- entropy
- computes the entropy of a distribution.
- kdiv
- computes the K-divergence between distributions p and q.
- kl
- computes the relative entropy.
- ldiv
- computes the L-divergence between distributions p and q.

- fa_engin
- performs factor analysis on the data.
- factorscores
- estimate the scores after factor analysis.
- fish_engin
- performs Fisher transformation of a grouped dataset.
- pca_engin
- performs PCA analysis on the data.

- distmat
- calculates distance matrix.

- regress1d
- linearly regress one variable on another.

- allstats
- computes all common statistics (© D.C. Hanselman).
- fdr
- calculates
- kendall
- computes the Kendall rank correlation matrix.
- pearson
- computes the Pearson (linear) correlation matrix.
- spearman
- computes the Spearman rank correlation matrix.

- scatter2d_engin
- the engine used for 2D scatter plots.
- scatter3d_engin
- the engine used for 3D scatter plots.

The toolbox is freely available from this site. The latest release is MVA_13Sep2010.

**Prerequisite**:
the toolbox General Utilities.

