Grouping object (Multivariate Analysis Toolbox for MatlabŪ)

written by: Liran Carmel

Last modified: 14:53, Mon 6-Sep-2010

General Description
Holds information about different clusters (classes, groups, labels) associated with a sampleset object. Each cluster is identified by a unique positive integer, called the group identification number (GID). For example, suppose that we have a sampleset of six human subjects, where numbers 1 and 4 are seniors, number 2 is infant, and numbers 3, 5 and 6 are adults. Then, their assignment vector may look like
     [9 1 5 9 5 5]
where 1 is the GID of the group 'infant', 5 is the GID of the group 'adult', and 9 is the GID of the group 'senior'. Another important number that we associate with an assignment vector is the group consecutive number (GCN), which is just an integer that determines the location of each group in a list of groups sorted by their GID. In our example, the GCN vector would be [1 2 3] for the three groups 'infant', 'adult' and 'senior', in that order, corresponding to their sorted GIDs [1 5 9].

If the GCN vector is identical to the GID vector, the grouping is called consistent. Otherwise, it is inconsistent. The transformation between the two vectors is achieved by the two vectors GID2GCN and GCN2GID. GID2GCN is a vector of length max(GID), where GID2GCN(ii) is the GCN of GID ii. NaN in entry ii means that GID ii is nonexistent. In the above example the vector GID2GCN is
     [1 NaN NaN NaN 2 NaN NaN NaN 3].
GCN2GID is the inverse transformation, a vector of length max(GCN), where GCN2GID(ii) is the GID of the ii'th group. In the above example the vector GCN2GID is [1 5 9]. Actually, the two vectors are related by GCN2GID = find(~isnan(GID2GCN)).

A grouping may have several hierarchies. A hierarchy is a division (finer division) of a grouping. In our example, a finer hierarchy may be
     [10 1 5 10 7 5]
where 1 is the GID of 'infant', 5 is the GID of 'male adult', 7 is the GID of 'female adult', and 10 is the GID of 'senior'. Hierarchy A is called coarser than hierarchy B (and hierarchy B is called finer than hierarchy A) if B is a subdivision of A (B includes more groups than A). All hierarchies are assumed to be compatible with the first, coarsest, one (but not with each other), and it is the sole responsibility of the user to verify that. An incompatible hierarchy in our example might be [6 1 5 9 5 6], as 6 is assigned to two samples that have different coarser grouping.

Navigate to:    General Description    Class Structure    Class Construction    Class Functions

Class Structure
Each field can be accessed by the dot (.) operation, or by the GET function. The GET function can work on multiple instances simultaneously. Most fields, except for those that are Dependent, can be modified using the dot (.) operation, or by the SET function.
    Field Description Type Default Dedicated Get/Set Function  
    name name of object, should be short and used as identifier. This field will never be empty. string 'unnamed'    
    description verbal description of the class content. string ''    
    source verbal description of the source of information. string ''    
    assignment a mapping between samples to their associated group labeling, in a form of h-by-n matrix. assignment(hh,ii) is the hh-hierarchy group associated with the ii'th sample. double (positive integers, NaNs) []    
    naming group names, naming{hh}{gg} being the name of the gg'th group (GCN) in the hh'th hierarchy. double cell of strings {}    
    is_consistent indicator whether the grouping is consistent or not. Logical vector of length h. logical vector []    
    gid2gcn a mapping between the GID and the GCN, with gid2gcn{hh} being the vector mapping the GIDs of the groups in the hh'th hierarchy to their corresponding GCNs. cell of vectors of integers and NaNs []    
    gcn2gid a mapping between the GCN and the GID, with gcn2gid{hh} being the vector mapping the GCNs of the groups in the hh'th hierarchy to their corresponding GIDs. cell of vectors of integers []    
  Dependent no_samples number of samples, n. Set to zero for void objects. integer scalar 0 nosamples  
  Dependent no_hierarchies number of hierarchies, h. Set to zero for void objects. integer scalar 0 nohierarchies  
  Dependent no_groups the number of groups, gh, in each hierarchy h. Set to [] for void objects. integer vector [] nogroups  

Class Construction
Empty instance (scalar)
an empty grouping instance, with all fields initialized to their default values.
syntax: gr = grouping;
Empty instance (matrix)
a vector of empty grouping instances.
syntax: gr = grouping(size);
Copy constructor
one grouping instance is copied into another.
syntax: gr_destination = grouping(gr_origin);
Construction by field names
an instance is formed by directly providing field values. Any field which is not dependent is permitted.
syntax: gr = grouping(field_name, field_value, ...);
example: gr = grouping('name','demo group','description','no data inside');
Construction by assignment/naming pairs
an instance is formed by providing pairs of assignment matrix (of size h -by-n) and the corresponding naming.
syntax: gr = grouping(assignment, naming, ...);

List of Functions

Coloring and colormaps:

generates group-dependent color indices.


displays class content.
displays basic information on the groups.

Group-wise computations:

computes intra-group covariances.
computes the inter-group covariance matrix.
computes the average intra-group covariance matrix.
computes intra-group means
computes intra-group medians
computes intra-group standard deviations.
computes intra-group variances


tests whether grouping distributions are different.

Information extraction:

entropy of a GROUPING object.
finds the GCN of a group by its name.
finds the GID of a group by its name.
retrieves the group names of desired samples.
finds the size of each group.
extract sample numbers of specific groups.
information content of a grouping.
list the indices of those samples whose grouping is known.
finds the number of samples whose grouping is known.
finds the number of samples whos grouping is unknown.
list the indices of those samples whose grouping is unknown.


checks the consistency of hierarchies.


forms a "super-grouping" out of two groupings.
joins two GROUPING objects.

SET/GET functions:

get method
query about the consistency of the GROUPING instance.
finds number of groups.
returns the number of hierarchies in a GROUPING instance
finds the number of samples in a GROUPING instance.
set method


turns inconsistent assignments into consistent ones.
eliminate samples from a GROUPING instance.
turns known groups into unknowns.
extracts a single 1-level grouping.
merges two or more groups in a grouping.
assigns numerical values to categories (labeling).
reorders the assignment vector(s).
index the unknown class.
builds a grouping with a subset of the hierarchies.
substitutes a subset of GIDs by another subset.
makes redundant groups in a test set unknowns.
keeps only a portion of the groupings, merging all the others.


pie plot of the group sizes.
shows one grouping spliced by another.