Grouping object (Multivariate Analysis Toolbox for MatlabŪ)

written by: Liran Carmel

Last modified: 14:53, Mon 6-Sep-2010

General Description
Holds information about different clusters (classes, groups, labels) associated with a sampleset object. Each cluster is identified by a unique positive integer, called the group identification number (GID). For example, suppose that we have a sampleset of six human subjects, where numbers 1 and 4 are seniors, number 2 is infant, and numbers 3, 5 and 6 are adults. Then, their assignment vector may look like
     [9 1 5 9 5 5]
where 1 is the GID of the group 'infant', 5 is the GID of the group 'adult', and 9 is the GID of the group 'senior'. Another important number that we associate with an assignment vector is the group consecutive number (GCN), which is just an integer that determines the location of each group in a list of groups sorted by their GID. In our example, the GCN vector would be [1 2 3] for the three groups 'infant', 'adult' and 'senior', in that order, corresponding to their sorted GIDs [1 5 9].

If the GCN vector is identical to the GID vector, the grouping is called consistent. Otherwise, it is inconsistent. The transformation between the two vectors is achieved by the two vectors GID2GCN and GCN2GID. GID2GCN is a vector of length max(GID), where GID2GCN(ii) is the GCN of GID ii. NaN in entry ii means that GID ii is nonexistent. In the above example the vector GID2GCN is
     [1 NaN NaN NaN 2 NaN NaN NaN 3].
GCN2GID is the inverse transformation, a vector of length max(GCN), where GCN2GID(ii) is the GID of the ii'th group. In the above example the vector GCN2GID is [1 5 9]. Actually, the two vectors are related by GCN2GID = find(~isnan(GID2GCN)).

A grouping may have several hierarchies. A hierarchy is a division (finer division) of a grouping. In our example, a finer hierarchy may be
     [10 1 5 10 7 5]
where 1 is the GID of 'infant', 5 is the GID of 'male adult', 7 is the GID of 'female adult', and 10 is the GID of 'senior'. Hierarchy A is called coarser than hierarchy B (and hierarchy B is called finer than hierarchy A) if B is a subdivision of A (B includes more groups than A). All hierarchies are assumed to be compatible with the first, coarsest, one (but not with each other), and it is the sole responsibility of the user to verify that. An incompatible hierarchy in our example might be [6 1 5 9 5 6], as 6 is assigned to two samples that have different coarser grouping.

Navigate to:    General Description    Class Structure    Class Construction    Class Functions

Class Structure
Each field can be accessed by the dot (.) operation, or by the GET function. The GET function can work on multiple instances simultaneously. Most fields, except for those that are Dependent, can be modified using the dot (.) operation, or by the SET function.
    Field Description Type Default Dedicated Get/Set Function  
    name name of object, should be short and used as identifier. This field will never be empty. string 'unnamed'    
    description verbal description of the class content. string ''    
    source verbal description of the source of information. string ''    
    assignment a mapping between samples to their associated group labeling, in a form of h-by-n matrix. assignment(hh,ii) is the hh-hierarchy group associated with the ii'th sample. double (positive integers, NaNs) []    
    naming group names, naming{hh}{gg} being the name of the gg'th group (GCN) in the hh'th hierarchy. double cell of strings {}    
    is_consistent indicator whether the grouping is consistent or not. Logical vector of length h. logical vector []    
    gid2gcn a mapping between the GID and the GCN, with gid2gcn{hh} being the vector mapping the GIDs of the groups in the hh'th hierarchy to their corresponding GCNs. cell of vectors of integers and NaNs []    
    gcn2gid a mapping between the GCN and the GID, with gcn2gid{hh} being the vector mapping the GCNs of the groups in the hh'th hierarchy to their corresponding GIDs. cell of vectors of integers []    
  Dependent no_samples number of samples, n. Set to zero for void objects. integer scalar 0 nosamples  
  Dependent no_hierarchies number of hierarchies, h. Set to zero for void objects. integer scalar 0 nohierarchies  
  Dependent no_groups the number of groups, gh, in each hierarchy h. Set to [] for void objects. integer vector [] nogroups  

Class Construction
Empty instance (scalar)
an empty grouping instance, with all fields initialized to their default values.
syntax: gr = grouping;
Empty instance (matrix)
a vector of empty grouping instances.
syntax: gr = grouping(size);
Copy constructor
one grouping instance is copied into another.
syntax: gr_destination = grouping(gr_origin);
Construction by field names
an instance is formed by directly providing field values. Any field which is not dependent is permitted.
syntax: gr = grouping(field_name, field_value, ...);
example: gr = grouping('name','demo group','description','no data inside');
Construction by assignment/naming pairs
an instance is formed by providing pairs of assignment matrix (of size h -by-n) and the corresponding naming.
syntax: gr = grouping(assignment, naming, ...);

List of Functions

Coloring and colormaps:

grp2col
generates group-dependent color indices.

Display:

show
displays class content.
showsummary
displays basic information on the groups.

Group-wise computations:

gcov
computes intra-group covariances.
gcovinter
computes the inter-group covariance matrix.
gcovintra
computes the average intra-group covariance matrix.
gmean
computes intra-group means
gmedian
computes intra-group medians
gstd
computes intra-group standard deviations.
gvar
computes intra-group variances

Inference:

testdistributions
tests whether grouping distributions are different.

Information extraction:

entropy
entropy of a GROUPING object.
gname2gcn
finds the GCN of a group by its name.
gname2gid
finds the GID of a group by its name.
groupname
retrieves the group names of desired samples.
groupsize
finds the size of each group.
grp2samp
extract sample numbers of specific groups.
infocontent
information content of a grouping.
knowns
list the indices of those samples whose grouping is known.
noknowns
finds the number of samples whose grouping is known.
nounknowns
finds the number of samples whos grouping is unknown.
unknowns
list the indices of those samples whose grouping is unknown.

Maintenance:

checkhierarchies
checks the consistency of hierarchies.

Operators:

mtimes
forms a "super-grouping" out of two groupings.
plus
joins two GROUPING objects.

SET/GET functions:

get
get method
isconsistent
query about the consistency of the GROUPING instance.
nogroups
finds number of groups.
nohierarchies
returns the number of hierarchies in a GROUPING instance
nosamples
finds the number of samples in a GROUPING instance.
set
set method

Transformations:

consistent
turns inconsistent assignments into consistent ones.
deletesamples
eliminate samples from a GROUPING instance.
grp2unknowns
turns known groups into unknowns.
isolate
extracts a single 1-level grouping.
mergegroups
merges two or more groups in a grouping.
nom2num
assigns numerical values to categories (labeling).
shuffle
reorders the assignment vector(s).
specifyunknowns
index the unknown class.
subgrouping
builds a grouping with a subset of the hierarchies.
substitute
substitutes a subset of GIDs by another subset.
testgrouping
makes redundant groups in a test set unknowns.
zoom
keeps only a portion of the groupings, merging all the others.

Visualization:

pie
pie plot of the group sizes.
slice
shows one grouping spliced by another.