Grouping object (Multivariate Analysis Toolbox for MatlabŪ)

written by: Liran Carmel

Last modified: 14:53, Mon 6-Sep-2010

General Description
Holds information about different clusters (classes, groups, labels) associated with a sampleset object. Each cluster is identified by a unique positive integer, called the group identification number (GID). For example, suppose that we have a sampleset of six human subjects, where numbers 1 and 4 are seniors, number 2 is infant, and numbers 3, 5 and 6 are adults. Then, their assignment vector may look like
     [9 1 5 9 5 5]
where 1 is the GID of the group 'infant', 5 is the GID of the group 'adult', and 9 is the GID of the group 'senior'. Another important number that we associate with an assignment vector is the group consecutive number (GCN), which is just an integer that determines the location of each group in a list of groups sorted by their GID. In our example, the GCN vector would be [1 2 3] for the three groups 'infant', 'adult' and 'senior', in that order, corresponding to their sorted GIDs [1 5 9].

If the GCN vector is identical to the GID vector, the grouping is called consistent. Otherwise, it is inconsistent. The transformation between the two vectors is achieved by the two vectors GID2GCN and GCN2GID. GID2GCN is a vector of length max(GID), where GID2GCN(ii) is the GCN of GID ii. NaN in entry ii means that GID ii is nonexistent. In the above example the vector GID2GCN is
     [1 NaN NaN NaN 2 NaN NaN NaN 3].
GCN2GID is the inverse transformation, a vector of length max(GCN), where GCN2GID(ii) is the GID of the ii'th group. In the above example the vector GCN2GID is [1 5 9]. Actually, the two vectors are related by GCN2GID = find(~isnan(GID2GCN)).

A grouping may have several hierarchies. A hierarchy is a division (finer division) of a grouping. In our example, a finer hierarchy may be
     [10 1 5 10 7 5]
where 1 is the GID of 'infant', 5 is the GID of 'male adult', 7 is the GID of 'female adult', and 10 is the GID of 'senior'. Hierarchy A is called coarser than hierarchy B (and hierarchy B is called finer than hierarchy A) if B is a subdivision of A (B includes more groups than A). All hierarchies are assumed to be compatible with the first, coarsest, one (but not with each other), and it is the sole responsibility of the user to verify that. An incompatible hierarchy in our example might be [6 1 5 9 5 6], as 6 is assigned to two samples that have different coarser grouping.

Navigate to:    General Description    Class Structure    Class Construction    Class Functions

Class Structure
Each field can be accessed by the dot (.) operation, or by the GET function. The GET function can work on multiple instances simultaneously. Most fields, except for those that are Dependent, can be modified using the dot (.) operation, or by the SET function.
    Field Description Type Default Dedicated Get/Set Function  
    name name of object, should be short and used as identifier. This field will never be empty. string 'unnamed'    
    description verbal description of the class content. string ''    
    source verbal description of the source of information. string ''    
    assignment a mapping between samples to their associated group labeling, in a form of h-by-n matrix. assignment(hh,ii) is the hh-hierarchy group associated with the ii'th sample. double (positive integers, NaNs) []    
    naming group names, naming{hh}{gg} being the name of the gg'th group (GCN) in the hh'th hierarchy. double cell of strings {}    
    is_consistent indicator whether the grouping is consistent or not. Logical vector of length h. logical vector []    
    gid2gcn a mapping between the GID and the GCN, with gid2gcn{hh} being the vector mapping the GIDs of the groups in the hh'th hierarchy to their corresponding GCNs. cell of vectors of integers and NaNs []    
    gcn2gid a mapping between the GCN and the GID, with gcn2gid{hh} being the vector mapping the GCNs of the groups in the hh'th hierarchy to their corresponding GIDs. cell of vectors of integers []    
  Dependent no_samples number of samples, n. Set to zero for void objects. integer scalar 0 nosamples  
  Dependent no_hierarchies number of hierarchies, h. Set to zero for void objects. integer scalar 0 nohierarchies  
  Dependent no_groups the number of groups, gh, in each hierarchy h. Set to [] for void objects. integer vector [] nogroups  

Class Construction
Empty instance (scalar)
an empty grouping instance, with all fields initialized to their default values.
syntax: gr = grouping;
Empty instance (matrix)
a vector of empty grouping instances.
syntax: gr = grouping(size);
Copy constructor
one grouping instance is copied into another.
syntax: gr_destination = grouping(gr_origin);
Construction by field names
an instance is formed by directly providing field values. Any field which is not dependent is permitted.
syntax: gr = grouping(field_name, field_value, ...);
example: gr = grouping('name','demo group','description','no data inside');
Construction by assignment/naming pairs
an instance is formed by providing pairs of assignment matrix (of size h -by-n) and the corresponding naming.
syntax: gr = grouping(assignment, naming, ...);

List of Functions

I/O functions:

bundle
converts p_CpG structure into amSample object
dump
writes class content to a text file

Information extraction:

regionmethylation
computes methylation in a region

Maintenance:

indexofchr
gets the index of chromosomes

Processing:

determinewinsize
estimates the window size.
diagnose
computes basic statistics and recommends thresholds values.
estimatedrate
estimates deamination rate.
filter
removes unreliable CpG positions.
reconstructmethylation
computes methylation from tOct data
simulate
simulates DNA degradation
smooth
provides smoothed CT and T vectors

SET/GET functions:

effectivecoverage
gets the effective coverage of a chromosome
get
get method
getaoga
gets the aOga vector for a specific chromosome
getmethylation
gets the methylation vector for a specific chromosome
getno_as
gets the No_As vector for a specific chromosome
getno_cs
gets the No_Cs vector for a specific chromosome
getno_gs
gets the No_Gs vector for a specific chromosome
getno_ts
gets the No_Ts vector for a specific chromosome
isfiltered
reports whether AMSAMPLE is filtered.
set
set method