Dataset object (Multivariate Analysis Toolbox for MatlabŪ)

written by: Liran Carmel

last modified:

General Description
A dataset typically comprises a collection of many different core objects (variables, groupings, and samplesets). For example, during the initial phase of a project, many versions of the data may be kept simultaneously. The dataset serves as a repository, keeping the raw data. Typically, before an actual analysis begins, a more structured object, like datamatrix, is extracted.

Navigate to:     General Description     Class Structure     Class Construction     Class Functions

Class Structure
Each field can be accessed by the dot (.) operation, or by the GET function. Most fields can be modified using the SET function, except for those designated as READ-ONLY which are computed automatically by the class methods and cannot be handled by the user. Some fields, which we found to be more commonly accessed, have specific SET/GET functions that are listed in the table.
    Field Description Type Default Dedicated Get/Set Function  
    name name of object, should be short and used as identifier. This field can never be empty. string 'unnamed'    
    description verbal description of the class content. string ''    
    source verbal description of the source of information. string ''    
    variables a vector of length nv, holding the variables aggregated in the dataset. vector of variables [] variables  
  read only no_variables a vector [nnom nord nnum nunk nv], indicating the number of nominal variables, ordinal variables, numerical variables, variables of unknown level, and the number of total variables, respectively. 5-vector of nonnegative integers [0 0 0 0 0] novariables  
    groupings a vector of length ng, holding the groupings aggregated in the dataset. vector of groupings []    
  read only no_groupings number of groupings, ng. Set to zero for void objects. integer scalar 0 nogroupings  
    samplesets a vector of length ns, holding the samplesets aggregated in the dataset. vector of samplesets [] samplesets  
  read only no_samplesets number of samplesets, ns. Set to zero for void objects. integer scalar 0 nosamplesets  
    matrix a vector of length nm, holding the datamatrices aggregated in the dataset. vector of datamatrices []    
  read only no_matrices number of datamatrices, nm. Set to zero for void objects. integer scalar 0    
    var2sampset a vector of length nv, holding for each variable the ID of its corresponding sampleset. vector of nonnegative integers []    
    grp2sampset a vector of length ng, holding for each grouping the ID of its corresponding sampleset. vector of nonnegative integers []    

Class Construction
Empty instance (scalar)
an empty dataset instance, with all fields initialized to their default values.
syntax: ds = dataset;
Empty instance (vector)
a vector of empty dataset instances.
syntax: ds = dataset(no_instances);
Copy constructor
a dataset instance is copied into another.
syntax: ds_destination = dataset(ds_origin);
Construction by field names
an instance is formed by directly providing field values. Any field which is not read-only is permitted.
syntax: ds = dataset(field_name, field_value, ...);
example: ds = dataset('name','demo dataset', 'description','no data inside');

List of Functions


computes covariance/correlation matrix


constructor method

Display functions:

display method

Housekeeping functions:

index of groupings in dataset by their name
guess the mapping of groupings to their sampleset
guess the mapping of variables to their sampleset
index of samplesets in dataset by their name
index of variables in dataset by their name


end keyword
extracts a specific instance from a dataset array
basic indexing method


eliminate samples from a dataset instance
eliminate variables from a dataset instance
+ (plus)
adds samplesets, variables and groupings to a dataset

SET/GET functions:

get method
number of groupings in dataset(s)
number of samples in the different variables
number of samplesets in dataset(s)
number of variables of a specific type
sample names of a specific sampleset
samplesets of dataset(s)
set method
variables of dataset(s)