Dataset object (Multivariate Analysis Toolbox for MatlabŪ)

written by: Liran Carmel

last modified:

General Description
A dataset typically comprises a collection of many different core objects (variables, groupings, and samplesets). For example, during the initial phase of a project, many versions of the data may be kept simultaneously. The dataset serves as a repository, keeping the raw data. Typically, before an actual analysis begins, a more structured object, like datamatrix, is extracted.

Navigate to:     General Description     Class Structure     Class Construction     Class Functions

Class Structure
Each field can be accessed by the dot (.) operation, or by the GET function. Most fields can be modified using the SET function, except for those designated as READ-ONLY which are computed automatically by the class methods and cannot be handled by the user. Some fields, which we found to be more commonly accessed, have specific SET/GET functions that are listed in the table.
    Field Description Type Default Dedicated Get/Set Function  
    name name of object, should be short and used as identifier. This field can never be empty. string 'unnamed'    
    description verbal description of the class content. string ''    
    source verbal description of the source of information. string ''    
    variables a vector of length nv, holding the variables aggregated in the dataset. vector of variables [] variables  
  read only no_variables a vector [nnom nord nnum nunk nv], indicating the number of nominal variables, ordinal variables, numerical variables, variables of unknown level, and the number of total variables, respectively. 5-vector of nonnegative integers [0 0 0 0 0] novariables  
    groupings a vector of length ng, holding the groupings aggregated in the dataset. vector of groupings []    
  read only no_groupings number of groupings, ng. Set to zero for void objects. integer scalar 0 nogroupings  
    samplesets a vector of length ns, holding the samplesets aggregated in the dataset. vector of samplesets [] samplesets  
  read only no_samplesets number of samplesets, ns. Set to zero for void objects. integer scalar 0 nosamplesets  
    matrix a vector of length nm, holding the datamatrices aggregated in the dataset. vector of datamatrices []    
  read only no_matrices number of datamatrices, nm. Set to zero for void objects. integer scalar 0    
    var2sampset a vector of length nv, holding for each variable the ID of its corresponding sampleset. vector of nonnegative integers []    
    grp2sampset a vector of length ng, holding for each grouping the ID of its corresponding sampleset. vector of nonnegative integers []    

Class Construction
Empty instance (scalar)
an empty dataset instance, with all fields initialized to their default values.
syntax: ds = dataset;
Empty instance (vector)
a vector of empty dataset instances.
syntax: ds = dataset(no_instances);
Copy constructor
a dataset instance is copied into another.
syntax: ds_destination = dataset(ds_origin);
Construction by field names
an instance is formed by directly providing field values. Any field which is not read-only is permitted.
syntax: ds = dataset(field_name, field_value, ...);
example: ds = dataset('name','demo dataset', 'description','no data inside');

List of Functions

Computations:

correlate
computes covariance/correlation matrix

Constructors:

dataset
constructor method

Display functions:

display
display method

Housekeeping functions:

grpname2grpidx
index of groupings in dataset by their name
guessgrp2sampset
guess the mapping of groupings to their sampleset
guessvar2sampset
guess the mapping of variables to their sampleset
ssname2ssidx
index of samplesets in dataset by their name
varname2varidx
index of variables in dataset by their name

Indexing:

end
end keyword
instance
extracts a specific instance from a dataset array
subsref
basic indexing method

Operators:

deletesamples
eliminate samples from a dataset instance
deletevariables
eliminate variables from a dataset instance
+ (plus)
adds samplesets, variables and groupings to a dataset

SET/GET functions:

get
get method
nogroupings
number of groupings in dataset(s)
nosamples
number of samples in the different variables
nosamplesets
number of samplesets in dataset(s)
novariables
number of variables of a specific type
samplenames
sample names of a specific sampleset
samplesets
samplesets of dataset(s)
set
set method
variables
variables of dataset(s)