Variable object (Multivariate Analysis Toolbox for MatlabŪ)

written by: Liran Carmel

last modified:

General Description
Holds information about values of a measurement of some scalar magnitude (namely, a feature or a coordinate) over samples. Each variable has a range of possible values, which can be numerical or verbal descriptions. Variables are classified by their level, according to the relationships between their possible values. A variable is nominal if there are no relationships between the possible values of the variable, which are typically non-numerical. For example, a wine primary taste can be described as citrus, woody, or fruity. A variable is ordinal if the values have an order. For example, a person can be described as thin, normal and fat. Finally, a variable is numerical if its values are numerical. We do not make a discrimination between numerical values that are on a ratio scale (e.g., a person's height) and those that are not (e.g., temperature in Celsius).

If the values are verbal descriptions, they are represented as integers, and the connection between them and the verbal descriptions are kept in the lookup table lut. The lut field is therefore always defined for nominal variables, always not defined for numerical variables, and sometimes defined for ordinal variables.

If the variable is known to be distributed according to some probability distribution, this distribution is designated in the field distribution. This field is a structure with two fields - the name of the distribution (e.g., 'normal') and its parameters (e.g, mean and variance). Statistical measures of the variable, namely mean, variance and minmax, are also kept. Each of these is a structure with a 'population' field and with 'sample' field, keeping the population and sampling measures, respectively. Unknown population measures are designated by NaNs.

Someimes, a transformation is applied to a variable (e.g., log-transform, centering, standardizing), yielding a related, but new, variable. In order to be able to track the series of transfomations performed, these are recorded as the transformations vector. Each element is a structure with a name field (e.g., 'center') and a parameter field (e.g., one scalar for centering).

Navigate to:     General Description     Class Structure     Class Construction     Class Functions

Class Structure
Each field can be accessed by the dot (.) operation, or by the GET function. Most fields can be modified using the SET function, except for those designated as READ-ONLY which are computed automatically by the class methods and cannot be handled by the user. Some fields, which we found to be more commonly accessed, have specific SET/GET functions that are listed in the table.
    Field Description Type Default Dedicated Get/Set Function  
    name name of object, should be short and used as identifier. This field can never be empty. string 'unnamed'    
    description verbal description of the class content. string ''    
    source verbal description of the source of information. string ''    
  read only no_samples number of samples, n. Set to zero for void objects. integer scalar 0 nosamples  
    data measured value of the variable for each of the n samples. Missing values are denoted by NaNs. numerical vector [] () indexing (all values), or {} indexing (nonmissing values only)  
    units units of data. string ''    
    level level of the variable. unknown/nominal/ordinal/numerical 'unknown'    
    lut lookup table mapping integers to descriptions. Used for nominal variables only. cell array of strings {} can be accessed using () indexing  
  read only transformations keeps track of the transformation that were applied to the variable. Each transformation is a structure with fields 'name' and 'parameters'. Later, I plan to create a transformation object. vector of transformation structures []    
    minmax allowed range of the variable. It is a structure with a field 'population' that holds the population range, and a field 'sample' that holds the sample range, and is read-only. If the population minimum and maximum are unknown, NaN is substituted. a structure with two 2-vectors [NaN NaN] for both population and sample minmax min,max,minmax read sample values. pminmax reads population values.  
    mean mean of the variable. It is a structure with a field 'population' that holds the population mean, and a field 'sample' that holds the sample mean, and is read-only. If the population mean is unknown, NaN is substituted. a structure with two scalars NaN for both population and sample mean mean for sample value, pmean for population value  
    variance variance of the variable. It is a structure with a field 'population' that holds the population variance, and a field 'sample' that holds the sample variance, and is read-only. If the population variance is unknown, NaN is substituted. a structure with two scalars NaN for both population and sample variance var for sample value  
    distribution the distribution associated with the variable. This is a structure with fields 'name' and 'parameters'. Later, I plan to create a distribution object. vector of distribution structures name='unknown'; parameters=[]    
  read only no_missing number of missing samples. Set to zero for void objects. integer scalar 0 nomissing  

Class Construction
Empty instance (scalar)
an empty variable instance, with all fields initialized to their default values.
syntax: vr = variable;
Empty instance (vector)
a vector of empty variable instances.
syntax: vr = variable(no_instances);
Copy constructor
a variable instance is copied into another.
syntax: vr_destination = variable(vr_origin);
Construction by field names
an instance is formed by directly providing field values. Any field which is not read-only is permitted.
syntax: vr = variable(field_name, field_value, ...);
example: vr = variable('name','demo variable', 'data',[1.1 1.2 1.3]);
Casting from a matrix
an instance is formed by providing a variable-by-sample matrix.
syntax: vr = variable(data);

List of Functions

Analysis:

detectoutliers
detects outliers

Characteristics of variable:

entropy
entropy of the variable
hist
computes and plots the histogram

Computations:

emutualentropy
estimates pairwise mutual entropy

Constructors:

loadobj
basic load function
variable
constructor method

Display functions:

display
display method

Indexing:

end
end keyword
instance
extracts a specific instance from a variable array
subsref
basic indexing method

Operators:

deletesamples
eliminate samples from a variable instance
== (eq)
element-wise logical operator
>= (ge)
element-wise logical operator
> (gt)
element-wise logical operator
<= (le)
element-wise logical operator
< (lt)
element-wise logical operator
- (minus)
subtracts two variables
/ (mrdivide)
divides a variable by a scalar
~= (ne)
element-wise logical operator
+ (plus)
adds two variables
./ (rdivide)
divides two variables
.* (times)
multiplies two variables

Queries:

iscomplete
checks for complete variable(s)
isnominal
checks for nominal variable(s)
isnumeric
checks for numeric variable(s)
isordinal
checks for ordinal variable(s)
isunknown
checks for unknown variable(s)

SET/GET functions:

get
get method
iqr
computes the inter-quartile range of variable(s)
max
retrieves the sample max of variable(s)
mean
retrieves the sample mean of variable(s)
median
retrieves the sample median of variable(s)
min
retrieves the sample min of variable(s)
minmax
retrieves the sample minmax of variable(s)
name
retrieves the name of variable(s)
nanless
extracts only complete data (without missing values)
nomissing
retrieves the number of missing values
nosamples
retrieves the number of samples in variable
pmean
retrieves the population mean of variable(s)
pminmax
retrieves the population minmax values of a variable
quantile
computes the quantiles of variable(s)
set
set method
std
retrieves the sample standard deviation of variable(s)
var
retrieves the sample variance of variable(s)

Transformations:

fillin
fills missing data
quantize
turns a numeric variable into a nominal one
transform
transforms variable data

Update functions:

computemean
calculates sample mean
computeminmax
calculates sample min and max values
computemissing
finds the number of missing values
computevariance
calculates sample variance