PURPOSE
Loads data from a dataset. The supported dataset types are CSV, Excel (xlsx, xlsx), HDF5, GAUSS Matrix (fmt), GAUSS Dataset (dat), Stata (dta), and SAS (sas7bdat, sas7bcat). Existing dataframes are also supported.
FORMAT
Parameters:
dataset (string or existing dataframe) –
filepath to the dataset on disk, URL, or existing dataframe. If the a URL is provided (with http or https schema), the dataset will be downloaded first. Since libcurl is used for all web operations, various proxy settings can be set using the relevant libcurl environment variables (see https://curl.haxx.se/libcurl/c/CURLOPT_PROXY.html).
varnames (string) –
Formula string indicating which variable names to load from the dataset
E.g "."
, include all variables;
E.g "Income + Limit "
, include "Income"
and "Limit"
;
E.g ". - Cards"
, include all variables except for "Cards"
.
Returns:
y (NxK matrix) – data.
EXAMPLES
Load all contents of a GAUSS dataset
After the above code, the following ouptut should be printed to the Command window.
Load specified variables from a dataset
After the above code,
All variables: 14.891 3606.00 283.00 2.0000 34.000 11.000 1.0000 1.0000 2.0000 3.0000 333.000 106.03 6645.00 483.00 3.0000 82.000 15.000 2.0000 2.0000 2.0000 2.0000 903.000 104.59 7075.00 514.00 4.0000 71.000 11.000 1.0000 1.0000 1.0000 2.0000 580.000 Balance and Limit: 333.000 3606.00 903.000 6645.00 580.000 7075.00 All except Cards: 14.8910 3606.00 283.00 34.000 11.000 1.0000 1.0000 2.0000 3.0000 333.000 106.025 6645.00 483.00 82.000 15.000 2.0000 2.0000 2.0000 2.0000 903.000 104.593 7075.00 514.00 71.000 11.000 1.0000 1.0000 1.0000 2.0000 580.000
Load all columns of a GAUSS matrix file, .fmt
No variable names are stored in .fmt
files. GAUSS allows the use of X1, X2, X2...XP
to reference variables in a .fmt
file.
Load specified columns of a GAUSS matrix file, .fmt.
Load three specified variables from a SAS dataset, .sas7bdat.
After the above code,
Load a string date from a .csv file and automatically convert it to a POSIX date/time (seconds since Jan 1, 1970).
After the above code,
Remarks
-
Since
loadd()
will load the entire dataset at once, the dataset must be small enough to fit in memory. To read chunks of a dataset in an iterative manner, usedataopen()
andreadr()
. - If dataset is a null string or 0, the dataset
temp.dat
will be loaded. - To load a matrix file, use an
.fmt
extension on dataset. - The supported dataset types are
CSV
,Excel
(XLS, XLSX),HDF5
,GAUSS Matrix (FMT)
,GAUSS Dataset (DAT)
,Stata
(DTA) andSAS
(SAS7BDAT, SAS7BCAT). -
For
HDF5
file, the dataset must include schema and both file name and dataset name must be provided, e.g.