The opendf
package is specifically designed to
facilitate the seamless utilization of the Open Data Format in Stata. It
offers functionality to import data from the Open Data Format into a
Stata data frame, as well as export data from a Stata data frame (.dta)
to the Open Data Format. Additionally, you can easily access
comprehensive information about the dataset and variables. This
user-friendly approach ensures convenient exploration and utilization of
dataset information within your preferred environment.
The Stata package can be installed from SSC using the command
ssc install
.
* Install the package from ssc:
ssc install opendf
Alternatively the development version can be installed from GitHub.
* Install the development version from GitHub:
net install opendf, from (https://opendataformat.github.io/stata-package-opendf/) replace
The opendataformat package consists of three main functions:
opendf read
to read an Open Data Format file in
Stata. This function takes an input parameter, which is the path to the
Open Data Format ZIP file.
opendf docu
to explore the dataset information. You
can display metadata for the whole dataset using
opendf docu
or an selected variable
opendf docu variable
.
opendf write
to write the Stata data frame to an
Open Data Format ZIP file. By specifying the dataframe input and
providing the output directory path the function will generate a ZIP
file containing the dataset stored in a “data.csv” and the metadata
stored in a “metadata.xml”.
* You can load a dataset in opendf-format with opendf_read by specifying the path to the zip-folder, (in this case, the testdataset from github):
opendf read "https://opendataformat.github.io/stata-package-opendf/example_data/soep_data.zip", clear verbose
The opendataformat package provides further functions that might be helpful:
opendf installpython
will install a python version
that enables the opendf-package to work. Some functions
(opendf read
and opendf write
) need a working
python version on your PC. If Stata is not able to find a working python
version, you can use this command to copy a python (default v3.12) to
the Stata ado/plus folder (where Stata packages are stored). You can
also specify the python version you want to install and specify a
location where the folder with the python version should be deployed.
But than you have to tell Stata manually where the python.exe is located
using “set python_exec”.
opendf removepython
will remove python versions
installed with opendf installpython
. By default the the
command will delete all python3 folders/installations from the Stata ado
folder (the default folder where opendf installpython
deploys the python installation). With the version()
-option
can also specify which python version to remove. With the
location()
-option you can indicate where to delete the
python folders from.
opendf_csv2dta
to read data from four csvs
containing data and metadata into Stata.
opendf_csv2zip
to save data from four csvs
containing data and metadata to a opendf-format (zip-folder containing
data.csv and metadata.xml). The csvs consist of data.csv (raw data),
dataset.csv (dataset metadata), varaibles.csv (variables metadata) and
categories.csv (value labels for variables).
For data providers the opendf_csv2dta
and
opendf_csv2dta
functions might be interesting to convert
their data to the opendf format.
When working with a multilingual dataset, the
opendataformat
package provides the option to specify the
language you want to work with for the main functions:
opendf read
, opendf docu
,
opendf write
.
You can achieve this by using the languages
argument and
setting it to either all
to include all languages, or by
specifying the language code such as de
for German or
en
for English. This allows you to easily select the
desired language for your dataset operations. Metadata without a
language tag will be assigned to the language default
. The
language codes are defined by the ISO
639-1.
For more comprehensive insights into the Open Data Format specification, read the Open Data Format Specification. This resource provides detailed documentation and profiles illustrating the storage locations of attributes within the Open Data Format, as well as within the native formats to which they will be converted.
Additionally, you can download a practical example of a data file in the Open Data Format (example_dataset.zip).