opendataformat package¶
Submodules¶
opendataformat.docu_odf module¶
Created on Thu Nov 14 17:12:37 2024
@author: thartl
- opendataformat.docu_odf.docu_odf(x, metadata='all', languages='all')[source]¶
Extract and display metadata from a pandas DataFrame or pandas.Series.
This function processes the metadata stored in the attrs attribute of a pandas object, allowing for selective extraction by metadata type or language. Metadata includes fields such as labels, descriptions, and URLs.
- Parameters:
x (pandas.DataFrame or pandas.Series (single variable metadata)) – The input pandas object from which metadata will be extracted.
metadata (str, default "all") – The type of metadata to extract. Options include: - “all”: Display all available metadata. - “label”, “labels”: Display and return dataset or variable labels. - “description”: Display and return descriptions. - “type”: Display and return types. - “url”: Display and return URLs. - “valuelabels”: Display and return value labels. Aliases for these options are supported (e.g., “Value labels” for “labels”).
languages (str or list of str, default "all") – The language(s) to filter metadata by. Options include: - “all”: Process metadata for all languages. - A single language code (e.g., “en”). - A list of language codes (e.g., [“en”, “de”]). Edge cases like empty strings or None are handled gracefully.
- Returns:
Extracted metadata as a dictionary. If only a single metadata field is found, returns the metadata as a string instead.
- Return type:
dict or str
- Raises:
TypeError – If x is not a pandas DataFrame or Series.
ValueError – If metadata or languages contain invalid values.
Notes
Metadata is stored in the attrs attribute of pandas objects.
This function supports multilingual metadata if provided in the input.
Examples
Extract all metadata from a DataFrame: >>> import opendataformat as odf >>> df = pd.DataFrame() >>> df.attrs = {“label_en”: “English Label”, “label_fr”: “French Label”, “url”: “https://example.com”} >>> odf.docu_odf(df) label_en: English Label label_fr: French Label url: https://example.com
Extract specific metadata type:
>>> odf.docu_odf(df, metadata="label") label_en: English Label label_fr: French Label
Extract metadata filtered by language:
>>> label = odf.docu_odf(df, metadata="label", languages="en") label_en: English Label >>> print(label) English Label
Extract dataset level metadata from a DataFrame:
>>> df = odf.read_odf("example_dataset.zip") >>> df.attrs = {'study': 'study name', 'dataset': 'dataset name', 'label_en': 'label in english', 'label_de': 'label in german', 'description_en': 'details in english', 'description_de': 'details in german', 'url': 'https://example.url'} >>> odf.docu_odf(df) study: study name dataset: dataset name label_en: label in english label_de: label in german description_en: details in english description_de: details in german url: https://example.url
Extract specific variable metadata:
>>> odf.docu_odf(df['variable_name']) name:variable label_en: english label label_de: german label url: https://example.url
Extract specific metadata type:
>>> odf.docu_odf(df, metadata="label") label_en: English label label_de: German label
Extract metadata filtered by language:
>>> label = odf.docu_odf(df, metadata="label", languages="en") label_en: English Label >>> print(label) English Label
opendataformat.read_odf module¶
Created on Mon Oct 21 12:24:04 2024
@author: xhan
- opendataformat.read_odf.read_odf(path, languages='all', usecols=None, skiprows=None, nrows=None, na_values=None)[source]¶
Read an Open Data Format (ODF) file into a Pandas DataFrame.
This function reads data from an ODF zipfile (containing data.csv and metadata.xml) and converts it into a pandas DataFrame. It supports language selection, optional filtering of columns, skipping rows, and replacing specific values with NaN.
- Parameters:
path (str) – The file path to the ODF file to be read.
languages (str or list of str, default "all") – Specifies the language(s) to extract from the file. Use “all” to read all available languages, or pass a single language code (e.g., “en”).
usecols (list of int or str, optional) – Specifies the columns to be read from the file. If None, all columns are read. Column selection can be by index or name.
skiprows (int or list of int, optional) – Line numbers to skip (0-indexed) at the start of the file. Can be used to skip metadata or headers.
nrows (int, optional) – The number of rows to read from the file. If None, all rows are read.
na_values (scalar, str, list-like, or dict, optional) – Additional values to consider as NaN. If dict, applies per column.
- Returns:
A pandas DataFrame containing the data and metadata from the ODF file.
- Return type:
DataFrame
Notes
The languages parameter allows for selecting specific localized data if the ODF file supports it.
Metadata is stored in the attrs attribute of pandas objects- You can call the attributes with df.attrs or df[‘variable_name’].attrs.
Examples
Read an ODF file and load all columns: >>> import opendataformat as odf >>> df = odf.read_odf(“example_dataset.zip”)
Read an ODF zipfile, selecting specific language:
>>> df = odf.read_odf("example.zip", languages="en")
opendataformat.write_odf module¶
Created on Thu Oct 17 12:25:16 2024
@author: thartl
- opendataformat.write_odf.write_odf(x, path, languages='all')[source]¶
Write a pandas DataFrame or Series to an Open Data Format (ODF) file.
This function saves the provided pandas dataframe (x) to an ODF file, including metadata stored in its attrs attribute. Metadata can optionally be filtered by language.
- Parameters:
x (pandas.DataFrame or pandas.Series) – The pandas object to be saved to the ODF file. It should have metadata stored in the attrs attribute for inclusion in the output file metadata.xml.
path (str) – The file path (including filename) where the ODF file will be saved. Ensure the path ends with .zip to specify the correct file format.
languages (str or list of str, default "all") – Specifies which language(s) of metadata to include in the ODF file. Options include: - “all”: Include metadata for all available languages. - A single language code (e.g., “en”). - A list of language codes (e.g., [“en”, “de”]). Edge cases like empty strings or None in the language list are handled gracefully.
- Returns:
The function writes the file to the specified path and does not return a value.
- Return type:
None
- Raises:
TypeError – If x is not a pandas DataFrame or Series.
ValueError – If languages contains invalid values.
Notes
Metadata from the attributes (attrs) of x is included in the file.
Multilingual metadata, if present, is processed according to the languages parameter.
Examples
Write a DataFrame to an ODF file, including all metadata: >>> import opendataformat as odf >>> df = pd.DataFrame({“A”: [1, 2], “B”: [3, 4]}) >>> df.attrs = {“label_en”: “English Label”, “label_de”: “German Label”, “description_en”: “Example dataset”} >>> odf.write_odf(df, “output.zip”)
Write a DataFrame to an ODF file, filtering metadata by language:
>>> odf.write_odf(df, "output.zip", languages="en")
Write a DataFrame to an ODF file, including metadata for multiple languages:
>>> odf.write_odf(df, "output.zip", languages="all")