opendataformat package

Submodules

opendataformat.docu_odf module

Created on Thu Nov 14 17:12:37 2024

@author: thartl

opendataformat.docu_odf.docu_odf(x, metadata='all', languages='all')[source]

Extract and display metadata from a pandas DataFrame or pandas.Series.

This function processes the metadata stored in the attrs attribute of a pandas object, allowing for selective extraction by metadata type or language. Metadata includes fields such as labels, descriptions, and URLs.

Parameters:
  • x (pandas.DataFrame or pandas.Series (single variable metadata)) – The input pandas object from which metadata will be extracted.

  • metadata (str, default "all") – The type of metadata to extract. Options include: - “all”: Display all available metadata. - “label”, “labels”: Display and return dataset or variable labels. - “description”: Display and return descriptions. - “type”: Display and return types. - “url”: Display and return URLs. - “valuelabels”: Display and return value labels. Aliases for these options are supported (e.g., “Value labels” for “labels”).

  • languages (str or list of str, default "all") – The language(s) to filter metadata by. Options include: - “all”: Process metadata for all languages. - A single language code (e.g., “en”). - A list of language codes (e.g., [“en”, “de”]). Edge cases like empty strings or None are handled gracefully.

Returns:

Extracted metadata as a dictionary. If only a single metadata field is found, returns the metadata as a string instead.

Return type:

dict or str

Raises:
  • TypeError – If x is not a pandas DataFrame or Series.

  • ValueError – If metadata or languages contain invalid values.

Notes

  • Metadata is stored in the attrs attribute of pandas objects.

  • This function supports multilingual metadata if provided in the input.

Examples

Extract all metadata from a DataFrame: >>> import opendataformat as odf >>> df = pd.DataFrame() >>> df.attrs = {“label_en”: “English Label”, “label_fr”: “French Label”, “url”: “https://example.com”} >>> odf.docu_odf(df) label_en: English Label label_fr: French Label url: https://example.com

Extract specific metadata type:

>>> odf.docu_odf(df, metadata="label")
label_en: English Label
label_fr: French Label

Extract metadata filtered by language:

>>> label = odf.docu_odf(df, metadata="label", languages="en")
label_en: English Label
>>> print(label)
English Label

Extract dataset level metadata from a DataFrame:

>>> df = odf.read_odf("example_dataset.zip")
>>> df.attrs = {'study': 'study name',
        'dataset': 'dataset name',
        'label_en': 'label in english',
        'label_de': 'label in german',
        'description_en': 'details in english',
        'description_de': 'details in german',
        'url': 'https://example.url'}
>>> odf.docu_odf(df)
study: study name
dataset: dataset name
label_en: label in english
label_de: label in german
description_en: details in english
description_de: details in german
url: https://example.url

Extract specific variable metadata:

>>> odf.docu_odf(df['variable_name'])
name:variable
label_en: english label
label_de: german label
url: https://example.url

Extract specific metadata type:

>>> odf.docu_odf(df, metadata="label")
label_en: English label
label_de: German label

Extract metadata filtered by language:

>>> label = odf.docu_odf(df, metadata="label", languages="en")
label_en: English Label
>>> print(label)
English Label

opendataformat.read_odf module

Created on Mon Oct 21 12:24:04 2024

@author: xhan

opendataformat.read_odf.read_odf(path, languages='all', usecols=None, skiprows=None, nrows=None, na_values=None)[source]

Read an Open Data Format (ODF) file into a Pandas DataFrame.

This function reads data from an ODF zipfile (containing data.csv and metadata.xml) and converts it into a pandas DataFrame. It supports language selection, optional filtering of columns, skipping rows, and replacing specific values with NaN.

Parameters:
  • path (str) – The file path to the ODF file to be read.

  • languages (str or list of str, default "all") – Specifies the language(s) to extract from the file. Use “all” to read all available languages, or pass a single language code (e.g., “en”).

  • usecols (list of int or str, optional) – Specifies the columns to be read from the file. If None, all columns are read. Column selection can be by index or name.

  • skiprows (int or list of int, optional) – Line numbers to skip (0-indexed) at the start of the file. Can be used to skip metadata or headers.

  • nrows (int, optional) – The number of rows to read from the file. If None, all rows are read.

  • na_values (scalar, str, list-like, or dict, optional) – Additional values to consider as NaN. If dict, applies per column.

Returns:

A pandas DataFrame containing the data and metadata from the ODF file.

Return type:

DataFrame

Notes

  • The languages parameter allows for selecting specific localized data if the ODF file supports it.

  • Metadata is stored in the attrs attribute of pandas objects- You can call the attributes with df.attrs or df[‘variable_name’].attrs.

Examples

Read an ODF file and load all columns: >>> import opendataformat as odf >>> df = odf.read_odf(“example_dataset.zip”)

Read an ODF zipfile, selecting specific language:

>>> df = odf.read_odf("example.zip", languages="en")

opendataformat.write_odf module

Created on Thu Oct 17 12:25:16 2024

@author: thartl

opendataformat.write_odf.write_odf(x, path, languages='all')[source]

Write a pandas DataFrame or Series to an Open Data Format (ODF) file.

This function saves the provided pandas dataframe (x) to an ODF file, including metadata stored in its attrs attribute. Metadata can optionally be filtered by language.

Parameters:
  • x (pandas.DataFrame or pandas.Series) – The pandas object to be saved to the ODF file. It should have metadata stored in the attrs attribute for inclusion in the output file metadata.xml.

  • path (str) – The file path (including filename) where the ODF file will be saved. Ensure the path ends with .zip to specify the correct file format.

  • languages (str or list of str, default "all") – Specifies which language(s) of metadata to include in the ODF file. Options include: - “all”: Include metadata for all available languages. - A single language code (e.g., “en”). - A list of language codes (e.g., [“en”, “de”]). Edge cases like empty strings or None in the language list are handled gracefully.

Returns:

The function writes the file to the specified path and does not return a value.

Return type:

None

Raises:
  • TypeError – If x is not a pandas DataFrame or Series.

  • ValueError – If languages contains invalid values.

Notes

  • Metadata from the attributes (attrs) of x is included in the file.

  • Multilingual metadata, if present, is processed according to the languages parameter.

Examples

Write a DataFrame to an ODF file, including all metadata: >>> import opendataformat as odf >>> df = pd.DataFrame({“A”: [1, 2], “B”: [3, 4]}) >>> df.attrs = {“label_en”: “English Label”, “label_de”: “German Label”, “description_en”: “Example dataset”} >>> odf.write_odf(df, “output.zip”)

Write a DataFrame to an ODF file, filtering metadata by language:

>>> odf.write_odf(df, "output.zip", languages="en")

Write a DataFrame to an ODF file, including metadata for multiple languages:

>>> odf.write_odf(df, "output.zip", languages="all")