scandeval package

Submodules

scandeval.benchmark module

Fetches an updated list of all Scandinavian models on the HuggingFace Hub

class scandeval.benchmark.Benchmark(progress_bar: bool = True, save_results: bool = False, language: Union[str, List[str]] = ['da', 'sv', 'no', 'nb', 'nn', 'is', 'fo'], task: Union[str, List[str]] = 'all', evaluate_train: bool = False, verbose: bool = False)

Bases: object

Benchmarking all the Scandinavian language models.

Parameters
  • progress_bar (bool, optional) – Whether progress bars should be shown. Defaults to True.

  • save_results (bool, optional) – Whether to save the benchmark results to ‘scandeval_benchmark_results.json’. Defaults to False.

  • language (str or list of str, optional) – The language codes of the languages to include in the list. Set this to ‘all’ if all languages (also non-Scandinavian) should be considered. Defaults to [‘da’, ‘sv’, ‘no’, ‘nb’, ‘nn’, ‘is’, ‘fo’].

  • task (str or list of str, optional) – The tasks to consider in the list. Set this to ‘all’ if all tasks should be considered. Defaults to ‘all’.

  • evaluate_train (bool, optional) – Whether to evaluate the training set as well. Defaults to False.

  • verbose (bool, optional) – Whether to output additional output. Defaults to False.

progress_bar

Whether progress bars should be shown.

Type

bool

save_results

Whether to save the benchmark results.

Type

bool

language

The languages to include in the list.

Type

str or list of str

task

The tasks to consider in the list.

Type

str or list of str

evaluate_train

Whether to evaluate the training set as well.

Type

bool

verbose

Whether to output additional output.

Type

bool

benchmark_results

The benchmark results.

Type

dict

benchmark(model_id: Optional[Union[str, List[str]]] = None, dataset: Optional[Union[str, List[str]]] = None, progress_bar: Optional[bool] = None, save_results: Optional[bool] = None, language: Optional[Union[str, List[str]]] = None, task: Optional[Union[str, List[str]]] = None, evaluate_train: Optional[bool] = None, verbose: Optional[bool] = None) Dict[str, Dict[str, dict]]

Benchmarks models on datasets.

Parameters
  • model_id (str, list of str or None, optional) – The model ID(s) of the models to benchmark. If None then all relevant model IDs will be benchmarked. Defaults to None.

  • dataset (str, list of str or None, optional) – The datasets to benchmark on. If None then all datasets will be benchmarked. Defaults to None.

  • progress_bar (bool or None, optional) – Whether progress bars should be shown. If None then the default value from the constructor will be used. Defaults to None.

  • save_results (bool or None, optional) – Whether to save the benchmark results to ‘scandeval_benchmark_results.json’. If None then the default value from the constructor will be used. Defaults to None.

  • language (str, list of str or None, optional) – The language codes of the languages to include in the list. Set this to ‘all’ if all languages (also non-Scandinavian) should be considered. If None then the default value from the constructor will be used. Defaults to None.

  • task (str, list of str or None, optional) – The tasks to consider in the list. Set this to ‘all’ if all tasks should be considered. If None then the default value from the constructor will be used. Defaults to None.

  • evaluate_train (bool or None, optional) – Whether to evaluate the training set as well. If None then the default value from the constructor will be used. Defaults to None.

  • verbose (bool or None, optional) – Whether to output additional output. If None then the default value from the constructor will be used. Defaults to None.

Returns

A nested dictionary of the benchmark results. The keys are the names of the datasets, with values being new dictionaries having the model IDs as keys.

Return type

dict

scandeval.cli module

Command-line interface for benchmarking

scandeval.datasets module

Functions that load datasets

scandeval.datasets.load_absabank_imm() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the ABSAbank-Imm dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_angry_tweets() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the AngryTweets dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_dalaj() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the DaLaJ dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_dane() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the the DaNE dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_dataset(name: str) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load a benchmark dataset.

Parameters

name (str) – Name of the dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

Raises

RuntimeError – If name is not a valid dataset name.

scandeval.datasets.load_ddt_dep() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the dependency parsing part of the DDT dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_ddt_pos() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the POS part of the DDT dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_dkhate() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the DKHate dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_europarl() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the Europarl dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_fdt_dep() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the dependency parsing part of the FDT dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_fdt_pos() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the POS part of the FDT dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_idt_dep() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the dependency parsing part of the IDT dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_idt_pos() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the POS part of the IDT dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_lcc() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the LCC dataset.

This dataset is the concatenation of the LCC1 and LCC2 datasets.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_mim_gold_ner() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the the MIM-GOLD-NER dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_ndt_nb_dep() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the Bokmål POS part of the NDT dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_ndt_nb_pos() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the Bokmål POS part of the NDT dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_ndt_nn_dep() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the Nynorsk POS part of the NDT dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_ndt_nn_pos() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the Nynorsk POS part of the NDT dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_nordial() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the Bokmål/Nynorsk part of the NorDial dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_norec() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the NoReC dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_norec_fo() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the NoReC-FO dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_norec_is() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the NoReC-IS dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_norne_nb() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the the Bokmål part of the NorNE dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_norne_nn() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the the Nynorsk part of the NorNE dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_sdt_dep() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the dependency parsing part of the SDT dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_sdt_pos() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the POS part of the SDT dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_suc3() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the SUC 3.0 dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_twitter_sent() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the TwitterSent dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.datasets.load_wikiann_fo() Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Load the the Faroese WikiANN dataset.

Returns

Four dataframes, X_train, X_test, y_train and y_test, where X_train and X_test corresponds to the feature matrices for the training and test split, respectively, and y_train and y_test contains the target vectors.

Return type

tuple

scandeval.utils module

Utility functions to be used in other scripts

class scandeval.utils.DocInherit(mthd: Callable)

Bases: object

Docstring inheriting method descriptor.

The class itself is also used as a decorator.

get_no_inst(cls)
get_with_inst(obj, cls)
use_parent_doc(func, source)
exception scandeval.utils.InvalidBenchmark(message: str = 'This model cannot be benchmarked on the given dataset.')

Bases: Exception

class scandeval.utils.NeverLeaveProgressCallback

Bases: transformers.trainer_callback.ProgressCallback

Progress callback which never leaves the progress bar

on_prediction_step(args, state, control, eval_dataloader=None, **kwargs)

Event called after a prediction step.

on_train_begin(args, state, control, **kwargs)

Event called at the beginning of training.

class scandeval.utils.TwolabelTrainer(split_point: int, **kwargs)

Bases: transformers.trainer.Trainer

Trainer class which deals with two labels.

compute_loss(model, inputs, return_outputs=False)

How the loss is computed by Trainer. By default, all models return the loss in the first element.

Subclass and override for custom behavior.

scandeval.utils.block_terminal_output()

Blocks libraries from writing output to the terminal

scandeval.utils.doc_inherit

alias of scandeval.utils.DocInherit

scandeval.utils.get_all_datasets() list

Load a list of all datasets.

Returns

First entry in each tuple is the short name of the dataset, second entry the long name, third entry the benchmark class and fourth entry the loading function.

Return type

list of tuples

scandeval.utils.is_module_installed(module: str) bool

Check if a module is installed.

Parameters

module (str) – The name of the module.

Returns

Whether the module is installed or not.

Return type

bool

Module contents