scandeval.benchmarks package

Submodules

scandeval.benchmarks.absabank_imm module

Immigration sentiment classification on the ABSAbank-Imm dataset

class scandeval.benchmarks.absabank_imm.AbsabankImmBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark

Benchmark of language models on the ABSAbank-Imm dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.angry_tweets module

Sentiment evaluation of a language model on the AngryTweets dataset

class scandeval.benchmarks.angry_tweets.AngryTweetsBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark

Benchmark of language models on the AngryTweets dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.dalaj module

Correct spelling classification of a language model on the DaLaJ dataset

class scandeval.benchmarks.dalaj.DalajBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark

Benchmark of language models on the DaLaJ dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.dane module

NER evaluation of a language model on the DaNE dataset

class scandeval.benchmarks.dane.DaneBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.ner.NerBenchmark

Benchmark of language models on the DaNE dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.ddt_dep module

Dependency parsing evaluation of a language model on the DDT dataset

class scandeval.benchmarks.ddt_dep.DdtDepBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.dep.DepBenchmark

Benchmark of language models on the dependency parsing part of the DDT.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.ddt_pos module

POS evaluation of a language model on the DDT dataset

class scandeval.benchmarks.ddt_pos.DdtPosBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.pos.PosBenchmark

Benchmark of language models on the POS part of the DDT dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.dkhate module

Hate speech classification of a language model on the DKHate dataset

class scandeval.benchmarks.dkhate.DkHateBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark

Benchmark of language models on the DKHate dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.europarl module

Sentiment evaluation of a language model on the Europarl dataset

class scandeval.benchmarks.europarl.EuroparlBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark

Benchmark of language models on the Europarl dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.fdt_dep module

Dependency parsing evaluation of a language model on the FDT dataset

class scandeval.benchmarks.fdt_dep.FdtDepBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.dep.DepBenchmark

Benchmark of language models on the dependency parsing part of the FDT.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.fdt_pos module

POS evaluation of a language model on the FDT dataset

class scandeval.benchmarks.fdt_pos.FdtPosBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.pos.PosBenchmark

Benchmark of language models on the POS part of the FDT dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.idt_dep module

Dependency parsing evaluation of a language model on the IDT dataset

class scandeval.benchmarks.idt_dep.IdtDepBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.dep.DepBenchmark

Benchmark of language models on the dependency parsing part of the IDT.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.idt_pos module

POS evaluation of a language model on the IDT dataset

class scandeval.benchmarks.idt_pos.IdtPosBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.pos.PosBenchmark

Benchmark of language models on the POS part of the IDT dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.lcc module

Sentiment evaluation of a language model on the LCC dataset

class scandeval.benchmarks.lcc.LccBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark

Benchmark of language models on the LCC dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.mim_gold_ner module

NER evaluation of a language model on the MIM-GOLD-NER dataset

class scandeval.benchmarks.mim_gold_ner.MimGoldNerBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.ner.NerBenchmark

Benchmark of language models on the MIM-GOLD-NER dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.ndt_nb_dep module

Dependency parsing evaluation of a language model on the NDT-NB dataset

class scandeval.benchmarks.ndt_nb_dep.NdtNBDepBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.dep.DepBenchmark

Benchmark of language models on the dependency parsing part of DDT-NB.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.ndt_nb_pos module

POS evaluation of a language model on the Bokmål part of the NDT dataset

class scandeval.benchmarks.ndt_nb_pos.NdtNBPosBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.pos.PosBenchmark

Benchmark of language models on the Bokmål POS part of the NDT dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.ndt_nn_dep module

Dependency parsing evaluation of a language model on the NDT-NN dataset

class scandeval.benchmarks.ndt_nn_dep.NdtNNDepBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.dep.DepBenchmark

Benchmark of language models on the dependency parsing part of DDT-NN.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.ndt_nn_pos module

POS evaluation of a language model on the Nynorsk part of the NDT dataset

class scandeval.benchmarks.ndt_nn_pos.NdtNNPosBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.pos.PosBenchmark

Benchmark of language models on the Nynorsk POS part of the NDT dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.nordial module

Dialect classification of a language model on the NorDial dataset

class scandeval.benchmarks.nordial.NorDialBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark

Benchmark of language models on the NorDial dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.norec module

Sentiment evaluation of a language model on the NoReC dataset

class scandeval.benchmarks.norec.NorecBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark

Benchmark of language models on the NoReC dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.norec_fo module

Sentiment evaluation of a language model on the NoReC-FO dataset

class scandeval.benchmarks.norec_fo.NorecFOBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark

Benchmark of language models on the NoReC-FO dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.norec_is module

Sentiment evaluation of a language model on the NoReC-IS dataset

class scandeval.benchmarks.norec_is.NorecISBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark

Benchmark of language models on the NoReC-IS dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.norne_nb module

NER evaluation of a language model on the Bokmål part of NorNE

class scandeval.benchmarks.norne_nb.NorneNBBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.ner.NerBenchmark

Benchmark of language models on the Bokmål part of the NorNE dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.norne_nn module

NER evaluation of a language model on the Nynorsk part of NorNE

class scandeval.benchmarks.norne_nn.NorneNNBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.ner.NerBenchmark

Benchmark of language models on the Nynorsk part of the NorNE dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.sdt_dep module

Dependency parsing evaluation of a language model on the SDT dataset

class scandeval.benchmarks.sdt_dep.SdtDepBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.dep.DepBenchmark

Benchmark of language models on the dependency parsing part of the SDT.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.sdt_pos module

POS evaluation of a language model on the SDT dataset

class scandeval.benchmarks.sdt_pos.SdtPosBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.pos.PosBenchmark

Benchmark of language models on the POS part of the SDT dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.suc3 module

NER evaluation of a language model on the SUC 3.0 dataset

class scandeval.benchmarks.suc3.Suc3Benchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.ner.NerBenchmark

Benchmark of language models on the NER part of the SUC 3.0 dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.twitter_sent module

Sentiment evaluation of a language model on the TwitterSent dataset

class scandeval.benchmarks.twitter_sent.TwitterSentBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark

Benchmark of language models on the TwitterSent dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

verbose

Whether to print additional output.

Type

bool

scandeval.benchmarks.wikiann_fo module

NER evaluation of a language model on the Faroese WikiANN dataset

class scandeval.benchmarks.wikiann_fo.WikiannFoBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)

Bases: scandeval.benchmarks.abstract.ner.NerBenchmark

Benchmark of language models on the Faroese WikiANN dataset.

Parameters
  • cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.

  • evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False. Whether to prefer Jax for the evaluation. Defaults to False.

  • verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.

name

The name of the dataset.

Type

str

task

The type of task to be benchmarked.

Type

str

metric_names

The names of the metrics.

Type

dict

id2label

A dictionary converting indices to labels.

Type

dict or None

label2id

A dictionary converting labels to indices.

Type

dict or None

num_labels

The number of labels in the dataset.

Type

int or None

label_synonyms

Synonyms of the dataset labels.

Type

list of lists of str

evaluate_train

Whether the training set should be evaluated.

Type

bool

cache_dir

Directory where models are cached.

Type

str

two_labels

Whether two labels should be predicted.

Type

bool

split_point

Splitting point of id2label into labels.

Type

int or None

Module contents