scandeval.benchmarks package¶
Subpackages¶
- scandeval.benchmarks.abstract package
- Submodules
- scandeval.benchmarks.abstract.base module
- scandeval.benchmarks.abstract.dep module
- scandeval.benchmarks.abstract.ner module
- scandeval.benchmarks.abstract.pos module
- scandeval.benchmarks.abstract.text_classification module
- scandeval.benchmarks.abstract.token_classification module
- Module contents
Submodules¶
scandeval.benchmarks.absabank_imm module¶
Immigration sentiment classification on the ABSAbank-Imm dataset
- class scandeval.benchmarks.absabank_imm.AbsabankImmBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark
Benchmark of language models on the ABSAbank-Imm dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.angry_tweets module¶
Sentiment evaluation of a language model on the AngryTweets dataset
- class scandeval.benchmarks.angry_tweets.AngryTweetsBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark
Benchmark of language models on the AngryTweets dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.dalaj module¶
Correct spelling classification of a language model on the DaLaJ dataset
- class scandeval.benchmarks.dalaj.DalajBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark
Benchmark of language models on the DaLaJ dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.dane module¶
NER evaluation of a language model on the DaNE dataset
- class scandeval.benchmarks.dane.DaneBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.ner.NerBenchmark
Benchmark of language models on the DaNE dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.ddt_dep module¶
Dependency parsing evaluation of a language model on the DDT dataset
- class scandeval.benchmarks.ddt_dep.DdtDepBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.dep.DepBenchmark
Benchmark of language models on the dependency parsing part of the DDT.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.ddt_pos module¶
POS evaluation of a language model on the DDT dataset
- class scandeval.benchmarks.ddt_pos.DdtPosBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.pos.PosBenchmark
Benchmark of language models on the POS part of the DDT dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.dkhate module¶
Hate speech classification of a language model on the DKHate dataset
- class scandeval.benchmarks.dkhate.DkHateBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark
Benchmark of language models on the DKHate dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.europarl module¶
Sentiment evaluation of a language model on the Europarl dataset
- class scandeval.benchmarks.europarl.EuroparlBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark
Benchmark of language models on the Europarl dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.fdt_dep module¶
Dependency parsing evaluation of a language model on the FDT dataset
- class scandeval.benchmarks.fdt_dep.FdtDepBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.dep.DepBenchmark
Benchmark of language models on the dependency parsing part of the FDT.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.fdt_pos module¶
POS evaluation of a language model on the FDT dataset
- class scandeval.benchmarks.fdt_pos.FdtPosBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.pos.PosBenchmark
Benchmark of language models on the POS part of the FDT dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.idt_dep module¶
Dependency parsing evaluation of a language model on the IDT dataset
- class scandeval.benchmarks.idt_dep.IdtDepBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.dep.DepBenchmark
Benchmark of language models on the dependency parsing part of the IDT.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.idt_pos module¶
POS evaluation of a language model on the IDT dataset
- class scandeval.benchmarks.idt_pos.IdtPosBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.pos.PosBenchmark
Benchmark of language models on the POS part of the IDT dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.lcc module¶
Sentiment evaluation of a language model on the LCC dataset
- class scandeval.benchmarks.lcc.LccBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark
Benchmark of language models on the LCC dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.mim_gold_ner module¶
NER evaluation of a language model on the MIM-GOLD-NER dataset
- class scandeval.benchmarks.mim_gold_ner.MimGoldNerBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.ner.NerBenchmark
Benchmark of language models on the MIM-GOLD-NER dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.ndt_nb_dep module¶
Dependency parsing evaluation of a language model on the NDT-NB dataset
- class scandeval.benchmarks.ndt_nb_dep.NdtNBDepBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.dep.DepBenchmark
Benchmark of language models on the dependency parsing part of DDT-NB.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.ndt_nb_pos module¶
POS evaluation of a language model on the Bokmål part of the NDT dataset
- class scandeval.benchmarks.ndt_nb_pos.NdtNBPosBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.pos.PosBenchmark
Benchmark of language models on the Bokmål POS part of the NDT dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.ndt_nn_dep module¶
Dependency parsing evaluation of a language model on the NDT-NN dataset
- class scandeval.benchmarks.ndt_nn_dep.NdtNNDepBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.dep.DepBenchmark
Benchmark of language models on the dependency parsing part of DDT-NN.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.ndt_nn_pos module¶
POS evaluation of a language model on the Nynorsk part of the NDT dataset
- class scandeval.benchmarks.ndt_nn_pos.NdtNNPosBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.pos.PosBenchmark
Benchmark of language models on the Nynorsk POS part of the NDT dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.nordial module¶
Dialect classification of a language model on the NorDial dataset
- class scandeval.benchmarks.nordial.NorDialBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark
Benchmark of language models on the NorDial dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.norec module¶
Sentiment evaluation of a language model on the NoReC dataset
- class scandeval.benchmarks.norec.NorecBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark
Benchmark of language models on the NoReC dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.norec_fo module¶
Sentiment evaluation of a language model on the NoReC-FO dataset
- class scandeval.benchmarks.norec_fo.NorecFOBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark
Benchmark of language models on the NoReC-FO dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.norec_is module¶
Sentiment evaluation of a language model on the NoReC-IS dataset
- class scandeval.benchmarks.norec_is.NorecISBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark
Benchmark of language models on the NoReC-IS dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.norne_nb module¶
NER evaluation of a language model on the Bokmål part of NorNE
- class scandeval.benchmarks.norne_nb.NorneNBBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.ner.NerBenchmark
Benchmark of language models on the Bokmål part of the NorNE dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.norne_nn module¶
NER evaluation of a language model on the Nynorsk part of NorNE
- class scandeval.benchmarks.norne_nn.NorneNNBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.ner.NerBenchmark
Benchmark of language models on the Nynorsk part of the NorNE dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.sdt_dep module¶
Dependency parsing evaluation of a language model on the SDT dataset
- class scandeval.benchmarks.sdt_dep.SdtDepBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.dep.DepBenchmark
Benchmark of language models on the dependency parsing part of the SDT.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.sdt_pos module¶
POS evaluation of a language model on the SDT dataset
- class scandeval.benchmarks.sdt_pos.SdtPosBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.pos.PosBenchmark
Benchmark of language models on the POS part of the SDT dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.suc3 module¶
NER evaluation of a language model on the SUC 3.0 dataset
- class scandeval.benchmarks.suc3.Suc3Benchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.ner.NerBenchmark
Benchmark of language models on the NER part of the SUC 3.0 dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.twitter_sent module¶
Sentiment evaluation of a language model on the TwitterSent dataset
- class scandeval.benchmarks.twitter_sent.TwitterSentBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.text_classification.TextClassificationBenchmark
Benchmark of language models on the TwitterSent dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None
- verbose¶
Whether to print additional output.
- Type
bool
scandeval.benchmarks.wikiann_fo module¶
NER evaluation of a language model on the Faroese WikiANN dataset
- class scandeval.benchmarks.wikiann_fo.WikiannFoBenchmark(cache_dir: str = '.benchmark_models', evaluate_train: bool = False, verbose: bool = False)¶
Bases:
scandeval.benchmarks.abstract.ner.NerBenchmark
Benchmark of language models on the Faroese WikiANN dataset.
- Parameters
cache_dir (str, optional) – Where the downloaded models will be stored. Defaults to ‘.benchmark_models’.
evaluate_train (bool, optional) – Whether the models should be evaluated on the training scores. Defaults to False. Whether to prefer Jax for the evaluation. Defaults to False.
verbose (bool, optional) – Whether to print additional output during evaluation. Defaults to False.
- name¶
The name of the dataset.
- Type
str
- task¶
The type of task to be benchmarked.
- Type
str
- metric_names¶
The names of the metrics.
- Type
dict
- id2label¶
A dictionary converting indices to labels.
- Type
dict or None
- label2id¶
A dictionary converting labels to indices.
- Type
dict or None
- num_labels¶
The number of labels in the dataset.
- Type
int or None
- label_synonyms¶
Synonyms of the dataset labels.
- Type
list of lists of str
- evaluate_train¶
Whether the training set should be evaluated.
- Type
bool
- cache_dir¶
Directory where models are cached.
- Type
str
- two_labels¶
Whether two labels should be predicted.
- Type
bool
- split_point¶
Splitting point of id2label into labels.
- Type
int or None