Skip to content

Commit

Permalink
Merge pull request #121 from KennethEnevoldsen/update-tables
Browse files Browse the repository at this point in the history
Updated docs
  • Loading branch information
KennethEnevoldsen authored Feb 4, 2024
2 parents 4e1cd36 + db3ae97 commit 02b5993
Show file tree
Hide file tree
Showing 45 changed files with 253 additions and 77 deletions.
51 changes: 1 addition & 50 deletions docs/datasets.md
Original file line number Diff line number Diff line change
@@ -1,53 +1,4 @@

## Coverage
This section gives you an overview of what the benhcmark covers.
Note that some segments are completely uncovered by the benchmark, this makes it clear what type of tasks haven't been evaluated and what domains these
results might not generalize to.


### Tasks
The follows table give you and an overview of the coverage of the tasks in the Scandinavian Embedding Benchmark.

| | | Danish | Norwegian Bokmål | Norwegian Nynorsk | Swedish |
| :----------------------- | :------------------ | :--------------------------------: | :--------------------------------: | :--------------------------------: | :--------------------------------: |
| **Task** | **Formalization** | | | | |
| Question Answering | Retrieval | <span style="color:green">✓</span> | <span style="color:green">✓</span> | | <span style="color:green">✓</span> |
| bitext Mining | Bitext Mining | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> | |
| Political | Classification | | <span style="color:green">✓</span> | <span style="color:green">✓</span> | |
| Language Identification | Classification | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> |
| Linguistic Acceptability | Classification | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> |
| Sentiment/Hate Speech | Classification | <span style="color:green">✓</span> | <span style="color:green">✓</span> | | <span style="color:green">✓</span> |
| Dialog Systems | Classification | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> |
| Thematic Clustering | Clustering | | <span style="color:green">✓</span> | | <span style="color:green">✓</span> |
| | Reranking | | | | |
| | Pair Classification | | | | |
| | STS | | | | |



### Domains
The following table show the coverage pr. language. The domains follows the categories used in the [Universal Dependencies project](https://universaldependencies.org).

| | Danish | Norwegian Bokmål | Norwegian Nynorsk | Swedish |
| ----------- | :----------------------------------: | :--------------------------------: | :--------------------------------: | :--------------------------------: |
| **Domain** | | | | |
| Academic | (<span style="color:green">✓</span>) | | | |
| Bible | | | | |
| Blog | | | | |
| Fiction | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> |
| Government | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> |
| Legal | (<span style="color:green">✓</span>) | <span style="color:green">✓</span> | <span style="color:green">✓</span> | |
| Medical | | | | |
| News | <span style="color:green">✓</span> | <span style="color:green">✓</span> | | <span style="color:green">✓</span> |
| Non-Fiction | <span style="color:green">✓</span> | <span style="color:green">✓</span> | | <span style="color:green">✓</span> |
| Poetry | (<span style="color:green">✓</span>) | | | |
| Reviews | | <span style="color:green">✓</span> | | <span style="color:green">✓</span> |
| Social | <span style="color:green">✓</span> | | | <span style="color:green">✓</span> |
| Spoken | <span style="color:green">✓</span> | <span style="color:green">✓</span> | | <span style="color:green">✓</span> |
| Wiki | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> |
| Web | <span style="color:green">✓</span> | | | <span style="color:green">✓</span> |


# Datasets


## Descriptions
Expand Down
37 changes: 37 additions & 0 deletions docs/domains.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@

# Domains
This section examines coverage and performance across task types in SEB. The domains follows the categories used in the [Universal Dependencies project](https://universaldependencies.org).

# Performance across Domains
The table show the performance across domains in the Scandinavian Embedding Benchmark.

<iframe title="Domains SEB" aria-label="Table" id="datawrapper-chart-F00q5" src="https://datawrapper.dwcdn.net/F00q5/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important; border: none;" height="1043" data-external="1"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(a){if(void 0!==a.data["datawrapper-height"]){var e=document.querySelectorAll("iframe");for(var t in a.data["datawrapper-height"])for(var r=0;r<e.length;r++)if(e[r].contentWindow===a.source){var i=a.data["datawrapper-height"][t]+"px";e[r].style.height=i}}}))}();
</script>


## Coverage
The following table show the coverage pr. language. Note that some are only partially includes. This is due to some text partially including data from the domain though it is not considered the majority.

| | Danish | Norwegian Bokmål | Norwegian Nynorsk | Swedish |
| ----------- | :----------------------------------: | :--------------------------------: | :--------------------------------: | :--------------------------------: |
| **Domain** | | | | |
| Academic | (<span style="color:green">✓</span>) | | | |
| Bible | | | | |
| Blog | | | | |
| Fiction | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> |
| Government | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> |
| Legal | (<span style="color:green">✓</span>) | <span style="color:green">✓</span> | <span style="color:green">✓</span> | |
| Medical | | | | |
| News | <span style="color:green">✓</span> | <span style="color:green">✓</span> | | <span style="color:green">✓</span> |
| Non-Fiction | <span style="color:green">✓</span> | <span style="color:green">✓</span> | | <span style="color:green">✓</span> |
| Poetry | (<span style="color:green">✓</span>) | | | |
| Reviews | | <span style="color:green">✓</span> | | <span style="color:green">✓</span> |
| Social | <span style="color:green">✓</span> | | | <span style="color:green">✓</span> |
| Spoken | <span style="color:green">✓</span> | <span style="color:green">✓</span> | | <span style="color:green">✓</span> |
| Wiki | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> | <span style="color:green">✓</span> |
| Web | <span style="color:green">✓</span> | | | <span style="color:green">✓</span> |





7 changes: 5 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@

# Scandinavian Embedding Benchmark

This is the documentation for the Scandinavian Embedding Benchmark. This benchmark is intended to evaluate the sentence/document embeddings of large language models.
This is the documentation for the Scandinavian Embedding Benchmark. This benchmark is intended to evaluate the sentence/document embeddings of language models for mainland Scandinavian Languages.

Intended uses for this benchmark:

- Evaluating document embeddings of Scandinavian language models
- Evaluating document embeddings for multilingual models in Scandinavian languages
- Evaluating document embeddings of multilingual models for Scandinavian languages
- Allow ranking of competing Scandinavian and multilingual models using no more compute than what a consumer laptop can provide


Expand All @@ -15,6 +15,7 @@ Intended uses for this benchmark:
<iframe title="Scandinavian Sentence Embedding Benchmark" aria-label="Table" id="datawrapper-chart-7Nwjx" src="https://datawrapper.dwcdn.net/7Nwjx/16/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important; border: none;" height="910" data-external="1"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(a){if(void 0!==a.data["datawrapper-height"]){var e=document.querySelectorAll("iframe");for(var t in a.data["datawrapper-height"])for(var r=0;r<e.length;r++)if(e[r].contentWindow===a.source){var i=a.data["datawrapper-height"][t]+"px";e[r].style.height=i}}}))}();
</script>


=== "Danish"

<iframe title="Danish Sentence Embedding Benchmark" aria-label="Table" id="datawrapper-chart-us1YK" src="https://datawrapper.dwcdn.net/us1YK/12/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important; border: none;" height="910" data-external="1"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(a){if(void 0!==a.data["datawrapper-height"]){var e=document.querySelectorAll("iframe");for(var t in a.data["datawrapper-height"])for(var r=0;r<e.length;r++)if(e[r].contentWindow===a.source){var i=a.data["datawrapper-height"][t]+"px";e[r].style.height=i}}}))}();
Expand All @@ -31,6 +32,8 @@ Intended uses for this benchmark:
</script>




## Comparison to other benchmarks

If you use this benchmark for a relative ranking of language models where you plan to fine-tune the models I would recommend looking at [ScandEval](https://scandeval.github.io), which benchmarks the model using a cross-validated fine-tuning. It also includes structured prediction tasks such as named entity recognition. Many of the tasks in this embedding benchmark are also included in ScandEval, and an attempt has been made to use the same versions. A few tasks (ScandiQA) are included in ScandEval, but not in this benchmark as they are human translations of an English dataset.
Expand Down
Loading

0 comments on commit 02b5993

Please sign in to comment.