What is MultiSpanQA?
MultiSpanQA is a question answering dataset focuses on questions with multi-span answers.
For more details about MultiSpanQA, please refer to our paper:
Getting started
MultiSpanQA is distributed under a CC BY-SA 4.0 License.
Datasets can be downloaded below:
To reproduce the baseline results reported in our paper, visit our github repo.
Evaluation
You can evaluate your model with our evluation script by running:
python eval_script.py --pred_file <path_to_prediction> --gold_file <path_to_gold>
You can submit your prediction by sending email to nathan.8270.n@gmail.com with the title "MultiSpanQA submission (multi/expand/both) (your model name)" and the same format of sample prediction file.
Leaderboard (Multi-Span QA only)
In this setting, at least two answers exist for each question-document pair.
Rank | Model | Exact Match | Partial Match | ||||
---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | ||
1 Oct 14, 2022 |
LIQUID (RoBERTa-large Ensemble) Korea University AAAI 2023 |
74.34 | 71.96 | 73.13 | 85.57 | 81.27 | 83.36 |
2 Dec 28, 2022 |
BigBERTa-large Tagger Anonymous |
70.82 | 74.95 | 72.83 | 86.32 | 87.68 | 86.99 |
3 Oct 11, 2022 |
SpanQualifier (RoBERTa-large) Nanjing University CIKM 2023 |
72.42 | 73.08 | 72.75 | 85.34 | 85.48 | 85.41 |
4 Oct 14, 2022 |
LIQUID (RoBERTa-large) Korea University AAAI 2023 |
74.99 | 68.22 | 71.44 | 85.29 | 77.0 | 80.93 |
5 Jun 16, 2023 |
Contrastive Span Selector (RoBERTa-large) Anonymous under review |
74.33 | 68.38 | 71.23 | 86.4 | 79.17 | 82.62 |
6 Jun 23, 2023 |
Iterative Extractor (RoBERTa-large) Peking University ACL 2023 |
69.39 | 72.06 | 70.7 | 85.09 | 84.64 | 84.87 |
7 Sep 06, 2022 |
RoBERTa-large Tagger Illuin Technology - Research Team None |
68.12 | 70.3 | 69.19 | 85.56 | 82.22 | 83.86 |
8 Jun 16, 2023 |
Contrastive Span Selector (RoBERTa-base) Anonymous under review |
69.76 | 68.27 | 69.01 | 83.03 | 78.37 | 80.63 |
9 Aug 9, 2022 |
LIQUID (RoBERTa-base) Korea University AAAI 2023 |
65.7 | 69.18 | 67.4 | 80.86 | 81.45 | 81.16 |
10 Oct 11, 2022 |
SpanQualifier (BERT-base) Nanjing University CIKM 2023 |
63.56 | 66.24 | 64.87 | 77.96 | 79.63 | 78.79 |
11 Jun 16, 2023 |
Contrastive Span Selector (BERT-base) Anonymous under review |
63.9 | 62.5 | 63.19 | 77.52 | 74.6 | 76.04 |
12 Apr 25, 2022 |
BERT-base Tagger (multi-task) The University of Melbourne Paper (NAACL 22) |
58.12 | 60.5 | 59.28 | 79.56 | 73.23 | 76.26 |
13 Apr 25, 2022 |
BERT-base Tagger The University of Melbourne Paper (NAACL 22) |
52.45 | 61.11 | 56.45 | 75.91 | 74.53 | 75.22 |
14 Apr 25, 2022 |
BERT-base Single-span The University of Melbourne Paper (NAACL 22) |
16.2 | 12.98 | 14.41 | 60.31 | 76.78 | 67.56 |
Leaderboard (Expanded)
In this setting, single-span questions and unanswerable questions were added.
Rank | Model | Exact Match | Partial Match | ||||
---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | ||
1 Jun 16, 2023 |
Contrastive Span Selector (RoBERTa-large) Anonymous under review |
74.22 | 68.75 | 71.38 | 83.33 | 77.09 | 80.09 |
2 Oct 11, 2022 |
SpanQualifier (RoBERTa-large) Nanjing University CIKM 2023 |
69.16 | 72.62 | 70.85 | 80.75 | 82.55 | 81.64 |
3 Sep 06, 2022 |
RoBERTa-large Tagger Illuin Technology - Research Team None |
70.14 | 68.0 | 69.05 | 81.62 | 76.48 | 78.97 |
4 Jun 16, 2023 |
Contrastive Span Selector (RoBERTa-base) Anonymous under review |
71.07 | 63.94 | 67.32 | 80.37 | 71.73 | 75.81 |
5 Oct 11, 2022 |
SpanQualifier (BERT-base) Nanjing University CIKM 2023 |
64.31 | 64.47 | 64.39 | 75.1 | 73.37 | 74.22 |
6 Jun 16, 2023 |
Contrastive Span Selector (BERT-base) Anonymous under review |
65.64 | 61.74 | 63.63 | 75.15 | 69.71 | 72.33 |
7 Apr 25, 2022 |
BERT-base Tagger (multi-task) The University of Melbourne Paper (NAACL 22) |
42.74 | 41.81 | 42.26 | 74.05 | 68.06 | 70.47 |
8 Apr 25, 2022 |
BERT-base Tagger The University of Melbourne Paper (NAACL 22) |
39.43 | 43.54 | 41.38 | 70.79 | 69.42 | 70.1 |
9 Apr 25, 2022 |
BERT-base Single-span The University of Melbourne Paper (NAACL 22) |
13.36 | 12.05 | 12.66 | 63.01 | 73.09 | 67.73 |