MultiSpanQA

A Dataset for Multi-span Question Answering

What is MultiSpanQA?

MultiSpanQA is a question answering dataset focuses on questions with multi-span answers.

For more details about MultiSpanQA, please refer to our paper:

Getting started

MultiSpanQA is distributed under a CC BY-SA 4.0 License.

Datasets can be downloaded below:

To reproduce the baseline results reported in our paper, visit our github repo.

Evaluation

You can evaluate your model with our evluation script by running: python eval_script.py --pred_file <path_to_prediction> --gold_file <path_to_gold>

You can submit your prediction by sending email to nathan.8270.n@gmail.com with the title "MultiSpanQA submission (multi/expand/both) (your model name)" and the same format of sample prediction file.

Leaderboard (Multi-Span QA only)
In this setting, at least two answers exist for each question-document pair.
Rank Model Exact Match Partial Match
P R F1 P R F1
1
Oct 14, 2022
LIQUID (RoBERTa-large Ensemble)
Korea University
AAAI 2023
74.34 71.96 73.13 85.57 81.27 83.36
2
Dec 28, 2022
BigBERTa-large Tagger
Anonymous
70.82 74.95 72.83 86.32 87.68 86.99
3
Oct 11, 2022
SpanQualifier (RoBERTa-large)
Nanjing University
CIKM 2023
72.42 73.08 72.75 85.34 85.48 85.41
4
Oct 14, 2022
LIQUID (RoBERTa-large)
Korea University
AAAI 2023
74.99 68.22 71.44 85.29 77.0 80.93
5
Jun 16, 2023
Contrastive Span Selector (RoBERTa-large)
Anonymous
under review
74.33 68.38 71.23 86.4 79.17 82.62
6
Jun 23, 2023
Iterative Extractor (RoBERTa-large)
Peking University
ACL 2023
69.39 72.06 70.7 85.09 84.64 84.87
7
Sep 06, 2022
RoBERTa-large Tagger
Illuin Technology - Research Team
None
68.12 70.3 69.19 85.56 82.22 83.86
8
Jun 16, 2023
Contrastive Span Selector (RoBERTa-base)
Anonymous
under review
69.76 68.27 69.01 83.03 78.37 80.63
9
Aug 9, 2022
LIQUID (RoBERTa-base)
Korea University
AAAI 2023
65.7 69.18 67.4 80.86 81.45 81.16
10
Oct 11, 2022
SpanQualifier (BERT-base)
Nanjing University
CIKM 2023
63.56 66.24 64.87 77.96 79.63 78.79
11
Jun 16, 2023
Contrastive Span Selector (BERT-base)
Anonymous
under review
63.9 62.5 63.19 77.52 74.6 76.04
12
Apr 25, 2022
BERT-base Tagger (multi-task)
The University of Melbourne
Paper (NAACL 22)
58.12 60.5 59.28 79.56 73.23 76.26
13
Apr 25, 2022
BERT-base Tagger
The University of Melbourne
Paper (NAACL 22)
52.45 61.11 56.45 75.91 74.53 75.22
14
Apr 25, 2022
BERT-base Single-span
The University of Melbourne
Paper (NAACL 22)
16.2 12.98 14.41 60.31 76.78 67.56
Leaderboard (Expanded)
In this setting, single-span questions and unanswerable questions were added.
Rank Model Exact Match Partial Match
P R F1 P R F1
1
Jun 16, 2023
Contrastive Span Selector (RoBERTa-large)
Anonymous
under review
74.22 68.75 71.38 83.33 77.09 80.09
2
Oct 11, 2022
SpanQualifier (RoBERTa-large)
Nanjing University
CIKM 2023
69.16 72.62 70.85 80.75 82.55 81.64
3
Sep 06, 2022
RoBERTa-large Tagger
Illuin Technology - Research Team
None
70.14 68.0 69.05 81.62 76.48 78.97
4
Jun 16, 2023
Contrastive Span Selector (RoBERTa-base)
Anonymous
under review
71.07 63.94 67.32 80.37 71.73 75.81
5
Oct 11, 2022
SpanQualifier (BERT-base)
Nanjing University
CIKM 2023
64.31 64.47 64.39 75.1 73.37 74.22
6
Jun 16, 2023
Contrastive Span Selector (BERT-base)
Anonymous
under review
65.64 61.74 63.63 75.15 69.71 72.33
7
Apr 25, 2022
BERT-base Tagger (multi-task)
The University of Melbourne
Paper (NAACL 22)
42.74 41.81 42.26 74.05 68.06 70.47
8
Apr 25, 2022
BERT-base Tagger
The University of Melbourne
Paper (NAACL 22)
39.43 43.54 41.38 70.79 69.42 70.1
9
Apr 25, 2022
BERT-base Single-span
The University of Melbourne
Paper (NAACL 22)
13.36 12.05 12.66 63.01 73.09 67.73