MultiSpanQA Homepage

What is MultiSpanQA?

MultiSpanQA is a question answering dataset focuses on questions with multi-span answers.

For more details about MultiSpanQA, please refer to our paper:

Getting started

MultiSpanQA is distributed under a CC BY-SA 4.0 License.

Datasets can be downloaded below:

To reproduce the baseline results reported in our paper, visit our github repo.

Evaluation

You can evaluate your model with our evluation script by running: python eval_script.py --pred_file <path_to_prediction> --gold_file <path_to_gold>

You can submit your prediction by sending email to nathan.8270.n@gmail.com with the title "MultiSpanQA submission (multi/expand/both) (your model name)" and the same format of sample prediction file.

Leaderboard (Multi-Span QA only)

In this setting, at least two answers exist for each question-document pair.

Rank	Model	Exact Match			Partial Match
Rank	Model	P	R	F1	P	R	F1
1 Oct 14, 2022	LIQUID (RoBERTa-large Ensemble) Korea University AAAI 2023	74.34	71.96	73.13	85.57	81.27	83.36
2 Dec 28, 2022	BigBERTa-large Tagger Anonymous	70.82	74.95	72.83	86.32	87.68	86.99
3 Oct 11, 2022	SpanQualifier (RoBERTa-large) Nanjing University CIKM 2023	72.42	73.08	72.75	85.34	85.48	85.41
4 Oct 14, 2022	LIQUID (RoBERTa-large) Korea University AAAI 2023	74.99	68.22	71.44	85.29	77.0	80.93
5 Jun 16, 2023	Contrastive Span Selector (RoBERTa-large) Anonymous under review	74.33	68.38	71.23	86.4	79.17	82.62
6 Jun 23, 2023	Iterative Extractor (RoBERTa-large) Peking University ACL 2023	69.39	72.06	70.7	85.09	84.64	84.87
7 Sep 06, 2022	RoBERTa-large Tagger Illuin Technology - Research Team None	68.12	70.3	69.19	85.56	82.22	83.86
8 Jun 16, 2023	Contrastive Span Selector (RoBERTa-base) Anonymous under review	69.76	68.27	69.01	83.03	78.37	80.63
9 Aug 9, 2022	LIQUID (RoBERTa-base) Korea University AAAI 2023	65.7	69.18	67.4	80.86	81.45	81.16
10 Oct 11, 2022	SpanQualifier (BERT-base) Nanjing University CIKM 2023	63.56	66.24	64.87	77.96	79.63	78.79
11 Jun 16, 2023	Contrastive Span Selector (BERT-base) Anonymous under review	63.9	62.5	63.19	77.52	74.6	76.04
12 Apr 25, 2022	BERT-base Tagger (multi-task) The University of Melbourne Paper (NAACL 22)	58.12	60.5	59.28	79.56	73.23	76.26
13 Apr 25, 2022	BERT-base Tagger The University of Melbourne Paper (NAACL 22)	52.45	61.11	56.45	75.91	74.53	75.22
14 Apr 25, 2022	BERT-base Single-span The University of Melbourne Paper (NAACL 22)	16.2	12.98	14.41	60.31	76.78	67.56

Leaderboard (Expanded)

In this setting, single-span questions and unanswerable questions were added.

Rank	Model	Exact Match			Partial Match
Rank	Model	P	R	F1	P	R	F1
1 Jun 16, 2023	Contrastive Span Selector (RoBERTa-large) Anonymous under review	74.22	68.75	71.38	83.33	77.09	80.09
2 Oct 11, 2022	SpanQualifier (RoBERTa-large) Nanjing University CIKM 2023	69.16	72.62	70.85	80.75	82.55	81.64
3 Sep 06, 2022	RoBERTa-large Tagger Illuin Technology - Research Team None	70.14	68.0	69.05	81.62	76.48	78.97
4 Jun 16, 2023	Contrastive Span Selector (RoBERTa-base) Anonymous under review	71.07	63.94	67.32	80.37	71.73	75.81
5 Oct 11, 2022	SpanQualifier (BERT-base) Nanjing University CIKM 2023	64.31	64.47	64.39	75.1	73.37	74.22
6 Jun 16, 2023	Contrastive Span Selector (BERT-base) Anonymous under review	65.64	61.74	63.63	75.15	69.71	72.33
7 Apr 25, 2022	BERT-base Tagger (multi-task) The University of Melbourne Paper (NAACL 22)	42.74	41.81	42.26	74.05	68.06	70.47
8 Apr 25, 2022	BERT-base Tagger The University of Melbourne Paper (NAACL 22)	39.43	43.54	41.38	70.79	69.42	70.1
9 Apr 25, 2022	BERT-base Single-span The University of Melbourne Paper (NAACL 22)	13.36	12.05	12.66	63.01	73.09	67.73