Long-Context LLM Benchmarks

DatasetRelease DateTypeDomainToken LengthLanguageData Released?Answer Released?
ZeroSCROLLS2023-05RealisticNovel Report Meetings TV WikipediaAvg ~15kEN
L-Eval2023-07RealisticMath Code Paper e.t.cACL'24 Outstanding Avg ~ 15kZH
LongBench2023-08RealisticCode Meeting Wiki NovelAvg ~13kZH EN
BAMBOO2023-09RealisticPaper TVshows GovReport Code MeetingOnly 4k, 16kEN
LooGLE2023-11RealisticPaper Wikipedia TV&MovieAvg ~24KEN
LVEval2024-02RealisticMixup16 32 64 128 256kZH EN
InfiniteBench2024-02RealisticCode Novel Math Dialogue> 100kZH EN
DocFInQA2024-02RealisticFinance> 100kEN
Counting-Stars2024-03NeedleEssay NovelAnyZH EN
ClongEval2024-03RealisticStory News Conversation< 100kZH
NovelQA2024-03RealisticNovel> 100 kEN
RULER2024-04NeedleEssaysAnyEN
XL2Bench2024-04RealisticNovel Paper Law> 100kZH EN
babilong2024-06NeedleBooksAnyEN
MedOdyssey2024-06Realistic NeedleMedical40k-180KZH EN
Loong2024-06RealisticPapers Legal Finance40k-230kZH EN
LongIns2024-06OtherMultible QA256 - 16kEN
NOCHA2024-07RealisticNovel> 100kEN
SummaryStack2024-07OtherNews ConversationsAvg ~92kEN
NeedleBench2024-07NeedleEssaysAnyZH EN
ML-Needle2024-08NeedleWikipedia4K-32KZH EN SP GR AR VT