
Jiuxiang Gu Xiangxi Shi Jason Kuen Lu Qi Ruiyi Zhang Anqi Liu Ani Nenkova Tong Sun
ICLR 2024 (Poster)
Research in document image understanding is hindered by limited high-quality document data. ADOPD introduces a large-scale dataset for document page decomposition with a data-driven document taxonomy discovery process during collection, and dense annotations. It supports four tasks: Doc2Mask, Doc2Box, Doc2Tag, and Doc2Seq.
- Data-driven taxonomy discovery during collection to improve diversity and balance.
- Dense annotations including entity masks and text bounding boxes; plus tags/captions cleaned with human involvement.
- Four benchmark tasks: Doc2Mask / Doc2Box / Doc2Tag / Doc2Seq.
Paper (OpenReview PDF) | OpenReview | ICLR Poster Page | Slides | Project Page | Code | Data (Annotations Release)
BibTeX
@inproceedings{
gu2024adopd,
title={{AD}o{PD}: A Large-Scale Document Page Decomposition Dataset},
author={Jiuxiang Gu and Xiangxi Shi and Jason Kuen and Lu Qi and Ruiyi Zhang and Anqi Liu and Ani Nenkova and Tong Sun},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=x1ptaXpOYa}
}