Empower Your Donut Model for Receipts with Self-Annotated Data

In this article, I will show you how to fine-tune the Donut model with your own custom receipts data. Further fine-tuning the Donut model for your specific need, can massively boost the performance of the model in the particular task. This article will use a Donut model already fine-tuned on the CORD dataset, annotate some receipts, and then use those annotations to further fine-tune the Donut model.

Create your own dataset and fine-tune your Donut model with custom data with this tutorial. Photo by Isabella Fischer on Unsplash

Story overview

  • Finding an annotation tool and annotating
  • Converting data to the correct format
  • Training with your annotated data

Finding an annotation tool and annotating

To create your own dataset, you have to have an annotation tool. Luckily, there are plenty of tools available online. For this tutorial, I will be using the Sparrow annotation tool from this GitHub repository. Note that this is forked from another GitHub repository, and with a few changes for my specific needs which are described later in the article. How to annotate is explained in the repository, but I will display the steps you have to do below, for simplicity.

Click Here