• Another way of doing consulting.

Start » Tutorials and news » Extracting invoice data with Azure Form Recognizer and Python

Extracting invoice data with Azure Form Recognizer and Python

In the business world, automating the reading of documents such as invoices can save a lot of time and prevent many errors. In this workshop, we'll see how easy it is to use. Azure Form Recognizer to parse PDF invoices and save their structured data to a JSON file.

? What is Azure Form Recognizer?

Form Recognizer is a Microsoft Azure artificial intelligence service that extracts structured information from unstructured documents such as invoices, receipts, forms, etc. We will use the pre-trained model for invoices, ideal for getting started without having to train your own model.

?The code step by step

1. Imports and credentials

We import the necessary Azure libraries and the module json.

from azure.ai.formrecognizer

import DocumentAnalysisClient

from azure.core.credentials import AzureKeyCredential

import json


 2. Function to extract information through the Azure service

This function receives a PDF file, sends it to the Azure service, and returns the extraction result.

  • DocumentAnalysisClient: Azure client to communicate with the service.
  • begin_analyze_document: Start the analysis using the model "prebuilt-invoice", specialized in invoices.
  • cls=...: This part defines how the response should be processed. We use json.loads(...) to convert the HTTP response into a Python dictionary.

? 3. Run the analysis and save the result

  • The PDF file to be read is opened with the first with open This way we ensure that the stream closes only when it ends. The parameter is used "rb" because it will be used in binary read mode.
  • We then assign to result the value returned after analyzing the PDF using the aforementioned function.
  • A second opens with open this time in writing mode "w", the result of the reading is saved in the dump method of the json module, which will make it easier to go from the Python dictionary to the information serialized in JSON.
  • The file will be saved in a JSON file as the raw result of the call.

✅ What does Azure Form Recognizer return?

In the case of invoices, the pre-trained model can return:

  • Invoice number
  • Issue date
  • Total
  • Subtotal, taxes, discounts
  • Supplier and customer name
  • Product or service lines
  • Payment methods

All this becomes a Structured JSON ready to integrate with other systems.

?️ Requirements to run this code

Before using this script, make sure you:

  • Have an Azure account and have created a resource Form Recognizer.
  • Having obtained your API Key Y Endpoint from the Azure portal.
  • Install the necessary packages:
           pip install azure-ai-formrecognizer

 

? Conclusion

This small script automates PDF invoice reading using AI with just a few lines of code. With Azure Form Recognizer, you can easily scale to thousands of documents, saving time and avoiding human error.

And you? Are you ready to stop reading invoices by hand? Contact us if you found this interesting.

Latest related posts

Resumen de privacidad

Esta web utiliza cookies para que podamos ofrecerte la mejor experiencia de usuario posible. La información de las cookies se almacena en tu navegador y realiza funciones tales como reconocerte cuando vuelves a nuestra web o ayudar a nuestro equipo a comprender qué secciones de la web encuentras más interesantes y útiles.