Records Editor
Document Display

Re-Extraction

3min

Our Re-Extraction feature is a powerful tool designed to ensure adjustments to your document extraction experience are seamless and quick. This feature enables users to re-extract values from a document against Patterns of their choice in such a way that any adjustments to the OCR-obtained tables and text structure can be considered.

Pages that are run as part of re-extract are not counted toward your organization's pages processed count, saving your organization from the costs typically associated with re-running a document when adjustments are needed.

There are two primary scenarios where this functionality becomes invaluable:

  1. Minor OCR Adjustments:
    • On rare occassions with misaligned or small text, the OCR model may encounter a minor hiccup, such as treating two lines of text as separate boxes or missing a final, less populated row in a table
    • Our Document Viewer allows you to quick, manual adjustment to ensure precision in your extracted values
    • After making the adjustment, you can use Re-extract to use your Pattern to extract updated values based on newly redrawn boxes, post modification
  2. Automation Modifications:
    • Sometimes a modification to your extraction automation is necessary for a more accurate and up-to-date extraction of values, which can be accomplished via a machine learning model update or a Pattern change
    • With the Re-Extraction feature, you can effortlessly run the updated automation to reflect the changes, ensuring that your extracted data remains aligned with the latest automation configurations
      • This can even be done to run the document against Patterns that aren't currently associated with the document. This is helpful in the following scenarios:
        • If you uploaded your document directly to a Pattern, but you wanted another Pattern to extract data from it
        • If you enabled or created a new Pattern after your document had been uploaded
        • If a previous version of a Pattern didn't detect data on your document but has been since updated



In the following example, we have a Brokerage Statement that has a text box that has been modified for the sake of demonstrating Re-Extract such that the "Broker Address" could not be located by the Pattern:

Document image


Therefore, we will remove the text box in order to draw a new one:

Document image

Document image


After doing so, we will click on the "Re-Extract" button at the top-right of the Document Viewer page we're currently viewing. This will open a prompt to select the Patterns to re-extract on:

Document image


After confirming, we can observe that our Pattern is Re-Extracting. During this time, the Pattern extracted values are locked from being edited and the document cannot be Reviewed, as new values are being produced:

Document image


Once this has completed, the new values will be displayed! Please note that data validation will be enforced on re-extracted data to give you assurance that your data is available and in the right format. This means that Review will be locked accordingly if validation detects any issues with the extracted values.

We can observe that "Broker Address" has been extracted and the value is linked to the area in the document where it was found:

Document image