Form Recognizer Studio
Last updated
Last updated
Form Recognizer Studio is an online tool for visually exploring, understanding, and integrating features from the Form Recognizer service into your applications. More information regarding the Form Recognizer Studio and how to get started analyzing documents with pre trained and custom models can be found here.
Navigate to the following URL https://formrecognizer.appliedai.azure.com/studio.
Scroll down to the bottom of the page and click on "Custom Extraction Model".
In this screen you will see all existing projects and the options to create, share, import and delete projects.
When you have a new customer who doesn't have a project/trained model, you will need to create a project and go trough the setup process.
Click on "Create a project"
Enter project details according to the prompt - click "Continue".
Configure service resources according to the prompt - click "Continue".
"Subscription", "Resource Group", "Form Recognizer, or Cognitive Resource" - these are defined under the installation documentation of this component and should be available in the drop-down list.
Connect training data source according to prompt - click "Continue".
"Subscription", "Resource Group", "Storage Account", "Blob container" is defined under the installation documentation of this component.
Set Folder Path.
Should be have the format {CustomerId}_{AddressId}
It's important that the Folder Path have the correct format or else the solution will not work.
Review and create - click "Create project".
Go through the summary.
You have now created a new project for your customer and you can now train a model.
To get started you need at least five documents for training your model, or else you will be shown the following prompt.
These documents should cover the range of variations and layouts you expect in your target documents. Make sure your documents are of sufficient quality and in a supported format.
Label your training data
In Form Recognizer Studio, you'll need to label the key fields of the information you want the model to extract from the documents. This labeling process involves identifying and highlighting the relevant fields in each document manually. You can use the labeling tools provided in the studio to annotate your data accurately. After you have uploaded your first document to the Form Recognizer Studio you can start to labeling your document.
Click on "Run Layout" - the Form Recognizer Studio will try to map objects in your document.
To add fields to your document, click on the "Draw region" button.
You are now able to highlight text and objects and map these labels to different fields.
Highlight the text/object where you want to create a field.
Give the field a name.
Set the type of the field.
When creating a field you need to make sure that the Subtype of the field is correct.
If the Subtype is not correct, you may not get all the data when using the REST connector.
For example, if you have used "Integer" instead of "Number" for the field "Price" then you will miss the decimal values for that field.
You have now created a field and it is now visible on the right side in the Form Recognizer Studio.
Repeat for all relevant data in your document.
Exit by click the button "End drawing".
Upload and train your model
Once you've labeled at least five documents, then you can upload your training data to the Form Recognizer Studio. The studio will analyze your data to learn the layout and structure of the documents, then initiate the training process. The Form Recognizer Studio will use the labeled data to train a custom model tailored to your specific document types.
Click on the button "Train"
Evaluate and iterate
After training, evaluate the performance of your model using a separate set of test documents. Review the extracted information and identify any areas where the model may need improvement.
Navigate to Test, upload a similar document, and then click on the "Run analysis" button.
Form Recognizer Studio will run an analysis of this document using the trained model.
Depending on the result of the analysis you may or may not need to relabel some fields and retrain your model.
Iterate and improve
Based on the evaluation results, iterate on the training process. You can add more labeled data, refine the labels, or adjust parameters to improve the model's performance. Repeat the training process until you achieve satisfactory results.