Automation

Text Tools

9min

Text Tools are ideal for automating data extraction of single text strings. At a high level, they work by looking for a string of text that matches the pattern specified. Once the tool finds a string that matches, it extracts the data and assigns it to a Field.

There are two types of Text Tools that help you extract text: Simple Mode and Advanced Mode

  • Simple Mode allows you to locate a known value on the page, either with the aid of surrounding context or not
  • Advanced Mode allows you to use regular expressions (regex) to flexibly identify text based on its format and potential surrounding context
    • Data Inbox uses Python-based regex. This means that you designate commonly seen surrounding text and symbols (e.g. “Name: ” or “Date: ”) and common formatting (“___@___.___” or “Commission File No. __-____”) to identify the value you want to extract

If you are unfamiliar or require more practice with using regex, it is recommended that you use an external regular expression builder in order to build and test the expression then copy and paste the expression into the query field.

To create a Text Tool:

  • Within the Kit associated with the Schema you'd like to work with, locate the Field you want to place data into. In this case, we are working with our Primary Schema, Performance Report (refer to Pattern Creation and Editing for a refresher)
  • Click "Add Tool" and select "Text Tool"
Document image

  • Select whether you would like to use a Simple or Advanced Text Tool to extract your data
Document image


Simple Mode

  • Enter the value that corresponds to your target. Please note that values are located as-is; if you require the ability to locate a value via format, please use the Advanced Text Tool
Document image

  • Designate whether you want to locate your target value by just the target value alone or with the aid of additional values that precede or succeed the target
Target Search Menu
Target Search Menu


Advanced Mode

  • In the Text Tool, enter a regex that would locate the target text
    • Specify any preceding text that would help identify the value you’d like to extract (e.g. “Office Phone: ” or “Statement Period: ”)
    • Use the following to designate the target text/ value you’d like to extract:
Text Pattern Format

Document image

  • For our example, we will combine the regex “wildcard” to mean any character, the dot ( . ) symbol with the regex symbol for repetition, the asterisk ( * ) symbol, meaning “repeating for any number of times”
    • This flexibility will allow us to capture any name, date, or performance number
    • WARNING: Please do not use the wildcard dot ( . ) symbol on its own without any surrounding context. This will create numerous records that will prevent you from being able to successfully navigate your values. This includes using the wildcard in other queries on their own, such as "." or ".*" or ".+" or ".?"
Document image

  • Repeat this process for as many Fields as you need to extract via Text Tools
Document image

Document image

  • You now have automated text extraction!
Document image




Updated 23 Feb 2023
Doc contributor
Doc contributor
Did this page help you?