Text Parsing

Parse any text or document with a few lines of code.

Text Parsing is a general-purpose parsing engine. It uses text inputs to extract meaningful semantic entities to define, design, and customize your own parsed document object.

The most common example lies in standardizing and enriching job offers from your databases. Additionally, by bringing out semantic entities from an unstructured job offer, such as companies, locations, tasks, skills, and more, creating efficient dashboards and reports can be done with ease.

πŸ“˜

Prerequisites

  1. ✨ Create a Workspace
  2. πŸ”‘ Get your API Key
  3. 🧠 Activate Text Parsing API
  4. Download HrFlow.ai's Postman

πŸ“˜

API Endpoint

Get more information about the endpoint 🧠 Parse a raw Text.

Step 1: Configure your Postman Environment

Following the steps from the HrFlow.ai Postman publication will make you land on this page:

2880

First, click on the "Environments" tab on the left side of your Postman window. Then, fill in the Empty - Environment template with the correct values. The compulsory variables for Text Parsing are:

Finally, save the environment and ensure that you selected Empty - Environment as your current environment.

1920

Step 2: Get your First Text Parsing Results

Fill in your body parameters in a raw format. The body contains only one key named text associated with the text you want to parse.

1920

The result of your parsing lies into data and contains:

  • text: the text sent to the API
  • ents: the list Parsing entities detected by our API
  • parsing: the lists of entities grouped by their type

πŸ“˜

Data Fields

  • The parsing field gives all the list of entities extracted from your text.
  • The ents field targets more advanced applications that require the position of entities within your text.

Each Parsing entity from the ents field is composed by three informations:

  • start: the beggining of the entity in the text
  • end: the end of the entity in the text
  • label: the type of entity (e.g. JobTitle, Company, Location, etc)

For example, given the following Response:

{
    "code": 200,
    "message": "Text parsing results",
    "data": {
        "ents": [
            {
                "end": 19,
                "label": "job_title",
                "start": 0
            },
            {
                "end": 118,
                "label": "company",
                "start": 99
            },
            {
                "end": 171,
                "label": "location",
                "start": 120
            },
            {
                "end": 190,
                "label": "phone",
                "start": 177
            },
            {
                "end": 209,
                "label": "phone",
                "start": 196
            },
            {
                "end": 236,
                "label": "email",
                "start": 217
            },
            {
                "end": 273,
                "label": "skill_hard",
                "start": 257
            }
        ],
        "parsing": {
            "certifications": [],
            "companies": [
                "Stanford University"
            ],
            "courses": [],
            "dates": [],
            "durations": [],
            "education_titles": [],
            "emails": [
                "[email protected]"
            ],
            "first_names": [],
            "interests": [],
            "job_titles": [
                "Assistant Professor"
            ],
            "languages": [],
            "last_names": [],
            "locations": [
                "Room 156, Gates Building 1A Stanford, CA 94305-9010"
            ],
            "phones": [
                "(650)725-2593",
                "(650)725-1449"
            ],
            "schools": [],
            "skills_hard": [
                "Machine learning"
            ],
            "skills_soft": [],
            "tasks": []
        },
        "text": "Assistant Professor\nComputer Science Department Department of Electrical Engineering (by courtesy)\nStanford University.\nRoom 156, Gates Building 1A Stanford, CA 94305-9010\nTel: (650)725-2593\nFAX: (650)725-1449\nemail: [email protected]\nResearch interests: Machine learning, broad competence artificial intelligence, reinforcement learning and robotic control, algorithms for text and web data processing."
    }
}
{
    "text": "Assistant Professor\nComputer Science Department Department of Electrical Engineering (by courtesy)\nStanford University.\nRoom 156, Gates Building 1A Stanford, CA 94305-9010\nTel: (650)725-2593\nFAX: (650)725-1449\nemail: [email protected]\nResearch interests: Machine learning, broad competence artificial intelligence, reinforcement learning and robotic control, algorithms for text and web data processing."
}

The first entity is a JobTitle starting from 0 and till the 19th character (excluded) of the following text:

Assistant Professor
Computer Science Department Department of Electrical Engineering (by courtesy)
Stanford University.
Room 156, Gates Building 1A Stanford, CA 94305-9010
Tel: (650)725-2593
FAX: (650)725-1449
email: [email protected]
Research interests: Machine learning, broad competence artificial intelligence, reinforcement learning and robotic control, algorithms for text and web data processing.

Thus, the first Parsed element is the JobTitle Assistant Professor from the text.

612

Building a structured object naturally follows by iterating through all the ents returned by the Text Parsing API.

Advanced Topics

1. Try Text Parsing in your Favorite Programming Language

You can use Postman to work with your favorite programming language. Here is an example with Python.

1920
import requests
import json

url = "https://api.hrflow.ai/v1/text/parsing"

payload = json.dumps({
  "text": "Assistant Professor\nComputer Science Department Department of Electrical Engineering (by courtesy)\nStanford University.\nRoom 156, Gates Building 1A Stanford, CA 94305-9010\nTel: (650)725-2593\nFAX: (650)725-1449\nemail: [email protected]\nResearch interests: Machine learning, broad competence artificial intelligence, reinforcement learning and robotic control, algorithms for text and web data processing."
})
headers = {
  'X-USER-EMAIL': 'YOUR_USER_EMAIL',
  'X-API-KEY': 'YOUR_SECRET_KEY',
  'Content-Type': 'application/json',
  'Cookie': 'AWSALB=ykgwzj4n2IGp3TS3mWeLT3fmots2OHJsUFYIHuU70Wdqy72GDRnJYp5717+ixhHtiUV/qTOAS0ZagUbfFn71eY6dtxqPlSpj1cgxR4Apyh1o8bN4/BK7K1Fd4KIE; AWSALBCORS=ykgwzj4n2IGp3TS3mWeLT3fmots2OHJsUFYIHuU70Wdqy72GDRnJYp5717+ixhHtiUV/qTOAS0ZagUbfFn71eY6dtxqPlSpj1cgxR4Apyh1o8bN4/BK7K1Fd4KIE'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

What’s Next