📁 Parse a Folder of Resumes
Learn how to efficiently parse a folder of resumes in HrFlow.ai using the add_folder function.
HrFlow.ai's Python SDK provides a powerful function, add_folder
, to parse a folder of resumes and store the extracted profiles in a specified source. This guide will show you how to use this function effectively.
A. Why Use the add_folder
Function?
add_folder
Function?The add_folder
function is designed to streamline the process of parsing multiple resumes stored in a directory. It allows you to automate the extraction of profile information and store it in HrFlow.ai, enabling efficient data management and analysis.
B. Step-by-Step Guide
B.1. Initialize the HrFlow Client
Prerequisites
- ✨ Create a Workspace
- 🔑 Get your API Key
- 🧠 Activate Profile Parsing API
- 🔌 Create a Source
- HrFlow.ai Python SDK version 4.0.0 or above: Install it via pip with
pip install -U hrflow>=4.0.0
or conda withconda install hrflow>=4.0.0 -c conda-forge
.
First, initialize the HrFlow client with your API credentials.
from hrflow import Hrflow
from hrflow.utils import generate_parsing_evaluation_report
client = Hrflow(api_secret="your_api_secret", api_user="your_api_user")
B.2. Ensure Your Data is Ready
Before parsing the folder of resumes, make sure that you have placed the resumes in a directory that is accessible by the code.
For example, you can create a folder called "resumes" in the same directory as your Python script and place the resumes inside it.
.
└── project/
├── script.py
├── failures/
└── resumes/
├── john_doe.docx
├── jane-doe.pdf
└── ...
B.3 Parse the Folder of Resumes
Use the add_folder
function to parse the resumes and store the profiles in a specified source.
results = client.profile.parsing.add_folder(
source_key="YOUR_SOURCE_KEY",
dir_path=STORAGE_DIRECTORY_PATH,
is_recursive=True,
move_failure_to=FAILURES_DIRECTORY_PATH,
show_progress=True,
max_requests_per_minute=30,
min_sleep_per_request=1,
)
Field Explanations:
Parameter | Type | Example | Description |
---|---|---|---|
source_key | str | "YOUR_SOURCE_KEY" | The key identifying the source where your profiles (CVs) will be stored. This key is unique to the source and is required to specify the destination for the parsed profiles. |
dir_path | str | "./resumes" | The directory path where the resumes to be parsed are stored. This is the path to the folder containing the profile resumes. |
is_recursive | bool | True | Indicates whether to parse files in subfolders as well. If set to True , the function will parse files in the specified directory and all its subdirectories. |
created_at | str (optional) | "2021-05-01T00:00:00Z" | The original date of the application of the profile in ISO format. This field is optional and can be used to set a specific application date for the profiles. |
sync_parsing | bool | 0 | Indicates whether to perform synchronous parsing. If set to 1 , parsing is performed synchronously. If set to 0 , parsing is performed asynchronously. |
move_failure_to | str or None | "./failures" | The directory path to move the failed files. If set to None , the failed files will not be moved. |
show_progress | bool | True | A flag to indicate whether a progress bar should be displayed during the parsing process. This can be useful for monitoring the progress, especially when processing a large number of resumes. |
max_requests_per_minute | int | 30 | The maximum number of requests that can be made per minute. This is used to rate limit the parsing requests to avoid overloading the server. |
min_sleep_per_request | float | 1 | The minimum time to wait between requests, in seconds. This helps to space out the requests and comply with rate limiting. |
C. Additional Resources
- HrFlow.ai Python SDK on PyPI
- HrFlow Cookbook: Repository containing helpful notebooks by the HrFlow.ai team
- Connectors Source Documentation
Updated 6 months ago