CVPARSER DOCUMENTS

cvparser-documents-overview

INTRODUCTION

intro-1

STS Software is the leading Vietnam-based Agile software outsourcing company in Ho Chi Minh and Da Nang Cities, Vietnam, established in 2012 with the power of more than 350 top-tier software engineers and a mature process. Each month, we receive a large number of resumes from potential employees. Meaning we have to sort through a mountain of CVs.

intro-2

In addition, when looking for a good applicant, we consult some online tools or other sources such as LinkedIn... The standard procedure involves our Talent Acquisition (TA) team manually checking each CV file to get the information, then passing it along to the Tech Lead and Project Manager for review and interview. Finally, it is passed along to the Human Resources (HR) team for processing, making the contract, updating the candidate's personal information to our system, etc.

intro-3

STS Software also boasts a strong AI team with a lot of experience in developing the AI software solution, our AI team has joined many similar projects before that providing our customer some AI solutions to process the big dataset and create the very powerful system with high performance, so we have applied these technologies and create an end to end system to process automatically the CV data, the CV Parser system.

OUR APPROACHES

There are many available tools or PDF reader modules, and libraries… to read the text layer from the .pdf file. But these outputs are only the text which is arranged line by line, the received information is messy and meaningless. For our issue, extracting the necessary information from a .pdf CV file, we will have to face up to some problems below:

The structure of CV files is so varied, and they are not in the same format.

It is difficult to cluster all related sections together.

It is hard for machines to know the meaning of each text data.

We will need a lot of rules to clean that text information, …

However, some state-of-the-art AI technologies could deal with the above issues, so we have built an end-to-end system, CV Parser, that could help us automatically parse all meaningful information from a .pdf file. Our system architecture was divided into 4 main parts:

Part 1: Input Pre-processing

  • Input: . pdf CV file
  • Output: Cleaned Image layers
  • Part 2: Detect Block Text Region

  • Input: Image
  • Output: Block text locations
  • Part 3: Extract Necessary Information

  • Input: Text Region
  • Output: Text, important information
  • Part 4:

    {
    Name:...,
    Email:...,
    Phone:...,
    Work:...,
    Edu:...,
    ...
    }

    USAGE

    usage-1

    Step 01

    Access to the CV Parser site: https://experiment.saigontechnology.vn/cvparser. Or you can access the main Saigon Technology AI Research Lab page here: https://experiment.saigontechnology.vn/, select the CV Parser section and click Try our demo button.

    usage-2

    Step 02

    On the CVParser page, click the SELECT A File button.

    usage-3

    Step 03

    Choose a .pdf CV file you want to run

    usage-4

    Step 04

    The output extracted information will be printed out as bellow As you can see, the entire process requires a significant amount of work to obtain the required information from CV data; it will also scale up if we have a large amount of CV data to deal with. So we plan to use Artificial Intelligence solutions to autonomously pull all of the required information from CV data, such as name, contact information, job experience, education, and so on. With all of this information, we can categorize the applicant to identify the top prospects, or we can quickly comprehend the candidate.

    Let’s Talk

    Together with our developers and analysts, we begin by discussing and analysing our client’s needs, sketching the outline