2 V-T Tran et al.
with intermediate and college degrees that found jobs was 79.1% and 72.9%
respectively; meanwhile, only 55.6% of university graduates have jobs.
Besides, Vietnamese job portals have been considered as an important bridge
between recruitment managers and job seekers. Over the years, these portals
have accumulated a growing amount of digital labor-related market data such as
job listings and applicants’ resumes. However, the exploitation of these data is
limited as these portals only provide job categories and keyword-based search
functionality.
To enable advanced analysis, it is imperative to have a model that can automatically
detect skills from labor market-related data. The model can benefit advanced labor
market analysis and ultimately facilitate orienting workforce training and re-
skilling programs. Various approaches [18,13,11,17] consider this skill detection
task as a Named Entity Recognition (NER) task in natural language processing.
They have a common drawback: a large number of labeled sentences is needed to
train the NER models in a supervised setting. Other approaches detect skills from
a given document by performing a direct match between n-gram sequences and
terms in the target taxonomy [9,2,7]. These approaches, however, do not work for
Vietnamese language as there is no such a taxonomy yet.
In Vietnamese job listing websites, a job opening usually has a common semi-
structural format. Each job opening has the following sections:
– Title A short, one sentence highlighting for the job to attract job seekers.
The title often mentions job position, job level, and salary range.
– Description One paragraph or a list that describes the job characteristics:
What and how the work will be carried on.
– Compensation One paragraph or a list that shows salary range and benefits
paid to employees in exchange for the services they provide.
– Requirements One paragraph or a list that contains experiences, qualifi-
cations, and skills necessary for the candidates to be considered for a role.
– About the company Brief introduction to the company and its environ-
ment.
– Contact point An email address and a phone number to submit and ques-
tion the application.
The order of those sections may vary, however, most skill mentions will be within
the requirement section. In this paper, we present a practical approach for skill
detection in Vietnamese job listings. Rather than viewing the task as a NER task,
we model the task as a ranking problem. Our approach exploits the structural
property of a job description: any skill mention found in a requirement section
will have a high semantic similarity score with the section itself.
The rest of this paper is organized as follows: we start in Section 2 by outlining
the main steps of the proposed method. In Section 3, we describe in detail the
implementation of the tasks in the previous section: embedding, phrase mining,
term ranking, and term classification. In Section 4, we carry out a comprehensive
experimental study to validate the proposed method. We conclude with a summary
of results and future work in Section 5.
2 Methodology
Our method is depicted in Figure 1. In comparison to the traditional NER approach,
our methodology is more practical and less expensive in terms of manual efforts.
It is a pipeline composed of 4 layers: