Please use this identifier to cite or link to this item: https://elib.vku.udn.vn/handle/123456789/2728
Title: Information Technology Skills Extractor for Job Descriptions in vku-ITSkills Dataset Using Natural Language Processing
Authors: Nguyen, Huu Nhat Minh
Nguyen, Ket Doan
Pham, Quoc Huy
Kieu, Xuan Loc
Hoang, Nguyen Vu
Nguyen, Huy
Huynh, Cong Phap
Keywords: IT Skills Dataset
Named Entity Recognition
Natural Language Processing
Issue Date: Jul-2023
Publisher: Springer Nature
Abstract: The IT skills extractor is convenient and efficient for recent job recommendation systems and job seekers to find suitable jobs. In this paper, we design an efficient SpaCy pipeline for extracting IT skills based on Natural Language Processing (NLP) and Named Entity Recognition (NER) methods from the job description. The main proposed method helps to extract potential hard-soft skills and later could provide to job recommender and job seekers. As the state-of-the-art open-source NLP framework, we first construct a new IT skills dictionary based on ChatGPT and perform automatic labeling for scrapped job description dataset, named vku-ITSkills dataset. Using this dataset, the quality of labels could be improved by the Part-of-Speech (POS) function and additional rules. We then fine-tune the pre-trained RoBERTa-base model for Transformer based word embedding in training NER model to extract skills. Thereafter, we define additional logical rules to enhance the extracted results that could further find out more skills based on syntactic such as the comma rule. In this language pipeline, RoBERTa embedding, NER, and additional rules play important roles to cope with unseen and new IT skills that are non-existed in vku-ITSkills dataset and are missed from NER. Throughout the evaluation, we test the proposed pipeline with 200 job descriptions manually labeled by our team and demonstrate the efficiency of each step in the pipeline.
Description: Lecture Notes in Networks and Systems (LNNS, volume 734); CITA: Conference on Information Technology and its Applications; pp: 250-261.
URI: https://link.springer.com/chapter/10.1007/978-3-031-36886-8_21
http://elib.vku.udn.vn/handle/123456789/2728
ISBN: 978-3-031-36886-8
Appears in Collections:CITA 2023 (International)

Files in This Item:

 Sign in to read



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.