Vui lòng dùng định danh này để trích dẫn hoặc liên kết đến tài liệu này:
https://elib.vku.udn.vn/handle/123456789/2728
Nhan đề: | Information Technology Skills Extractor for Job Descriptions in vku-ITSkills Dataset Using Natural Language Processing |
Tác giả: | Nguyen, Huu Nhat Minh Nguyen, Ket Doan Pham, Quoc Huy Kieu, Xuan Loc Hoang, Nguyen Vu Nguyen, Huy Huynh, Cong Phap |
Từ khoá: | IT Skills Dataset Named Entity Recognition Natural Language Processing |
Năm xuất bản: | thá-2023 |
Nhà xuất bản: | Springer Nature |
Tóm tắt: | The IT skills extractor is convenient and efficient for recent job recommendation systems and job seekers to find suitable jobs. In this paper, we design an efficient SpaCy pipeline for extracting IT skills based on Natural Language Processing (NLP) and Named Entity Recognition (NER) methods from the job description. The main proposed method helps to extract potential hard-soft skills and later could provide to job recommender and job seekers. As the state-of-the-art open-source NLP framework, we first construct a new IT skills dictionary based on ChatGPT and perform automatic labeling for scrapped job description dataset, named vku-ITSkills dataset. Using this dataset, the quality of labels could be improved by the Part-of-Speech (POS) function and additional rules. We then fine-tune the pre-trained RoBERTa-base model for Transformer based word embedding in training NER model to extract skills. Thereafter, we define additional logical rules to enhance the extracted results that could further find out more skills based on syntactic such as the comma rule. In this language pipeline, RoBERTa embedding, NER, and additional rules play important roles to cope with unseen and new IT skills that are non-existed in vku-ITSkills dataset and are missed from NER. Throughout the evaluation, we test the proposed pipeline with 200 job descriptions manually labeled by our team and demonstrate the efficiency of each step in the pipeline. |
Mô tả: | Lecture Notes in Networks and Systems (LNNS, volume 734); CITA: Conference on Information Technology and its Applications; pp: 250-261. |
Định danh: | https://link.springer.com/chapter/10.1007/978-3-031-36886-8_21 http://elib.vku.udn.vn/handle/123456789/2728 |
ISBN: | 978-3-031-36886-8 |
Bộ sưu tập: | CITA 2023 (International) |
Khi sử dụng các tài liệu trong Thư viện số phải tuân thủ Luật bản quyền.