Keyword Extraction from Kazakh News Dataset with BERT

Home Page
About
Submit A Journal
Submit A Conference
Submit Paper/Book
- Submit a Preprint
- Submit a Book
Publisher/Editor Panel
- Sign In/Sign Up

El-Cezeri
Vol: 9 Issue: 4 Özel Sayı
Keyword Extraction from Kazakh News Dataset with BERT

Keyword Extraction from Kazakh News Dataset with BERT

Authors : Aiman Abibullayeva, Aydın Çetin

Pages : 1193-1200

Doi:10.31202/ecjse.1131826

View : 10 | Download : 5

Publication Date : 2022-12-31

Article Type : Research

Abstract :Keywords provide a concise and precise description of the document\'s content. Due to the importance of the keyword and the difficulty of manual markup, automatic keyword extraction makes this process easy and fast. In this paper, Keyword Extraction from Kazakh News Dataset was presented. Model performance results were obtained by using the BERT base - uncased and BERT-base-multilingual-uncased pre-trained language model for the newly compiled Kazakh News Dataset-KND. Compiled Kazakh news data set consists of 7060 data. Data were collected from the web pages anatili.kazgazeta.kz, Bilimdinews.kz, and zhasalash.kz using the BeautifulSoap and Requests libraries. These web pages mostly contain news, history, and literary texts. The dataset includes the publication name or news title, the author of the publication or news subject, and the URL of the Kazakh news site. In the evaluation of the training results, it was observed that the BERT base-multilingual-uncased F-score performance was higher than the BERT model.
Keywords : Kazak dili, anahtar kelime çıkarımı, doğal dil işleme, BERT

ORIGINAL ARTICLE URL

VIEW PAPER (PDF)