Keyword Extraction from Kazakh News Dataset with BERT
Authors : Aiman Abibullayeva, Aydın Çetin
Pages : 1193-1200
Doi:10.31202/ecjse.1131826
View : 10 | Download : 5
Publication Date : 2022-12-31
Article Type : Research
Abstract :Keywords provide a concise and precise description of the document\'s content. Due to the importance of the keyword and the difficulty of manual markup, automatic keyword extraction makes this process easy and fast. In this paper, Keyword Extraction from Kazakh News Dataset was presented. Model performance results were obtained by using the BERT base - uncased and BERT-base-multilingual-uncased pre-trained language model for the newly compiled Kazakh News Dataset-KND. Compiled Kazakh news data set consists of 7060 data. Data were collected from the web pages anatili.kazgazeta.kz, Bilimdinews.kz, and zhasalash.kz using the BeautifulSoap and Requests libraries. These web pages mostly contain news, history, and literary texts. The dataset includes the publication name or news title, the author of the publication or news subject, and the URL of the Kazakh news site. In the evaluation of the training results, it was observed that the BERT base-multilingual-uncased F-score performance was higher than the BERT model.Keywords : Kazak dili, anahtar kelime çıkarımı, doğal dil işleme, BERT