- Turkish Journal of Electrical Engineering and Computer Science
- Vol: 27 Issue: 5
- Sentence similarity using weighted path and similarity matrices
Sentence similarity using weighted path and similarity matrices
Authors : Reza Javadzadeh, Morteza Zahedi, Marziea Rahimi
Pages : 3779-3790
View : 9 | Download : 6
Publication Date : 9999-12-31
Article Type : Makaleler
Abstract :Sentence similarity is the task of assessing how similar the two snippets of text are. Similarity techniques are used extensively in clustering, summarization, classification, plagiarism detection etc. Due to a small set of vocabularies, sentence similarity is considered to be a difficult problem in natural language processing. There are two issues in solving this problem: (1) Which similarity techniques to be used for word pair similarity and (2) How to generalize that to sentence pairs. We have used the weighted path, a WordNet-based similarity assessment, and the paraphrase database to obtain word pair similarity values. Thereafter, we extracted maximum values from the pairwise similarity matrix and computed a similarity value for a sentence pair. We have also incorporated a vector space model technique to form a robust similarity measure. Our method outperformed state-of-the-art methods on the STSS65 test dataset in Pearson's correlation of 87 % compared to human similarity scores. Moreover, our approach performed on par with other methods on the STSS131 test data using the same test. Our approach outperforms all the other WordNet-based methods compared on both datasets.Keywords : Sentence similarity, plagiarism detection, text mining, vector space model, paraphrase database