- International Journal of Applied Mathematics Electronics and Computers
- Vol: 6 Issue: 1
- Big Data: Controlling Fraud by Using Machine Learning Libraries on Spark
Big Data: Controlling Fraud by Using Machine Learning Libraries on Spark
Authors : Ferhat Karataş, Sevcan Aytaç Korkmaz
Pages : 1-5
Doi:10.18100/ijamec.2018138629
View : 12 | Download : 3
Publication Date : 2018-03-31
Article Type : Research
Abstract :Continuous changes and the high calculation volume in network data distribution have made it more difficult to detect abnormal behaviors within and analyze data. For this cause, large data solutions have gained important. With the advancement of internet technologies and the digital age, cyber-attacks have increased steadily. The k-Means clustering algorithm is one of the most widely used algorithms in the world of data mining. Clustering algorithms are algorithms that automatically divide data into smaller clusters or sub-clusters. The algorithm places statistically similar records in the same group. In this article, we have used k-Means method from the Machine Learning libraries on Spark to determine whether the incoming network values are normal behavior. 400 thousand network data were used in this article. This data was obtained from KDD Cup 1999 Data. We have detected 10 abnormal behaviors from 400 thousand network data with k-means method.Keywords : k-Means, Spark, Machine Learning, Anomaly Detection