school

UM E-Theses Collection (澳門大學電子學位論文庫)

check Full Text
Title

Improving the outlier detection algorithm from multivariate data stream

English Abstract

IMPROVING THE OUTLIER DETECTION ALGORITHM FROM MULTIVARIATE DATA STREAM by HAN DONG Thesis Supervisor: Department of Computer and Information Science Dr. Simon Fong Master of Science in E-Commerce Technology Outlier detection is a preprocessing technology that is effective in reducing irrelevant instances in machine learning. Since now, there are plenty of outlier detection algorithms invented by predecessor, for the purpose of forming the datasets with fewer outliers. In this study, we propose an outlier detection method named lightweight analysis. Whereas we use the full dataset to get this value most of the time. This atmosphere encourage us to think using a fixed number of instances as a reference to calculate the outlier indicator, like Cumulative analysis or lightweight analysis with sliding window, other than global analysis only. Then we combine this three mechanisms with the existed outlier detection algorithms, which is Mahalanobis distance, local outlier factor and interquartile range. The experiments yield encouraging results supporting the fact that classification accuracy using the reduced dataset. Results are equaled or better accuracy when using the proposed classifier based outlier detection (COD) method. Key words: Outlier detection, COD, data mining

Issue date

2015.

Author

Han, Dong

Faculty

Faculty of Science and Technology

Department

Department of Computer and Information Science

Degree

M.Sc.

Subject

Outliers (Statistics)

Data mining

Supervisor

Fong, Chi Chiu

Files In This Item

Full-text (Intranet only)

Location
1/F Zone C
Library URL
991000732809706306