University of Macau Library

UM E-Theses Collection (澳門大學電子學位論文庫)

Title

Novel data mining methodologies for medical data processing and application on i+diagnostic workbench

English Abstract

University of Macau Abstract NOVEL DATA MINING METHODOLOGIES FORMEDICAL DATA PROCESSING AND APPLICATION ONT DIAGNOSTIC WORKBENCH by Chao Sam, Lidia Y-A1-7402-0 Dissertation Supervisor: Full Professor Li Yiping Department of Computer and Information Science Dissertation Co-Supervisor: Full Professor Ding Qiulin Nanjing University of Aeronautics and Astronautics The modern society is entering a data rich but knowledge poor age, as the collection and storage of data are inexpensively and easily in large information systems, hence a tremendous amount of data has been generated and accumulated rapidly, such as ECG, EEG or SPECT data in medical area. Data mining is a powerful tool that can transform the mountains of data into useful knowledge automatically. It is a burgeoning new technology with a wide range of applications, while its technology is well suited for medical diagnosis and data analysis. Nevertheless, we discovered that there are two fundamental aspects influence the efficiency of a data mining tool. The primary one is using the right data, as the quality of the data highly affects the result of a learning problem. Thus data preparation can present equal challenges to a data mining process, although it is a time consuming task. Another central one is the learning capability, which is the kernel of a data mining process. The traditional classic learning algorithms focused on batch and static mode, which the increase of either new features or instances causes the algorithm to relearn all data from scratch again. Therefore, an intelligent learning scheme may greatly improve the overall mining performance. In this dissertation, three brilliant visions are originated as our prominent research to tackle the addressed issues as follows: 1. An irrelevant attribute should be the one that provides neither explicit information nor supportive or implicit information for the learning concept. The discretized intervals of a continuous-valued attribute should be meaningful 2. and realistic regarding not only the learning concept but also the attribute itself. 3. An intelligent data mining learning process should mimic the learning in the real world, which is on-line, interactive, incremental and dynamical in multiple dimensions. These hypotheses lead our investigations to propose several innovative algorithms respectively that are pioneers in medical diagnosis, such as LUIFS (Latent Utility of Irrelevant Feature Selection), MIDCA (Multivariate Interdependent Discretization for Continuous Attributes), and i⁺Learning (Intelligent and Incremental Learning). The former two algorithms form so called MIA-Preprocessing (Multivariate Interdependent Attribute Preprocessing) method, which focuses on the discovery of hidden relevant attributes or latent supportive attributes during feature selection (FS) and continuous feature discretization (CFD). MA-Preprocessing hence minimizes the uncertainty and the lost of information, simultaneously maximizes the learning accuracy. Where as i⁺Learning theory mimics the humanoid learning style, enables the learning algorithm to be processed on-line, interactive, dynamically and incrementally to perform the modification for discovered knowledge by amending or strengthening the current one, to avoid the re-training of the growing data from scratch. Such learning strategy conforms to the truth in medical world that new cases and/or symptoms, or even new diseases are always appeared or vanished unexpectedly. In the experiments, a number of real-life datasets drawn from UCI repository were evaluated under four learning algorithms: D3, J4.8, IB and Naïve-Bayes, as well as i⁺Learning method, with or without our proposed algorithms, or with other comparable methods respectively. The empirical results demonstrated the solid evidences that our novel algorithms superior to others in the most cases, no matter individually or in whole. The i⁺Learning method can intelligently and appropriately handle any new case that is either a feature or an instance without repeating the entire DM process, while the MIA-Preprocessing is able to sift the most suitable data from the raw data for the subsequent processes. As a result, the overall learning capacity does have significant improvement.

Issue date

2008.

Author

Chao, Sam

Faculty

Faculty of Science and Technology

Department

Department of Computer and Information Science

Degree

Ph.D.

Subject

Data mining

Medical informatics

Medicine -- Data processing

Supervisor

Li, Yi Ping

Files In This Item

TOC & Abstract

Location

1/F Zone C

Library URL

991002578119706306