DATA MINING METHODS FOR MEDICAL DIAGNOSIS

Get Complete Project Material File(s) Now! »

PECULIARITY OF MEDICAL DATA

This chapter focuses on describing the unique nature of medical data. Medical records often encompass information concerning patient’s age, diseases they suffer or suffered from, whether they smoke cigarettes or not, etc. Prior to a launch of a medical examination these values have to be known [12]. Information about blood pressure fluctuations, pain, fever, etc. is referred to as symptoms in this paper. They are the basis for a diagnosis. Often it may be the case that in order to discover all the symptoms it is necessary for a patient to undergo some additional tests.
The following sections have been devoted to the description of the nature of the medical data.

Different types of medical data

Medical information may come from various sources. They include interviews with patients, medical images, ECG, EEG and RTG signals and other screening results. The symptoms gathered are used to produce diagnoses which are also stored in patients’ files. The constant progress of the medicine entails an increase of size of medical databases [12]. Some of the types of medical data are presented in the Table 3.1.
In the past, the dominant form of storing medical data were paper-based files. Nowadays, such a means would not be sufficient for the increasing amount of data. That is why the digital databases have been introduced and are still improved [12]. The benefits from using them include storing the data in a more structured way. This increases the quality of the data [12] as the digital systems are capable of controlling the data while feeding it into a database. Both symptoms and diagnoses are entered in a concise form. This may turn out crucial during the data mining.

Doctor’s interpretations

An important aspect of medicine is a physician’s interpretation of screening results [12]. It may happen that the same set of symptoms and diagnoses is interpreted differently by different doctors. Furthermore, physicians often tend to use different words and expressions to express the same thing. It is essential to highlight this problem because it may deform outcomes of the data mining algorithms. Detailed information about this problem is presented in the following paper:. J. Cios and G. W. Moore, 2002 [12]. They write about machine translations from a natural language to a structured, canonical form. They notice a very interesting dependency: such translation is possible in case of sentences not longer than 10 words.

Nature of medical data

The medical data is very specific [12]. To mine medical data all information should be converted into numeric values. The methods for this task are described in medical textbooks [1] and are beyond the scope of this work. The specificity of the medical data lies in the fact that the attributes’ (symptoms’) values usually come from certain ranges.
Usually the information about medical appointment is gathered in decision tables. An example of a decision table applicable in medical treatment support is presented in the Table 3.2
[2]. There are two types of attributes: conditional and decisional. The first of them, the set S = {s1,s2,…,sI}, represents symptoms, the second, the set D={d1,d2,…,dK}, represents the diseases. Let P = {p1,p2,…,pN} be the set of patients. Then the decision table is defined as the following quadruple:
T=(P,S,D,ρ) (4.1)
where ρ is a function
ρ: P × {S∪D} → {wdk}
The values of symptoms are marked with the symbol vi,n (Table 3.2), which denotes a symptom’s value for i-th symptom and n-th patient. The values of diseases are marked with the wk,dk for k-th disease and dk-th value. The values vi,n are usually binary [2], where 1 denotes occurrence of the symptom and 0 – lack of occurrence. Very often medical data is positive-valued [2]. Seldom are the values of the symptoms negative (however it may happen in case of medical signals, like ECG). Furthermore, values of symptoms usually belong to a definite range (for instance resting blood pressure is no lower than 30 and no higher than 300 [53]).

MEDICAL DECISION SUPPORT SYSTEMS

Diagnosing process vs. decision making

The structure of the diagnosing process may seem intuitive and easy to understand. However, it is the very diagnosis what may make it complex. The input for the medical diagnoses are symptoms [71]. After processing them the process produces an output which classifies a patient either as having a disease or as belonging to a certain risk group.
During an appointment a physician decides about patient’s treatment. The process of decision making is shown by the authors of [44] (Figure 4.2). They claim that this process is continuous and strongly connected with the following phases: intelligence, design, choice and implementation. The reality in which the process is settled constantly changes and a decision maker should take this fact into consideration. This is why after each phase the decisions have to be reviewed. They propose a consensus while developing different methodologies of designing a decision support system.

Description of Decision Support Systems

The scientific literature [3], [66] and [31] gives several definitions of a Decision Support System (DSS). All of them, however, emphasize the fact that a DSS is a computerized system which assists in a decision making process. The decision, in turn, is the choice between several alternatives. It should be done after estimating each of the decision values. The support of such a system relies on assisting the human by automatically generating alternatives and suggesting the best choice. The support is strongly connected with three parts of the support process:
• alternatives estimation
• alternatives evaluation
• alternatives comparison
All these steps are realized with the help of computer applications. In [66] the role of a DSS is specified as follows: « an interactive, flexible, and adaptable computer-based information system, especially developed for supporting the solution of a non-structured management problem for improved decision making. It utilizes data, provides an easy-to-use interface, and allows for the decision maker’s own insights. » The taxonomies of DSS’s are presented in a variety of ways. In the master’s thesis there is Power’s differentiation used [66] which divides DSS’s int five groups:
• communication-driven DSS – helps in a group task by supporting communication among workers;
• data-driven DSS – concentrates on the access and manipulation of both the internal and external data;
• document-driven DSS – applies to unstructured information that is managed, retrieved and manipulated with the use of the system into a variety of electronic formats;
• knowledge-driven DSS – specialized in solving problems basing on facts, rules or similar constructions;
• model-driven DSS – puts emphasis on simulation, financial support and optimization tools that are based on statistical solutions.
All the decision support systems mentioned above may be helpful in various fields of everyday life. In the recent years one can notice an increased interest in the usage of knowledge-driven DSS’s in medicine [25].
The aim of the knowledge-driven DSS is to facilitate the structuring of a problem, its evaluation and finally the selection of the decision from among various alternatives. They are specialized in uncovering nontrivial relationships in data from large long-term databases. In order to understand what processes are performed in such a system it is essential to know its architecture (Figure 4.3). It consists of three main parts: an input component, a processing component and an output component [44]. The vital element of the system is a Decision Maker which utilizes computer technology to access domain knowledge. The user has control over outcomes of the system which is useful in preparing decision alternatives. They are then evaluated and the most preferable one is chosen. This way new knowledge is created which can be used as an additional input to the system in the future.
The figure has been based on: Mora M., Forgionne G.A. and Jupta J., Decision Making Support Systems: achievements, trends and challenges for the next decade. Idea-Group: Hershey, P.A, 2002.

READ Issues and motivations: the TTL a transition layer between the troposphere and the stratosphere

Characteristics of Medical Decision Support Systems

The Medical DSS’s are the type of computer programs that assist physicians and medical staff in medical decision making tasks [13]. Most of the medical decision support systems (MDSS’s) are equipped with diagnostic assistance module, therapy critiquing and planning module, medications prescribing module (checking for drug-drug interactions, dosage errors, allergies, etc.), information retrieval subsystem (for instance formulating accurate clinical questions) and image recognition and interpretation section (X-rays, CT, MRI scans) [13].
Interesting examples of MDSS’s are machine learning systems which are capable of creating new clinical knowledge. The intensive studies on developing such systems resulted in a set of techniques that are successfully applied to creation of medical knowledge [10]. Machine learning systems look for relationships in raw data [13]. They utilize various data mining and machine learning algorithms, such as neural networks or decision trees. Machine learning systems are used to build knowledge bases which are then used by various expert systems. By analyzing clinical cases a Medical Decision Support System can produce a detailed description of input features with a unique characteristic of clinical conditions. This support may be priceless in looking for changes in patient’s health condition.
The benefits of the MDSS’s are described in a scientific literature [20]. Such systems may improve patients’ safety by reducing errors in diagnosing. They may also improve medications and test ordering. Furthermore, the quality of care gets better due to the lengthening of the time clinicians spend with a patient. This may be an effect of application of proper guidelines, up-to-date clinical evidence and improved documentation. Moreover, the efficiency of the health care delivery is improved by reducing costs through faster order processing or eliminated duplication of tests.

Examples of Medical Decision Support Systems

There exist several Medical Decision Support Systems (MDSS’s). They help in early detection of diseases. In the thesis a few of the most important systems are presented. They are utilized in hospitals. The Table 4.1. presents the MDSS’s which are currently in use. Most of them have been in use for years.
Medical Decision Support Systems are mostly commercial applications. Often a complete documentation of a system is not publicly available. The information sources may not describe the system in an enough detailed way. Sometimes the system being still in a test phase may have limited functionality. Thus it may be difficult to identify data mining algorithms implemented in it.
To present the idea of Medical Decision Support Systems three sample ones are described: Help, DXplain and ERA. The selection is motivated by huge influence exerted by these systems on research concerning decision support and information technologies in medicine [28], [3], [20], and [24].

HELP

One of the most popular and advanced Medical Decision Support System is called HELP [28]. It is a knowledge-based hospital information system. The system is equipped with a decision support component. It helps the clinicians in interpreting medical information, diagnosing patients, maintaining clinical protocols and other tasks. The evolution of medical information systems and computing technology resulted in an improvement of the system. In 2003 a new version was released, called HELP II.

Description of the system [28]

The structure of the HELP II system is presented in the Figure 4.4. In the previous version of the HELP system there was a separate module dedicated specially to the data management. In the new version the module has been integrated into the system. The system implements data mining solutions.

Table of contents :

1 INTRODUCTION
1.1 FOCUS AREA AND MOTIVATION
1.2 AIM AND OBJECTIVES OF THE MASTER’S THESIS
1.3 RESEARCH QUESTIONS
1.4 METHODOLOGY
1.5 DEFINITIONS
1.5.1 Data Mining Process
1.5.2 Diagnosing vs. Data Mining
1.6 OUTLINE
2 RELATED WORK
2.1 APPLICATIONS OF DATA MINING METHODS FOR MEDICAL DIAGNOSIS
2.2 METHODS OF EVALUATION OF EFFECTIVENESS AND ACCURACY OF DATA MINING METHODS
3 PECULIARITY OF MEDICAL DATA
3.1 DIFFERENT TYPES OF MEDICAL DATA
3.2 DOCTOR’S INTERPRETATIONS
3.3 NATURE OF MEDICAL DATA
4 MEDICAL DECISION SUPPORT SYSTEMS
4.1 DIAGNOSING PROCESS VS. DECISION MAKING
4.2 DESCRIPTION OF DECISION SUPPORT SYSTEMS
4.3 CHARACTERISTICS OF MEDICAL DECISION SUPPORT SYSTEMS
4.4 EXAMPLES OF MEDICAL DECISION SUPPORT SYSTEMS
4.4.1 HELP
4.4.2 DXplain
4.4.3 ERA
5 DATA MINING ALGORITHMS
5.1 DECISION TREES
5.2 NAÏVE BAYES
5.3 NEURAL NETWORKS
5.4 SAMPLE ALGORITHMS
5.4.1 ID3
5.4.2 C4.5
6 DESCRIPTION OF DATA SETS USED IN EXPERIMENTS
6.1 SOURCE OF DATA
6.2 DATABASES DETAILS DESCRIPTION
6.2.1 Heart disease database
6.2.2 Hepatitis database
6.2.3 Diabetes database
6.2.4 Dermatology database
6.2.5 Breast cancer database
7 METHODS OF EVALUATION OF DATA MINING ALGORITHMS
7.1 ESTIMATING HYPOTHESIS ACCURACY
7.1.1 Sample error and true error
7.1.2 Difference in error of two hypothesis
7.2 COMPARING LEARNING ALGORITHMS
7.2.1 Difference in algorithms’ errors
7.2.2 Counting the costs
7.2.3 ROC curves
7.2.4 Recall, precision and F-measure
7.3 ALGORITHMS’ PERFORMANCE EVALUATION MEASURES USED IN THE THESIS
8 DATA MINING IN WAIKATO ENVIRONMENT FOR KNOWLEDGE ANALYSIS
8.1 WAIKATO ENVIRONMENT FOR KNOWLEDGE ANALYSIS (WEKA)
8.2 SELECTED WEKA’S DATA MINING ALGORITHMS
8.2.1 C4.5 algorithm
8.2.2 Naïve Bayes
8.2.3 Multilayer Perceptron
8.3 ROC CURVE AND AUC
9 EXPERIMENTS’ RESULTS AND DISCUSSION
9.1 ALGORITHMS CALIBRATION
9.1.1 Diabetes database
9.1.2 Breast cancer database
9.1.3 Hepatitis database
9.1.4 Heart diseases database
9.1.5 Dermatology diseases database
9.2 EVALUATION AND COMPARISON OF THE DATA MINING ALGORITHMS
10 CONCLUSIONS
11 REFERENCES