Clustering and classification : describing the study population

Get Complete Project Material File(s) Now! »

Survival analysis and multivariate regression

For the survival analysis, time must be considered because the follow up was not the same for all patients as they did not decompensate at the same moment. A logistic regression following the Cox’s model was the most suitable for this analysis. It assessed the supplementary risk bring by the exposition to a risk factor while considering the probability of occurrence of decompensation at all times of the follow up.
The Kaplan-Meir method, a non-parametric estimation assessing the survival function from the data, gave the survival curves of the “D” and “NTR” groups. To find out if the two groups were significantly different, a univariate analysis consisting in a log-rank test with a p-value level of 0,05 was performed. Then, a semi parametric assessment using the Cox’s method gave unrefined Hazard Ratios (HR) and their 95% confidence intervals (CI) (univariate analysis). They corresponded to the association between a variable and the decompensation risk factor, taking account of the time. And after, a second Cox model was applied to obtain the adjusted HR (multivariate analysis) and their 95% CI. The adjusted HR corresponded to the association between a variable and the decompensation risk factor after adjusting some others variables (age, sex…). The interest was to avoid confusion factors. At the end, a check of the final model was carried out thanks to a Chi 2 test to control the hypothesis of proportional hazards set by the Cox’s model.

Study population

The first inclusion period took place from July to December 2014. 64 patients were enrolled on the 127 expected. None of the patients were lost to follow-up so the status at eighteen months could be collected for every patient. A second inclusion period happened from July to December 2015. It enrolled 54 other patients. The cohort counted 118 persons. The follow-up of these new patients was too short to include them in any statistical analysis. That’s why another thesis will study the data after eighteen months of follow-up for the 118 patients.

Status at eighteen months

Eighteen months after inclusion, 31 patients belonged to the group « D » and 33 to the group « NTR ». In the group « D », 13 were hospitalized seven days or more and 18 were dead. Characteristics of the two groups are exposed on the table 2. The analysis did not bring out any statistically significant variable, qualitative nor quantitative, i.e with a p-value < 0.05.

Cleaning and recoding data

The results of the recoding work were reported in the dictionary. (Appendix 2) The answer for several variables was the same for the 64 patients. So they were not discriminant and had been erased. It concerned: the presence of pharmacological treatment (yes), material available to the patient (yes), human aid available to the patient (yes), life in institution (yes), unemployment (no), suicide risk (not), stress at work (no), family history of cardiovascular diseases (no). The useless variables were eliminated, namely patient identification number and date of inclusion. Date of birth was transformed into age.
The goal was to compare the groups according to their characteristics. That’s why the variables relating to the quality of care of the multimorbidity and the general practitioner’s self-evaluation (intuition, good communication with others carers, coordination procedures etc) were also removed from the analysis (questions 26, 29, 30, 34, 45 to 52).

Clustering and classification : describing the study population

The multidimensional analysis started with an ascendant hierarchical classification. Every patient was initially taken individually. Some criterions were selected to group the patients. Then, patients who were the closest according to the Euclidian distances where gathered. The process continued iteratively and on every step, individuals then groups were fused until obtained hierarchy. The hierarchy was represented on a cluster dendrogram. (Figure 1) A MCA was then performed to point out the discriminant variables, in general and specifically in the different groups of patients. Finally, the HCPC was carried out from the previous results and gave the clustering of the populations. To constitute the clusters, a 20% departure from the cohort population was fixed. Three clusters appeared. They featured on the figure 2.

Table of contents :

INTRODUCTION
METHODS
1. Study design
2. Study population
3. Study conduct
4. Judgment criteria
5. Data cleaning
6. Statistical analysis
a. Clustering and classification
b. Survival analysis and multivariate regression
7. Ethics
RESULTS
1. Study population
2. Status at eighteen months
3. Cleaning and recoding data
4. Clustering and classification : describing the study population
a. Cluster 1
b. Cluster 2
c. Cluster 3
5. Survival analysis : investigate the decompensation risk factors
a. Univariate analysis : the Kaplan-Meier method
b. Multivariate analysis : Cox’s models
c. Checking of the final model
6. Experienced difficulties
DISCUSSION
1. Main findings
2. Analysis of experienced difficulties
a. Small sample size
b. Pertinence of some questions
c. Subjectivity of the « Expertise of the GP » assessment
d. Choice of chronic conditions
e. Statistical analysis
3. Study limitations
a. Selection bias
b. Information bias
c. Confusion bias
4. Future prospects
CONCLUSION
BIBLIOGRAPHY