PURPOSE: To integrate any combination and any number of prognostic factors into the TNM staging system using machine learning methods that preserve the TNM.
METHODS: Lung cancer cases were obtained from NCI’s Surveillance, Epidemiology, and End Results (SEER) Program for 1990-2000. Using a novel machine unsupervised learning algorithm based on cluster analysis, disease-specific survival and hazard rates were calculated using combinations of prognostic and demographic factors such as tumor size, grade, histological type, and age. An ensemble clustering algorithm stratified patients according to survival and prognostic factors. Survival rates (Kaplan-Meier) were compared by the log-rank test. The algorithm censored patients lost to follow-up. A minimum of 50 patients was arbitrarily used for each combination. The dataset contained 13,923 cases.
RESULTS: There were 216 combinations of 3 grades, 3 size categories, 4 nodal categories, 2 histological types, and 3 age groups. 158 combinations contained less than 50 patients and were excluded. From 58 resultant combinations, 5-year survival and hazard curves were plotted and compared. Increasing the number of variables affected outcome. For instance, for T1/N1 patients, the 5-year rate was 69%, but with 3 grades added the rates were 80%, 70%, and 65%. Similar 5-year curves (p< 0.05) were also noted with different variables, for instance, between combinations G2/SCC/T1/N1/age-76+ and G3/SCC/T2/N1/age-20-59. In addition, the algorithm generates a stratified dendrogram which showed a visual relation between prognostic factors and survival.
CONCLUSION: This new algorithm provides survival curves for any number and combination of prognostic factors without changing the TNM. The shape of the survival curves depends on the number of factors. Machine learning can be used for analysis of survival and hazard.
CLINICAL IMPLICATIONS: The new algorithm enhances traditional TNM staging by the inclusion of additional prognostic factors in any combination or number. This algorithm is robust and represents a step toward personalized medicine and individual decision analysis because treatment can be guided by the survival of patients with similar prognostic factor combinations.
DISCLOSURE: Arnold Schwartz, No Financial Disclosure Information; No Product/Research Disclosure Information