Machine learning methods for cancer risk stratification

Project Leader: Jan Komorowski

PhD Student: Nicholas Baltzer

Successful treatment of cancer depends among others on early recognition/diagnosis. An improvement over the current screening approaches is likely to be achieved by an innovative use of the rich resources provided by cancer registries in Sweden and Norway (esp. patient records and bio-banks) to construct associative models between the early potential manifestations of cervix and breast cancer with the diagnostic outcomes. These are thorny issues that require new computational approaches.
Machine learning methods that provide legible models in the form of if-then rules of rough sets together with Monte Carlo-based feature selection techniques allow to develop new types of models that may help discover complex associations between different sorts of markers, especially those used in screening (i.e. a comprehensive review of previous cytological screening history, and/or non-attendance to screening) and patient record data, with cancer outcomes. Karolinska Institutet will provide initial datasets on cervical cancer to pilot the models for use.
We propose to develop new models for improved detection of cervix and breast cancer using the data available through Karolinska’s databases and the collection at Kreftregistret in Norway. One of the major advantages of the rough set approach is that no assumption of the distributions needs to be made. Another advantage is that while the global classifier may be of poor quality, its constituents, i.e. individual rules, may be strong and can explain associations valid for definable small subgroups. We shall identify markers and time profiles that are significant in early detection, through a comprehensive assessment of past screening history. The models are to be cross-validated, provided sufficiently large patient profile data is available.
In a longer time framework, we shall extend our approach to data collections available in other participating countries. The NIASC Working Group for cervical cancer has agreed on a collaboration on a joint Nordic cervical screening database. Karolinska Institutet, the Norwegian Cancer Register, The Finnish Mass Screening Register and the Icelandic Cancer Society have all agreed to share data to the database, which will be exploited for this project and others. Thus, we will first check whether Swedish and Norwegian models are applicable to data in Finland, Estonia and possibly Poland. We will then construct models for each of the countries and compare them to find regional epidemiological differences and with respect to predicting outcomes.