Prediction of Prostate Cancer Risk to Improve PSA Screening: A Statistical Modeling Approach

Project Leader: Mark Clements

PhD Student: Thorgerdur Palsdottir

Aims: Develop methods and tools to predict and simulate prostate cancer incidence and mortality, and apply those tools to reduce the harms from prostate cancer testing in the Nordic countries. Background: Prostate cancer is the most common cancer diagnosis and the second most common cause of cancer death in the Nordic countries, constituting 32% of male cancer diagnoses and 18% of male cancer deaths in 2009. Accounting for 40% of prevalent male cancers in the region, the health burden and costs on the health care system are substantial. Prostate cancer mortality rates are very high in the Nordic countries and this pattern is unlikely to be explained by poor health care. Mortality rates have begun to decline in the last 5-10 years, which may be due to improvements in care and/or the introduction of prostate-specific antigen (PSA) testing. PSA test uptake is now high in some Nordic populations, where in Stockholm 50-75% of men aged 50 years and over have had a PSA test (Nordström et al., 2013).
There is uncertainty in how best to use the PSA test. In many men it may extend life through early detection and radical treatment, while it may also lead to over-diagnosis. There is also uncertainty about the appropriate management of prostate cancer as radical treatment has common side effects. Recent analyses of costs and effectiveness of PSA screening for prostate cancer suggest that current PSA screening is not effective, where the harms outweigh any mortality benefits from early detection, although conclusions are sensitive to the choice of health utilities (Chilcott, Hummel, & Mildred, 2010; Pataky et al., 2014). There is increasing evidence that combining biomarkers can improve the prediction of prostate cancer (Landers et al., 2005). The research question is whether improved biomarkers and organised prostate cancer screening can reduce the harms from screening and maintain any mortality benefit in the Nordic countries.
To address this research question from an epidemiological perspective, the Cancer Risk Prediction (CRisP) Center in Stockholm is undertaking the STHLM3 diagnostic trial during 2013-2014. The STHLM3 study uses a paired design, where men who have a PSA above 1 ng/ml undertake a panel of biomarkers; if their PSA exceeds 3 ng/ml or their biomarker panel prediction exceeds a threshold c, then those men are referred to biopsy to determine their cancer status. Men not referred to biopsy will then be randomised to one of two re-screening protocols. The aim of the study is to maintain comparable sensitivity for advanced prostate cancer, while improving specificity, leading to fewer men being referred to biopsy. CRisP has also linked all PSA and prostate biopsies in Stockholm to the population and health registers for the period 2003-2012, which allows for a detailed description of prostate cancer testing in a Nordic city.
To address the research question from an eScience perspective, we aim to improve the tools used for prediction of prostate cancer and develop tools for microsimulation of prostate cancer testing, incidence, mortality and effectiveness in the Nordic countries. In particular, we want to predict the effectiveness of different screening scenarios without performing separate randomised controlled trials, where we want to evaluate the effectiveness of the scenarios using in silico simulations.

Data and text mining of cancer symptoms and comorbidities in electronic patient records in the Nordic languages, MINECAN

Project Leader: Hercules Dalianis

PhD Student:  Rebecka Weegar

There are a number of E-science tools, such as C1. Patient record text mining applied for research on screening or B1. Nordic biobank registry, which have been proposed within the NIASC work plan. We are going to address this in our proposal, using text mining to convert unstructured into structured information. Our results will also address the requirements on open E-science tools. In our proposal, we have connected data and text mining techniques from Sweden and Denmark to stakeholders in Sweden, Denmark and Norway.
The PhD project is split into a number of subprojects. The first subproject comprises cervical cancer while the second subproject aims at prostate cancer. These subprojects constitute the main focus of the entire PhD project.
Cervical cancer (ICD-10 code C.53) develops in tissues of the cervix and is, in the majority of cases, caused by human papillomavirus (HPV) infection. Cervical cancer is difficult to diagnose early due to vague symptoms. When women discover symptoms the cancer has often already spread. One goal of this subproject is to apply text mining methods to identify early symptoms of cervical cancer, in form of either known or unknown symptoms. With help of such an established symptom spectrum, cytology screening could be complemented by questionnaires in order to strengthen the diagnosis. Prostate cancer (ICD-10 code C.61) is the most common cancer diagnosis. In 2009, it was the second most common cause of cancer death in the Nordic countries. Accounting for 40% of prevalent male cancers in the region, the health burden and costs are substantial. Prostate biopsies, where samples of tissue are removed from the prostate, are associated with adverse outcomes and substantial costs. The goal of this subproject is to identify comorbidities, i.e., additional disorders (or diseases) that co-occur with a primary disease or disorder, associated with biopsies.
The third subproject targets the transfer of free-text pathology referrals into a structured format in registers using text mining and machine learning techniques.

Machine learning methods for cancer risk stratification

Project Leader: Jan Komorowski

PhD Student: Nicholas Baltzer

Successful treatment of cancer depends among others on early recognition/diagnosis. An improvement over the current screening approaches is likely to be achieved by an innovative use of the rich resources provided by cancer registries in Sweden and Norway (esp. patient records and bio-banks) to construct associative models between the early potential manifestations of cervix and breast cancer with the diagnostic outcomes. These are thorny issues that require new computational approaches.
Machine learning methods that provide legible models in the form of if-then rules of rough sets together with Monte Carlo-based feature selection techniques allow to develop new types of models that may help discover complex associations between different sorts of markers, especially those used in screening (i.e. a comprehensive review of previous cytological screening history, and/or non-attendance to screening) and patient record data, with cancer outcomes. Karolinska Institutet will provide initial datasets on cervical cancer to pilot the models for use.
We propose to develop new models for improved detection of cervix and breast cancer using the data available through Karolinska’s databases and the collection at Kreftregistret in Norway. One of the major advantages of the rough set approach is that no assumption of the distributions needs to be made. Another advantage is that while the global classifier may be of poor quality, its constituents, i.e. individual rules, may be strong and can explain associations valid for definable small subgroups. We shall identify markers and time profiles that are significant in early detection, through a comprehensive assessment of past screening history. The models are to be cross-validated, provided sufficiently large patient profile data is available.
In a longer time framework, we shall extend our approach to data collections available in other participating countries. The NIASC Working Group for cervical cancer has agreed on a collaboration on a joint Nordic cervical screening database. Karolinska Institutet, the Norwegian Cancer Register, The Finnish Mass Screening Register and the Icelandic Cancer Society have all agreed to share data to the database, which will be exploited for this project and others. Thus, we will first check whether Swedish and Norwegian models are applicable to data in Finland, Estonia and possibly Poland. We will then construct models for each of the countries and compare them to find regional epidemiological differences and with respect to predicting outcomes.