Computational methods for genetic cancer susceptibility analysis

Project Leader: Lauri Aaltonen

PostDoc: Kimmo Palin

The aims of the project are to develop methods (i) for computational annotation and visualisation of DNA variants found in Whole-Genome Sequencing (WGS) studies, (ii) for genetic association and linkage studies in structured populations, (iii) for subclassification of cancer patients based on their constitutional genetic and environmental attributes and the attributes of their tumor and (iv) for detection of gene-environment and gene-gene interactions for rare cancer risk variants. These aims are tightly aligned with the methodological needs of the two host groups.
Both of the host laboratories are currently undertaking large scale sequencing and genotyping projects. The Helsinki group is focusing on detailed genome sequencing of ~250 individual Finnish colorectal cancer patients and their tumors whereas the Trondheim group is sequencing several thousand Norwegian individuals from the HUNT cohort in lower detail but wider representation of the general population. In addition to these, both groups are genotyping significantly larger sets of individuals from the same cohorts. These large data production projects have already required substantial methods development (e.g. RikuRator, SLRP:Systematic Long Range Phasing). The host groups are well prepared to provide mentoring for leading edge methods development. The UH group employs three computer science PhD:s and maintains close collaboration with UH Computer Science department also part of Finnish Centre of Excellence in Cancer Genetic Research. The project has access to substantial high performance computing and data storage environment provided by the CSC — IT Center for Science Ltd enabling use of very large datasets and resource intensive computation. The current setup includes 1277 CPU cores and 405 Terabytes of storage. Combination of the two genetic and epidemiological data sources from separate but closely related populations provide great potential for detecting cancer relevant genetic variants but simultaneously require novel methods development to be leveraged fully. The differential structure between the populations enables teasing apart the causative variants from the bystanders while the close relationship makes it more likely to have the same causative variant in both populations. The detailed clinical information available for the UH samples provide opportunity to discover subclasses of patients with potentially altered disease etiology and the practicality of the discoveries can be rapidly tested in the NTNU-HUNT set.