Nordic Exome Variant Catalogue

Project leader: Aarno Palotie

Postdoc: Priit Palta

Exome Aggregation Consortium (ExAC) coding variant dataset has proven to be an extremely useful resource and tool to facilitate research in human medical and population genetics. We will create a similar data resource for Nordic countries, including Estonia. By providing population-specific (and through requested look-ups also individual-level) allele frequencies for human coding variants, this eScience data resource would greatly benefit the scientific community in the field of human genetics and genomics and therefore would increase the competitive advantage of genetic research towards improvement of human health in all Nordic countries. Our aim is to create the Nordic Exome Variant Catalogue by collecting available human exome variant data from all participating countries (initially data for 15 000, 5000 and 5000 whole-exome sequenced individuals from Finland, Estonia and Norway, repectively). The collected data will be aggregated by populations (countries), maintaining the cohort-specific information. This data will be stored in a secure database server at CSC (Finnish IT Center for Science).

Outcome 1: Allele frequency database publicly available to the scientific community
For this we plan to update our existing web-based user interface (SISu variant browser: that would allow searching and browsing the collected data as population-specific summary data – allelic distribution information for all coding variants together with corresponding functional and variant effect/consequence annotations.

Outcome 2: Access routine for individual-level WES data
We will implement an integrated inquiry system that for a small fee (to ensure sustainability) would facilitate requesting access to the cohort-specific and individual-level genotype and phenotype data directly from the population/cohort ‘owner’ and corresponding DAC.

How individual consumer patterns could uncover lifestyle determinants of health and disease

Project leader: Mads Melbye

Postdoc: Frederik Trier Møller

The impact of most daily products we buy on our health is currently unknown, due to a lack of access to continuously collected detailed lifestyle data. In this project we will use currently unused continuously collected electronic customer receipts, to model lifestyle determinants on disease onset and behavior and improve lifestyle advice. Initially, we aim to identify lifestyle determinants of flares in complex inflammatory diseases, such as multiple sclerosis and inflammatory bowel disease. Approximately 5% of the Danish population suffer from chronic autoimmune diseases affecting many young people, with the majority being females. Individuals affected by autoimmune diseases have an unmet need for lifestyle advice that could improve the disease course. To clarify the underlying causes of complex autoimmune disease, it is crucial to collect continuous highly-detailed lifestyle data to assess exposure to nutrients, chemicals and additives in daily life. Combined with assessment of disease activity the lifestyle determinants of disease flares could be assessed. This project leverages the existing Danish framework for conducting population-based studies of disease exposures and outcomes and a highly digitalized retail sector.

Data from costumer receipts are, with the consumers consent, already continuously collected and used in targeted marketing. Recently published research suggests that costumer receipts data reflect individual lifestyle. To date more than a 100.000 products have a digital id in Denmark that are promoting upstream traceability. Each receipt are higly individual as the consumer chooses a subset of products out of all products available, thus in effect answering a +1000 items intentional yes/no questionnaire. This multiple item answer is repeated in continous manner throughout life, underlining the potential depth of knowledge to be extracted from consumer receipts. Furthermore each product may in time be characterized in detail, including unintended contents. As such the potential public health knowledge to cost ratio seem beneficial for consumer receipts compared to traditional means to collect lifestyle information to study medium to long term outcomes. By analyzing customer receipts data information at a household level and subdivide each product into its constituents we can achieve unparalleled detail on individual exposures. In addition Denmark offers an excellent research framework to assess key health related outcomes such as death, medication, and surgery. We will monitor disease outcomes of Danish participants affected by either of the aforementioned diseases, using information from national registries, patient files and routine samples. Digital informed consent secures ethical, safe and transparent implementation of the project.

Combined, these datasources enables participating researchers to reach the initial main objective, to provide targeted prevention of disease flares in complex autoimmune disease.


Project leader: Ahti Anttila

Postdoc: Veli-Matti Partanen

NORDSCEEN (Interactive joint NORDic database on performance and outcome indicators of cervical cancer SCREENing) aims to develop a publicly available web-based interactive tool/application to access standardised, and as far as possible, evidence-based, performance and outcome indicators of cervical cancer screening, based upon up-to-date Nordic cancer screening register data. The project will develop and implement a standardised set of readily available key performance and outcome indicators, and a set of scripts for standardised retrieval of the required data items from the different collaborating screening registers. This will greatly ease the provision of relevant and tailored data to decision-makers, the press, and screening providers (as exemplified by NORDCAN and GLOBOCAN in the field of cancer burden).

SOIGNONS: Societal Games to Nudge People into Attending Cervical Cancer Screening

Project Leader: Mari Nygård

PostDoc: Tomás Ruiz-López

SOIGNONS is an eScience initiative to virally communicate health information concerning cervical cancer via gamification in mobile games. Games with a purpose will present information as thought-provoking puzzles and incentives to a large number of smart phone users. These games will create awareness among individuals via puzzles encompassing popularized scientific-evidence about cervical cancer prevention (eventually other cancers) also for sub-optimally screened groups. Incentives in the game based on an individual’s performance will empower him/her to share electronic invitations to cancer screening via social platforms such as Facebook, Google+, and even basic SMS for maximum outreach. We expect gamification to create an evidence-based medium for the mobile users to nudge women in their families, friend circles, and also as a general societal empowerment tool to reach out to people not screened. Social nudging lies in the very nature of the human behavior where we are actively involved in the lives of our friends and relatives discussing lifestyle choices. SOIGNONS will boost nudging based on accurate scientific-evidence, which will lead to evolution in overall societal health, improve necessary visits to doctors, and introduce better lifestyle choices. About 50% of the cervical cancers in Norway arises among the 20% of sub-optimally screened population[4]. Games developed in SOIGNONS will present users with evidence and incentives to target this population. Multilingual support of English, Icelandic, and Norwegian in the games will help reach the largest possible number of users and the game can be down loaded for free from the App store. SOIGNONS has the following specific aims:
Specific Aim 1: Societal game design and content creation: Health-related scientific-evidence will be mapped onto as puzzles and incentives in a game suite called Solve Cervical Cancer (SCC). SCC will have variants based on familiar classic games such as trivia, hangman, detective games played widely by people of different ages on mobile phones. Scientific experts in public health will develop content (on an online web application) for the game puzzles from scientific-evidence. Users will play games to garner specific knowledge and will be empowered with invites to nudge their own social circle to attend screening (also by self-sampling).
Specific Aim 2: Incentives and dissemination of game for social nudging: The game will be disseminated on an app store for both IPhone and Android and it will be spread virally where a user can invite others. Incentives in the game will be designed to promote the player to encourage screening in his/her own family/friend circle and especially those who do not use modern mobile phones via SMS. Those who are at the target group for cervical cancer screening will have a possibility to access their own data in the screening program database (login through Bank ID).
Specific Aim 3: Evaluation: Attitude, knowledge and user friendliness will be evaluated by analyzing the large amount of user data generated and stored on a cloud server. The pattern of spreading app in the population will be evaluated by using information about how many invites were sent out and accepted, and we collect user feedback about whether he/she has been able to nudge a friend/relative to go to screening. Using screening registry we evaluate how many queried her screening history. Subsequent screening attendance will be observed via registry linkage, allowing addressing aspects of the effect of the game on sub-optimally screened groups.

Nordic Biobank Registers

Project Leader: Mads Melbye

PostDoc: Xueping Liu

We will focus on the existing and fully working Danish Biobank Register system. This system (, available online since 2012, supports the register based health related research with a very flexible and quick search functionality to view biological materials stored in various biobanks in Denmark, Greenland and the Genetic Biobank in the Faroe Islands. The Biobank Register e.g. integrates data from the Danish Patient Registry (including Danish Cancer Registry), Danish Pathology Registry, and the Danish Civil Registration Registry and automatically matches it with the biological material from individuals, including blood, tissue and other sample types. Presently the Biobank Register allows searches among 15.5 million samples, from over 5.1 million Danish individuals. The register points to a biological specimen on e.g. 568 000 persons who have been diagnosed with cancer. A large collection of genotyped samples exists here.
The Janus Serum Bank is a population-based biobank reserved for cancer research. The specimens are collected during the period from 1972-2004 and are stored at – 25° Celsius. The sam ples originate from 317 000 persons in Norway who have participated in health studies and also from blood donors in and around Oslo. Today, samples are only collected from earlier donors in the Janus Serum Bank who have developed cancer. The Bank is internationally unique regarding size and number of cancer cases. Annual linkage to the Cancer Registry shows that 61 000 donors are diagnosed with cancer as of December 31, 2011.
HUNT Biobank is the biobank for the comprehensive and longitudinal HUNT study as well as a national biobank for Cohorts of Norway (CONOR) with DNA samples from 250.000 participants from the large Norwegian Health Surveys gathered at one physical site. In total more than 107 000 unique participants have contributed with bio samples,; many with multiple samples from different time points stored in the biobank. In total approx. 8000 HUNT participants and 15 000 participants from the CONOR studies have developed cancer as of 2010, respectively. .
In HUNT, an interactive single-nucleotide polymorphism (SNP) database has recently been established where researchers can look for specific SNPs available across different genotype efforts based on sample collections (studies) at the HUNT biobank. This is a solution that dynamically connects all aspects of genotype data including study characteristics, genotype technologies, and minor allele frequencies of relevant SNPs.
The existing systems on which we would like to focus are the Danish Biobank Register ( the HUNT Study (, the Norwegian Cancer Registry ( and Janus Biobank ( However, during the study period we will seek to also start including other Nordic biobanks in the programme.

Developing an efficient imputation pipeline to construct near complete genome variant data information in GWAs datasets

Project Leader: Aarno Palotie

PostDoc: Priit Palta

The project aim to use population specific whole genome and whole exome sequence data as a backbone for imputing low frequency variants in Estonian and Finnish population cohort GWAs data and use the data for register based diagnostic outcomes such as cancers and comorbidities
Over the past eight years genome wide variant data has been accumulated from large samples collections in all NIASC sites. Currently the two performance site Tartu and Helsinki have accumulated GWAs data from 70 000 Finnish and 20 000 Estonian individuals and whole exome or whole genome sequence data from 16 000 Finnish and 1800 Estonian individuals. These large datasets provide a substantial resource for association studies. To efficiently use all genomewide variant data, we would also like to include low frequency variants in the outcome association analysis. To achieve this, we would have to impute the non‐genotyped variants in the GWAs results. Although HapMap and 1000genomes data provide a fundament and a standardized imputation backbone, there is increasing evidence that for low frequency variants these panels are not sufficient. Population specific sequence data improves substantially the imputation accuracy for variants that have a population frequency under 5%. Estonia and Finland are historically and linguistically closely related. Comparing low frequency variant association data between these two populations is thus especially interesting and potentially beneficial. As replication is challenging for low frequency variants, we hypothesize that similarly imputed datasets between two ethnically related countries would be helpful; the likelihood for shared haplotypes is likely to be higher.
The haplotype reference consortium led by Goncalo Abecasis, Jonathan Marchini and Richard Durbin are currently constructing a haplotype catalogue based on available whole genome data. This will further improve our imputation accuracy. However, as most of the haplotype project is using low coverage sequence data (2‐6X) the variant calling accuracy of rare variants will still not be superb. This is especially challenging for variants that are rare in the general European population but are enriched through bottleneck effects in either the Finnish or Estonian populations. As is well documented, the Finnish bottleneck effects are strong resulting in enrichment of some low frequency variants that are very rare elsewhere. Some of these variants are contributing to disease phenotypes but are so rare in most populations that they are not within reach of disease association studies. However, when enriched in an isolate like Finland, the frequency might be boosted to 0‐5‐5% as demonstrated in Figure 2 below and become analyzable disease association targets. Of special interest is that within the range of 0.5‐5% population frequency in Finland there is an excess of loss of function (LoF) variants. LoFs are of special interest in association studies as they represent human knockouts. In our recent study by Lim et al (PLoS Genetics 2014 Jul 31;10(7):e1004494) we analyzed 83 LoF variants enriched in Finland and linked them to National Health Record data. We identified several disease associations including a LoF in the LpA gene protective for coronary heart disease. Protective LoFs are interesting potential drug targets and thus of special value.

Computational methods for genetic cancer susceptibility analysis

Project Leader: Lauri Aaltonen

PostDoc: Kimmo Palin

The aims of the project are to develop methods (i) for computational annotation and visualisation of DNA variants found in Whole-Genome Sequencing (WGS) studies, (ii) for genetic association and linkage studies in structured populations, (iii) for subclassification of cancer patients based on their constitutional genetic and environmental attributes and the attributes of their tumor and (iv) for detection of gene-environment and gene-gene interactions for rare cancer risk variants. These aims are tightly aligned with the methodological needs of the two host groups.
Both of the host laboratories are currently undertaking large scale sequencing and genotyping projects. The Helsinki group is focusing on detailed genome sequencing of ~250 individual Finnish colorectal cancer patients and their tumors whereas the Trondheim group is sequencing several thousand Norwegian individuals from the HUNT cohort in lower detail but wider representation of the general population. In addition to these, both groups are genotyping significantly larger sets of individuals from the same cohorts. These large data production projects have already required substantial methods development (e.g. RikuRator, SLRP:Systematic Long Range Phasing). The host groups are well prepared to provide mentoring for leading edge methods development. The UH group employs three computer science PhD:s and maintains close collaboration with UH Computer Science department also part of Finnish Centre of Excellence in Cancer Genetic Research. The project has access to substantial high performance computing and data storage environment provided by the CSC — IT Center for Science Ltd enabling use of very large datasets and resource intensive computation. The current setup includes 1277 CPU cores and 405 Terabytes of storage. Combination of the two genetic and epidemiological data sources from separate but closely related populations provide great potential for detecting cancer relevant genetic variants but simultaneously require novel methods development to be leveraged fully. The differential structure between the populations enables teasing apart the causative variants from the bystanders while the close relationship makes it more likely to have the same causative variant in both populations. The detailed clinical information available for the UH samples provide opportunity to discover subclasses of patients with potentially altered disease etiology and the practicality of the discoveries can be rapidly tested in the NTNU-HUNT set.