Background: Tobacco is the main risk factor for developing lung cancer. Yet, some heavy smokers do not develop lung cancer at advanced ages while others develop it at young ages. Here, we assess for the first time the genetic background of these clinically relevant extreme phenotypes using whole exome sequencing (WES).
Methods: We performed WES of germline DNA from heavy smokers who either developed lung adenocarcinoma at an early age ( extreme cases, n=50) or did not present lung adenocarcinoma or other tumors at an advanced age (extreme controls, n=50). We selected non-synonymous variants located in exonic regions and consensus splice sites of the genes that showed significantly different allelic frequencies between both cohorts. We validated our results in all the additional extreme cases (i.e., heavy smokers who developed lung adenocarcinoma at an early age) available from The Cancer Genome Atlas (TCGA).
Results: The mean age for the extreme cases and controls was respectively 49.7 and 77.5 years. Mean tobacco consumption was 43.6 and 56.8 pack-years. We identified 619 significantly different variants between both cohorts, and we validated 108 of these in extreme cases selected from TCGA. Nine validated variants, located in relevant cancer related genes, such as PARP4, HLA-A or NQO1, among others, achieved statistical significance in the False Discovery Rate test. The most significant validated variant (P=4.48x10(-5)) was located in the tumor-suppressor gene ALPK2.
Conclusions: We describe genetic variants associated with extreme phenotypes of high and low risk for the development of tobacco-induced lung adenocarcinoma. Our results and our strategy may help to identify high-risk subjects and to develop new therapeutic strategies.