Revistas
Revista:
PLOS COMPUTATIONAL BIOLOGY
ISSN:
1553-7358
Año:
2022
Vol.:
18
N°:
5
Págs.:
e1010180
With the frenetic growth of high-dimensional datasets in different biomedical domains, there is an urgent need to develop predictive methods able to deal with this complexity. Feature selection is a relevant strategy in machine learning to address this challenge. We introduce a novel feature selection algorithm for linear regression called BOSO (Bilevel Optimization Selector Operator). We conducted a benchmark of BOSO with key algorithms in the literature, finding a superior accuracy for feature selection in high-dimensional datasets. Proof-of-concept of BOSO for predicting drug sensitivity in cancer is presented. A detailed analysis is carried out for methotrexate, a well-studied drug targeting cancer metabolism.
Revista:
NAR GENOMICS AND BIOINFORMATICS
ISSN:
2631-9268
Año:
2022
Vol.:
4
N°:
3
Págs.:
lqac067
Alternative splicing (AS) plays a key role in cancer: all its hallmarks have been associated with different mechanisms of abnormal AS. The improvement of the human transcriptome annotation and the availability of fast and accurate software to estimate isoform concentrations has boosted the analysis of transcriptome profiling from RNA-seq. The statistical analysis of AS is a challenging problem not yet fully solved. We have included in EventPointer (EP), a Bioconductor package, a novel statistical method that can use the bootstrap of the pseudoaligners. We compared it with other state-of-the-art algorithms to analyze AS. Its performance is outstanding for shallow sequencing conditions. The statistical framework is very flexible since it is based on design and contrast matrices. EP now includes a convenient tool to find the primers to validate the discoveries using PCR. We also added a statistical module to study alteration in protein domain related to AS. Applying it to 9514 patients from TCGA and TARGET in 19 different tumor types resulted in two conclusions: i) aberrant alternative splicing alters the relative presence of Protein domains and, ii) the number of enriched domains is strongly correlated with the age of the patients.
Revista:
FRONTIERS IN IMMUNOLOGY
ISSN:
1664-3224
Año:
2022
Vol.:
13
Págs.:
977358
Artificial intelligence (AI) can unveil novel personalized treatments based on drug screening and whole-exome sequencing experiments (WES). However, the concept of "black box" in AI limits the potential of this approach to be translated into the clinical practice. In contrast, explainable AI (XAI) focuses on making AI results understandable to humans. Here, we present a novel XAI method -called multi-dimensional module optimization (MOM)- that associates drug screening with genetic events, while guaranteeing that predictions are interpretable and robust. We applied MOM to an acute myeloid leukemia (AML) cohort of 319 ex-vivo tumor samples with 122 screened drugs and WES. MOM returned a therapeutic strategy based on the FLT3, CBF beta-MYH11, and NRAS status, which predicted AML patient response to Quizartinib, Trametinib, Selumetinib, and Crizotinib. We successfully validated the results in three different large-scale screening experiments. We believe that XAI will help healthcare providers and drug regulators better understand AI medical decisions.
Revista:
CANCERS
ISSN:
2072-6694
Año:
2022
Vol.:
14
N°:
13
Págs.:
3251
Simple Summary This work shows that the predictions of lethal dependencies (LEDs) between genes can be dramatically improved by incorporating the "HUb effect in Genetic Essentiality" (HUGE) of gene alterations. In three genome-wide loss-of-function screens-Project Score, CERES score and DEMETER score-LEDs are identified with 75 times larger statistical power than using state-of-the-art methods. In AML, we identified LEDs not recalled by previous pipelines, including FLT3-mutant genotypes sensitive to FLT3 inhibitors. Interestingly, in-vitro validations confirm lethal de-pendencies of either NRAS or PTPN11 depending on the NRAS mutational status. Recent functional genomic screens-such as CRISPR-Cas9 or RNAi screening-have fostered a new wave of targeted treatments based on the concept of synthetic lethality. These approaches identified LEthal Dependencies (LEDs) by estimating the effect of genetic events on cell viability. The multiple-hypothesis problem is related to a large number of gene knockouts limiting the statistical power of these studies. Here, we show that predictions of LEDs from functional screens can be dramatically improved by incorporating the "HUb effect in Genetic Essentiality" (HUGE) of gene alterations. We analyze three recent genome-wide loss-of-function screens-Project Score, CERES score and DEMETER score-identifying LEDs with 75 times larger statistical power than using state-of-the-art methods. Using acute myeloid leukemia, breast cancer, lung adenocarcinoma and colon adenocarcinoma as disease models, we validate that our predictions are enriched in a recent harmonized knowledge base of clinical interpretations of somatic genomic variants in cancer (AUROC > 0.87). Our approach is effective even in tumors with large genetic heterogeneity such as acute myeloid leukemia, where we identified LEDs not recalled by previous pipelines, including FLT3-mutant genotypes sensitive to FLT3 inhibitors. Interestingly, in-vitro validations confirm lethal dependencies of either NRAS or PTPN11 depending on the NRAS mutational status. HUGE will hopefully help discover novel genetic dependencies amenable for precision-targeted therapies in cancer. All the graphs showing lethal dependencies for the 19 tumor types analyzed can be visualized in an interactive tool.
Revista:
BIOINFORMATICS
ISSN:
1367-4803
Año:
2022
Vol.:
38
N°:
6
Págs.:
1491 - 1496
Motivation: Isoform deconvolution is an NP-hard problem. The accuracy of the proposed solutions is far from perfect. At present, it is not known if gene structure and isoform concentration can be uniquely inferred given paired-end reads, and there is no objective method to select the fragment length to improve the number of identifiable genes. Different pieces of evidence suggest that the optimal fragment length is gene-dependent, stressing the need for a method that selects the fragment length according to a reasonable trade-off across all the genes in the whole genome. Results: A gene is considered to be identifiable if it is possible to get both the structure and concentration of its transcripts univocally. Here, we present a method to state the identifiability of this deconvolution problem. Assuming a given transcriptome and that the coverage is sufficient to interrogate all junction reads of the transcripts, this method states whether or not a gene is identifiable given the read length and fragment length distribution. Applying this method using different read and fragment length combinations, the optimal average fragment length for the human transcriptome is around 400-600 nt for coding genes and 150-200 nt for long non-coding RNAs. The optimal read length is the largest one that fits in the fragment length. It is also discussed the potential profit of combining several libraries to reconstruct the transcriptome. Combining two libraries of very different fragment lengths results in a significant improvement in gene identifiability.
Revista:
BIOINFORMATICS
ISSN:
1367-4803
Año:
2022
Vol.:
38
N°:
3
Págs.:
844 - 845
Motivation: Discover is an algorithm developed to identify mutually exclusive genomic events. Its main contribution is a statistical analysis based on the Poisson-Binomial (PB) distribution to take into account the mutation rate of genes and samples. Discover is very effective for identifying mutually exclusive mutations at the expense of speed in large datasets: the PB is computationally costly to estimate, and checking all the potential mutually exclusive alterations requires millions of tests. Results: We have implemented a new version of the package called Rediscover that implements exact and approximate computations of the PB. Rediscover exact implementation is slightly faster than Discover for large and medium-sized datasets. The approximation is 100-1000 times faster for them making it possible to get results in less than a minute with a standard desktop. The memory footprint is also smaller in Rediscover. The new package is available at CRAN and provides some functions to integrate its usage with other R packages such as maftools and TCGAbiolinks. Availability and implementation: Rediscover is available at CRAN (https://cran.r-project.org/web/packages/ Rediscover/index.html).
Autores:
Carrasco-García, E. (Autor de correspondencia); López, L.; Moncho-Amor, V.; et al.
Revista:
CANCERS
ISSN:
2072-6694
Año:
2022
Vol.:
14
N°:
4
Págs.:
916
Simple Summary Pancreatic cancers are lethal types of cancer. A majority of patients progress to an advanced and metastatic disease, which remains a major clinical problem. Therefore, it is crucial to identify critical regulators to help predict the disease progression and to develop more efficacious therapeutic approaches. In this work we found that an increased expression of the developmental factor SOX9 is associated with metastasis, a poor prognosis and resistance to therapy in pancreatic ductal adenocarcinoma patients and in cell cultures. We also found that this effect is at least in part due to the ability of SOX9 to regulate the activity of stem cell factors, such as BMI1, in addition to those involved in EMT and metastasis. Background: Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal cancers mainly due to spatial obstacles to complete resection, early metastasis and therapy resistance. The molecular events accompanying PDAC progression remain poorly understood. SOX9 is required for maintaining the pancreatic ductal identity and it is involved in the initiation of pancreatic cancer. In addition, SOX9 is a transcription factor linked to stem cell activity and is commonly overexpressed in solid cancers. It cooperates with Snail/Slug to induce epithelial-mesenchymal transition (EMT) during neural development and in diseases such as organ fibrosis or different types of cancer. Methods: We investigated the roles of SOX9 in pancreatic tumor cell plasticity, metastatic dissemination and chemoresistance using pancreatic cancer cell lines as well as mouse embryo fibroblasts. In addition, we characterized the clinical relevance of SOX9 in pancreatic cancer using human biopsies. Results: Gain- and loss-of-function of SOX9 in PDAC cells revealed that high levels of SOX9 increased migration and invasion, and promoted EMT and metastatic dissemination, whilst SOX9 silencing resulted in metastasis inhibition, along with a phenotypic reversion to epithelial features and loss of stemness potential. In both contexts, EMT factors were not altered. Moreover, high levels of SOX9 promoted resistance to gemcitabine. In contrast, overexpression of SOX9 was sufficient to promote metastatic potential in K-Ras transformed MEFs, triggering EMT associated with Snail/Slug activity. In clinical samples, SOX9 expression was analyzed in 198 PDAC cases by immunohistochemistry and in 53 patient derived xenografts (PDXs). SOX9 was overexpressed in primary adenocarcinomas and particularly in metastases. Notably, SOX9 expression correlated with high vimentin and low E-cadherin expression. Conclusions: Our results indicate that SOX9 facilitates PDAC progression and metastasis by triggering stemness and EMT.
Autores:
Blasco, T.; Pérez-Burillo, S.; Balzerani, F.; et al.
Revista:
NATURE COMMUNICATIONS
ISSN:
2041-1723
Año:
2021
Vol.:
12
N°:
1
Págs.:
4728
Understanding how diet and gut microbiota interact in the context of human health is a key question in personalized nutrition. Genome-scale metabolic networks and constraint-based modeling approaches are promising to systematically address this complex problem. However, when applied to nutritional questions, a major issue in existing reconstructions is the limited information about compounds in the diet that are metabolized by the gut microbiota. Here, we present AGREDA, an extended reconstruction of diet metabolism in the human gut microbiota. AGREDA adds the degradation pathways of 209 compounds present in the human diet, mainly phenolic compounds, a family of metabolites highly relevant for human health and nutrition. We show that AGREDA outperforms existing reconstructions in predicting diet-specific output metabolites from the gut microbiota. Using 16S rRNA gene sequencing data of faecal samples from Spanish children representing different clinical conditions, we illustrate the potential of AGREDA to establish relevant metabolic interactions between diet and gut microbiota. The interplay between human diet and the gut microbiome is complex. Here, the authors present a model of human-microbiome interaction that can predict how phenolic compounds are metabolized by the human gut microbiome, identifying diet-specific metabolites in children of varied clinical conditions.
Revista:
VEHICLES
ISSN:
2624-8921
Año:
2021
Vol.:
3
N°:
1
Págs.:
127 - 144
Direct Yaw Moment Control (DYC) is an effective way to alter the behaviour of electric cars with independent drives. Controlling the torque applied to each wheel can improve the handling performance of a vehicle making it safer and faster on a race track. The state-of-the-art literature covers the comparison of various controllers (PID, LPV, LQR, SMC, etc.) using ISO manoeuvres. However, a more advanced comparison of the important characteristics of the controllers' performance is lacking, such as the robustness of the controllers under changes in the vehicle model, steering behaviour, use of the friction circle, and, ultimately, lap time on a track. In this study, we have compared the controllers according to some of the aforementioned parameters on a modelled race car. Interestingly, best lap times are not provided by perfect neutral or close-to-neutral behaviour of the vehicle, but rather by allowing certain deviations from the target yaw rate. In addition, a modified Proportional Integral Derivative (PID) controller showed that its performance is comparable to other more complex control techniques such as Model Predictive Control (MPC).
Revista:
CANCERS
ISSN:
2072-6694
The development of predictive biomarkers of response to targeted therapies is an unmet clinical need for many antitumoral agents. Recent genome-wide loss-of-function screens, such as RNA interference (RNAi) and CRISPR-Cas9 libraries, are an unprecedented resource to identify novel drug targets, reposition drugs and associate predictive biomarkers in the context of precision oncology. In this work, we have developed and validated a large-scale bioinformatics tool named DrugSniper, which exploits loss-of-function experiments to model the sensitivity of 6237 inhibitors and predict their corresponding biomarkers of sensitivity in 30 tumor types. Applying DrugSniper to small cell lung cancer (SCLC), we identified genes extensively explored in SCLC, such as Aurora kinases or epigenetic agents. Interestingly, the analysis suggested a remarkable vulnerability to polo-like kinase 1 (PLK1) inhibition inCREBBP-mutant SCLC cells. We validated this association in vitro using four mutated and four wild-type SCLC cell lines and twoPLK1inhibitors (Volasertib and BI2536), confirming that the effect ofPLK1inhibitors depended on the mutational status ofCREBBP. Besides, DrugSniper was validated in-silico with several known clinically-used treatments, including the sensitivity of Tyrosine Kinase Inhibitors (TKIs) and Vemurafenib toFLT3andBRAFmutant cells, respectively. These findings show the potential of genome-wide loss-of-function screens to identify new personalized therapeutic hypotheses in SCLC and potentially in other tumors, which is a valuable starting point for further drug development and drug repositioning projects.
Autores:
Ferrer-Bonsoms, J. A.; Cassol, I. ; Fernandez-Acin, P. ; et al.
Revista:
SCIENTIFIC REPORTS
ISSN:
2045-2322
The advent of RNA-seq technologies has switched the paradigm of genetic analysis from a genome to a transcriptome-based perspective. Alternative splicing generates functional diversity in genes, but the precise functions of many individual isoforms are yet to be elucidated. Gene Ontology was developed to annotate gene products according to their biological processes, molecular functions and cellular components. Despite a single gene may have several gene products, most annotations are not isoform-specific and do not distinguish the functions of the different proteins originated from a single gene. Several approaches have tried to automatically annotate ontologies at the isoform level, but this has shown to be a daunting task. We have developed ISOGO (ISOform + GO function imputation), a novel algorithm to predict the function of coding isoforms based on their protein domains and their correlation of expression along 11,373 cancer patients. Combining these two sources of information outperforms previous approaches: it provides an area under precision-recall curve (AUPRC) five times larger than previous attempts and the median AUROC of assigned functions to genes is 0.82. We tested ISOGO predictions on some genes with isoform-specific functions (BRCA1, MADD,VAMP7 and ITSN1) and they were coherent with the literature. Besides, we examined whether the main isoform of each gene -as predicted by APPRIS- was the most likely to have the annotated gene functions and it occurs in 99.4% of the genes. We also evaluated the predictions for isoform-specific functions provided by the CAFA3 challenge and results were also convincing. To make these results available to the scientific community, we have deployed a web application to consult ISOGO predictions (https://biotecnun.unav.es/app/isogo). Initial data, website link, isoform-specific GO function predictions and R code is available at https://gitlab.com/icassol/isogo.
Revista:
EBIOMEDICINE
ISSN:
2352-3964
Año:
2020
Vol.:
51
Págs.:
UNSP 102577
Revista:
BMC GENOMICS
ISSN:
1471-2164
Año:
2019
Vol.:
20
N°:
Art. 521
BackgroundSplicing is a genetic process that has important implications in several diseases including cancer. Deciphering the complex rules of splicing regulation is crucial to understand and treat splicing-related diseases. Splicing factors and other RNA-binding proteins (RBPs) play a key role in the regulation of splicing. The specific binding sites of an RBP can be measured using CLIP experiments. However, to unveil which RBPs regulate a condition, it is necessary to have a priori hypotheses, as a single CLIP experiment targets a single protein.ResultsIn this work, we present a novel methodology to predict context-specific splicing factors from transcriptomic data. For this, we systematically collect, integrate and analyze more than 900 CLIP experiments stored in four CLIP databases: POSTAR2, CLIPdb, DoRiNA and StarBase. The analysis of these experiments shows the strong coherence between the binding sites of RBPs of similar families. Augmenting this information with expression changes, we are able to correctly predict the splicing factors that regulate splicing in two gold-standard experiments in which specific splicing factors are knocked-down.ConclusionsThe methodology presented in this study allows the prediction of active splicing factors in either cancer or any other condition by only using the information of transcript expression. This approach opens a wide range of possible studies to understand the splicing regulation of different conditions. A tutorial with the source code and databases is available at https://gitlab.com/fcarazo.m/sfprediction.
Revista:
GIGASCIENCE
ISSN:
2047-217X
BACKGROUND:
Aberrant alternative splicing plays a key role in cancer development. In recent years, alternative splicing has been used as a prognosis biomarker, a therapy response biomarker, and even as a therapeutic target. Next-generation RNA sequencing has an unprecedented potential to measure the transcriptome. However, due to the complexity of dealing with isoforms, the scientific community has not sufficiently exploited this valuable resource in precision medicine.
FINDINGS:
We present TranscriptAchilles, the first large-scale tool to predict transcript biomarkers associated with gene essentiality in cancer. This application integrates 412 loss-of-function RNA interference screens of >17,000 genes, together with their corresponding whole-transcriptome expression profiling. Using this tool, we have studied which are the cancer subtypes for which alternative splicing plays a significant role to state gene essentiality. In addition, we include a case study of renal cell carcinoma that shows the biological soundness of the results. The databases, the source code, and a guide to build the platform within a Docker container are available at GitLab. The application is also available online.
CONCLUSIONS:
TranscriptAchilles provides a user-friendly web interface to identify transcript or gene biomarkers of gene essentiality, which could be used as a starting point for a drug development project. This approach opens a wide range of translational applications in cancer.
Revista:
BMC GENOMICS
ISSN:
1471-2164
Año:
2018
Vol.:
19
N°:
703
Background: RNA-seq is a reference technology for determining alternative splicing at genome-wide level. Exon arrays remain widely used for the analysis of gene expression, but show poor validation rate with regard to splicing events. Commercial arrays that include probes within exon junctions have been developed in order to overcome this problem. We compare the performance of RNA-seq (Illumina HiSeq) and junction arrays (Affymetrix Human Transcriptome array) for the analysis of transcript splicing events. Three different breast cancer cell lines were treated with CX-4945, a drug that severely affects splicing. To enable a direct comparison of the two platforms, we adapted EventPointer, an algorithm that detects and labels alternative splicing events using junction arrays, to work also on RNA-seq data. Common results and discrepancies between the technologies were validated and/or resolved by over 200 PCR experiments. Results: As might be expected, RNA-seq appears superior in cases where the technologies disagree and is able to discover novel splicing events beyond the limitations of physical probe-sets. We observe a high degree of coherence between the two technologies, however, with correlation of EventPointer results over 0.90. Through decimation, the detection power of the junction arrays is equivalent to RNA-seq with up to 60 million reads. Conclusions: Our results suggest, therefore, that exon-junction arrays are a viable alternative to RNA-seq for detection of alternative splicing events when focusing on well-described transcriptional regions.
Revista:
NEURO-ONCOLOGY
ISSN:
1522-8517
Año:
2018
Vol.:
20
N°:
7
Págs.:
930 - 941
Background: Glioblastoma, the most aggressive primary brain tumor, is genetically heterogeneous. Alternative splicing (AS) plays a key role in numerous pathologies, including cancer. The objectives of our study were to determine whether aberrant AS could play a role in the malignant phenotype of glioma and to understand the mechanism underlying its aberrant regulation. Methods: We obtained surgical samples from patients with glioblastoma who underwent 5-aminolevulinic fluorescence-guided surgery. Biopsies were taken from the tumor center as well as from adjacent normal-appearing tissue. We used a global splicing array to identify candidate genes aberrantly spliced in these glioblastoma samples. Mechanistic and functional studies were performed to elucidate the role of our top candidate splice variant, BAF45d, in glioblastoma. Results: BAF45d is part of the switch/sucrose nonfermentable complex and plays a key role in the development of the CNS. The BAF45d/6A isoform is present in 85% of over 200 glioma samples that have been analyzed and contributes to the malignant glioma phenotype through the maintenance of an undifferentiated cellular state. We demonstrate that BAF45d splicing is mediated by polypyrimidine tract-binding protein 1 (PTBP1) and that BAF45d regulates PTBP1, uncovering a reciprocal interplay between RNA splicing regulation and transcription. Conclusions: Our data indicate that AS is a mechanism that contributes to the malignant phenotype of glioblastoma. Understanding the consequences of this biological process will uncover new therapeutic targets for this devastating disease.
Revista:
SCIENTIFIC REPORTS
ISSN:
2045-2322
Constraint-based modeling for genome-scale metabolic networks has emerged in the last years as a promising approach to elucidate drug targets in cancer. Beyond the canonical biosynthetic routes to produce biomass, it is of key importance to focus on metabolic routes that sustain the proliferative capacity through the regulation of other biological means in order to improve in-silico gene essentiality analyses. Polyamines are polycations with central roles in cancer cell proliferation, through the regulation of transcription and translation among other things, but are typically neglected in in silico cancer metabolic models. In this study, we analysed essential genes for the biosynthesis of polyamines. Our analysis corroborates the importance of previously known regulators of the pathway, such as Adenosylmethionine Decarboxylase 1 (AMD1) and uncovers novel enzymes predicted to be relevant for polyamine homeostasis. We focused on Adenine Phosphoribosyltransferase (APRT) and demonstrated the detrimental consequence of APRT gene silencing on different leukaemia cell lines. Our results highlight the importance of revisiting the metabolic models used for in-silico gene essentiality analyses in order to maximize the potential for drug target identification in cancer.
Autores:
Garcia, I. ; Aldaregia, J. ; Vicentic, J. M.; et al.
Revista:
SCIENTIFIC REPORTS
ISSN:
2045-2322
Glioblastoma remains the most common and deadliest type of brain tumor and contains a population of self-renewing, highly tumorigenic glioma stem cells (GSCs), which contributes to tumor initiation and treatment resistance. Developmental programs participating in tissue development and homeostasis re-emerge in GSCs, supporting the development and progression of glioblastoma. SOX1 plays an important role in neural development and neural progenitor pool maintenance. Its impact on glioblastoma remains largely unknown. In this study, we have found that high levels of SOX1 observed in a subset of patients correlate with lower overall survival. At the cellular level, SOX1 expression is elevated in patient-derived GSCs and it is also higher in oncosphere culture compared to differentiation conditions in conventional glioblastoma cell lines. Moreover, genetic inhibition of SOX1 in patient-derived GSCs and conventional cell lines decreases self-renewal and proliferative capacity in vitro and tumor initiation and growth in vivo. Contrarily, SOX1 over-expression moderately promotes self-renewal and proliferation in GSCs. These functions seem to be independent of its activity as Wnt/beta-catenin signaling regulator. In summary, these results identify a functional role for SOX1 in regulating glioma cell heterogeneity and plasticity, and suggest SOX1 as a potential target in the GSC population in glioblastoma.
Revista:
MOLECULAR ONCOLOGY
ISSN:
1574-7891
Año:
2016
Vol.:
10
N°:
9
Págs.:
1437 - 1449
Increasing interest has been devoted in recent years to the understanding of alternative splicing in cancer. In this study, we performed a genome-wide analysis to identify cancer-associated splice variants in non-small cell lung cancer. We discovered and validated novel differences in the splicing of genes known to be relevant to lung cancer biology, such as NFIB, ENAH or SPAG9. Gene enrichment analyses revealed an important contribution of alternative splicing to cancer-related molecular functions, especially those involved in cytoskeletal dynamics. Interestingly, a substantial fraction of the altered genes found in our analysis were targets of the protein quaking (QKI), pointing to this factor as one of the most relevant regulators of alternative splicing in non-small cell lung cancer. We also found that ESYT2, one of the QKI targets, is involved in cytoskeletal organization. ESYT2-short variant inhibition in lung cancer cells resulted in a cortical distribution of actin whereas inhibition of the long variant caused an increase of endocytosis, suggesting that the cancer-associated splicing pattern of ESYT2 has a profound impact in the biology of cancer cells. Finally, we show that low nuclear QKI expression in non-small cell lung cancer is an independent prognostic factor for disease-free survival (HR = 2.47; 95% CI = 1.11-5.46, P = 0.026). In conclusion, we identified several splicing variants with functional relevance in lung cancer largely regulated by the splicing factor QKI, a tumor suppressor associated with prognosis in lung cancer.
Revista:
BMC GENOMICS
ISSN:
1471-2164
Año:
2016
Vol.:
17
Págs.:
467
Background: Alternative splicing (AS) is a major source of variability in the transcriptome of eukaryotes. There is an increasing interest in its role in different pathologies. Before sequencing technology appeared, AS was measured with specific arrays. However, these arrays did not perform well in the detection of AS events and provided very large false discovery rates (FDR). Recently the Human Transcriptome Array 2.0 (HTA 2.0) has been deployed. It includes junction probes. However, the interpretation software provided by its vendor (TAC 3.0) does not fully exploit its potential (does not study jointly the exons and junctions involved in a splicing event) and can only be applied to case-control studies. New statistical algorithms and software must be developed in order to exploit the HTA 2.0 array for event detection. Results: We have developed EventPointer, an R package (built under the aroma. affymetrix framework) to search and analyze Alternative Splicing events using HTA 2.0 arrays. This software uses a linear model that broadens its application from plain case-control studies to complex experimental designs. Given the CEL files and the design and contrast matrices, the software retrieves a list of all the detected events indicating: 1) the type of event (exon cassette, alternative 3', etc.), 2) its fold change and its statistical significance, and 3) the potential protein domains affected by the AS events and the statistical significance of the possible enrichment. Our tests have shown that EventPointer has an extremely low FDR value (only 1 false positive within the tested top-200 events). This software is publicly available and it has been uploaded to GitHub. Conclusions: This software empowers the HTA 2.0 arrays for AS event detection as an alternative to RNA-seq: simplifying considerably the required analysis, speeding it up and reducing the required computational power.
Revista:
THE R JOURNAL
ISSN:
2073-4859
Año:
2015
Vol.:
7
N°:
2
Págs.:
275 - 287
Code analysis tools are crucial to understand program behavior. Profile tools use the results of time measurements in the execution of a program to gain this understanding and thus help in the optimization of the code. In this paper, we review the different available packages to profile R code and show the advantages and disadvantages of each of them. In additon, we present GUIProfiler, a package that fulfills some unmet needs. Package GUIProfiler generates an HTML report with the timing for each code line and the relationships between different functions. This package mimics the behavior of the MATLAB profiler. The HTML report includes information on the time spent on each of the lines of the profiled code (the slowest code is highlighted). If the package is used within the RStudio environment, the user can navigate across the bottlenecks in the code and open the editor to modify the lines of code where more time is spent. It is also possible to edit the code using Notepad++ (a free editor for Windows) by simply clicking on the corresponding line. The graphical user interface makes it easy to identify the specific lines which slow down the code. The integration in RStudio and the generation of an HTML report makes GUIProfiler a very convenient tool to perform code optimization.
Revista:
BMC GENOMICS
ISSN:
1471-2164
Año:
2015
Vol.:
16
Págs.:
752
Background: The development of a more refined prognostic methodology for early non-small cell lung cancer (NSCLC) is an unmet clinical need. An accurate prognostic tool might help to select patients at early stages for adjuvant therapies.
Results: A new integrated bioinformatics searching strategy, that combines gene copy number alterations and expression, together with clinical parameters was applied to derive two prognostic genomic signatures. The proposed methodology combines data from patients with and without clinical data with a priori information on the ability of a gene to be a prognostic marker. Two initial candidate sets of 513 and 150 genes for lung adenocarcinoma (ADC) and squamous cell carcinoma (SCC), respectively, were generated by identifying genes which have both: a) significant correlation between copy number and gene expression, and b) significant prognostic value at the gene expression level in external databases. From these candidates, two panels of 7 (ADC) and 5 (SCC) genes were further identified via semi-supervised learning. These panels, together with clinical data (stage, age and sex), were used to construct the ADC and SCC hazard scores combining clinical and genomic data. The signatures were validated in two independent datasets (n = 73 for ADC, n = 97 for SCC), confirming that the prognostic value of both clinical-genomic models is robust, statistically significant (P = 0.008 for ADC and P = 0.019 for SCC) and outperforms both the clinical models (P = 0.060 for ADC and P = 0.121 for SCC) and the genomic models applied separately (P = 0.350 for ADC and P = 0.269 for SCC).
Conclusion: The present work provides a methodology to generate a robust signature using copy number data that can be potentially used to any cancer. Using it, we found new prognostic scores based on tumor DNA that, jointly with clinical information, are able to predict overall survival (OS) in patients with early-stage ADC and SCC.
Revista:
EXPERT SYSTEMS WITH APPLICATIONS
ISSN:
0957-4174
Año:
2014
Vol.:
41
N°:
11
Págs.:
5190 - 5200
The objective of this research is to select a reduced group of surface electromyographic (sEMG) channels and signal-features that is able to provide an accurate classification rate in a myoelectric control system for any user. To that end, the location of 32 sEMG electrodes placed around-along the forearm and 86 signal-features are evaluated simultaneously in a static-hand gesture classification task (14 different gestures). A novel multivariate variable selection filter method named mRMR-FCO is presented as part of the selection process. This process finds the most informative and least redundant combination of sEMG channels and signal-features among all the possible ones. The performance of the selected set of channels and signal-features is evaluated with a Support Vector Machine classifier. (C) 2014 Elsevier Ltd. All rights reserved.
Revista:
CANCER RESEARCH
ISSN:
0008-5472
Año:
2014
Vol.:
74
N°:
4
Págs.:
1105 - 1115
Abnormal alternative splicing has been associated with cancer. Genome-wide microarrays can be used to detect differential splicing events. In this study, we have developed ExonPointer, an algorithm that uses data from exon and junction probes to identify annotated cassette exons. We used the algorithm to profile differential splicing events in lung adenocarcinoma A549 cells after downregulation of the oncogenic serine/arginine-rich splicing factor 1 (SRSF1). Data were generated using two different microarray platforms. The PCR-based validation rate of the top 20 ranked genes was 60% and 100%. Functional enrichment analyses found a substantial number of splicing events in genes related to RNA metabolism. These analyses also identified genes associated with cancer and developmental and hereditary disorders, as well as biologic processes such as cell division, apoptosis, and proliferation. Most of the top 20 ranked genes were validated in other adenocarcinoma and squamous cell lung cancer cells, with validation rates of 80% to 95% and 70% to 75%, respectively. Moreover, the analysis allowed us to identify four genes, ATP11C, IQCB1, TUBD1, and proline-rich coiled-coil 2C (PRRC2C), with a significantly different pattern of alternative splicing in primary non-small cell lung tumors compared with normal lung tissue. In the case of PRRC2C, SRSF1 downregulation led to the skipping of an exon overexpressed in primary lung tumors. Specific siRNA downregulation of the exon-containing var
Revista:
BMC GENOMICS
ISSN:
1471-2164
Año:
2014
Vol.:
15
N°:
Suppl10:S2
Background: MicroRNAs are short RNA molecules that post-transcriptionally regulate gene expression. Today, microRNA target prediction remains challenging since very few have been experimentally validated and sequence-based predictions have large numbers of false positives. Furthermore, due to the different measuring rules used in each database of predicted interactions, the selection of the most reliable ones requires extensive knowledge about each algorithm. Results: Here we propose two methods to measure the confidence of predicted interactions based on experimentally validated information. The output of the methods is a combined database where new scores and statistical confidences are re-assigned to each predicted interaction. The new scores allow the robust combination of several databases without the effect of low-performing algorithms dragging down good-performing ones. The combined databases obtained using both algorithms described in this paper outperform each of the existing predictive algorithms that were considered for the combination. Conclusions: Our approaches are a useful way to integrate predicted interactions from different databases. They reduce the selection of interactions to a unique database based on an intuitive score and allow comparing databases between them.
Revista:
PLOS ONE
ISSN:
1932-6203
Metabolism expresses the phenotype of living cells and understanding it is crucial for different applications in biotechnology and health. With the increasing availability of metabolomic, proteomic and, to a larger extent, transcriptomic data, the elucidation of specific metabolic properties in different scenarios and cell types is a key topic in systems biology. Despite the potential of the elementary flux mode (EFM) concept for this purpose, its use has been limited so far, mainly because their computation has been infeasible for genome-scale metabolic networks. In a recent work, we determined a subset of EFMs in human metabolism and proposed a new protocol to integrate gene expression data, spotting key 'characteristic EFMs' in different scenarios. Our approach was successfully applied to identify metabolic differences among several human healthy tissues. In this article, we evaluated the performance of our approach in clinically interesting situation. In particular, we identified key EFMs and metabolites in adenocarcinoma and squamous-cell carcinoma subtypes of non-small cell lung cancers. Results are consistent with previous knowledge of these major subtypes of lung cancer in the medical literature. Therefore, this work constitutes the starting point to establish a new methodology that could lead to distinguish key metabolic processes among different clinical outcomes.
Revista:
BMC SYSTEMS BIOLOGY
ISSN:
1752-0509
Año:
2013
Vol.:
7
N°:
134
Background: The study of cellular metabolism in the context of high-throughput -omics data has allowed us to decipher novel mechanisms of importance in biotechnology and health. To continue with this progress, it is essential to efficiently integrate experimental data into metabolic modeling. Results: We present here an in-silico framework to infer relevant metabolic pathways for a particular phenotype under study based on its gene/protein expression data. This framework is based on the Carbon Flux Path (CFP) approach, a mixed-integer linear program that expands classical path finding techniques by considering additional biophysical constraints. In particular, the objective function of the CFP approach is amended to account for gene/protein expression data and influence obtained paths. This approach is termed integrative Carbon Flux Path (iCFP). We show that gene/protein expression data also influences the stoichiometric balancing of CFPs, which provides a more accurate picture of active metabolic pathways. This is illustrated in both a theoretical and real scenario. Finally, we apply this approach to find novel pathways relevant in the regulation of acetate overflow metabolism in Escherichia coli. As a result, several targets which could be relevant for better understanding of the phenomenon leading to impaired acetate overflow are proposed. Conclusions: A novel mathematical framework that determines functional pathways based on gene/protein expression data is presented and validated. We show that our approach is able to provide new insights into complex biological scenarios such as acetate overflow in Escherichia coli.
Revista:
BIOINFORMATICS
ISSN:
1367-4803
Año:
2013
Vol.:
29
N°:
16
Págs.:
2009 - 2016
Motivation: The analysis of high-throughput molecular data in the context of metabolic pathways is essential to uncover their underlying functional structure. Among different metabolic pathway concepts in systems biology, elementary flux modes (EFMs) hold a predominant place, as they naturally capture the complexity and plasticity of cellular metabolism and go beyond predefined metabolic maps. However, their use to interpret high-throughput data has been limited so far, mainly because their computation in genome-scale metabolic networks has been unfeasible. To face this issue, different optimization-based techniques have been recently introduced and their application to human metabolism is promising. Results: In this article, we exploit and generalize the K-shortest EFM algorithm to determine a subset of EFMs in a human genome-scale metabolic network. This subset of EFMs involves a wide number of reported human metabolic pathways, as well as potential novel routes, and constitutes a valuable database where high-throughput data can be mapped and contextualized from a metabolic perspective. To illustrate this, we took expression data of 10 healthy human tissues from a previous study and predicted their characteristic EFMs based on enrichment analysis. We used a multivariate hypergeometric test and showed that it leads to more biologically meaningful results than standard hypergeometric. Finally, a biological discussion on the characteristic EFMs obtained in liver is conducted, finding a high level of agreement when compared with the literature.
Revista:
BIOINFORMATICS
ISSN:
1367-4803
Año:
2012
Vol.:
28
N°:
13
Págs.:
1793 - 1794
CalMaTe calibrates preprocessed allele-specific copy number estimates (ASCNs) from DNA microarrays by controlling for single-nucleotide polymorphism-specific allelic crosstalk. The resulting ASCNs are on average more accurate, which increases the power of segmentation methods for detecting changes between copy number states in tumor studies including copy neutral loss of heterozygosity. CalMaTe applies to any ASCNs regardless of preprocessing method and microarray technology, e. g. Affymetrix and Illumina.
Revista:
ALGORITHMS FOR MOLECULAR BIOLOGY
ISSN:
1748-7188
Background: The selection of the reference to scale the data in a copy number analysis has paramount importance to achieve accurate estimates. Usually this reference is generated using control samples included in the study. However, these control samples are not always available and in these cases, an artificial reference must be created. A proper generation of this signal is crucial in terms of both noise and bias. We propose NSA (Normality Search Algorithm), a scaling method that works with and without control samples. It is based on the assumption that genomic regions enriched in SNPs with identical copy numbers in both alleles are likely to be normal. These normal regions are predicted for each sample individually and used to calculate the final reference signal. NSA can be applied to any CN data regardless the microarray technology and preprocessing method. It also finds an optimal weighting of the samples minimizing possible batch effects. Results: Five human datasets (a subset of HapMap samples, Glioblastoma Multiforme (GBM), Ovarian, Prostate and Lung Cancer experiments) have been analyzed. It is shown that using only tumoral samples, NSA is able to remove the bias in the copy number estimation, to reduce the noise and therefore, to increase the ability to detect copy number aberrations (CNAs). These improvements allow NSA to also detect recurrent aberrations more accurately than other state of the art methods. Conclusions: NSA provides a robust and accurate reference for scaling probe signals data to CN values without the need of control samples. It minimizes the problems of bias, noise and batch effects in the estimation of CNs. Therefore, NSA scaling approach helps to better detect recurrent CNAs than current methods. The automatic selection of references makes it useful to perform bulk analysis of many GEO or ArrayExpress experiments without the need of developing a parser to find the normal samples or possible batches within the data. The method is available in the open-source R package NSA, which is an add-on to the aroma.cn framework. http://www.aroma-project.org/addons.
Revista:
METABOLIC ENGINEERING
ISSN:
1096-7176
Año:
2012
Vol.:
14
N°:
4
Págs.:
344 - 353
Constraints-based modeling is an emergent area in Systems Biology that includes an increasing set of methods for the analysis of metabolic networks. In order to refine its predictions, the development of novel methods integrating high-throughput experimental data is currently a key challenge in the field. In this paper, we present a novel set of constraints that integrate tracer-based metabolomics data from Isotope Labeling Experiments and metabolic fluxes in a linear fashion. These constraints are based on Elementary Carbon Modes (ECMs), a recently developed concept that generalizes Elementary Flux Modes at the carbon level. To illustrate the effect of our ECMs-based constraints, a Flux Variability Analysis approach was applied to a previously published metabolic network involving the main pathways in the metabolism of glucose. The addition of our ECMs-based constraints substantially reduced the under-determination resulting from a standard application of Flux Variability Analysis, which shows a clear progress over the state of the art. In addition, our approach is adjusted to deal with combinatorial explosion of ECMs in genome-scale metabolic networks. This extension was applied to infer the maximum biosynthetic capacity of non-essential amino acids in human metabolism. Finally, as linearity is the hallmark of our approach, its importance is discussed at a methodological, computational and theoretical level and illustrated with a practical application in the field of Isotope Labeling Experiments. (C) 2012 Elsevier Inc. All rights reserved.
Revista:
PLOS ONE
ISSN:
1932-6203
Año:
2012
Vol.:
7
N°:
2
Págs.:
e30766
miRNAs are small RNA molecules (' 22nt) that interact with their corresponding target mRNAs inhibiting the translation of the mRNA into proteins and cleaving the target mRNA. This second effect diminishes the overall expression of the target mRNA. Several miRNA-mRNA relationship databases have been deployed, most of them based on sequence complementarities. However, the number of false positives in these databases is large and they do not overlap completely. Recently, it has been proposed to combine expression measurement from both miRNA and mRNA and sequence based predictions to achieve more accurate relationships. In our work, we use LASSO regression with non-positive constraints to integrate both sources of information. LASSO enforces the sparseness of the solution and the non-positive constraints restrict the search of miRNA targets to those with down-regulation effects on the mRNA expression. We named this method TaLasso (miRNA-Target LASSO). We used TaLasso on two public datasets that have paired expression levels of human miRNAs and mRNAs. The top ranked interactions recovered by TaLasso are especially enriched (more than using any other algorithm) in experimentally validated targets. The functions of the genes with mRNA transcripts in the top-ranked interactions are meaningful. This is not the case using other algorithms. TaLasso is available as Matlab or R code. There is also a web-based tool for human miRNAs at http://talasso.cnb.csic.es/.
Revista:
BIOSYSTEMS
ISSN:
0303-2647
Año:
2011
Vol.:
105
N°:
2
Págs.:
140 - 146
The elementary flux modes (EFMs) approach is an efficient computational tool to predict novel metabolic pathways. Elucidating the physiological relevance of EFMs in a particular cellular state is still an open challenge. Different methods have been presented to carry out this task. However, these methods typically use little experimental data, exploiting methodologies where an a priori optimization function is used to deal with the indetermination underlying metabolic networks. Available "omics" data represent an opportunity to refine current methods. In this article we discuss whether (or not) metabolomics data from isotope labeling experiments (ILEs) and EFMs can be integrated into a linear system of equations. Aside from refining current approaches to infer the physiological relevance of EFMs, this question is important for the integration of metabolomics data from ILEs into metabolic networks, which generally involve non-linear relationships. As a result of our analysis, we concluded that in general the concept of EFMs needs to be redefined at the atomic level for the modeling of ILEs. For this purpose, the concept of Elementary Carbon Modes (ECMs) is introduced. (C) 2011 Elsevier Ireland Ltd. All rights reserved.
Revista:
PLoS One
ISSN:
1932-6203
Año:
2011
Vol.:
6
N°:
11
Págs.:
-
We undertook this study to understand how the transcription factor Sox2 contributes to the malignant phenotype of glioblastoma multiforme (GBM), the most aggressive primary brain tumor. We initially looked for unbalanced genomic rearrangements in the Sox2 locus in 42 GBM samples and found that Sox2 was amplified in 11.5% and overexpressed in all the samples. These results prompted us to further investigate the mechanisms involved in Sox2 overexpression in GBM. We analyzed the methylation status of the Sox2 promoter because high CpG density promoters are associated with key developmental genes. The Sox2 promoter presented a CpG island that was hypomethylated in all the patient samples when compared to normal cell lines. Treatment of Sox2-negative glioma cell lines with 5-azacitidine resulted in the re-expression of Sox2 and in a change in the methylation status of the Sox2 promoter. We further confirmed these results by analyzing data from GBM cases generated by The Cancer Genome Atlas project. We observed Sox2 overexpression (86%; N¿=¿414), Sox2 gene amplification (8.5%; N¿=¿492), and Sox 2 promoter hypomethylation (100%; N¿=¿258), suggesting the relevance of this factor in the malignant phenotype of GBMs. To further explore the role of Sox2, we performed in vitro analysis with brain tumor stem cells (BTSCs) and established glioma cell lines. Downmodulation of Sox2 in BTSCs resulted in the loss of their self-renewal properties. Surprisingly, ectopic expression of Sox2 in esta
Revista:
GENOMICS
ISSN:
0888-7543
Año:
2011
Vol.:
97
N°:
2
Págs.:
86 - 93
DNA copy number aberrations (CNAs) are genetic alterations common in cancer cells. Their transcriptional consequences are still poorly understood. Based on the fact that DNA copy number (CN) is highly correlated with the genomic position, we have applied a segmentation algorithm to gene expression (GE) to explore its relation with CN. We have found a strong correlation between segmented CN (sCN) and segmented GE (sGE), corroborating that CNAs have clear effects on genome-wide expression. We have found out that most of the recurrent regions of sGE are common to those obtained from sCN analysis. Results for two cancer datasets confirm the known targets of aberrations and provide new candidates to study. The suggested methodology allows to find recurrent aberrations specific to sGE, revealing loci where the expression of the genes is independent from their CNs. R code and additional files are available as supplementary material. (C) 2010 Elsevier Inc. All rights reserved.
Revista:
BIOINFORMATICS
ISSN:
1367-4803
Año:
2010
Vol.:
26
N°:
15
Págs.:
1827 - 1833
Motivation: Current algorithms for estimating DNA copy numbers (CNs) borrow concepts from gene expression analysis methods. However, single nucleotide polymorphism (SNP) arrays have special characteristics that, if taken into account, can improve the overall performance. For example, cross hybridization between alleles occurs in SNP probe pairs. In addition, most of the current CN methods are focused on total CNs, while it has been shown that allele-specific CNs are of paramount importance for some studies. Therefore, we have developed a summarization method that estimates high-quality allele-specific CNs. Results: The proposed method estimates the allele-specific DNA CNs for all Affymetrix SNP arrays dealing directly with the cross hybridization between probes within SNP probesets. This algorithm outperforms (or at least it performs as well as) other state-of-the-art algorithms for computing DNA CNs. It better discerns an aberration from a normal state and it also gives more precise allele-specific CNs.
Revista:
BMC GENOMICS
ISSN:
1471-2164
Año:
2010
Vol.:
11
Págs.:
352
Background: Microarrays strategies, which allow for the characterization of thousands of alternative splice forms in a single test, can be applied to identify differential alternative splicing events. In this study, a novel splice array approach was developed, including the design of a high-density oligonucleotide array, a labeling procedure, and an algorithm to identify splice events.
Results: The array consisted of exon probes and thermodynamically balanced junction probes. Suboptimal probes were tagged and considered in the final analysis. An unbiased labeling protocol was developed using random primers. The algorithm used to distinguish changes in expression from changes in splicing was calibrated using internal non-spliced control sequences. The performance of this splice array was validated with artificial constructs for CDC6, VEGF, and PCBP4 isoforms. The platform was then applied to the analysis of differential splice forms in lung cancer samples compared to matched normal lung tissue. Overexpression of splice isoforms was identified for genes encoding CEACAM1, FHL-1, MLPH, and SUSD2. None of these splicing isoforms had been previously associated with lung cancer.
Conclusions: This methodology enables the detection of alternative splicing events in complex biological samples, providing a powerful tool to identify novel diagnostic and prognostic biomarkers for cancer and other pathologies.
Revista:
METHODS IN MOLECULAR BIOLOGY
ISSN:
1064-3745
Año:
2010
Vol.:
593
Págs.:
157 - 174
High-throughput gene expression technologies based on DNA microarrays allow the examination of biological systems. However, the interpretation of the complex molecular descriptions generated by these approaches is still challenging. The development of new methodologies to identify common regulatory mechanisms involved in the control of the expression of a set of co-expressed genes might enhance our capacity to extract functional information from genomic data sets. In this chapter, we describe a method that integrates different sources of information: gene expression data, genome sequence information, described transcription factor binding sites (TFBSs), functional information, and bibliographic data. The starting point of the analysis is the extraction of promoter sequences from a whole genome and the detection of TFBSs in each gene promoter. This information allows the identification of enriched TFBSs in the proximal promoter of differentially expressed genes. The functional and bibliographic interpretation of the results improves our biological insight into the regulatory mechanisms involved in a microarray experiment.
Revista:
BMC BIOINFORMATICS
ISSN:
1471-2105
Background: Exon arrays provide a way to measure the expression of different isoforms of genes in an organism. Most of the procedures to deal with these arrays are focused on gene expression or on exon expression. Although the only biological analytes that can be properly assigned a concentration are transcripts, there are very few algorithms that focus on them. The reason is that previously developed summarization methods do not work well if applied to transcripts. In addition, gene structure prediction, i.e., the correspondence between probes and novel isoforms, is a field which is still unexplored. Results: We have modified and adapted a previous algorithm to take advantage of the special characteristics of the Affymetrix exon arrays. The structure and concentration of transcripts -some of them possibly unknown-in microarray experiments were predicted using this algorithm. Simulations showed that the suggested modifications improved both specificity (SP) and sensitivity (ST) of the predictions. The algorithm was also applied to different real datasets showing its effectiveness and the concordance with PCR validated results. Conclusions: The proposed algorithm shows a substantial improvement in the performance over the previous version. This improvement is mainly due to the exploitation of the redundancy of the Affymetrix exon arrays. An R-Package of SPACE with the updated algorithms have been developed and is freely available.