Nuestros investigadores

Ángel Rubio Díaz-Cordoves

Departamento de Ingeniería Biomédica y Ciencias
Escuela de Ingeniería (TECNUN). Universidad de Navarra
Índice H
19, (WoS, 23/09/2019)

Publicaciones científicas más recientes (desde 2010)

Autores: Rubio Díaz-Cordoves, Ángel (Autor de correspondencia)
ISSN 2352-3964  Vol. 51  2020  págs. UNSP 102577
Autores: Ferrer-Bonsoms, J. A.; Cassol, I. ; Fernandez-Acin, P. ; et al.
ISSN 2045-2322  Vol. 10  Nº 1  2020 
The advent of RNA-seq technologies has switched the paradigm of genetic analysis from a genome to a transcriptome-based perspective. Alternative splicing generates functional diversity in genes, but the precise functions of many individual isoforms are yet to be elucidated. Gene Ontology was developed to annotate gene products according to their biological processes, molecular functions and cellular components. Despite a single gene may have several gene products, most annotations are not isoform-specific and do not distinguish the functions of the different proteins originated from a single gene. Several approaches have tried to automatically annotate ontologies at the isoform level, but this has shown to be a daunting task. We have developed ISOGO (ISOform + GO function imputation), a novel algorithm to predict the function of coding isoforms based on their protein domains and their correlation of expression along 11,373 cancer patients. Combining these two sources of information outperforms previous approaches: it provides an area under precision-recall curve (AUPRC) five times larger than previous attempts and the median AUROC of assigned functions to genes is 0.82. We tested ISOGO predictions on some genes with isoform-specific functions (BRCA1, MADD,VAMP7 and ITSN1) and they were coherent with the literature. Besides, we examined whether the main isoform of each gene -as predicted by APPRIS- was the most likely to have the annotated gene functions and it occurs in 99.4% of the genes. We also evaluated the predictions for isoform-specific functions provided by the CAFA3 challenge and results were also convincing. To make these results available to the scientific community, we have deployed a web application to consult ISOGO predictions ( Initial data, website link, isoform-specific GO function predictions and R code is available at
Autores: Carazo Melo, Fernando; Bértolo Martín de Rosales, Cristina María; Castilla Ruíz, Carlos; et al.
Revista: CANCERS
ISSN 2072-6694  Vol. 12  Nº 7  2020 
The development of predictive biomarkers of response to targeted therapies is an unmet clinical need for many antitumoral agents. Recent genome-wide loss-of-function screens, such as RNA interference (RNAi) and CRISPR-Cas9 libraries, are an unprecedented resource to identify novel drug targets, reposition drugs and associate predictive biomarkers in the context of precision oncology. In this work, we have developed and validated a large-scale bioinformatics tool named DrugSniper, which exploits loss-of-function experiments to model the sensitivity of 6237 inhibitors and predict their corresponding biomarkers of sensitivity in 30 tumor types. Applying DrugSniper to small cell lung cancer (SCLC), we identified genes extensively explored in SCLC, such as Aurora kinases or epigenetic agents. Interestingly, the analysis suggested a remarkable vulnerability to polo-like kinase 1 (PLK1) inhibition inCREBBP-mutant SCLC cells. We validated this association in vitro using four mutated and four wild-type SCLC cell lines and twoPLK1inhibitors (Volasertib and BI2536), confirming that the effect ofPLK1inhibitors depended on the mutational status ofCREBBP. Besides, DrugSniper was validated in-silico with several known clinically-used treatments, including the sensitivity of Tyrosine Kinase Inhibitors (TKIs) and Vemurafenib toFLT3andBRAFmutant cells, respectively. These findings show the potential of genome-wide loss-of-function screens to identify new personalized therapeutic hypotheses in SCLC and potentially in other tumors, which is a valuable starting point for further drug development and drug repositioning projects.
Autores: Carazo Melo, Fernando; Romero Riojas, Juan Pablo; Rubio Díaz-Cordoves, Ángel (Autor de correspondencia)
ISSN 1467-5463  Vol. 20  Nº 4  2019  págs. 1358 - 1375
Alternative splicing (AS) has shown to play a pivotal role in the development of diseases, including cancer. Specifically, all the hallmarks of cancer (angiogenesis, cell immortality, avoiding immune system response, etc.) are found to have a counterpart in aberrant splicing of key genes. Identifying the context-specific regulators of splicing provides valuable information to find new biomarkers, as well as to define alternative therapeutic strategies. The computational models to identify these regulators are not trivial and require three conceptual steps: the detection of AS events, the identification of splicing factors that potentially regulate these events and the contextualization of these pieces of information for a specific experiment. In this work, we review the different algorithmic methodologies developed for each of these tasks. Main weaknesses and strengths of the different steps of the pipeline are discussed. Finally, a case study is detailed to help the reader be aware of the potential and limitations of this computational approach.
Autores: Carazo Melo, Fernando; Gimeno, M.; Ferrer-Bonsoms, J. A.; et al.
ISSN 1471-2164  Vol. 20  Nº Art. 521  2019 
BackgroundSplicing is a genetic process that has important implications in several diseases including cancer. Deciphering the complex rules of splicing regulation is crucial to understand and treat splicing-related diseases. Splicing factors and other RNA-binding proteins (RBPs) play a key role in the regulation of splicing. The specific binding sites of an RBP can be measured using CLIP experiments. However, to unveil which RBPs regulate a condition, it is necessary to have a priori hypotheses, as a single CLIP experiment targets a single protein.ResultsIn this work, we present a novel methodology to predict context-specific splicing factors from transcriptomic data. For this, we systematically collect, integrate and analyze more than 900 CLIP experiments stored in four CLIP databases: POSTAR2, CLIPdb, DoRiNA and StarBase. The analysis of these experiments shows the strong coherence between the binding sites of RBPs of similar families. Augmenting this information with expression changes, we are able to correctly predict the splicing factors that regulate splicing in two gold-standard experiments in which specific splicing factors are knocked-down.ConclusionsThe methodology presented in this study allows the prediction of active splicing factors in either cancer or any other condition by only using the information of transcript expression. This approach opens a wide range of possible studies to understand the splicing regulation of different conditions. A tutorial with the source code and databases is available at
Autores: Carazo Melo, Fernando; Campuzano, L.; Cendoya Garmendia, Xabier; et al.
ISSN 2047-217X  Vol. 8  Nº 4  2019 
BACKGROUND: Aberrant alternative splicing plays a key role in cancer development. In recent years, alternative splicing has been used as a prognosis biomarker, a therapy response biomarker, and even as a therapeutic target. Next-generation RNA sequencing has an unprecedented potential to measure the transcriptome. However, due to the complexity of dealing with isoforms, the scientific community has not sufficiently exploited this valuable resource in precision medicine. FINDINGS: We present TranscriptAchilles, the first large-scale tool to predict transcript biomarkers associated with gene essentiality in cancer. This application integrates 412 loss-of-function RNA interference screens of >17,000 genes, together with their corresponding whole-transcriptome expression profiling. Using this tool, we have studied which are the cancer subtypes for which alternative splicing plays a significant role to state gene essentiality. In addition, we include a case study of renal cell carcinoma that shows the biological soundness of the results. The databases, the source code, and a guide to build the platform within a Docker container are available at GitLab. The application is also available online. CONCLUSIONS: TranscriptAchilles provides a user-friendly web interface to identify transcript or gene biomarkers of gene essentiality, which could be used as a starting point for a drug development project. This approach opens a wide range of translational applications in cancer.
Autores: Carazo Melo, Fernando; San José Enériz, Edurne; Garate Iturriagagoitia, Leire; et al.
Revista: BLOOD
ISSN 0006-4971  Vol. 134  Nº supl.1  2019 
Autores: Romero Riojas, Juan Pablo; Ortiz-Estevez, M.; Muniategui Merino, Ander; et al.
ISSN 1471-2164  Vol. 19  Nº 703  2018 
Background: RNA-seq is a reference technology for determining alternative splicing at genome-wide level. Exon arrays remain widely used for the analysis of gene expression, but show poor validation rate with regard to splicing events. Commercial arrays that include probes within exon junctions have been developed in order to overcome this problem. We compare the performance of RNA-seq (Illumina HiSeq) and junction arrays (Affymetrix Human Transcriptome array) for the analysis of transcript splicing events. Three different breast cancer cell lines were treated with CX-4945, a drug that severely affects splicing. To enable a direct comparison of the two platforms, we adapted EventPointer, an algorithm that detects and labels alternative splicing events using junction arrays, to work also on RNA-seq data. Common results and discrepancies between the technologies were validated and/or resolved by over 200 PCR experiments. Results: As might be expected, RNA-seq appears superior in cases where the technologies disagree and is able to discover novel splicing events beyond the limitations of physical probe-sets. We observe a high degree of coherence between the two technologies, however, with correlation of EventPointer results over 0.90. Through decimation, the detection power of the junction arrays is equivalent to RNA-seq with up to 60 million reads. Conclusions: Our results suggest, therefore, that exon-junction arrays are a viable alternative to RNA-seq for detection of alternative splicing events when focusing on well-described transcriptional regions.
Autores: Aldave, G.; González Huarriz, María Soledad; Rubio Díaz-Cordoves, Ángel; et al.
ISSN 1522-8517  Vol. 20  Nº 7  2018  págs. 930 - 941
Background: Glioblastoma, the most aggressive primary brain tumor, is genetically heterogeneous. Alternative splicing (AS) plays a key role in numerous pathologies, including cancer. The objectives of our study were to determine whether aberrant AS could play a role in the malignant phenotype of glioma and to understand the mechanism underlying its aberrant regulation. Methods: We obtained surgical samples from patients with glioblastoma who underwent 5-aminolevulinic fluorescence-guided surgery. Biopsies were taken from the tumor center as well as from adjacent normal-appearing tissue. We used a global splicing array to identify candidate genes aberrantly spliced in these glioblastoma samples. Mechanistic and functional studies were performed to elucidate the role of our top candidate splice variant, BAF45d, in glioblastoma. Results: BAF45d is part of the switch/sucrose nonfermentable complex and plays a key role in the development of the CNS. The BAF45d/6A isoform is present in 85% of over 200 glioma samples that have been analyzed and contributes to the malignant glioma phenotype through the maintenance of an undifferentiated cellular state. We demonstrate that BAF45d splicing is mediated by polypyrimidine tract-binding protein 1 (PTBP1) and that BAF45d regulates PTBP1, uncovering a reciprocal interplay between RNA splicing regulation and transcription. Conclusions: Our data indicate that AS is a mechanism that contributes to the malignant phenotype of glioblastoma. Understanding the consequences of this biological process will uncover new therapeutic targets for this devastating disease.
Autores: Carazo Melo, Fernando; Romero Riojas, Juan Pablo; Rubio Díaz-Cordoves, Ángel (Autor de correspondencia)
ISSN 1477-4054  2018 
Alternative splicing (AS) has shown to play a pivotal role in the development of diseases, including cancer. Specifically, all the hallmarks of cancer (angiogenesis, cell immortality, avoiding immune system response, etc.) are found to have a counterpart in aberrant splicing of key genes. Identifying the context-specific regulators of splicing provides valuable information to find new biomarkers, as well as to define alternative therapeutic strategies. The computational models to identify these regulators are not trivial and require three conceptual steps: the detection of AS events, the identification of splicing factors that potentially regulate these events and the contextualization of these pieces of information for a specific experiment. In this work, we review the different algorithmic methodologies developed for each of these tasks. Main weaknesses and strengths of the different steps of the pipeline are discussed. Finally, a case study is detailed to help the reader be aware of the potential and limitations of this computational approach.
Autores: Pey Pérez, Jon; San José Enériz, Edurne; Ochoa Nieto, Maria del Carmen; et al.
ISSN 2045-2322  Vol. 7  2017 
Constraint-based modeling for genome-scale metabolic networks has emerged in the last years as a promising approach to elucidate drug targets in cancer. Beyond the canonical biosynthetic routes to produce biomass, it is of key importance to focus on metabolic routes that sustain the proliferative capacity through the regulation of other biological means in order to improve in-silico gene essentiality analyses. Polyamines are polycations with central roles in cancer cell proliferation, through the regulation of transcription and translation among other things, but are typically neglected in in silico cancer metabolic models. In this study, we analysed essential genes for the biosynthesis of polyamines. Our analysis corroborates the importance of previously known regulators of the pathway, such as Adenosylmethionine Decarboxylase 1 (AMD1) and uncovers novel enzymes predicted to be relevant for polyamine homeostasis. We focused on Adenine Phosphoribosyltransferase (APRT) and demonstrated the detrimental consequence of APRT gene silencing on different leukaemia cell lines. Our results highlight the importance of revisiting the metabolic models used for in-silico gene essentiality analyses in order to maximize the potential for drug target identification in cancer.
Autores: Garcia, I. ; Aldaregia, J. ; Vicentic, J. M.; et al.
ISSN 2045-2322  Vol. 7  2017 
Glioblastoma remains the most common and deadliest type of brain tumor and contains a population of self-renewing, highly tumorigenic glioma stem cells (GSCs), which contributes to tumor initiation and treatment resistance. Developmental programs participating in tissue development and homeostasis re-emerge in GSCs, supporting the development and progression of glioblastoma. SOX1 plays an important role in neural development and neural progenitor pool maintenance. Its impact on glioblastoma remains largely unknown. In this study, we have found that high levels of SOX1 observed in a subset of patients correlate with lower overall survival. At the cellular level, SOX1 expression is elevated in patient-derived GSCs and it is also higher in oncosphere culture compared to differentiation conditions in conventional glioblastoma cell lines. Moreover, genetic inhibition of SOX1 in patient-derived GSCs and conventional cell lines decreases self-renewal and proliferative capacity in vitro and tumor initiation and growth in vivo. Contrarily, SOX1 over-expression moderately promotes self-renewal and proliferation in GSCs. These functions seem to be independent of its activity as Wnt/beta-catenin signaling regulator. In summary, these results identify a functional role for SOX1 in regulating glioma cell heterogeneity and plasticity, and suggest SOX1 as a potential target in the GSC population in glioblastoma.
Autores: de Miguel Sánchez de Puerta, Fernando; Pajares Villandiego, María José; Martínez Terroba, Elena; et al.
ISSN 1574-7891  Vol. 10  Nº 9  2016  págs. 1437 - 1449
Increasing interest has been devoted in recent years to the understanding of alternative splicing in cancer. In this study, we performed a genome-wide analysis to identify cancer-associated splice variants in non-small cell lung cancer. We discovered and validated novel differences in the splicing of genes known to be relevant to lung cancer biology, such as NFIB, ENAH or SPAG9. Gene enrichment analyses revealed an important contribution of alternative splicing to cancer-related molecular functions, especially those involved in cytoskeletal dynamics. Interestingly, a substantial fraction of the altered genes found in our analysis were targets of the protein quaking (QKI), pointing to this factor as one of the most relevant regulators of alternative splicing in non-small cell lung cancer. We also found that ESYT2, one of the QKI targets, is involved in cytoskeletal organization. ESYT2-short variant inhibition in lung cancer cells resulted in a cortical distribution of actin whereas inhibition of the long variant caused an increase of endocytosis, suggesting that the cancer-associated splicing pattern of ESYT2 has a profound impact in the biology of cancer cells. Finally, we show that low nuclear QKI expression in non-small cell lung cancer is an independent prognostic factor for disease-free survival (HR = 2.47; 95% CI = 1.11-5.46, P = 0.026). In conclusion, we identified several splicing variants with functional relevance in lung cancer largely regulated by the splicing factor QKI, a tumor suppressor associated with prognosis in lung cancer.
Autores: Romero, J. P.; Muniategui Merino, Ander; de Miguel Sánchez de Puerta, Fernando; et al.
ISSN 1471-2164  Vol. 17  2016  págs. 467
Background: Alternative splicing (AS) is a major source of variability in the transcriptome of eukaryotes. There is an increasing interest in its role in different pathologies. Before sequencing technology appeared, AS was measured with specific arrays. However, these arrays did not perform well in the detection of AS events and provided very large false discovery rates (FDR). Recently the Human Transcriptome Array 2.0 (HTA 2.0) has been deployed. It includes junction probes. However, the interpretation software provided by its vendor (TAC 3.0) does not fully exploit its potential (does not study jointly the exons and junctions involved in a splicing event) and can only be applied to case-control studies. New statistical algorithms and software must be developed in order to exploit the HTA 2.0 array for event detection. Results: We have developed EventPointer, an R package (built under the aroma. affymetrix framework) to search and analyze Alternative Splicing events using HTA 2.0 arrays. This software uses a linear model that broadens its application from plain case-control studies to complex experimental designs. Given the CEL files and the design and contrast matrices, the software retrieves a list of all the detected events indicating: 1) the type of event (exon cassette, alternative 3', etc.), 2) its fold change and its statistical significance, and 3) the potential protein domains affected by the AS events and the statistical significance of the possible enrichment. Our tests have shown that EventPointer has an extremely low FDR value (only 1 false positive within the tested top-200 events). This software is publicly available and it has been uploaded to GitHub. Conclusions: This software empowers the HTA 2.0 arrays for AS event detection as an alternative to RNA-seq: simplifying considerably the required analysis, speeding it up and reducing the required computational power.
Autores: Rezola Urquía, Alberto; Pey Pérez, Jon; Tobalina, L.; et al.
ISSN 1467-5463  Vol. 16  Nº 2  2015  págs. 265 - 279
With the emergence of metabolic networks, novel mathematical pathway concepts were introduced in the past decade, aiming to go beyond canonical maps. However, the use of network-based pathways to interpret 'omics' data has been limited owing to the fact that their computation has, until very recently, been infeasible in large (genome-scale) metabolic networks. In this review article, we describe the progress made in the past few years in the field of network-based metabolic pathway analysis. In particular, we review in detail novel optimization techniques to compute elementary flux modes, an important pathway concept in this field. In addition, we summarize approaches for the integration of metabolic pathways with gene expression data, discussing recent advances using network-based pathway concepts.
Autores: Aramburu, A.; Zudaire Ripa, María Isabel; Pajares Villandiego, María José; et al.
ISSN 1471-2164  Vol. 16  2015  págs. 752
Background: The development of a more refined prognostic methodology for early non-small cell lung cancer (NSCLC) is an unmet clinical need. An accurate prognostic tool might help to select patients at early stages for adjuvant therapies. Results: A new integrated bioinformatics searching strategy, that combines gene copy number alterations and expression, together with clinical parameters was applied to derive two prognostic genomic signatures. The proposed methodology combines data from patients with and without clinical data with a priori information on the ability of a gene to be a prognostic marker. Two initial candidate sets of 513 and 150 genes for lung adenocarcinoma (ADC) and squamous cell carcinoma (SCC), respectively, were generated by identifying genes which have both: a) significant correlation between copy number and gene expression, and b) significant prognostic value at the gene expression level in external databases. From these candidates, two panels of 7 (ADC) and 5 (SCC) genes were further identified via semi-supervised learning. These panels, together with clinical data (stage, age and sex), were used to construct the ADC and SCC hazard scores combining clinical and genomic data. The signatures were validated in two independent datasets (n = 73 for ADC, n = 97 for SCC), confirming that the prognostic value of both clinical-genomic models is robust, statistically significant (P = 0.008 for ADC and P = 0.019 for SCC) and outperforms both the clinical models (P = 0.060 for ADC and P = 0.121 for SCC) and the genomic models applied separately (P = 0.350 for ADC and P = 0.269 for SCC). Conclusion: The present work provides a methodology to generate a robust signature using copy number data that can be potentially used to any cancer. Using it, we found new prognostic scores based on tumor DNA that, jointly with clinical information, are able to predict overall survival (OS) in patients with early-stage ADC and SCC.
Autores: Rubio Díaz-Cordoves, Ángel; de Villar, F.;
ISSN 2073-4859  Vol. 7  Nº 2  2015  págs. 275 - 287
Code analysis tools are crucial to understand program behavior. Profile tools use the results of time measurements in the execution of a program to gain this understanding and thus help in the optimization of the code. In this paper, we review the different available packages to profile R code and show the advantages and disadvantages of each of them. In additon, we present GUIProfiler, a package that fulfills some unmet needs. Package GUIProfiler generates an HTML report with the timing for each code line and the relationships between different functions. This package mimics the behavior of the MATLAB profiler. The HTML report includes information on the time spent on each of the lines of the profiled code (the slowest code is highlighted). If the package is used within the RStudio environment, the user can navigate across the bottlenecks in the code and open the editor to modify the lines of code where more time is spent. It is also possible to edit the code using Notepad++ (a free editor for Windows) by simply clicking on the corresponding line. The graphical user interface makes it easy to identify the specific lines which slow down the code. The integration in RStudio and the generation of an HTML report makes GUIProfiler a very convenient tool to perform code optimization.
Autores: Rezola Urquía, Alberto; Pey Pérez, Jon; Rubio Díaz-Cordoves, Ángel; et al.
Revista: PLOS ONE
ISSN 1932-6203  Vol. 9  Nº 8  2014 
Metabolism expresses the phenotype of living cells and understanding it is crucial for different applications in biotechnology and health. With the increasing availability of metabolomic, proteomic and, to a larger extent, transcriptomic data, the elucidation of specific metabolic properties in different scenarios and cell types is a key topic in systems biology. Despite the potential of the elementary flux mode (EFM) concept for this purpose, its use has been limited so far, mainly because their computation has been infeasible for genome-scale metabolic networks. In a recent work, we determined a subset of EFMs in human metabolism and proposed a new protocol to integrate gene expression data, spotting key 'characteristic EFMs' in different scenarios. Our approach was successfully applied to identify metabolic differences among several human healthy tissues. In this article, we evaluated the performance of our approach in clinically interesting situation. In particular, we identified key EFMs and metabolites in adenocarcinoma and squamous-cell carcinoma subtypes of non-small cell lung cancers. Results are consistent with previous knowledge of these major subtypes of lung cancer in the medical literature. Therefore, this work constitutes the starting point to establish a new methodology that could lead to distinguish key metabolic processes among different clinical outcomes.
Autores: Mesa Helguera, Iker; Rubio Díaz-Cordoves, Ángel; de Nó Lengaran, Joaquín; et al.
ISSN 0957-4174  Vol. 41  Nº 11  2014  págs. 5190 - 5200
The objective of this research is to select a reduced group of surface electromyographic (sEMG) channels and signal-features that is able to provide an accurate classification rate in a myoelectric control system for any user. To that end, the location of 32 sEMG electrodes placed around-along the forearm and 86 signal-features are evaluated simultaneously in a static-hand gesture classification task (14 different gestures). A novel multivariate variable selection filter method named mRMR-FCO is presented as part of the selection process. This process finds the most informative and least redundant combination of sEMG channels and signal-features among all the possible ones. The performance of the selected set of channels and signal-features is evaluated with a Support Vector Machine classifier. (C) 2014 Elsevier Ltd. All rights reserved.
Autores: de Miguel Sánchez de Puerta, Fernando; Sharma, Ravi Datta; Pajares Villandiego, María José; et al.
ISSN 0008-5472  Vol. 74  Nº 4  2014  págs. 1105 - 1115
Abnormal alternative splicing has been associated with cancer. Genome-wide microarrays can be used to detect differential splicing events. In this study, we have developed ExonPointer, an algorithm that uses data from exon and junction probes to identify annotated cassette exons. We used the algorithm to profile differential splicing events in lung adenocarcinoma A549 cells after downregulation of the oncogenic serine/arginine-rich splicing factor 1 (SRSF1). Data were generated using two different microarray platforms. The PCR-based validation rate of the top 20 ranked genes was 60% and 100%. Functional enrichment analyses found a substantial number of splicing events in genes related to RNA metabolism. These analyses also identified genes associated with cancer and developmental and hereditary disorders, as well as biologic processes such as cell division, apoptosis, and proliferation. Most of the top 20 ranked genes were validated in other adenocarcinoma and squamous cell lung cancer cells, with validation rates of 80% to 95% and 70% to 75%, respectively. Moreover, the analysis allowed us to identify four genes, ATP11C, IQCB1, TUBD1, and proline-rich coiled-coil 2C (PRRC2C), with a significantly different pattern of alternative splicing in primary non-small cell lung tumors compared with normal lung tissue. In the case of PRRC2C, SRSF1 downregulation led to the skipping of an exon overexpressed in primary lung tumors. Specific siRNA downregulation of the exon-containing var
Autores: Tabas-Madrid, D.; Muniategui Merino, Ander; Sanchez-Caballero, I.; et al.
ISSN 1471-2164  Vol. 15  Nº Suppl10:S2  2014 
Background: MicroRNAs are short RNA molecules that post-transcriptionally regulate gene expression. Today, microRNA target prediction remains challenging since very few have been experimentally validated and sequence-based predictions have large numbers of false positives. Furthermore, due to the different measuring rules used in each database of predicted interactions, the selection of the most reliable ones requires extensive knowledge about each algorithm. Results: Here we propose two methods to measure the confidence of predicted interactions based on experimentally validated information. The output of the methods is a combined database where new scores and statistical confidences are re-assigned to each predicted interaction. The new scores allow the robust combination of several databases without the effect of low-performing algorithms dragging down good-performing ones. The combined databases obtained using both algorithms described in this paper outperform each of the existing predictive algorithms that were considered for the combination. Conclusions: Our approaches are a useful way to integrate predicted interactions from different databases. They reduce the selection of interactions to a unique database based on an intuitive score and allow comparing databases between them.
Autores: Muniategui Merino, Ander; Pey Pérez, Jon; Planes Pedreño, Francisco Javier; et al.
ISSN 1467-5463  Vol. 14  Nº 3  2013  págs. 263 - 278
miRNAs are small RNA molecules ('22 nt) that interact with their target mRNAs inhibiting translation or/and cleavaging the target mRNA. This interaction is guided by sequence complentarity and results in the reduction of mRNA and/or protein levels. miRNAs are involved in key biological processes and different diseases. Therefore, deciphering miRNA targets is crucial for diagnostics and therapeutics. However, miRNA regulatory mechanisms are complex and there is still no high-throughput and low-cost miRNA target screening technique. In recent years, several computational methods based on sequence complementarity of the miRNA and the mRNAs have been developed. However, the predicted interactions using these computational methods are inconsistent and the expected false positive rates are still large. Recently, it has been proposed to use the expression values of miRNAs and mRNAs (and/or proteins) to refine the results of sequence-based putative targets for a particular experiment. These methods have shown to be effective identifying the most prominent interactions from the databases of putative targets. Here, we review these methods that combine both expression and sequence-based putative targets to predict miRNA targets.
Autores: Rezola Urquía, Alberto; Pey Pérez, Jon; de Figueiredo, L.F.; et al.
ISSN 1367-4803  Vol. 29  Nº 16  2013  págs. 2009 - 2016
Motivation: The analysis of high-throughput molecular data in the context of metabolic pathways is essential to uncover their underlying functional structure. Among different metabolic pathway concepts in systems biology, elementary flux modes (EFMs) hold a predominant place, as they naturally capture the complexity and plasticity of cellular metabolism and go beyond predefined metabolic maps. However, their use to interpret high-throughput data has been limited so far, mainly because their computation in genome-scale metabolic networks has been unfeasible. To face this issue, different optimization-based techniques have been recently introduced and their application to human metabolism is promising. Results: In this article, we exploit and generalize the K-shortest EFM algorithm to determine a subset of EFMs in a human genome-scale metabolic network. This subset of EFMs involves a wide number of reported human metabolic pathways, as well as potential novel routes, and constitutes a valuable database where high-throughput data can be mapped and contextualized from a metabolic perspective. To illustrate this, we took expression data of 10 healthy human tissues from a previous study and predicted their characteristic EFMs based on enrichment analysis. We used a multivariate hypergeometric test and showed that it leads to more biologically meaningful results than standard hypergeometric. Finally, a biological discussion on the characteristic EFMs obtained in liver is conducted, finding a high level of agreement when compared with the literature.
Autores: Pey Pérez, Jon; Valgepea, K.; Rubio Díaz-Cordoves, Ángel; et al.
ISSN 1752-0509  Vol. 7  Nº 134  2013 
Background: The study of cellular metabolism in the context of high-throughput -omics data has allowed us to decipher novel mechanisms of importance in biotechnology and health. To continue with this progress, it is essential to efficiently integrate experimental data into metabolic modeling. Results: We present here an in-silico framework to infer relevant metabolic pathways for a particular phenotype under study based on its gene/protein expression data. This framework is based on the Carbon Flux Path (CFP) approach, a mixed-integer linear program that expands classical path finding techniques by considering additional biophysical constraints. In particular, the objective function of the CFP approach is amended to account for gene/protein expression data and influence obtained paths. This approach is termed integrative Carbon Flux Path (iCFP). We show that gene/protein expression data also influences the stoichiometric balancing of CFPs, which provides a more accurate picture of active metabolic pathways. This is illustrated in both a theoretical and real scenario. Finally, we apply this approach to find novel pathways relevant in the regulation of acetate overflow metabolism in Escherichia coli. As a result, several targets which could be relevant for better understanding of the phenomenon leading to impaired acetate overflow are proposed. Conclusions: A novel mathematical framework that determines functional pathways based on gene/protein expression data is presented and validated. We show that our approach is able to provide new insights into complex biological scenarios such as acetate overflow in Escherichia coli.
Autores: Pey Pérez, Jon; Rubio Díaz-Cordoves, Ángel; Theodoropoulos, C.; et al.
ISSN 1096-7176  Vol. 14  Nº 4  2012  págs. 344 - 353
Constraints-based modeling is an emergent area in Systems Biology that includes an increasing set of methods for the analysis of metabolic networks. In order to refine its predictions, the development of novel methods integrating high-throughput experimental data is currently a key challenge in the field. In this paper, we present a novel set of constraints that integrate tracer-based metabolomics data from Isotope Labeling Experiments and metabolic fluxes in a linear fashion. These constraints are based on Elementary Carbon Modes (ECMs), a recently developed concept that generalizes Elementary Flux Modes at the carbon level. To illustrate the effect of our ECMs-based constraints, a Flux Variability Analysis approach was applied to a previously published metabolic network involving the main pathways in the metabolism of glucose. The addition of our ECMs-based constraints substantially reduced the under-determination resulting from a standard application of Flux Variability Analysis, which shows a clear progress over the state of the art. In addition, our approach is adjusted to deal with combinatorial explosion of ECMs in genome-scale metabolic networks. This extension was applied to infer the maximum biosynthetic capacity of non-essential amino acids in human metabolism. Finally, as linearity is the hallmark of our approach, its importance is discussed at a methodological, computational and theoretical level and illustrated with a practical application in the field of Isotope Labeling Experiments. (C) 2012 Elsevier Inc. All rights reserved.
Autores: Ortiz Estévez, María; Aramburu, A.; Bengtsson, H.; et al.
ISSN 1367-4803  Vol. 28  Nº 13  2012  págs. 1793 - 1794
CalMaTe calibrates preprocessed allele-specific copy number estimates (ASCNs) from DNA microarrays by controlling for single-nucleotide polymorphism-specific allelic crosstalk. The resulting ASCNs are on average more accurate, which increases the power of segmentation methods for detecting changes between copy number states in tumor studies including copy neutral loss of heterozygosity. CalMaTe applies to any ASCNs regardless of preprocessing method and microarray technology, e. g. Affymetrix and Illumina.
Autores: Ortiz Estévez, María; Aramburu, A.; Rubio Díaz-Cordoves, Ángel (Autor de correspondencia)
ISSN 1748-7188  Vol. 7  Nº 19  2012 
Background: The selection of the reference to scale the data in a copy number analysis has paramount importance to achieve accurate estimates. Usually this reference is generated using control samples included in the study. However, these control samples are not always available and in these cases, an artificial reference must be created. A proper generation of this signal is crucial in terms of both noise and bias. We propose NSA (Normality Search Algorithm), a scaling method that works with and without control samples. It is based on the assumption that genomic regions enriched in SNPs with identical copy numbers in both alleles are likely to be normal. These normal regions are predicted for each sample individually and used to calculate the final reference signal. NSA can be applied to any CN data regardless the microarray technology and preprocessing method. It also finds an optimal weighting of the samples minimizing possible batch effects. Results: Five human datasets (a subset of HapMap samples, Glioblastoma Multiforme (GBM), Ovarian, Prostate and Lung Cancer experiments) have been analyzed. It is shown that using only tumoral samples, NSA is able to remove the bias in the copy number estimation, to reduce the noise and therefore, to increase the ability to detect copy number aberrations (CNAs). These improvements allow NSA to also detect recurrent aberrations more accurately than other state of the art methods. Conclusions: NSA provides a robust and accurate reference for scaling probe signals data to CN values without the need of control samples. It minimizes the problems of bias, noise and batch effects in the estimation of CNs. Therefore, NSA scaling approach helps to better detect recurrent CNAs than current methods. The automatic selection of references makes it useful to perform bulk analysis of many GEO or ArrayExpress experiments without the need of developing a parser to find the normal samples or possible batches within the data. The method is available in the open-source R package NSA, which is an add-on to the framework.
Autores: Muniategui Merino, Ander; Nogales-Cadenas, R.; Vazquez, M.; et al.
Revista: PLOS ONE
ISSN 1932-6203  Vol. 7  Nº 2  2012  págs. e30766
miRNAs are small RNA molecules (' 22nt) that interact with their corresponding target mRNAs inhibiting the translation of the mRNA into proteins and cleaving the target mRNA. This second effect diminishes the overall expression of the target mRNA. Several miRNA-mRNA relationship databases have been deployed, most of them based on sequence complementarities. However, the number of false positives in these databases is large and they do not overlap completely. Recently, it has been proposed to combine expression measurement from both miRNA and mRNA and sequence based predictions to achieve more accurate relationships. In our work, we use LASSO regression with non-positive constraints to integrate both sources of information. LASSO enforces the sparseness of the solution and the non-positive constraints restrict the search of miRNA targets to those with down-regulation effects on the mRNA expression. We named this method TaLasso (miRNA-Target LASSO). We used TaLasso on two public datasets that have paired expression levels of human miRNAs and mRNAs. The top ranked interactions recovered by TaLasso are especially enriched (more than using any other algorithm) in experimentally validated targets. The functions of the genes with mRNA transcripts in the top-ranked interactions are meaningful. This is not the case using other algorithms. TaLasso is available as Matlab or R code. There is also a web-based tool for human miRNAs at
Autores: Alonso Roldán, Marta María (Autor de correspondencia); Diez Valle, Ricardo; Manterola Careaga, Lorea; et al.
Revista: PLoS One
ISSN 1932-6203  Vol. 6  Nº 11  2011  págs.  -
We undertook this study to understand how the transcription factor Sox2 contributes to the malignant phenotype of glioblastoma multiforme (GBM), the most aggressive primary brain tumor. We initially looked for unbalanced genomic rearrangements in the Sox2 locus in 42 GBM samples and found that Sox2 was amplified in 11.5% and overexpressed in all the samples. These results prompted us to further investigate the mechanisms involved in Sox2 overexpression in GBM. We analyzed the methylation status of the Sox2 promoter because high CpG density promoters are associated with key developmental genes. The Sox2 promoter presented a CpG island that was hypomethylated in all the patient samples when compared to normal cell lines. Treatment of Sox2-negative glioma cell lines with 5-azacitidine resulted in the re-expression of Sox2 and in a change in the methylation status of the Sox2 promoter. We further confirmed these results by analyzing data from GBM cases generated by The Cancer Genome Atlas project. We observed Sox2 overexpression (86%; N¿=¿414), Sox2 gene amplification (8.5%; N¿=¿492), and Sox 2 promoter hypomethylation (100%; N¿=¿258), suggesting the relevance of this factor in the malignant phenotype of GBMs. To further explore the role of Sox2, we performed in vitro analysis with brain tumor stem cells (BTSCs) and established glioma cell lines. Downmodulation of Sox2 in BTSCs resulted in the loss of their self-renewal properties. Surprisingly, ectopic expression of Sox2 in esta
Autores: Ortiz Estévez, María; De las Rivas, J.; Fontanillo, C.; et al.
ISSN 0888-7543  Vol. 97  Nº 2  2011  págs. 86 - 93
DNA copy number aberrations (CNAs) are genetic alterations common in cancer cells. Their transcriptional consequences are still poorly understood. Based on the fact that DNA copy number (CN) is highly correlated with the genomic position, we have applied a segmentation algorithm to gene expression (GE) to explore its relation with CN. We have found a strong correlation between segmented CN (sCN) and segmented GE (sGE), corroborating that CNAs have clear effects on genome-wide expression. We have found out that most of the recurrent regions of sGE are common to those obtained from sCN analysis. Results for two cancer datasets confirm the known targets of aberrations and provide new candidates to study. The suggested methodology allows to find recurrent aberrations specific to sGE, revealing loci where the expression of the genes is independent from their CNs. R code and additional files are available as supplementary material. (C) 2010 Elsevier Inc. All rights reserved.
Autores: Pey Pérez, Jon; Theodoropoulos, C.; Rezola Urquía, Alberto; et al.
ISSN 0303-2647  Vol. 105  Nº 2  2011  págs. 140 - 146
The elementary flux modes (EFMs) approach is an efficient computational tool to predict novel metabolic pathways. Elucidating the physiological relevance of EFMs in a particular cellular state is still an open challenge. Different methods have been presented to carry out this task. However, these methods typically use little experimental data, exploiting methodologies where an a priori optimization function is used to deal with the indetermination underlying metabolic networks. Available "omics" data represent an opportunity to refine current methods. In this article we discuss whether (or not) metabolomics data from isotope labeling experiments (ILEs) and EFMs can be integrated into a linear system of equations. Aside from refining current approaches to infer the physiological relevance of EFMs, this question is important for the integration of metabolomics data from ILEs into metabolic networks, which generally involve non-linear relationships. As a result of our analysis, we concluded that in general the concept of EFMs needs to be redefined at the atomic level for the modeling of ILEs. For this purpose, the concept of Elementary Carbon Modes (ECMs) is introduced. (C) 2011 Elsevier Ireland Ltd. All rights reserved.
Autores: Pio Osés, Rubén (Autor de correspondencia); Blanco Barrenechea, David; Pajares Villandiego, María José; et al.
ISSN 1471-2164  Vol. 11  2010  págs. 352
Background: Microarrays strategies, which allow for the characterization of thousands of alternative splice forms in a single test, can be applied to identify differential alternative splicing events. In this study, a novel splice array approach was developed, including the design of a high-density oligonucleotide array, a labeling procedure, and an algorithm to identify splice events. Results: The array consisted of exon probes and thermodynamically balanced junction probes. Suboptimal probes were tagged and considered in the final analysis. An unbiased labeling protocol was developed using random primers. The algorithm used to distinguish changes in expression from changes in splicing was calibrated using internal non-spliced control sequences. The performance of this splice array was validated with artificial constructs for CDC6, VEGF, and PCBP4 isoforms. The platform was then applied to the analysis of differential splice forms in lung cancer samples compared to matched normal lung tissue. Overexpression of splice isoforms was identified for genes encoding CEACAM1, FHL-1, MLPH, and SUSD2. None of these splicing isoforms had been previously associated with lung cancer. Conclusions: This methodology enables the detection of alternative splicing events in complex biological samples, providing a powerful tool to identify novel diagnostic and prognostic biomarkers for cancer and other pathologies.
Autores: Guruceaga Martínez, Elisabet; Segura Ruiz, Victor; Corrales Izquierdo, Fernando José; et al.
ISSN 1064-3745  Vol. 593  2010  págs. 157 - 174
High-throughput gene expression technologies based on DNA microarrays allow the examination of biological systems. However, the interpretation of the complex molecular descriptions generated by these approaches is still challenging. The development of new methodologies to identify common regulatory mechanisms involved in the control of the expression of a set of co-expressed genes might enhance our capacity to extract functional information from genomic data sets. In this chapter, we describe a method that integrates different sources of information: gene expression data, genome sequence information, described transcription factor binding sites (TFBSs), functional information, and bibliographic data. The starting point of the analysis is the extraction of promoter sequences from a whole genome and the detection of TFBSs in each gene promoter. This information allows the identification of enriched TFBSs in the proximal promoter of differentially expressed genes. The functional and bibliographic interpretation of the results improves our biological insight into the regulatory mechanisms involved in a microarray experiment.
Autores: Ortiz Estévez, María; Bengtsson, H.; Rubio Díaz-Cordoves, Ángel (Autor de correspondencia)
ISSN 1367-4803  Vol. 26  Nº 15  2010  págs. 1827 - 1833
Motivation: Current algorithms for estimating DNA copy numbers (CNs) borrow concepts from gene expression analysis methods. However, single nucleotide polymorphism (SNP) arrays have special characteristics that, if taken into account, can improve the overall performance. For example, cross hybridization between alleles occurs in SNP probe pairs. In addition, most of the current CN methods are focused on total CNs, while it has been shown that allele-specific CNs are of paramount importance for some studies. Therefore, we have developed a summarization method that estimates high-quality allele-specific CNs. Results: The proposed method estimates the allele-specific DNA CNs for all Affymetrix SNP arrays dealing directly with the cross hybridization between probes within SNP probesets. This algorithm outperforms (or at least it performs as well as) other state-of-the-art algorithms for computing DNA CNs. It better discerns an aberration from a normal state and it also gives more precise allele-specific CNs.
Autores: Antón González, Miguel Ángel; Aramburu Ibarlucea, Amaya; Rubio Díaz-Cordoves, Ángel (Autor de correspondencia)
ISSN 1471-2105  Vol. 11  2010 
Background: Exon arrays provide a way to measure the expression of different isoforms of genes in an organism. Most of the procedures to deal with these arrays are focused on gene expression or on exon expression. Although the only biological analytes that can be properly assigned a concentration are transcripts, there are very few algorithms that focus on them. The reason is that previously developed summarization methods do not work well if applied to transcripts. In addition, gene structure prediction, i.e., the correspondence between probes and novel isoforms, is a field which is still unexplored. Results: We have modified and adapted a previous algorithm to take advantage of the special characteristics of the Affymetrix exon arrays. The structure and concentration of transcripts -some of them possibly unknown-in microarray experiments were predicted using this algorithm. Simulations showed that the suggested modifications improved both specificity (SP) and sensitivity (ST) of the predictions. The algorithm was also applied to different real datasets showing its effectiveness and the concordance with PCR validated results. Conclusions: The proposed algorithm shows a substantial improvement in the performance over the previous version. This improvement is mainly due to the exploitation of the redundancy of the Affymetrix exon arrays. An R-Package of SPACE with the updated algorithms have been developed and is freely available.
Autores: Martínez Climent, José Ángel; Fontan, L.; Fresquet Arnau, Vicente José; et al.
Libro:  Methods in Molecular Biology (MIMB) Cancer Gene Profiling
Vol. 576  2010  págs. 231 - 277
During the last decade, gene expression microarrays and array-based comparative genomic hybridization (array-CGH) have unraveled the complexity of human tumor genomes more precisely and comprehensively than ever before. More recently, the simultaneous assessment of global changes in messenger RNA (mRNA) expression and in DNA copy number through "integrative oncogenomic" analyses has allowed researchers the access to results uncovered through the analysis of one-dimensional data sets, thus accelerating cancer gene discovery. In this chapter, we discuss the major contributions of DNA microarrays to the study of hematological malignancies, focusing on the integrative oncogenomic approaches that correlate genomic and transcriptomic data. We also present the basic aspects of these methodologies and their present and future application in clinical oncology.
Autores: Gil Nobajas, Jorge Juan; Rubio Díaz-Cordoves, Ángel