Nuestros investigadores

Ángel Rubio Díaz-Cordoves

Publicaciones científicas más recientes (desde 2010)

Autores: Garcia, I. ; Aldaregia, J. ; Vicentic, J. M.; et al.
ISSN 2045-2322  Vol. 7  Nº 46575  2017 
Glioblastoma remains the most common and deadliest type of brain tumor and contains a population of self-renewing, highly tumorigenic glioma stem cells (GSCs), which contributes to tumor initiation and treatment resistance. Developmental programs participating in tissue development and homeostasis re-emerge in GSCs, supporting the development and progression of glioblastoma. SOX1 plays an important role in neural development and neural progenitor pool maintenance. Its impact on glioblastoma remains largely unknown. In this study, we have found that high levels of SOX1 observed in a subset of patients correlate with lower overall survival. At the cellular level, SOX1 expression is elevated in patient-derived GSCs and it is also higher in oncosphere culture compared to differentiation conditions in conventional glioblastoma cell lines. Moreover, genetic inhibition of SOX1 in patient-derived GSCs and conventional cell lines decreases self-renewal and proliferative capacity in vitro and tumor initiation and growth in vivo. Contrarily, SOX1 over-expression moderately promotes self-renewal and proliferation in GSCs. These functions seem to be independent of its activity as Wnt/beta-catenin signaling regulator. In summary, these results identify a functional role for SOX1 in regulating glioma cell heterogeneity and plasticity, and suggest SOX1 as a potential target in the GSC population in glioblastoma.
Autores: Ochoa, María del Carmen; et al.
ISSN 2045-2322  Vol. 7  2017  págs. 14358
Constraint-based modeling for genome-scale metabolic networks has emerged in the last years as a promising approach to elucidate drug targets in cancer. Beyond the canonical biosynthetic routes to produce biomass, it is of key importance to focus on metabolic routes that sustain the proliferative capacity through the regulation of other biological means in order to improve in-silico gene essentiality analyses. Polyamines are polycations with central roles in cancer cell proliferation, through the regulation of transcription and translation among other things, but are typically neglected in in silico cancer metabolic models. In this study, we analysed essential genes for the biosynthesis of polyamines. Our analysis corroborates the importance of previously known regulators of the pathway, such as Adenosylmethionine Decarboxylase 1 (AMD1) and uncovers novel enzymes predicted to be relevant for polyamine homeostasis. We focused on Adenine Phosphoribosyltransferase (APRT) and demonstrated the detrimental consequence of APRT gene silencing on different leukaemia cell lines. Our results highlight the importance of revisiting the metabolic models used for in-silico gene essentiality analyses in order to maximize the potential for drug target identification in cancer.
Autores: Pajares, María Josefa; et al.
ISSN 1574-7891  Vol. 10  Nº 9  2016  págs. 1437 - 1449
Increasing interest has been devoted in recent years to the understanding of alternative splicing in cancer. In this study, we performed a genome-wide analysis to identify cancer-associated splice variants in non-small cell lung cancer. We discovered and validated novel differences in the splicing of genes known to be relevant to lung cancer biology, such as NFIB, ENAH or SPAG9. Gene enrichment analyses revealed an important contribution of alternative splicing to cancer-related molecular functions, especially those involved in cytoskeletal dynamics. Interestingly, a substantial fraction of the altered genes found in our analysis were targets of the protein quaking (QKI), pointing to this factor as one of the most relevant regulators of alternative splicing in non-small cell lung cancer. We also found that ESYT2, one of the QKI targets, is involved in cytoskeletal organization. ESYT2-short variant inhibition in lung cancer cells resulted in a cortical distribution of actin whereas inhibition of the long variant caused an increase of endocytosis, suggesting that the cancer-associated splicing pattern of ESYT2 has a profound impact in the biology of cancer cells. Finally, we show that low nuclear QKI expression in non-small cell lung cancer is an independent prognostic factor for disease-free survival (HR = 2.47; 95% CI = 1.11-5.46, P = 0.026). In conclusion, we identified several splicing variants with functional relevance in lung cancer largely regulated by the splicing factor QKI, a tumor suppressor associated with prognosis in lung cancer.
Autores: Romero, J. P.; et al.
ISSN 1471-2164  Vol. 17  2016  págs. 467
Background: Alternative splicing (AS) is a major source of variability in the transcriptome of eukaryotes. There is an increasing interest in its role in different pathologies. Before sequencing technology appeared, AS was measured with specific arrays. However, these arrays did not perform well in the detection of AS events and provided very large false discovery rates (FDR). Recently the Human Transcriptome Array 2.0 (HTA 2.0) has been deployed. It includes junction probes. However, the interpretation software provided by its vendor (TAC 3.0) does not fully exploit its potential (does not study jointly the exons and junctions involved in a splicing event) and can only be applied to case-control studies. New statistical algorithms and software must be developed in order to exploit the HTA 2.0 array for event detection. Results: We have developed EventPointer, an R package (built under the aroma. affymetrix framework) to search and analyze Alternative Splicing events using HTA 2.0 arrays. This software uses a linear model that broadens its application from plain case-control studies to complex experimental designs. Given the CEL files and the design and contrast matrices, the software retrieves a list of all the detected events indicating: 1) the type of event (exon cassette, alternative 3', etc.), 2) its fold change and its statistical significance, and 3) the potential protein domains affected by the AS events and the statistical significance of the possible enrichment. Our tests have shown that EventPointer has an extremely low FDR value (only 1 false positive within the tested top-200 events). This software is publicly available and it has been uploaded to GitHub. Conclusions: This software empowers the HTA 2.0 arrays for AS event detection as an alternative to RNA-seq: simplifying considerably the required analysis, speeding it up and reducing the required computational power.
Autores: Rubio, Ángel; de Villar, F.;
ISSN 2073-4859  Vol. 7  Nº 2  2015  págs. 275 - 287
Code analysis tools are crucial to understand program behavior. Profile tools use the results of time measurements in the execution of a program to gain this understanding and thus help in the optimization of the code. In this paper, we review the different available packages to profile R code and show the advantages and disadvantages of each of them. In additon, we present GUIProfiler, a package that fulfills some unmet needs. Package GUIProfiler generates an HTML report with the timing for each code line and the relationships between different functions. This package mimics the behavior of the MATLAB profiler. The HTML report includes information on the time spent on each of the lines of the profiled code (the slowest code is highlighted). If the package is used within the RStudio environment, the user can navigate across the bottlenecks in the code and open the editor to modify the lines of code where more time is spent. It is also possible to edit the code using Notepad++ (a free editor for Windows) by simply clicking on the corresponding line. The graphical user interface makes it easy to identify the specific lines which slow down the code. The integration in RStudio and the generation of an HTML report makes GUIProfiler a very convenient tool to perform code optimization.
Autores: Rezola, Alberto; Tobalina, L.; et al.
ISSN 1467-5463  Vol. 16  Nº 2  2015  págs. 265 - 279
With the emergence of metabolic networks, novel mathematical pathway concepts were introduced in the past decade, aiming to go beyond canonical maps. However, the use of network-based pathways to interpret 'omics' data has been limited owing to the fact that their computation has, until very recently, been infeasible in large (genome-scale) metabolic networks. In this review article, we describe the progress made in the past few years in the field of network-based metabolic pathway analysis. In particular, we review in detail novel optimization techniques to compute elementary flux modes, an important pathway concept in this field. In addition, we summarize approaches for the integration of metabolic pathways with gene expression data, discussing recent advances using network-based pathway concepts.
Autores: Aramburu, A.; Zudaire, María Isabel; Pajares, María Josefa; et al.
ISSN 1471-2164  Vol. 16  2015  págs. 752
Background: The development of a more refined prognostic methodology for early non-small cell lung cancer (NSCLC) is an unmet clinical need. An accurate prognostic tool might help to select patients at early stages for adjuvant therapies. Results: A new integrated bioinformatics searching strategy, that combines gene copy number alterations and expression, together with clinical parameters was applied to derive two prognostic genomic signatures. The proposed methodology combines data from patients with and without clinical data with a priori information on the ability of a gene to be a prognostic marker. Two initial candidate sets of 513 and 150 genes for lung adenocarcinoma (ADC) and squamous cell carcinoma (SCC), respectively, were generated by identifying genes which have both: a) significant correlation between copy number and gene expression, and b) significant prognostic value at the gene expression level in external databases. From these candidates, two panels of 7 (ADC) and 5 (SCC) genes were further identified via semi-supervised learning. These panels, together with clinical data (stage, age and sex), were used to construct the ADC and SCC hazard scores combining clinical and genomic data. The signatures were validated in two independent datasets (n = 73 for ADC, n = 97 for SCC), confirming that the prognostic value of both clinical-genomic models is robust, statistically significant (P = 0.008 for ADC and P = 0.019 for SCC) and outperforms both the clinical models (P = 0.060 for ADC and P = 0.121 for SCC) and the genomic models applied separately (P = 0.350 for ADC and P = 0.269 for SCC). Conclusion: The present work provides a methodology to generate a robust signature using copy number data that can be potentially used to any cancer. Using it, we found new prognostic scores based on tumor DNA that, jointly with clinical information, are able to predict overall survival (OS) in patients with early-stage ADC and SCC.
Autores: Rezola, Alberto; Rubio, Ángel; et al.
Revista: PLOS ONE
ISSN 1932-6203  Vol. 9  Nº 8  2014 
Metabolism expresses the phenotype of living cells and understanding it is crucial for different applications in biotechnology and health. With the increasing availability of metabolomic, proteomic and, to a larger extent, transcriptomic data, the elucidation of specific metabolic properties in different scenarios and cell types is a key topic in systems biology. Despite the potential of the elementary flux mode (EFM) concept for this purpose, its use has been limited so far, mainly because their computation has been infeasible for genome-scale metabolic networks. In a recent work, we determined a subset of EFMs in human metabolism and proposed a new protocol to integrate gene expression data, spotting key 'characteristic EFMs' in different scenarios. Our approach was successfully applied to identify metabolic differences among several human healthy tissues. In this article, we evaluated the performance of our approach in clinically interesting situation. In particular, we identified key EFMs and metabolites in adenocarcinoma and squamous-cell carcinoma subtypes of non-small cell lung cancers. Results are consistent with previous knowledge of these major subtypes of lung cancer in the medical literature. Therefore, this work constitutes the starting point to establish a new methodology that could lead to distinguish key metabolic processes among different clinical outcomes.
Autores: Sharma, Ravi Datta; Pajares, María Josefa; et al.
ISSN 0008-5472  Vol. 74  Nº 4  2014  págs. 1105 - 1115
Abnormal alternative splicing has been associated with cancer. Genome-wide microarrays can be used to detect differential splicing events. In this study, we have developed ExonPointer, an algorithm that uses data from exon and junction probes to identify annotated cassette exons. We used the algorithm to profile differential splicing events in lung adenocarcinoma A549 cells after downregulation of the oncogenic serine/arginine-rich splicing factor 1 (SRSF1). Data were generated using two different microarray platforms. The PCR-based validation rate of the top 20 ranked genes was 60% and 100%. Functional enrichment analyses found a substantial number of splicing events in genes related to RNA metabolism. These analyses also identified genes associated with cancer and developmental and hereditary disorders, as well as biologic processes such as cell division, apoptosis, and proliferation. Most of the top 20 ranked genes were validated in other adenocarcinoma and squamous cell lung cancer cells, with validation rates of 80% to 95% and 70% to 75%, respectively. Moreover, the analysis allowed us to identify four genes, ATP11C, IQCB1, TUBD1, and proline-rich coiled-coil 2C (PRRC2C), with a significantly different pattern of alternative splicing in primary non-small cell lung tumors compared with normal lung tissue. In the case of PRRC2C, SRSF1 downregulation led to the skipping of an exon overexpressed in primary lung tumors. Specific siRNA downregulation of the exon-containing var
Autores: Tabas-Madrid, D.; Sanchez-Caballero, I.; et al.
ISSN 1471-2164  Vol. 15  Nº Suppl10:S2  2014 
Background: MicroRNAs are short RNA molecules that post-transcriptionally regulate gene expression. Today, microRNA target prediction remains challenging since very few have been experimentally validated and sequence-based predictions have large numbers of false positives. Furthermore, due to the different measuring rules used in each database of predicted interactions, the selection of the most reliable ones requires extensive knowledge about each algorithm. Results: Here we propose two methods to measure the confidence of predicted interactions based on experimentally validated information. The output of the methods is a combined database where new scores and statistical confidences are re-assigned to each predicted interaction. The new scores allow the robust combination of several databases without the effect of low-performing algorithms dragging down good-performing ones. The combined databases obtained using both algorithms described in this paper outperform each of the existing predictive algorithms that were considered for the combination. Conclusions: Our approaches are a useful way to integrate predicted interactions from different databases. They reduce the selection of interactions to a unique database based on an intuitive score and allow comparing databases between them.
Autores: Rubio, Ángel; de Nó, Joaquín Juan; et al.
ISSN 0957-4174  Vol. 41  Nº 11  2014  págs. 5190 - 5200
The objective of this research is to select a reduced group of surface electromyographic (sEMG) channels and signal-features that is able to provide an accurate classification rate in a myoelectric control system for any user. To that end, the location of 32 sEMG electrodes placed around-along the forearm and 86 signal-features are evaluated simultaneously in a static-hand gesture classification task (14 different gestures). A novel multivariate variable selection filter method named mRMR-FCO is presented as part of the selection process. This process finds the most informative and least redundant combination of sEMG channels and signal-features among all the possible ones. The performance of the selected set of channels and signal-features is evaluated with a Support Vector Machine classifier. (C) 2014 Elsevier Ltd. All rights reserved.
Autores: Planes, Francisco Javier; et al.
ISSN 1467-5463  Vol. 14  Nº 3  2013  págs. 263 - 278
miRNAs are small RNA molecules ('22 nt) that interact with their target mRNAs inhibiting translation or/and cleavaging the target mRNA. This interaction is guided by sequence complentarity and results in the reduction of mRNA and/or protein levels. miRNAs are involved in key biological processes and different diseases. Therefore, deciphering miRNA targets is crucial for diagnostics and therapeutics. However, miRNA regulatory mechanisms are complex and there is still no high-throughput and low-cost miRNA target screening technique. In recent years, several computational methods based on sequence complementarity of the miRNA and the mRNAs have been developed. However, the predicted interactions using these computational methods are inconsistent and the expected false positive rates are still large. Recently, it has been proposed to use the expression values of miRNAs and mRNAs (and/or proteins) to refine the results of sequence-based putative targets for a particular experiment. These methods have shown to be effective identifying the most prominent interactions from the databases of putative targets. Here, we review these methods that combine both expression and sequence-based putative targets to predict miRNA targets.
Autores: Rezola, Alberto; de Figueiredo, L.F.; et al.
ISSN 1367-4803  Vol. 29  Nº 16  2013  págs. 2009 - 2016
Motivation: The analysis of high-throughput molecular data in the context of metabolic pathways is essential to uncover their underlying functional structure. Among different metabolic pathway concepts in systems biology, elementary flux modes (EFMs) hold a predominant place, as they naturally capture the complexity and plasticity of cellular metabolism and go beyond predefined metabolic maps. However, their use to interpret high-throughput data has been limited so far, mainly because their computation in genome-scale metabolic networks has been unfeasible. To face this issue, different optimization-based techniques have been recently introduced and their application to human metabolism is promising. Results: In this article, we exploit and generalize the K-shortest EFM algorithm to determine a subset of EFMs in a human genome-scale metabolic network. This subset of EFMs involves a wide number of reported human metabolic pathways, as well as potential novel routes, and constitutes a valuable database where high-throughput data can be mapped and contextualized from a metabolic perspective. To illustrate this, we took expression data of 10 healthy human tissues from a previous study and predicted their characteristic EFMs based on enrichment analysis. We used a multivariate hypergeometric test and showed that it leads to more biologically meaningful results than standard hypergeometric. Finally, a biological discussion on the characteristic EFMs obtained in liver is conducted, finding a high level of agreement when compared with the literature.
Autores: Valgepea, K.; Rubio, Ángel; et al.
ISSN 1752-0509  Vol. 7  2013 
Background: The study of cellular metabolism in the context of high-throughput -omics data has allowed us to decipher novel mechanisms of importance in biotechnology and health. To continue with this progress, it is essential to efficiently integrate experimental data into metabolic modeling. Results: We present here an in-silico framework to infer relevant metabolic pathways for a particular phenotype under study based on its gene/protein expression data. This framework is based on the Carbon Flux Path (CFP) approach, a mixed-integer linear program that expands classical path finding techniques by considering additional biophysical constraints. In particular, the objective function of the CFP approach is amended to account for gene/protein expression data and influence obtained paths. This approach is termed integrative Carbon Flux Path (iCFP). We show that gene/protein expression data also influences the stoichiometric balancing of CFPs, which provides a more accurate picture of active metabolic pathways. This is illustrated in both a theoretical and real scenario. Finally, we apply this approach to find novel pathways relevant in the regulation of acetate overflow metabolism in Escherichia coli. As a result, several targets which could be relevant for better understanding of the phenomenon leading to impaired acetate overflow are proposed. Conclusions: A novel mathematical framework that determines functional pathways based on gene/protein expression data is presented and validated. We show that our approach is able to provide new insights into complex biological scenarios such as acetate overflow in Escherichia coli.
Autores: Aramburu, A.; Rubio, Ángel;
ISSN 1748-7188  Vol. 7  Nº 19  2012 
Background: The selection of the reference to scale the data in a copy number analysis has paramount importance to achieve accurate estimates. Usually this reference is generated using control samples included in the study. However, these control samples are not always available and in these cases, an artificial reference must be created. A proper generation of this signal is crucial in terms of both noise and bias. We propose NSA (Normality Search Algorithm), a scaling method that works with and without control samples. It is based on the assumption that genomic regions enriched in SNPs with identical copy numbers in both alleles are likely to be normal. These normal regions are predicted for each sample individually and used to calculate the final reference signal. NSA can be applied to any CN data regardless the microarray technology and preprocessing method. It also finds an optimal weighting of the samples minimizing possible batch effects. Results: Five human datasets (a subset of HapMap samples, Glioblastoma Multiforme (GBM), Ovarian, Prostate and Lung Cancer experiments) have been analyzed. It is shown that using only tumoral samples, NSA is able to remove the bias in the copy number estimation, to reduce the noise and therefore, to increase the ability to detect copy number aberrations (CNAs). These improvements allow NSA to also detect recurrent aberrations more accurately than other state of the art methods. Conclusions: NSA provides a robust and accurate reference for scaling probe signals data to CN values without the need of control samples. It minimizes the problems of bias, noise and batch effects in the estimation of CNs. Therefore, NSA scaling approach helps to better detect recurrent CNAs than current methods. The automatic selection of references makes it useful to perform bulk analysis of many GEO or ArrayExpress experiments without the need of developing a parser to find the normal samples or possible batches within the data. The method is available in the open-source R package NSA, which is an add-on to the framework.
Autores: Rubio, Ángel; Theodoropoulos, C.; et al.
ISSN 1096-7176  Vol. 14  Nº 4  2012  págs. 344 - 353
Constraints-based modeling is an emergent area in Systems Biology that includes an increasing set of methods for the analysis of metabolic networks. In order to refine its predictions, the development of novel methods integrating high-throughput experimental data is currently a key challenge in the field. In this paper, we present a novel set of constraints that integrate tracer-based metabolomics data from Isotope Labeling Experiments and metabolic fluxes in a linear fashion. These constraints are based on Elementary Carbon Modes (ECMs), a recently developed concept that generalizes Elementary Flux Modes at the carbon level. To illustrate the effect of our ECMs-based constraints, a Flux Variability Analysis approach was applied to a previously published metabolic network involving the main pathways in the metabolism of glucose. The addition of our ECMs-based constraints substantially reduced the under-determination resulting from a standard application of Flux Variability Analysis, which shows a clear progress over the state of the art. In addition, our approach is adjusted to deal with combinatorial explosion of ECMs in genome-scale metabolic networks. This extension was applied to infer the maximum biosynthetic capacity of non-essential amino acids in human metabolism. Finally, as linearity is the hallmark of our approach, its importance is discussed at a methodological, computational and theoretical level and illustrated with a practical application in the field of Isotope Labeling Experiments. (C) 2012 Elsevier Inc. All rights reserved.
Autores: Aramburu, A.; Bengtsson, H.; et al.
ISSN 1367-4803  Vol. 28  Nº 13  2012  págs. 1793 - 1794
CalMaTe calibrates preprocessed allele-specific copy number estimates (ASCNs) from DNA microarrays by controlling for single-nucleotide polymorphism-specific allelic crosstalk. The resulting ASCNs are on average more accurate, which increases the power of segmentation methods for detecting changes between copy number states in tumor studies including copy neutral loss of heterozygosity. CalMaTe applies to any ASCNs regardless of preprocessing method and microarray technology, e. g. Affymetrix and Illumina.
Autores: Nogales-Cadenas, R.; Vazquez, M.; et al.
Revista: PLOS ONE
ISSN 1932-6203  Vol. 7  Nº 2  2012  págs. e30766
miRNAs are small RNA molecules (' 22nt) that interact with their corresponding target mRNAs inhibiting the translation of the mRNA into proteins and cleaving the target mRNA. This second effect diminishes the overall expression of the target mRNA. Several miRNA-mRNA relationship databases have been deployed, most of them based on sequence complementarities. However, the number of false positives in these databases is large and they do not overlap completely. Recently, it has been proposed to combine expression measurement from both miRNA and mRNA and sequence based predictions to achieve more accurate relationships. In our work, we use LASSO regression with non-positive constraints to integrate both sources of information. LASSO enforces the sparseness of the solution and the non-positive constraints restrict the search of miRNA targets to those with down-regulation effects on the mRNA expression. We named this method TaLasso (miRNA-Target LASSO). We used TaLasso on two public datasets that have paired expression levels of human miRNAs and mRNAs. The top ranked interactions recovered by TaLasso are especially enriched (more than using any other algorithm) in experimentally validated targets. The functions of the genes with mRNA transcripts in the top-ranked interactions are meaningful. This is not the case using other algorithms. TaLasso is available as Matlab or R code. There is also a web-based tool for human miRNAs at
Autores: De las Rivas, J.; Fontanillo, C.; et al.
ISSN 0888-7543  Vol. 97  Nº 2  2011  págs. 86 - 93
DNA copy number aberrations (CNAs) are genetic alterations common in cancer cells. Their transcriptional consequences are still poorly understood. Based on the fact that DNA copy number (CN) is highly correlated with the genomic position, we have applied a segmentation algorithm to gene expression (GE) to explore its relation with CN. We have found a strong correlation between segmented CN (sCN) and segmented GE (sGE), corroborating that CNAs have clear effects on genome-wide expression. We have found out that most of the recurrent regions of sGE are common to those obtained from sCN analysis. Results for two cancer datasets confirm the known targets of aberrations and provide new candidates to study. The suggested methodology allows to find recurrent aberrations specific to sGE, revealing loci where the expression of the genes is independent from their CNs. R code and additional files are available as supplementary material. (C) 2010 Elsevier Inc. All rights reserved.
Autores: Theodoropoulos, C.; Rezola, Alberto; et al.
ISSN 0303-2647  Vol. 105  Nº 2  2011  págs. 140 - 146
The elementary flux modes (EFMs) approach is an efficient computational tool to predict novel metabolic pathways. Elucidating the physiological relevance of EFMs in a particular cellular state is still an open challenge. Different methods have been presented to carry out this task. However, these methods typically use little experimental data, exploiting methodologies where an a priori optimization function is used to deal with the indetermination underlying metabolic networks. Available "omics" data represent an opportunity to refine current methods. In this article we discuss whether (or not) metabolomics data from isotope labeling experiments (ILEs) and EFMs can be integrated into a linear system of equations. Aside from refining current approaches to infer the physiological relevance of EFMs, this question is important for the integration of metabolomics data from ILEs into metabolic networks, which generally involve non-linear relationships. As a result of our analysis, we concluded that in general the concept of EFMs needs to be redefined at the atomic level for the modeling of ILEs. For this purpose, the concept of Elementary Carbon Modes (ECMs) is introduced. (C) 2011 Elsevier Ireland Ltd. All rights reserved.
Autores: Antón, R; Rubio, Ángel;
ISSN 1471-2105  Vol. 11  2010 
Background: Exon arrays provide a way to measure the expression of different isoforms of genes in an organism. Most of the procedures to deal with these arrays are focused on gene expression or on exon expression. Although the only biological analytes that can be properly assigned a concentration are transcripts, there are very few algorithms that focus on them. The reason is that previously developed summarization methods do not work well if applied to transcripts. In addition, gene structure prediction, i.e., the correspondence between probes and novel isoforms, is a field which is still unexplored. Results: We have modified and adapted a previous algorithm to take advantage of the special characteristics of the Affymetrix exon arrays. The structure and concentration of transcripts -some of them possibly unknown-in microarray experiments were predicted using this algorithm. Simulations showed that the suggested modifications improved both specificity (SP) and sensitivity (ST) of the predictions. The algorithm was also applied to different real datasets showing its effectiveness and the concordance with PCR validated results. Conclusions: The proposed algorithm shows a substantial improvement in the performance over the previous version. This improvement is mainly due to the exploitation of the redundancy of the Affymetrix exon arrays. An R-Package of SPACE with the updated algorithms have been developed and is freely available.
Autores: Bengtsson, H.; Rubio, Ángel;
ISSN 1367-4803  Vol. 26  Nº 15  2010  págs. 1827 - 1833
Motivation: Current algorithms for estimating DNA copy numbers (CNs) borrow concepts from gene expression analysis methods. However, single nucleotide polymorphism (SNP) arrays have special characteristics that, if taken into account, can improve the overall performance. For example, cross hybridization between alleles occurs in SNP probe pairs. In addition, most of the current CN methods are focused on total CNs, while it has been shown that allele-specific CNs are of paramount importance for some studies. Therefore, we have developed a summarization method that estimates high-quality allele-specific CNs. Results: The proposed method estimates the allele-specific DNA CNs for all Affymetrix SNP arrays dealing directly with the cross hybridization between probes within SNP probesets. This algorithm outperforms (or at least it performs as well as) other state-of-the-art algorithms for computing DNA CNs. It better discerns an aberration from a normal state and it also gives more precise allele-specific CNs.
Autores: Pio, R; Blanco, David; Pajares, María Josefa; et al.
ISSN 1471-2164  Vol. 11  2010  págs. 352
Background: Microarrays strategies, which allow for the characterization of thousands of alternative splice forms in a single test, can be applied to identify differential alternative splicing events. In this study, a novel splice array approach was developed, including the design of a high-density oligonucleotide array, a labeling procedure, and an algorithm to identify splice events. Results: The array consisted of exon probes and thermodynamically balanced junction probes. Suboptimal probes were tagged and considered in the final analysis. An unbiased labeling protocol was developed using random primers. The algorithm used to distinguish changes in expression from changes in splicing was calibrated using internal non-spliced control sequences. The performance of this splice array was validated with artificial constructs for CDC6, VEGF, and PCBP4 isoforms. The platform was then applied to the analysis of differential splice forms in lung cancer samples compared to matched normal lung tissue. Overexpression of splice isoforms was identified for genes encoding CEACAM1, FHL-1, MLPH, and SUSD2. None of these splicing isoforms had been previously associated with lung cancer. Conclusions: This methodology enables the detection of alternative splicing events in complex biological samples, providing a powerful tool to identify novel diagnostic and prognostic biomarkers for cancer and other pathologies.