Publicador de contenidos

Aplicaciones anidadas


Seminarios DATAI. Curso 2023-2024

Aplicaciones anidadas



Deep learning y procesamiento de bioseñales

28/06/2024 / Marisol Gómez Fernández



Las bioseñales, como los electrocardiogramas (ECG), electroencefalogramas (EEG) y electromiogramas (EMG), contienen patrones complejos que los métodos tradicionales de procesamiento de señales no siempre logran interpretar con precisión. Por otro lado, el uso de técnicas de deep learning, especialmente las redes neuronales convolucionales (CNN) y las redes neuronales recurrentes (RNN), está ampliamente extendido para la extracción y clasificación de características a partir de datos.

En esta charla se revisa el estado actual de las aplicaciones del deep learning en el análisis de bioseñales, centrándonos sobre todo en señales de EEG. Se analizan su adecuación para distintos tipos de bioseñales y se aborda desafíos como la escasez de datos o la interpretabilidad de los modelos.

Aplicaciones anidadas




On fingerprinting amd functional reconfiguration of functional connectomes

13/06/2024 / Joaquín Goñi



Functional connectomes (FCs) contain pairwise estimations of functional couplings based on pairs of brain regions activity derived from fMRI BOLD signals. FCs are commonly represented as correlation matrices that are symmetric positive definite (SPD) matrices lying on or inside the SPD manifold. Since the geometry on the SPD manifold is non-Euclidean, the inter-related entries of FCs undermine the use of Euclidean-based distances and its stability when using them as features in machine learning algorithms. By projecting FCs into a tangent space, we can obtain tangent functional connectomes (tangent-FCs), whose entries would not be inter-related, and thus, allow the use of Euclidean-based methods. Tangent-FCs have shown a higher predictive power of behavior and cognition, but no studies have evaluated the effect of such projections with respect to fingerprinting.

In this work, we hypothesize that tangent-FCs have a higher fingerprint than “regular” (i.e., no tangent- projected) FCs. Fingerprinting was measured by identification rates (ID rates) using the standard test-retest approach as well as incorporating monozygotic and dizygotic twins. We assessed: (i) Choice of the Reference matrix Cref. Tangent projections require a reference point on the SPD manifold, so we explored the effect of choosing different reference matrices. (ii) Main-diagonal Regularization. We explored the effect of weighted main diagonal regularization1. (iii) Different fMRI conditions. We included resting state and seven fMRI tasks, (iv) Parcellation granularities from 100 to 900 cortical brain regions (plus subcortical), (v) Different distance metrics. Correlation and Euclidean distances were used to compare regular FCs as well as tangent-FCs. (vi) fMRI scan length on resting state and when comparing task-based versus (matching scan length) resting-state fingerprint.

Our results showed that identification rates are systematically higher when using tangent-FCs. Specifically, we found: (i) Riemann and log-Euclidean matrix references systematically led to higher ID rates for all configurations assessed. (ii) In tangent-FCs, Main-diagonal regularization prior to tangent space projection was critical for ID rate when using Euclidean distance, whereas barely affected ID rates when using correlation distance. (iii) ID rates were dependent on condition and fMRI scan length. (iv)

Parcellation granularity was key for ID rates in FCs, as well as in tangent-FCs with fixed regularization, whereas optimal regularization of tangent-FCs mostly removed this effect. (v) Correlation distance in tangent-FCs outperformed any other configuration of distance on FCs or on tangent-FCs across the “fingerprint gradient” (here sampled by assessing test-retest, Monozygotic twins and Dizygotic twins). (vi)
ID rates tended to be higher in task scans compared to resting-state scans when accounting for fMRI scan length.

I will also introduce our ongoing work on task-to-rest and rest-to-task functional reconfiguration based on tangent space projections of functional connectivity.

Aplicaciones anidadas


Francisco Carrillo-Pérez


RNA-to-Image synthesis: generating synthetic digital pathology tiles based on NGS data using deep generative models

02/05/2024 / Francisco Carrillo-Pérez



Data scarcity remains a major obstacle in biomedical machine learning, where data acquisition is frequently costly or difficult. Synthetic data generation offers a compelling solution for training more robust and generalizable machine learning models. While the generation of synthetic data for cancer diagnosis has been investigated in the literature, it has primarily focused on single-modality settings, such as whole-slide image tiles or RNA-Seq data. However, inspired by the success of text-to-image synthesis models in generating natural images based on a text prompt, a compelling question arises: can this framework be extended to gene expression data and digital pathology, given the established relationship between the two in existing research? This presentation introduces two novel RNA-to-Image generation models. We demonstrate that they effectively synthesize image tiles preserving gene expression patterns for both healthy and multi-cancer tissues. These models offer significant potential for biomedical research, including augmenting datasets to improve model performance, generating privacy-preserving synthetic samples, and enabling the investigation of causal links between gene expression modifications and tissue morphology changes.

Aplicaciones anidadas


David Martínez Rubio


Accelerated and Sparse Algorithms for Approximate Personalized PageRank

10/04/2024 / David Martínez Rubio



This talk will go over the basics of the PageRank problem, studied initially by the founders of Google, which allowed them to create their search engine by applying it to the internet graph with hyperlinks defining edges. Then, I will explain our new results on the problem for undirected graphs, whose main application is finding local clusters in networks, and is used in many branches of science. We have now algorithms that find local clusters fast in a time that does not depend on the whole graph but on the local cluster itself, which is significantly smaller. This is a joint work with Elias Wirth and Sebastian Pokutta.

Aplicaciones anidadas



Análisis Criminal y Pronósticos Criminales

20/03/2024 / Gaston Pezzuchi



En esencia el análisis criminal tiene como objetivo conocer la realidad criminal con el objetivo de proveer de capacidad anticipatoria al sistema de seguridad pública, de justicia penal y a las agencias policiales y judiciales involucradas en los mismos. Exploraremos muy brevemente el estado del arte y algunos avances en métodos de pronósticos espacio-temporales para el delito. El campo disciplinar que exploraremos incluye herramientas de la Ciencia de Datos, y la Ciencia y Sistemas de Información Geográfica.

Aplicaciones anidadas


Santiago Mazuelas


Beyond Empirical Risk Minimization

21/02/2024 / Santiago Mazuelas



The empirical risk minimization (ERM) approach for supervised learning chooses prediction rules that fit training samples and are “simple” (generalize). This approach has been the workhorse of machine learning methods and has enabled a myriad of applications. However, ERM methods strongly rely on the specific training samples available and cannot easily address scenarios affected by distribution shifts and corrupted samples. Robust risk minimization (RRM) is an alternative approach that does not aim to fit training examples and instead chooses prediction rules minimizing the maximum expected loss (risk). This talk presents a learning framework based on the generalized maximum entropy principle that leads to minimax risk classifiers (MRCs). The proposed MRCs can efficiently minimize worst-case expected 0-1 loss and provide tight performance guarantees. In particular, MRCs are strongly universally consistent using feature mappings given by characteristic kernels. MRC learning is based on expectation estimates and does not strongly rely on specific training samples. Therefore, the methods presented can provide techniques that are robust to practical situations that defy conventional assumptions, e.g., training samples that follow a different distribution or are corrupted by noise.

Aplicaciones anidadas


Vinny Dunne


Problemas de la era digital: Informática forense y recuperación de datos

24/01/2024 / Vinny Dunne



In today's talk, I will delve into the realms of digital forensics, shedding light on the crucial aspects of my work. Together, we will explore the challenges and triumphs encountered in the pursuit of recovering invaluable data. It is not just a profession for me; it is a passion that has driven me to establish a pioneering data recovery lab right here in Navarra. I will share insights into real-world cases that highlight the intricacies of digital forensics. These cases will not only captivate your interest but also provide a glimpse into the critical role that data recovery plays in our technologically driven world. 

Aplicaciones anidadas


fernando carazo


Unveiling the Black Box: Applying Explainable AI in Precision Medicine

13/12/2023 / Fernando Carazo



In today's AI-driven landscape, understanding the decisions made by artificial intelligence systems is crucial. Explainable AI (XAI) emerges as a pivotal solution, shedding light on the opaque nature of some AI models. The significance of XAI spans various industries, and its application is particularly transformative in the pharmaceutical sector. As we delve into a practical case study, we'll witness how XAI has revolutionized drug research and development, offering both efficiency gains and cost reductions and a profound comprehension of critical decisions in this vital field.

Aplicaciones anidadas



Monitoring research and innovation from heterogeneous sources using knowledge graphs

22/11/23 / Vanni Zaravella



Knowledge Graphs are machine-readable representations of the information via predicative triples, typically defined by an underlining ontology schema. The recent rise of the Open Science paradigm and advances in Natural Language Processing models has led to the creation of Information Extraction pipelines that can generate large-scale scholarly Knowledge Graphs from scientific publications and patents, enabling advanced ‘semantic’ services such as fine-grained document classification, retrieval, question answering, and innovation tracking. However, tracking the complex research-industry dynamics of a target technological domain requires also incorporating alternative text sources like news and micro-blogging posts, from where conventional NLP methods and models typically struggle to accurately extract information with high recall. In this talk, we present an enhanced information extraction pipeline tailored to the generation of a knowledge graph comprising open-domain entities from micro-blogging posts. It leverages dependency parsing and classifies entity relations in an unsupervised manner through hierarchical clustering over non-contextual word embeddings. We provide a use case that demonstrates the extraction of semantic triples within the domain of Digital Transformation from X/Twitter.

Aplicaciones anidadas



Resource-Constrained Project Scheduling Problem: A bi-objective approach with time-dependent resource costs

25/10/23 / Laura Anton Sanchez




This talk provides new insights on bi-criteria resource-constrained project scheduling problems. We define a realistic problem where the objectives to combine are the makespan and the total cost for resource usage. Time-dependent costs are assumed for the resources, i.e., they depend on when a resource is used. An optimization model is presented, followed by the development of an algorithm aiming at finding the set of Pareto solutions. The intractability of the optimization models underlying the problem also justifies the development of a metaheuristic for approximating the same front. We design a bi-objective evolutionary algorithm that includes problem-specific knowledge and is based on the Non-dominated Sorting Genetic Algorithm (NSGA-II). The results demonstrate the efficiency of the metaheuristic proposed. In a more recent work, another six multi-objective evolutionary algorithms have been implemented to solve this problem and then, an exhaustive comparison of their performance with the NSGA-II based algorithm has been carried out. A computational and statistically supported study is conducted, using instances built from those available in the literature and applying a set of performance measures to the solution sets obtained by each methodology.