UPEFinder: A Bioinformatic Tool for the Study of Uncharacterized Proteins Based on Gene Expression Correlation and the PageRank Algorithm
The Human Proteome Project (HPP) is leading the international effort to characterize the human proteome. Although the main goal of this project was first focused on the detection of missing proteins, a new challenge arose from the need to assign biological functions to the uncharacterized human proteins and describe their implications in human diseases. Not only the proteins with experimental evidence (uPE1 proteins) but also the uncharacterized missing proteins (uMPs) were the objects of study in this challenge, neXt-CP50. In this work, we developed a new bioinformatic approach to infer biological annotations for the uPE1 proteins and uMPs based on a "guilt-by-association" analysis using public RNA-Seq data sets. We used the correlation of these proteins with the well-characterized PE1 proteins to construct a network. In this way, we applied the PageRank algorithm to this network to identify the most relevant nodes, which were the biological annotations of the uncharacterized proteins. All of the generated information was stored in a database. In addition, we implemented the web application UPEFinder (https://upefinder. proteored.org ) to facilitate the access to this new resource. This information is especially relevant for the researchers of the HPP who are interested in the generation and validation of new hypotheses about the functions of these proteins. Both the database and the web application are publicly available (https://github.com/tibioinformat/UPEfinder).