University of Catania
Department of Clinical and Experimental Medicine
office at Department of Computer Science
viale A. Doria, 6, 95125 Catania
email giugno at dmi.unict.it
research interest algorithms and datamining for biomedicine, bioinformatics and pharmacogenomics
I'm currently affiliated at Bioinformatics group at University of Catania. Our group assembles expertise on Sequence Analysis (alignment, gene design, miRNA targeting and design), Indexing Techniques, and Biological Networks analysis (exact and inexact subgraph matching).
In scopus, google scholar and so on you can find the updated list of publications.
Since 2008, I teach Basi di Dati at University of Catania. This is the page of the course.
Graph Searching and Network Analysis
This is a partial map of the software GraphGrep
(and its variants
(graph searching tools) users.
GraphGrep: A Fast and Universal Method for Querying Graphs
GraphGrep is an application-independent merthod for querying graphs, finding all the occurrences of a subgraph in a database of graphs. The interface to Graph-Grep is a regular expression graph query language Glide that combines features from Xpath and Smart. Glide incorporates both single node and variable-length wildcards. It uses hash-based fingerprinting to represent the graphs in an abstract form and to filter the database. Graphs and Indexes are stored in K and BerkeleyDB relational database.
GraphFind: enhancing graph searching by low support data mining techniques
It implements an effective data storage based also on low-support data mining.
SING: Subgraph search In Non-homogeneous Graphs
SING implements a novel indexing system able to cope with large graphs. The method uses the notion of feature, which can be a small subgraph, subtree or path. Each graph in the database is annotated with the set of all its features. The key point is to make use of
feature locality information. This idea is used to both improve the filtering performance and speed up the subgraph isomorphism task.
GraphGrepSX: Enhancing Graph Database Indexing By Suffix Tree Structure
This is the latest release of graph database searching tool.
Biomedical and chemical databases are large and rapidly growing in size. Graphs naturally model such kinds of data. To fully exploit the wealth of information in these graph databases, a key role is played by systems that search for all occurrences of a query graph. GraphGrepSX implements efficient graph searching algorithms together with an advanced filtering technique. GraphGrepSX is compared with SING, GraphFind, CTree and GCoding. Experiments show that GraphGrepSX outperforms the compared systems on a very large collection of large data. In particular, it reduces the size and the time for the construction of large database index.
SIGMA: A Set-cover-based Inexact Graph Matching Algorithm
SIGMA is an indexing system for inexact graph matching. Given a query graph and fixed a maximum number of edge misses, SIGMA looks for all the inexact matches of the query in the database of target graphs. After a preprocessing phase, SIGMA executes the query phase by efficiently mining the database of target graphs to verify if there are matches. It outputs the set of matching graphs, the number of edges misses and other performance statistics.
NetMatch: a Cytoscape plugin for searching biological networks
NetMatch is a Cytoscape plugin which allows searching biological networks for subcomponents matching a given query. Queries may be approximate in the sense that certain parts of the subgraph-query may be left unspecified. To make the query creation process easy, a drawing tool is provided. Cytoscape is a bioinformatics software platform for the visualization and analysis of biological networks.
RI: A subgraph isomorphism algorithm (NEW!!)
RI is a new subgraph isomorphism algorithm which applies a search strategy to significantly reduce the search space without using any complex pruning rules or domain reduction procedures. We compare our method with the most recent and efficient subgraph isomorphism algorithms (VFlib, LAD, and our C++ implementation of FocusSearch which was originally distributed in Modula2) on synthetic, molecules, and interaction networks data. We show a significant reduction in the running time of our approach compared with these other excellent methods and show that our algorithm scales well as memory demands increase. Subgraph isomorphism algorithms are intensively used for example by biochemical tools. Our analysis gives a comprehensive comparison of different software approaches to subgraph isomorphism highlighting their weaknesses and strengths. This will help researchers make a rational choice among methods depending on their application. We also distribute an open-source package including our system and our own C++ implementation of FocusSearch together with all the used datasets.
miR-EdiTar: A database of predicted A-to-I edited miRNA target sites
A-to-I RNA editing is an important mechanism which consists of the conversion of specific adenosines into inosines in RNA molecules. Its dysregulation has been associated to several human diseases including cancer. Recent work has demonstrated a role for A-to-I editing in microRNA (miRNA) mediated gene expression regulation. In fact, edited forms of mature miRNAs can target sets of genes that differ from the targets of their unedited forms. The specific deamination of mRNAs can generate novel binding sites in addition to potentially altering existing ones. miR-EdiTar is a database of predicted Ato-I edited miRNA binding sites. The database contains predicted miRNA binding sites that could be affected by A-to-I editing and sites that could become miRNA binding sites as a result of A-to-I editing.
miRandola: Extracellular Circulating MicroRNAs Database
MicroRNAs are small noncoding RNAs that play an important role in the regulation of various biological processes through their interaction with cellular messenger RNAs. microRNAs are also present in extracellular human body fluids such as serum, plasma, saliva, and urine. Most of circulating microRNAs are present in human plasma and serum cofractionate with the Argonaute2 (Ago2) protein. However, circulating microRNAs have been also found in membranebound vesicles such as exosomes. Since microRNAs circulate in the bloodstream in a highly stable, extracellular form, they may be used as blood-based biomarkers for cancer and other diseases. miRandola is a comprehensive manually curated classification of extracellular circulating miRNAs. miRandola is connected to miRo` , the miRNA knowledge base, allowing users to infer the potential biological functions of circulating miRNAs and their connections with phenotypes.
miRo` is a web-based knowledge base that provides users with miRNA–phenotype associations in humans. It integrates data from various online sources, such as databases of miRNAs, ontologies, diseases and targets, into a unified database equipped with an intuitive and flexible query interface and data mining facilities. The main goal of miRo` is the establishment of a knowledge base which allows non-trivial analysis through sophisticated mining techniques and the introduction of a new layer of associations between genes and phenotypes inferred based on miRNAs annotations. Furthermore, a specificity function applied to validated data highlights the most significant associations.miRScape: A Cytoscape Plugin to Annotate Biological Networks with microRNAs
miRSscape is a Cytoscape plugin allowing mining on biological networks annotated with microRNAs. It makes use of the knowledge base miRo’, which introduce a new layer of associations between genes and phenotypes based on miRNAs annotations. Given a network, previously loaded into Cytoscape, miRScape allows to identify relationships among genes, processes, functions and diseases at the miRNA level and annotate them as attributes of each network node. These annotated networks may be further analyzed by using mining features available as plug-ins on Cytoscape allowing to find for examples hubs, interesting motifs and so on.miRiam: miRNA target prediction
MicroRNAs (miRNAs) are small RNA molecules that modulate gene expression through degradation of specific mRNAs and/or repression of their translation. miRNAs are involved in both physiological and pathological processes, such as apoptosis and cancer. Their presence has been demonstrated in several organisms as well as in viruses. Virus encoded miRNAs can act as viral gene expression regulators, but they may also interfere with the expression of host genes. Viral miRNAs may control host cell proliferation by targeting cell-cycle and apoptosis regulators. Therefore, they could be involved in cancer pathogenesis. Computational prediction of miRNA/target pairs is a fundamental step in these studies. miRiam is a novel program based on both thermodynamics features and empirical constraints to predict viral miRNAs/human targets interactions. miRiam exploits target mRNA secondary structure accessibility and interaction rules, inferred from validated miRNA/mRNA pairs. A set of genes involved in apoptosis and cell-cycle regulation was identified as target for our studies. This choice was supported by the knowledge that DNA tumor viruses interfere with the above processes in humans. miRNAs were selected from two cancer-related viruses, Epstein-Barr Virus (EBV) and Kaposi-Sarcoma-Associated Herpes Virus (KSHV). Results show that several transcripts possess potential binding sites for these miRNAs. This work has produced a set of plausible hypotheses of involvement of v-miRNAs and human apoptosis genes in cancer development. Our results suggest that during viral infection, besides the protein-based host regulation mechanism, a post-transcriptional level interference may exist.
VIRGO: Visualization of A-to-I RNA editing sites in genomic sequences
RNA Editing is a type of post-transcriptional modification that takes place in the eukaryotes. It
alters the sequence of primary RNA transcripts by deleting, inserting or modifying residues. Several forms of
RNA editing have been discovered including A-to-I, C-to-U, U-to-C and G-to-A. In recent years, the application of global approaches to the study of A-to-I editing, including high throughput sequencing, has led to important advances. However, in spite of enormous efforts, the real biological mechanism underlying this phenomenon remains unknown. VIRGO is a web-based tool that maps A-to-G mismatches between genomic and EST sequences as candidate A-to-I editing sites. VIRGO is built on top of a knowledge-base integrating information of genes from UCSC, EST of NCBI, SNPs, DARNED, and Next Generations Sequencing data. The tool is equipped with a user-friendly interface allowing users to analyze genomic sequences in order to identify candidate A-to-I editing sites. The integration of NGS data allows the computation of p-values and adjusted p-values to measure the mapped editing sites confidence. The whole knowledge base is available for download and will be continuously updated as new NGS data becomes available.
DBSCAN: Enhancing density-basedclustering, parameter reduction and outlier detection
Clustering is a widely used unsupervised data mining technique. In
density-based clustering, a cluster is defined as a connected dense
component and grows in the direction driven by the density. The basic
structure of density-based clustering presents some common
drawbacks: (i) parameters have to be set; (ii) the behavior of the
algorithm is sensitive to the density of the starting object; and (iii)
adjacent clusters of different densities could be not properly
identified. In this paper, we address all the above problems.
Our method, based on the concept of space stratification, efficiently identifies the different densities in the dataset and, accordingly, ranks the objects of the original space. Next, it exploits such a knowledge by projecting the original data into a space with one more dimension. It performs a density based clustering taking into account the reverse-nearest-neighbor of the objects. Our method also reduces the number of input parameters by giving a guideline to set them in a suitable way. Experimental results indicate that our algorithm is able to deal with clusters of different densities and outperforms the most popular algorithms DBSCAN and OPTICS in all the standard benchmark datasets. DBStrata is a software system that implements the density-based clustering architecture together with several extensions able to boost the clustering performances and to efficiently identify outliers.
EasyBack: Sequence similarity is more relevant than species specificity in probabilistic backtranslation
Backtranslation is the process of decoding a sequence of amino acids into the corresponding codons. All synthetic gene design systems include a backtranslation module. The degeneracy of the genetic code makes backtranslation potentially ambiguous since most amino acids are encoded by multiple codons. The common approach to overcome this difficulty is based on imitation of codon usage within the target species. EasyBack is a new parameter-free, fully-automated software for backtranslation using Hidden Markov Models. EasyBack is not based on imitation of codon usage within the target species, but instead uses a sequence-similarity criterion. The model is trained with a set of proteins with known cDNA coding sequences, constructed from the input protein by querying the NCBI databases with BLAST. Unlike existing software, the proposed method allows the quality of prediction to be estimated. When tested on a group of proteins that show different degrees of sequence conservation, EasyBack outperforms other published methods in terms of precision. The prediction quality of a protein backtranslation methis markedly increased by replacing the criterion of most used codon in the same species with a Hidden Markov Model trained with a set of most similar sequences from all species. Moreover, the proposed method allows the quality of prediction to be estimated probabilistically.
Si riceve per appuntamento. Controllare il forum di basi di dati per i ricevimenti settimanali e news.