My primary research interest is to design efficient algorithms and machine learning methods to process and interpret biological networks and large-scale high-throughput genomics data. Over the years, I have mainly worked on the following three main areas:
Computational Methods for Analysis of scRNA Sequencing Data
At Memorial Sloan Kettering Cancer Center, I have been responsible for implementing an in-house pipeline for processing and analysis of 10X scRNA from fastq files to gene-count matrices (do not make use of 10X Cell Ranger). The pipelines also includes clustering of phenotypically similar cells and discovering differentially expressed genes specific for each cluster.Moreover, I have designed and implemented novel algorithms and machine learning approaches for identifying copy number, discovering allelic expression, identifying and removing artifacts for scRNA sequencing data.
Computational Methods for Analysis of SELEX and HT-SELEX Data
Aptamers, short synthetic RNA/DNA molecules binding specific targets with high affinity and specificity, are utilized in an increasing spectrum of bio-medical applications. Aptamers are identified in vitro via the Systematic Evolution of Ligands by Exponential Enrichment (SELEX) protocol. SELEX selects binders through an iterative process that , starting from a pool of random ssDNA/RNA sequences, amplifies target-affine species through a series of selection cycles. HT-SELEX, which combines SELEX with high throughput sequencing , is capable of generating nearly one billion aptamer sequences. Given the massive amount of data generated by HT-SELEX, available computational methods to visualize high throughput sequencing data and to identify binding motifs neither posses the required scalability nor take advantage of important properties of the experimental procedure.We introduced AptaGUI, an open-source and platform-independent graphical user interface (GUI) to visualize HT-SELEX data. AptaGUI contains many computational tools for HT-SELEX analysis, including data pre-processing and tracking the changes of individual aptamers and entire aptamer families (groups of aptamers sharing highly similar nucleotide sequences) throughout selection cycles. We recently developed AptaTRACE, a novel approach for the identification of sequence-structure binding motifs for massive amount of sequence data produced by HT-SELEX experiment. Our approach leverages the experimental design of the SELEX protocol and identifies sequence-structure motifs that show a signature of selection towards a preferred structure. In the initial pool, secondary structural contexts i.e. tendency of residing in a hairpin, bulge loop, inner loop, multiple loop, dangling end or being paired of each k-mer are distributed according to a background distribution. For sequence motifs involved in binding, in later selection cycles, this distribution shifts towards the structural context favored by the binding interaction with the target site. Utilizing a relative entropy based scoring function, AptaTRACE is able to identify the motifs that converge to a specific structural context throughout the selection cycles of HT-SELEX experiments.