We are interested in developing computational technologies to understand the functions of coding and non-coding elements, especially in the context of human physiology and disease. We are focusing on the following areas:

Algorithm development for functional screening (esp. CRISPR/Cas9 knockout screening)

We developed a comprehensive computational solution for functional screens using CRISPR/Cas9, including guide-RNA design algorithms (SSC, CRISPR-DO, CRISPR-FOCUS), algorithms for the modeling and processing of CRISPR screens (MAGeCK/MAGeCK-VISPR/MAGeCKFlute), and for the interpretation of CRISPR screens using network or pathway information (MAGeCK-NEST). These algorithms became popular in the field: the MAGeCK suite reach over 20k paper visits and over 38,000 software downloads.



Analyzing gene functions using MAGeCK-VISPR in a single experiment (left) and two experiments (right)



Functional analysis of coding and non-coding elements from screening and genomics data

Using the computational frameworks we developed, we collaborated with experimental and clinical scientists around to world to study DNA functions and their associations with human diseases.


Example 1: targeting endocrine resistant breast cancer

Over 70% of breast cancer patients are ER positive, and endocrine therapy has been a standard treatment for these patients for decades. However, most patients with advanced stage will eventually develop resistance to ER inhibition therapies with unknown mechanisms. We collaborated with Myles Brown lab (at Dana-Farber Cancer Institute/Harvard Medical School) to study the mechanism and potential treatment solutions of breast cancer endocrine resistance. By analyzing genome-wide CRISPR knockout screening data, we found an unusual tumor suppressor, c-src tyrosine kinase (CSK), whose loss accelerated cell growth without hormone, and is associated with high-grade tumors and worse survival rates in patients.

We also identified genes that are synthetic lethal in CSK loss from screens that can serve as drug targets. The top hit (PAK family kinase) is confirmed as a vulnerable target for endocrine resistant patients, and the small molecule PAK inhibitor suppresses tumor growth in various confirmation experiments.

The paper was published in PNAS 2018 and a corresponding patent application is submitted.


Analyzing critical genes in breast cancer

Example 2: Studying functional long non-coding RNAs in cancer

Long non-coding RNAs (lncRNAs) do not translate into protein but they are important in many aspects (including cancer). In collaboration with Wensheng Wei laboratory (Peking University), we developed a novel computational and experimental protocol to screen for lncRNAs using paired gRNAs (pgRNAs). This technology introduces pgRNAs simultaneously into one cell, and is able to efficiently knockout non-coding elements by introducing large genomic deletions. We demonstrated its ability to knockout lncRNAs in a fast and efficient manner.

The paper was published in Nature Biotechnology.


lncRNA screening: designing algorithm (left) and identifying top hits (right)


Transcriptome dynamics from RNA-seq

RNA-Seq is an exciting technology to study transcriptome via the second generation sequencing. We studied the problem of de novo transcriptome assembly from RNA-Seq reads — reconstructing all possible message RNA compositions simultaneously, without using any information from current gene annotations. We developed a series of influential algorithms for RNA-seq transcriptome assembly and expression analysis: IsoInfer, IsoLasso, CEM and ISP. IsoInfer and IsoLasso were the first algorithms to use combinatorial methods and regularized least squares methods to study assembly problem in RNA-seq.


The IsoLasso splicing model