[Selected publications] [Research summary in one slide] [Google Scholar]

Journal(*equal contribution, #corresponding)

  1. SSBlazer: a genome-wide nucleotide-resolution model for predicting single-strand break sites.
    S Xu, J Wei, S Sun, J Zhang, TF Chan, Y Li#. Genome Biology, 2024. [Full text]
  2. ifDEEPre: large protein language-based deep learning enables interpretable and fast predictions of enzyme commission numbers.
    Q Tan, J Xiao, J Chen, Y Wang, Z Zhang, T Zhao, Y Li#. Briefing in Bioinformatics, 2024. accepted
  3. RiboDiffusion: Tertiary Structure-based RNA Inverse Folding with Generative Diffusion Models.
    H Huang, Z Lin, D He, L Hong, Y Li#. Bioinformatics, 2024. accepted
  4. scNovel: a scalable deep learning-based network for novel rare cell discovery in single-cell transcriptomics.
    C Zheng, Y Wang, Y Cheng, X Wang, H Wei, Y Li#. Briefing in Bioinformatics, 2024. [Full text]
  5. ProNet DB: A proteome-wise database for protein surface property representations and RNA-binding profiles.
    J Wei, J Xiao, S Chen, L Zong, X Gao, Y Li#. Database, 2024. [Full text]
  6. A machine learning-based risk score for prediction of infective endocarditis among patients with Staphylococcus aureus bacteraemia.
    C Lai, E Leung, Y He, C-C Cheung, O Mui, Q Yu, T Li, A Lee, Y Li, C-Y Lui. The Journal of Infectious Diseases, 2024. [Full text].
  7. Developing ChatGPT for Biology and Medicine: A Complete Review of Biomedical Question Answering.
    Q Li, L Li, Y Li#. Biophysics Reports, 2024. [Full text].
  8. USPNet: unbiased organism-agnostic and highly sensitive signal peptide predictor with deep protein language model.
    J Shen, Q Yu, S Chen, Q Tan, J Li, Y Li#. Nature Computational Science, 2023. [Full text]
  9. A scalable sparse neural network framework for rare cell type annotation of single-cell transcriptome data.
    Y Cheng, X Fan, J Zhang, Y Li#. Communications Biology, 2023. [Full text]
  10. AcrNET: Predicting anti-CRISPR with Deep Learning.
    Y Li, Y Wei, S Xu, Q Tan, L Zong, J Wang, Y Wang, J Chen, L Hong, Y Li#. Bioinformatics, 2023. [Full text]
  11. Con-AAE: Contrastive Cycle Adversarial Autoencoders for Single-cell Multi-omics Alignment and Integration.
    X Wang, Z Hu, T Yu, Y Wang, R Wang, Y Wei, J Shu, J Ma, Y Li#. Bioinformatics, 2023. [Full text]
  12. The High-dimensional Space of Human Diseases Built from Diagnosis Records and Mapped to Genetic Loci.
    G Jia*, Y Li*, X Zhong, K Wang, M Pividori, R Alomairy, A Esposito, H Ltaief, C Terao, M Akiyama , K Matsuda, D Keyes, H Im, T Gojobori, Y Kamatani, M Kubo, N Cox, X Gao#, A Rzhetsky#. Nature Computational Science, 2023. [Full text]
  13. Deep autoencoder for interpretable tissue-adaptive deconvolution and cell-type-specific gene analysis.
    Y Chen*, Y Wang*, Y Chen, Y Cheng, Y Wei, Y Li, J Wang, Y Wei, TF Chan#, Y Li#. Nature Communications, 2022. [Full text]
  14. The highly conserved RNA-binding specificity of nucleocapsid protein facilitates the identification of drugs with broad anti-coronavirus activity.
    S Fan, W Sun, L Fan, N Wu, W Sun, H Ma, S Chen, Z Li, Y Li, J Zhang, J Yan. Computational and Structural Biotechnology Journal, 2022.
  15. Self-supervised contrastive learning for integrative single cell RNA-seq data analysis.
    W Han*, Y Cheng*, J Chen*, H Zhong, Z Hu, S Chen, L Zong, L Hong, TF Chan, I King, X Gao#, Y Li#. Briefing in Bioinformatics, 2022. [Full text]
  16. Deep learning identifies and quantifies recombination hotspot determinants.
    Y Li*,#, S Chen*, T Rapakoulia, H Kuwahara, KY Yip, X Gao#. Bioinformatics, 2022. [Full text]
  17. Protein-RNA interaction prediction with deep learning: Structure matters.
    J Wei*, S Chen*, L Zong*, X Gao#, Y Li#. Briefing in Bioinformatics, 2022. [Full text]
  18. DeepCellState: an autoencoder-based framework for predicting cell type specific transcriptional states induced by drug treatment.
    R Umarov, Y Li, E Arner. PLOS Computational Biology, 2021. [Full text]
  19. NERO: A Biomedical Named-entity (Recognition) Ontology with a Large, Annotated Corpus Reveals Meaningful Associations Through Text Embedding.
    K Wang, R Stevens, H Alachram,Y Li, L Soldatova, R King, S Ananiadou, M Li, F Christopoulou, J Ambite, S Garg, U Hermjakob, D Marcu, E Sheng, T Beibbarth, E Wingender, A Galstyan, X Gao, B Chambers, B Khomtchouk, J Evans, A Schoene, W Pan, J Mathew, A Rzhetsky. npj Systems Biology and Applications, 2021. [Full text]
  20. ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation.
    R Umarov, Y Li, T Arakawa, S Takizawa, X Gao, E Arner. PLOS Computational Biology, 2021. [Full text]
  21. Structural and functional studies of the pyroptosis-related human Pannexin1 channel.
    S. Zhang, B. Yuan, J. Lam, J. Zhou, X. Zhou, G. Mandujano, X. Tian, Y. Liu, R. Han, Y Li, X. Gao, M. Li, and M. Yang. Cell Discovery, 2021.
  22. Lunar Features Detection for Energy Discovery via Deep Learning.
    S Chen*, Y Li*,#, T Zhang, X Zhu, S Sun#, X Gao#. Applied Energy, 2021.
  23. Disease Gene Prioritization with Privileged Information and Heteroscedastic Gaussian Dropout.
    J Shu, Y Li, S Wang, J Ma. Bioinformatics, 2021.
  24. HMD-ARG: hierarchical multi-task deep learning for annotating antibiotic resistance genes.
    Y Li*, Z Xu*, W Han*, H Cao, R Umarov, A Yan, M Fan, H Chen, L Li, P Ho, X Gao. Microbiome, 2021. [Full text]
  25. DeepSimulator1.5: a more powerful, quicker and lighter simulator for Nanopore sequencing.
    Y Li*, S Wang*, C Bi, Z Qiu, M Li, X Gao. Bioinformatics, 2020. [Code] [PDF]
  26. A Self-adaptive Deep Learning Algorithm for Accelerating Multi-component Flash Calculation.
    T Zhang, Y Li, Y Li, S Sun, and X Gao. Computer Methods in Applied Mechanics and Engineering, 2020.
  27. Modern Deep Learning in Bioinformatics.
    H Li*, S Tian*, Y Li*, R Tan, Y Pan, C Huang, Y Xu, and X Gao. Journal of Molecular Cell Biology, 2020.
  28. DeeReCT-APA: prediction of alternative polyadenylation site usage through deep learning.
    Z Li, Y Li, B Zhang, Y Li, Y Long, X Zou, M Zhang, Y Hu, W Chen, X Gao. Genomics, Proteomics & Bioinformatics (GPB), 2020.
  29. Long-read Individual-molecule Sequencing Reveals CRISPR-induced Genetic Heterogeneity in Human ESCs.
    C Bi, L Wang, B Yuan, X Zhou, Y Li, S Wang, Y Pang, X Gao, Y Huang, M Li. Genome Biology, 2020.
  30. A deep learning framework to predict binding preference of RNA constituents on protein surface.
    J Lam*, Y Li*, L Zhu, R Umarov, H Jiang, A Heliou, F Sheong, T Liu, Y Long, Y Li, L Fang, R Altman, W Chen, X Huang, X Gao. Nature Communications, 2019.
    [KAUST news] [Chinese introduction] [PDF] [Code] [Server]
  31. Estimating heritability and genetic correlations from large health datasets in the absence of genetic data.
    G Jia, Y Li, H Zhang, I Chattopadhyay, A Jensen, D Blair, L Davis, P Robinson, T Dahlén, S Brunak, M Benson, G Edgren, N Cox, X Gao, A Rzhetsky. Nature Communications, 2019. [PDF]
    [UChicago news] [Chinese introduction]
  32. Two symmetric Arginine residues play distinct roles in Thermus thermophilus Argonaute DNA guide strand-mediated DNA target cleavage.
    J Lei, G Sheng, P Cheung, S Wang, Y Li, X Gao, Y Zhang, Y Wang, X Huang. Proceedings of the National Academy of Sciences of the United States of America (PNAS), 2019.
  33. Accelerating Flash Calculation through Deep Learning Methods.
    Y Li, T Zhang, S Sun, X Gao. Journal of Computational Physics, 2019. [PDF]
  34. Deep learning in bioinformatics: introduction, application, and perspective in big data era.
    Y Li, C Huang, L Ding, Z Li, Y Pan, X Gao. Methods, 2019. [PDF] [Code]
    Cover article of the Methods issue: Deep Learning in Bioinformatics
    Highly cited paper
  35. mlDEEPre: Multi-functional enzyme function prediction with hierarchical multi-label deep learning.
    Z Zou, S Tian, X Gao, Y Li#. Frontiers in Genetics, 2019. [PDF] [Server]
  36. Promoter analysis and prediction in the human genome using sequence-based deep learning models.
    R Umarov, H Kuwahara, Y Li, X Gao, V Solovyev. Bioinformatics, 2019. [PDF] [Code]
  37. H-NS uses an autoinhibitory conformational switch to achieve environment-controlled gene silencing.
    U Hameed, C Liao, A Radhakrishnan, F Huser, S Aljedani, X Zhao, A Momin, F Melo, X Guo, C Brooks, Y Li, X Cui, X Gao, J Ladury, L Jaremko, M Jaremko, J Li, S, Arold. Nucleic Acids Research (NAR), 2018.
  38. DeeReCT-PolyA: a robust and generic deep learning method for PAS identification.
    Z Xia, Y Li, B Zhang, Z Li, Y Hu, W Chen, X Gao. Bioinformatics, 2018. [PDF] [Code]
  39. DeepSimulator: a deep simulator for nanopore sequencing.
    Y Li, R Han, C Bi, M Li, S Wang, X Gao. Bioinformatics, 2018. [PDF] [Code]
  40. DLBI: Deep learning guided Bayesian inference for structure reconstruction of super-resolution fluorescence microscopy.
    Y Li, F Xu, F Zhang, P Xu, M Fan, L Li, X Gao, R Han. Bioinformatics, 2018. [PDF] [Code]
  41. PredMP: a web server for de novo prediction and visualization of membrane proteins.
    S Wang, S Fei, Z Wang, Y Li, J Xu, F Zhao, X Gao. Bioinformatics, 2018. [PDF] [Server]
  42. An accurate and rapid continuous wavelet dynamic time warping algorithm for end-to-end mapping in ultra-long nanopore sequencing.
    R Han, Y Li, X Gao, S Wang. Bioinformatics, 2018. [PDF] [Code]
  43. DES-Mutation: System for Exploring Links of Mutations and Diseases.
    V Kordopati, A Salhi, R Razali, A Radovanovic, F Tifratene, M Uludag, Y Li, A Bokhari, A AlSaieedi, A Raies, C Neste, M Essack, V Bajic. Scientific Reports, 2018. [PDF] [Server]
  44. AuTom-dualx: a toolkit for fully automatic fiducial marker-based alignment of dual-axis tilt series with simultaneous reconstruction.
    R Han, X Wan, L Li, A Lawrence, P Yang, Y Li, S Wang, F Sun, Z Liu, X Gao, F Zhang. Bioinformatics, 2018.
  45. DEEPre: sequence-based enzyme EC number prediction by deep learning.
    Y Li, S Wang, R Umarov, B Xie, M Fan, L Li, X Gao. Bioinformatics, 2017. [PDF] [Server]
  46. Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape.
    H Dai, R Umarov, H Kuwahara, Y Li, L Song, X Gao. Bioinformatics, 2017. [PDF] [Code]
  47. The dynamic multisite interactions between two intrinsically disordered proteins.
    S Wu, D Wang, J Liu, Y Feng, J Weng, Y Li, X Gao, J Liu, W Wang. Angewandte Chemie, 2017.
  48. Reward sensitivity predicts ice cream-related attentional bias assessed by inattentional blindness.
    X Li, Q Tao, Y Fang, C Cheng, Y Hao, J Qi, Y Li, W Zhang, Y Wang, X Zhang. Appetite, 2015.

Conference(*equal contribution, #corresponding)

  1. RiboDiffusion: Tertiary Structure-based RNA Inverse Folding with Generative Diffusion Models.
    H Huang, Z Lin, D He, L Hong, Y Li. The Thirty-Second Conference on Intelligent Systems for Molecular Biology (ISMB-24)
  2. GFETM: Genome Foundation-based Embedded Topic Model for scATAC-seq Modeling.
    Y Fan, Y Li, J Ding, Y Li. The 28th Annual International Conference on Research in Computational Molecular Biology (RECOMB-24)
  3. Drug Synergistic Combinations Predictions via Large-Scale Pre-Training and Graph Structure Learning.
    Z Hu, Q Yu, Y Guo, T Wang, I King, X Gao, L Song, Y Li. The 27th Annual International Conference on Research in Computational Molecular Biology (RECOMB-23)
  4. Understanding Dropout for Graph Neural Networks.
    J Shu, B Xi, Y Li, F Wu, C Kamhoua, J Ma. GraphLearning-2022.
  5. CLMB: deep contrastive learning for robust metagenomic binning.
    P Zhang, Z Jiang, Y Wang, Y Li. The 26th Annual International Conference on Research in Computational Molecular Biology (RECOMB-22). Preprint
  6. Contact-Distil: Boosting Low Homologous Protein Contact Map Prediction by Self-Supervised Distillation.
    Q Wang, J Chen, Y Zhou, Y Li, L Zheng, Z Li, S Cui. Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)
  7. DeepCURATER: Deep Learning for CoURse And Teaching Evaluation and Review.
    Z Hu, B Thumu, Y Qin, T Wong, Y Lu, Z Tao, O Kan, Y Li, I King. 2021 IEEE International Conference on Engineering, Technology & Education (TALE-21)
  8. Disease Gene Prioritization with Privileged Information and Heteroscedastic Gaussian Dropout.
    J Shu, Y Li, S Wang, J Ma. The Twenty-Ninth Conference on Intelligent Systems for Molecular Biology (ISMB-21)
  9. RNA Secondary Structure Prediction By Learning Unrolled Algorithms.
    X Chen*, Y Li*, R Umarov, X Gao, L Song. Eighth International Conference on Learning Representations (ICLR-20),
    Oral(Accpetance rate=48/2599=1.85%)
    [GaTech news] [Chinese news] [Chinese introduction] [Plain explanation]
  10. Learning to Stop While Learning to Predict.
    X Chen, H Dai, Y Li, X Gao, and L Song. Thirty-seventh International Conference on Machine Learning (ICML-20).
  11. Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test.
    L Ding, M Yu, L Liu, F Zhu, Y Liu, Y Li, L Shao. Thirty-third Conference on Neural Information Processing Systems (NeurIPS-19)
  12. Linear Kernel Tests via Empirical Likelihood for High Dimensional Data.
    L Ding, Z Liu, Y Li, S Liao, Y Liu, P Yang, G Yu, L Shao, X Gao. The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19)
  13. Approximate Kernel Selection with Strong Approximate Consistency.
    L Ding, S Liao, Y Liu, Y Li, P Yang, Y Pan, C Huang, L Shao, X Gao. The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19)
  14. DLBI: Deep learning guided Bayesian inference for structure reconstruction of super-resolution fluorescence microscopy.
    Y Li*, F Xu*, F Zhang, P Xu, M Fan, L Li, X Gao, R Han. The Twenty-Sixth Conference on Intelligent Systems for Molecular Biology (ISMB-18)
  15. An accurate and rapid continuous wavelet dynamic time warping algorithm for end-to-end mapping in ultra-long nanopore sequencing.
    R Han, Y Li, X Gao, S Wang. The Seventeenth European Conference on Computational Biology (ECCB-18)