[Selected publications]
[Research summary in one slide]
[Google Scholar]
All publications(*equal contribution, #corresponding)
- Deep Learning for RNA Structure Prediction.
J Wang, Y Fan, L Hong, Z Hu, Y Li. Current Opinion in Structural Biology, 2025, accepted.
- cfDecon: Accurate and Interpretable methylation-based cell type deconvolution for cell-free DNA.
Y Wang, J Li, J Li, S Yang, Y Huang, X Liu, Y Fan, I King, Y Li, Y Li. RECOMB-25, 2025, accepted.
- Linking Dietary Fiber to Human Malady through Cumulative Profiling of Microbiota Disturbance.
X Zhang, H Liu, Y Li, X Wen, T Xu, C Chen, S Hao, J Hu, S Nie, F Gao, G Jia. iMeta, 2025, accepted.
- An effective encoding of human medical conditions in disease space provides a versatile framework for deciphering disease associations.
T Xu, Y Li, X Gao, A Rzhetsky, G Jia. Quantitative Biology, 2025, accepted.
- AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures.
Y Gao, C Zheng, E Xie, H Shi, T Hu, Y Li, M Ng, Z Li, Z Liu. Transactions on Machine Learning Research, 2025, accepted.
- From 2015 to 2023: How Machine Learning Aids Natural Product Analysis.
S Shi, Z Huang, X Gu, X Lin, C Zhong, J Hang, J Lin, C Zhong, L Zhang, Y Li, J Huang. Chemistry Africa, 2025, accepted.
- Accurate RNA 3D structure prediction using a language model-based deep learning approach.
T Shen, Z Hu, S Sun, D Liu, F Wong, J Chen, J Wang, L Hong, Y Wang, J Xiao, L Zheng, T Krishnamoorthi, I King, S Wang#, P Yin#, J Collins#, Y Li#. Nature Methods, 2024. [Full text]
[Nature Methods Research Briefing]
- Fast, sensitive detection of protein homologs using deep dense retrieval.
L Hong*, Z Hu*, S Sun#, X Tang*, J Wang, Q Tan, L Zheng, S Wang, Sheng X, I King, M Gerstein#, Y Li#. Nature Biotechnology, 2024. [Full text]
[Nature Biotechnology Research Briefing]
[CSE News]
- Deep generative design of RNA aptamers using structural predictions.
F Wong, D He, A Krishnan, L Hong, A Wang, J Wang, Z Hu, S Omori, A Li, J Rao, Q Yu, W Jin, T Zhang, K Ilia, J Chen, S Zheng, I King, Y Li#, J Collins#. Nature Computational Science, 2024. [Full text] [GitHub]
- SSBlazer: a genome-wide nucleotide-resolution model for predicting single-strand break sites.
S Xu, J Wei, S Sun, J Zhang, TF Chan, Y Li#. Genome Biology, 2024. [Full text]
- A fast and adaptive detection framework for genome-wide chromatin loop mapping from Hi-C data.
S Chen*, J Wang*, I Jung, Z Qiu, X Gao#, Y Li#. Genome Research, 2024. accepted
-
RiboDiffusion: Tertiary Structure-based RNA Inverse Folding with Generative Diffusion Models.
H Huang, Z Lin, D He, L Hong, Y Li#. Bioinformatics, 2024. [Full text]
Conference version: ISMB-2024
-
Progress and opportunities of foundation models in bioinformatics.
Q Li, Z Hu, Y Wang, L Li, Y Fan, I King, G Jia, S Wang, L Song, Y Li#. Briefing in Bioinformatics, 2024. [Full text]
-
ifDEEPre: large protein language-based deep learning enables interpretable and fast predictions of enzyme commission numbers.
Q Tan, J Xiao, J Chen, Y Wang, Z Zhang, T Zhao, Y Li#. Briefing in Bioinformatics, 2024. [Full text]
-
Learning Meaningful Representation of Single-Neuron Morphology via Large-scale Pre-training.
Y Fan*,Y Li*, Y Zhong, L Hong, L Li, Y Li#. Bioinformatics, 2024. accepted
Conference version: ECCB-2024
- scNovel: a scalable deep learning-based network for novel rare cell discovery in single-cell transcriptomics.
C Zheng, Y Wang, Y Cheng, X Wang, H Wei, Y Li#. Briefing in Bioinformatics, 2024. [Full text]
- ProNet DB: A proteome-wise database for protein surface property representations and RNA-binding profiles.
J Wei, J Xiao, S Chen, L Zong, X Gao, Y Li#. Database, 2024. [Full text]
- DAPE: Data-Adaptive Positional Encoding for Length Extrapolation.
C Zheng, Y Gao, H Shi, M Huang, J Li, J Xiong, X Ren, M Ng, X Jiang, Z Li, Y Li#.
NeurIPS-24.
- MSA Generation with Seqs2Seqs Pretraining: Advancing Protein Structure Predictions.
L Zhang, J Chen, T Shen, Y Li, S Sun.
NeurIPS-24.
- HORSE: Hierarchical Representation for Large-Scale Neural Subset Selection.
B Xie, Y Wang, Y Chen, K Zhou, Y Li, W Meng, J Cheng.
NeurIPS-24.
- Revisiting theEpitaxial Growth Mechanism of 2D TMDC Single Crystal.
C Li, F Zheng, J Min, N Yang, Y‐M Chang, H Liu,Y Zhang, P Yang, Q Yu, Y Li, Z Luo, A Aljarb, K Shih, J‐K Huang, L‐J Li, Y Wan.
Advanced Materials.
- Developing ChatGPT for Biology and Medicine: A Complete Review of Biomedical Question Answering.
Q Li, L Li, Y Li#. Biophysics Reports, 2024. [Full text].
-
Meta-Learning without Data via Unconditional Diffusion Models.
Y Wei, Zi Hu, L Shen, Z Wang, L Li, Y Li, C Yuan. IEEE Transactions on Circuits and Systems for Video Technology, 2024. accepted
-
Lyra: Orchestrating Dual Correction in Automated Theorem Proving.
C Zheng, H Wang, E Xie, Z Liu, J Sun, H Xin, J Shen, Z Li, Y Li. Transactions on Machine Learning Research, 2024. [Full text]
-
Strong Correlation between A-Site Cation Order and Self-Trapped Exciton Emission in Zero-Dimensional Hybrid Perovskites.
F Fang, Y Shen, Y Li, K Shih, H Hu, H Zhong, Y Shi, T Wu. Small Science, 2024. [Full text]
-
Enhancing Biomedical Knowledge Retrieval-Augmented Generation with Self-Rewarding Tree Search and Proximal Policy Optimization.
M Hu, L Zong, H WANG, J Zhou, J Li, Y Gao, K-F Wong, Y Li, I King. EMNLP-2024 Findings, 2024. [Full text]
- A machine learning-based risk score for prediction of infective endocarditis among patients with Staphylococcus aureus bacteraemia.
C Lai, E Leung, Y He, C-C Cheung, O Mui, Q Yu, T Li, A Lee, Y Li, C-Y Lui. The Journal of Infectious Diseases, 2024. [Full text].
-
Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models.
Y Wei, Z Hu, L Shen, Z Wang, Y Li, C Yuan, and D Tao. Forty-first International Conference on Machine Learning (ICML-24).
-
GFETM: Genome Foundation-based Embedded Topic Model for scATAC-seq Modeling.
Y Fan, Y Li, J Ding, Y Li. The 28th Annual International Conference on Research in Computational Molecular Biology (RECOMB-24).
- USPNet: unbiased organism-agnostic and highly sensitive signal peptide predictor with deep protein language model.
J Shen, Q Yu, S Chen, Q Tan, J Li, Y Li#. Nature Computational Science, 2023. [Full text]
- The High-dimensional Space of Human Diseases Built from Diagnosis Records and Mapped to Genetic Loci.
G Jia*, Y Li*, X Zhong, K Wang, M Pividori, R Alomairy, A Esposito, H Ltaief, C Terao, M Akiyama , K Matsuda, D Keyes, H Im, T Gojobori, Y Kamatani, M Kubo, N Cox, X Gao#, A Rzhetsky#. Nature Computational Science, 2023. [Full text]
- A scalable sparse neural network framework for rare cell type annotation of single-cell transcriptome data.
Y Cheng, X Fan, J Zhang, Y Li#. Communications Biology, 2023. [Full text]
- AcrNET: Predicting anti-CRISPR with Deep Learning.
Y Li, Y Wei, S Xu, Q Tan, L Zong, J Wang, Y Wang, J Chen, L Hong, Y Li#. Bioinformatics, 2023. [Full text]
- Con-AAE: Contrastive Cycle Adversarial Autoencoders for Single-cell Multi-omics Alignment and Integration.
X Wang, Z Hu, T Yu, Y Wang, R Wang, Y Wei, J Shu, J Ma, Y Li#. Bioinformatics, 2023. [Full text]
-
Drug Synergistic Combinations Predictions via Large-Scale Pre-Training and Graph Structure Learning.
Z Hu, Q Yu, Y Guo, T Wang, I King, X Gao, L Song, Y Li. The 27th Annual International Conference on Research in Computational Molecular Biology (RECOMB-23).
- Deep autoencoder for interpretable tissue-adaptive deconvolution and cell-type-specific gene analysis.
Y Chen*, Y Wang*, Y Chen, Y Cheng, Y Wei, Y Li, J Wang, Y Wei, TF Chan#, Y Li#. Nature Communications, 2022. [Full text]
[CSE News]
- Protein-RNA interaction prediction with deep learning: Structure matters.
J Wei*, S Chen*, L Zong*, X Gao#, Y Li#. Briefing in Bioinformatics, 2022. [Full text]
- Self-supervised contrastive learning for integrative single cell RNA-seq data analysis.
W Han*, Y Cheng*, J Chen*, H Zhong, Z Hu, S Chen, L Zong, L Hong, TF Chan, I King, X Gao#, Y Li#. Briefing in Bioinformatics, 2022. [Full text]
- Deep learning identifies and quantifies recombination hotspot determinants.
Y Li*,#, S Chen*, T Rapakoulia, H Kuwahara, KY Yip, X Gao#. Bioinformatics, 2022. [Full text]
-
CLMB: deep contrastive learning for robust metagenomic binning.
P Zhang, Z Jiang, Y Wang, Y Li. The 26th Annual International Conference on Research in Computational Molecular Biology (RECOMB-22). Preprint
- The highly conserved RNA-binding specificity of nucleocapsid protein facilitates the identification of drugs with broad anti-coronavirus activity.
S Fan, W Sun, L Fan, N Wu, W Sun, H Ma, S Chen, Z Li, Y Li, J Zhang, J Yan. Computational and Structural Biotechnology Journal, 2022.
-
Understanding Dropout for Graph Neural Networks.
J Shu, B Xi, Y Li, F Wu, C Kamhoua, J Ma. GraphLearning-2022.
-
Contact-Distil: Boosting Low Homologous Protein Contact Map Prediction by Self-Supervised Distillation.
Q Wang, J Chen, Y Zhou, Y Li, L Zheng, Z Li, S Cui. Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22).
- HMD-ARG: hierarchical multi-task deep learning for annotating antibiotic resistance genes.
Y Li*, Z Xu*, W Han*, H Cao, R Umarov, A Yan, M Fan, H Chen, L Li, P Ho, X Gao. Microbiome, 2021. [Full text]
- Lunar Features Detection for Energy Discovery via Deep Learning.
S Chen*, Y Li*,#, T Zhang, X Zhu, S Sun#, X Gao#. Applied Energy, 2021.
-
Disease Gene Prioritization with Privileged Information and Heteroscedastic Gaussian Dropout.
J Shu, Y Li, S Wang, J Ma. Bioinformatics, 2021.
Conference version: ISMB-2021
- Structural and functional studies of the pyroptosis-related human Pannexin1 channel.
S. Zhang, B. Yuan, J. Lam, J. Zhou, X. Zhou, G. Mandujano, X. Tian, Y. Liu, R. Han, Y Li, X. Gao, M. Li, and M. Yang. Cell Discovery, 2021.
- DeepCellState: an autoencoder-based framework for predicting cell type specific transcriptional states induced by drug treatment.
R Umarov, Y Li, E Arner. PLOS Computational Biology, 2021. [Full text]
- NERO: A Biomedical Named-entity (Recognition) Ontology with a Large, Annotated Corpus Reveals Meaningful Associations Through Text Embedding.
K Wang, R Stevens, H Alachram,Y Li, L Soldatova, R King, S Ananiadou, M Li, F Christopoulou, J Ambite, S Garg, U Hermjakob, D Marcu, E Sheng, T Beibbarth, E Wingender, A Galstyan, X Gao, B Chambers, B Khomtchouk, J Evans, A Schoene, W Pan, J Mathew, A Rzhetsky. npj Systems Biology and Applications, 2021. [Full text]
- ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation.
R Umarov, Y Li, T Arakawa, S Takizawa, X Gao, E Arner. PLOS Computational Biology, 2021. [Full text]
-
DeepCURATER: Deep Learning for CoURse And Teaching Evaluation and Review.
Z Hu, B Thumu, Y Qin, T Wong, Y Lu, Z Tao, O Kan, Y Li, I King.
2021 IEEE International Conference on Engineering, Technology & Education (TALE-21).
- DeepSimulator1.5: a more powerful, quicker and lighter simulator for Nanopore sequencing.
Y Li*, S Wang*, C Bi, Z Qiu, M Li, X Gao. Bioinformatics, 2020.
[Code]
[PDF]
- RNA Secondary Structure Prediction By Learning Unrolled Algorithms.
X Chen*, Y Li*, R Umarov, X Gao, L Song. Eighth International Conference on Learning Representations (ICLR-20),
Oral(Accpetance rate=48/2599=1.85%)
[GaTech news]
[Chinese news]
[Chinese introduction]
[Plain explanation]
- Modern Deep Learning in Bioinformatics.
H Li*, S Tian*, Y Li*, R Tan, Y Pan, C Huang, Y Xu, and X Gao. Journal of Molecular Cell Biology, 2020.
- Long-read Individual-molecule Sequencing Reveals CRISPR-induced Genetic Heterogeneity in Human ESCs.
C Bi, L Wang, B Yuan, X Zhou, Y Li, S Wang, Y Pang, X Gao, Y Huang, M Li. Genome Biology, 2020.
- DeeReCT-APA: prediction of alternative polyadenylation site usage through deep learning.
Z Li, Y Li, B Zhang, Y Li, Y Long, X Zou, M Zhang, Y Hu, W Chen, X Gao. Genomics, Proteomics & Bioinformatics (GPB), 2020.
- Learning to Stop While Learning to Predict.
X Chen, H Dai, Y Li, X Gao, and L Song. Thirty-seventh International Conference on Machine Learning (ICML-20).
- A Self-adaptive Deep Learning Algorithm for Accelerating Multi-component Flash Calculation.
T Zhang, Y Li, Y Li, S Sun, and X Gao. Computer Methods in Applied Mechanics and Engineering, 2020.
- A deep learning framework to predict binding preference of RNA constituents on protein surface.
J Lam*, Y Li*, L Zhu, R Umarov, H Jiang, A Heliou, F Sheong, T Liu, Y Long, Y Li, L Fang, R Altman, W Chen, X Huang, X Gao. Nature Communications, 2019.
[KAUST news]
[Chinese introduction]
[PDF]
[Code]
[Server]
- Estimating heritability and genetic correlations from large health datasets in the absence of genetic data.
G Jia, Y Li, H Zhang, I Chattopadhyay, A Jensen, D Blair, L Davis, P Robinson, T Dahlén, S Brunak, M Benson, G Edgren, N Cox, X Gao, A Rzhetsky. Nature Communications, 2019.
[PDF]
[UChicago news]
[Chinese introduction]
- Deep learning in bioinformatics: introduction, application, and perspective in big data era.
Y Li, C Huang, L Ding, Z Li, Y Pan, X Gao. Methods, 2019.
[PDF]
[Code]
Cover article of the Methods issue: Deep Learning in Bioinformatics
Highly cited paper
- Two symmetric Arginine residues play distinct roles in Thermus thermophilus Argonaute DNA guide strand-mediated DNA target cleavage.
J Lei, G Sheng, P Cheung, S Wang, Y Li, X Gao, Y Zhang, Y Wang, X Huang. Proceedings of the National Academy of Sciences of the United States of America (PNAS), 2019.
- Accelerating Flash Calculation through Deep Learning Methods.
Y Li, T Zhang, S Sun, X Gao. Journal of Computational Physics, 2019.
[PDF]
- mlDEEPre: Multi-functional enzyme function prediction with hierarchical multi-label deep learning.
Z Zou, S Tian, X Gao, Y Li#. Frontiers in Genetics, 2019.
[PDF]
[Server]
- Promoter analysis and prediction in the human genome using sequence-based deep learning models.
R Umarov, H Kuwahara, Y Li, X Gao, V Solovyev. Bioinformatics, 2019.
[PDF]
[Code]
- Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test.
L Ding, M Yu, L Liu, F Zhu, Y Liu, Y Li, L Shao. Thirty-third Conference on Neural Information Processing Systems (NeurIPS-19).
- Linear Kernel Tests via Empirical Likelihood for High Dimensional Data.
L Ding, Z Liu, Y Li, S Liao, Y Liu, P Yang, G Yu, L Shao, X Gao. The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19).
- Approximate Kernel Selection with Strong Approximate Consistency.
L Ding, S Liao, Y Liu, Y Li, P Yang, Y Pan, C Huang, L Shao, X Gao. The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19).
- DeepSimulator: a deep simulator for nanopore sequencing.
Y Li, R Han, C Bi, M Li, S Wang, X Gao. Bioinformatics, 2018.
[PDF]
[Code]
- DLBI: Deep learning guided Bayesian inference for structure reconstruction of super-resolution fluorescence microscopy.
Y Li, F Xu, F Zhang, P Xu, M Fan, L Li, X Gao, R Han. Bioinformatics, 2018.
[PDF]
[Code]
Conference version: ISMB-2018
- H-NS uses an autoinhibitory conformational switch to achieve environment-controlled gene silencing.
U Hameed, C Liao, A Radhakrishnan, F Huser, S Aljedani, X Zhao, A Momin, F Melo, X Guo, C Brooks, Y Li, X Cui, X Gao, J Ladury, L Jaremko, M Jaremko, J Li, S, Arold. Nucleic Acids Research (NAR), 2018.
- DeeReCT-PolyA: a robust and generic deep learning method for PAS identification.
Z Xia, Y Li, B Zhang, Z Li, Y Hu, W Chen, X Gao. Bioinformatics, 2018.
[PDF]
[Code]
- PredMP: a web server for de novo prediction and visualization of membrane proteins.
S Wang, S Fei, Z Wang, Y Li, J Xu, F Zhao, X Gao. Bioinformatics, 2018.
[PDF]
[Server]
- An accurate and rapid continuous wavelet dynamic time warping algorithm for end-to-end mapping in ultra-long nanopore sequencing.
R Han, Y Li, X Gao, S Wang. Bioinformatics, 2018.
[PDF]
[Code]
Conference version: ECCB-2018
- DES-Mutation: System for Exploring Links of Mutations and Diseases.
V Kordopati, A Salhi, R Razali, A Radovanovic, F Tifratene, M Uludag, Y Li, A Bokhari, A AlSaieedi, A Raies, C Neste, M Essack, V Bajic. Scientific Reports, 2018.
[PDF]
[Server]
- AuTom-dualx: a toolkit for fully automatic fiducial marker-based alignment of dual-axis tilt series with simultaneous reconstruction.
R Han, X Wan, L Li, A Lawrence, P Yang, Y Li, S Wang, F Sun, Z Liu, X Gao, F Zhang. Bioinformatics, 2018.
- DEEPre: sequence-based enzyme EC number prediction by deep learning.
Y Li, S Wang, R Umarov, B Xie, M Fan, L Li, X Gao. Bioinformatics, 2017.
[PDF]
[Server]
- Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape.
H Dai, R Umarov, H Kuwahara, Y Li, L Song, X Gao. Bioinformatics, 2017.
[PDF]
[Code]
- The dynamic multisite interactions between two intrinsically disordered proteins.
S Wu, D Wang, J Liu, Y Feng, J Weng, Y Li, X Gao, J Liu, W Wang. Angewandte Chemie, 2017.
- Reward sensitivity predicts ice cream-related attentional bias assessed by inattentional blindness.
X Li, Q Tao, Y Fang, C Cheng, Y Hao, J Qi, Y Li, W Zhang, Y Wang, X Zhang. Appetite, 2015.