With social-economic development, people are increasingly caring about health. Consequently, in the field of genomics and healthcare, especially personalized genomics and precision medicine, we have accumulated a tremendous amount of data, which are waiting to be analyzed. This course is designed to equip students with the ability to analyze such data, which would benefit both the students’ personal development and society. In the course, we will cover high-throughput experimental methods, standard data processing pipelines, sequence alignment and mapping, foundational concepts of data analytics, data exploration and visualization, clustering and classification, dimension reduction, and their applications in personalized genomics and precision medicine. For personalized genomics, we will also cover the integration of heterogeneous sequencing and non-sequencing data, single-cell data analysis, multi-omics analysis methods, and cancer genomics. For precision medicine, we will cover protein-RNA interactions, biological graph analysis, and a gentle introduction to biomedical imaging and electronic health records.
Yu LI (liyuATcse.cuhk.edu.hk), SHB-106. Office hour: 3pm-5pm, Friday
Licheng ZONG (lczong21ATcse.cuhk.edu.hk), SHB-1026. Office hour: 3pm-5pm, Monday
Xinyi ZHOU (xyzhou21ATcse.cuhk.edu.hk), SHB-903. Office hour: 9am-11am, Thursday
Wednesday: 9:30am-11am, SC-L3.
Friday: 9:30am-10:15am, ERB-803.
Friday: 10:30-11:15am, ERB-803. Tutorial
Mixed. Slides will be available the day before the lecture day. Video recordings will be available after the lecture.
Blackboard is the main software to manage the course, and grading will be through blackboard. We will use Piazza (BMEG3105) for discussion. You can ask questions through Piazza, even anonymously. For a personal matter, please use the private post to the instructor and the TA. You are also very welcomed to send emails to the instructor and TAs.
Bonus (up to 6%): One bonus question in Midterm. One additional scribing: 1%. Pre-course survey + Post-lecture survey: 0.5% for each, and the maximum is 3%. I do encourage you to complete all of them so that to let me know your feedback and adjust the course accordingly. Send your names to the TA.
All exams and quiz are open-book. You are allowed to take any paper-based materials. However, no phone or computer is allowed. Other communication tools are also not allowed. Discussion is not allowed.
Python (the TA will prepare a recitation class to introduce it, mainly for the non-grading homework and your project) or any other you are familiar with. For python, we suggest you to use Colab.
The programming credits include Non-grading assignment (10%) and Grading programming included in the project (5%). The bonus is sufficient to cover all the programming credit. If you really do not want to try hand-on experiments at all. We do encourage you try.
Please sign Scribing preference. We should have at least one student for each lecture. We may adjust the assignment if necessary. Notice that your note and scribing will be posted online, for others reference. You can choose to remove your name or not. Deadline for signing the scribing: 11:59 pm on 12th Sep. After that, the Google sheet will be closed.
We will do the project individually. You can give us your project and seek our help or we will predefine some projects for you to choose. Some potential projects:
Both the mid-term report (1 page) and the final report should be submitted.
Each student will have 6 late days to turn in assignments, which can be used on A1, A2, A3, PA1, and the project mid-term report. They cannot be used on the project final report and the scribing note. A maximum of 2 late days can be used for each assignment. Grades will be deducted by 25% for each additional late day.
Deadline for each survey: 11:59pm on the day before the next lecture. We do this because I could have time to answer the questions you mentioned in the survey. Please fill 1 in the Google sheet: Survey results, once you have finished one survey. Usually, we will trust the 1s you fill in the Google sheet. But we will check the things in detail if the number of survey forms we received and the number of 1s on the Google sheet is not consistent.
|Lecture||Date||Location||Topic||Slides/Video||Notes||Reading||Important dates (All due at 11:59 pm)|
|1||Sep 8 (Wed)||SC-L3||Introduction||Lec-1, YouTube, Bilibili||Note-1, Note-2||Learning objectives|
|2||Sep 10 (Fri)||ERB-803||Data & Python||Lec-2, YouTube, Bilibili||Note-1, Note-2||Sample code||PA 0 posted|
|3||Sep 15 (Wed)||SC-L3||Sequence and DP||Lec-3, YouTube, Bilibili||Note-1, Note-Wang,Han-Yi||Sample code, Chapter 2&3||PA0 due|
|4||Sep 17 (Fri)||ERB-803||Assembly & Mapping||Lec-4, Video unavailable due to technical issue, Makeup video YouTube, Makeup video Bilibili||Note-1, Note-2||RNA-seq analysis||A1 posted|
|-||Sep 22 (Wed)||-||-||-||-||-||Mid-Autumn Festival|
|5||Sep 24 (Fri)||ERB-803||Data exploration||Lec-5, YouTube, Bilibili||Note-1, Note-2||Repetitive DNA, Python for DA, Sample code, Sample code-2|
|6||Sep 29 (Wed)||SC-L3||Clustering||Lec-6, YouTube, Bilibili||Note-1, Note-2, Note-3||Data mining book||A1 due|
|-||Oct 1 (Fri)||-||-||-||-||-||National Day|
|7||Oct 6 (Wed)||SC-L3||Clustering & Classification||Lec-7, YouTube, Bilibili||Note-1, Note-2||Data mining book, Correlation||A2 posted|
|8||Oct 8 (Fri)||ERB-803||Classification||Lec-8, YouTube, Bilibili||Note-1||Data mining book|
|9||Oct 13 (Wed)||SC-L3||Perf evaluation||Lec-9,10||Cancelled due to typhoon|
|10||Oct 15 (Fri)||ERB-803||Perf evaluation||Lec-9,10, YouTube, Bilibili||Note-Chan, Wai Shing, Note-1||Data mining book|
|11||Oct 20 (Wed)||SC-L3||Mid-term review||Lec-11, YouTube, Bilibili||Note-CHAN, Chi Chung, Note-1, Note-2, Note-3||ParticipationQ, A2 due|
|12||Oct 22 (Fri)||ERB-803||-||-||-||-||8:30am-11:15am, Mid-term|
|Module 2 start|
|13||Oct 27 (Wed)||SC-L3||Dim reduction||Lec-13, YouTube, Bilibili||Note-LI, Xuanxuan, Note-FUNG, Cheuk Wa, Note-1, Note-2||PML book||PA1 posted|
|14||Oct 29 (Fri)||ERB-803||Multi-omics & cancer genomics overview||Lec-14, YouTube, Bilibili||Note-1, Note-2, Note-3||Intro to cancer, Cancer genomics|
|15||Nov 3 (Wed)||SC-L3||TUT|
|16||Nov 5 (Fri)||ERB-803||Cancer genomics||Lec-16, YouTube, Bilibili||Note-WONG, Wing Hei, Note-1, Note-2||Cancer genomics, GATK, GWAS, Epigenetics, ENCODE|
|17||Nov 10 (Wed)||SC-L3||Single cell & visualization||Lec-17, YouTube, Bilibili||Note-LO, Yue Lung Edison, Note-1||Current best practice, Tutorial-1, Tutorial-2, Tutorial-3, Clustering challenges||Project M-report|
|18||Nov 12 (Fri)||ERB-803||Protein-RNA||Lec-18, YouTube, Bilibili||Note-CHAN, Yun Lam, Note-LO, Yue Lung Edison||Sequence motif, PCA-tSNE-UMAP|
|Module 3 start|
|19||Nov 17 (Wed)||SC-L3||Deep learning||Lec-19, YouTube, Bilibili||Note-1, Note-2||Pytorch examples||PA1 due, A3 posted|
|20||Nov 19 (Fri)||ERB-803||How to present||Lec-20, YouTube, Bilibili||Note-1||Pytorch examples|
|21||Nov 24 (Wed)||SC-L3||EHRs & How to learn||Lec-21, YouTube, Bilibili||Note-1, Note-2, Note-3||EHRs processing|
|22||Nov 26 (Fri)||ERB-803||Project pres||Presentation schedule||A3 due|
|23||Dec 1 (Wed)||SC-L3||Course review||Lec-23, YouTube, Bilibili||ParticipationQ|
|24||Dec 3 (Fri)||ERB-803||Project pres||Presentation schedule||Project report|