Prerequisites
Good with numbers
The way to happiness is simple, you just “Be good with numbers, and be good with people.” – The Pursuit of Happyness (2006)
College level statistics. For example, be confident to test the difference between means, run hypothesis testing with probability theories, understand OLS and fixed effect models. Understanding how to flexibly use probability theories to do hypothesis testing is especially important.
Working knowledge of linear algebra. For example, you should know what is a vector and how to calculate the distance between two vectors.
Intermediate to advanced programming skills, when with the help of Generative AI (GAI) tools
Vibe Coding Updates
It’s 2026, ChatGPT has been around for a few years already, who still write a single line of code? Everyone is doing a thing called vibe coding, here is an great online tutorial Vibe Coding 101. Starting from this semester, we will experiment with vibe coding. So please review this online tutorial in the first week.
Having said all of this, I want to step back and acknowledge that vibe coding is too good to be true, at least for now. In higher education, the attitude toward GAI is a spectrum, with some objecting and others offering wholehearted support. I’m somewhere in between, leaning towards support. I believe there will be a fundamental change in pedagogy regarding what should be taught in class, but we are still exploring, and I want to be one of the explorers with you together. So be cautious; what we are trying may prove to be less effective or wrong next year. It’s an uncharted journey.
Below are prerequisites from previous years, which I still recommend. As a person who learned programming the hard way, I’m very confident about and appreciate the lessons I learned through that experience, even in the age of GAI.
Programming may be intimidating, but remember you are coding for social good and “Sometimes it’s the people no one imagines anything of who do the things that no one can imagine.” – The Imitation Game (2014)
Start coding today.
The class is Python based, but you can use R or any other programming language as long as you can complete the assignments and final project. R has its own advantages for sure, but I personally recommend Python because most of the state-of-the-art NLP implementations are in Python. Example Python packages used in this course: Pandas, Requests, regular expression, NetworkX, NLTK, TensorFlow, Keras, Transformers, and Gensim, etc.
Programming is an essential part of this course but not the purpose and will not be taught in this class. You are expected to have an intermediate to advanced level of programming skills before entering the class. At the minimum, you need to pass the following courses before registering this course (or you are confident that these modules are too easy):
Required fundamentals (no particular order)
- Programming:
- Introduction to Python (4 hours)
- Intermediate Python (4 hours)
- Introduction to Shell for Data Science (4 hours)
- Writing Functions in Python
- Writing Efficient Python Code
- Data preprocessing and exploratory analysis: