Prerequisites
Good with numbers
The way to happiness is simple, you just “Be good with numbers, and be good with people.” – The Pursuit of Happyness (2006)
College level statistics. For example, be confident to test the difference between means, run hypothesis testing with probability theories, understand OLS and fixed effect models. Understanding how to flexibly use probability theories to do hypothesis testing is especially important.
Working knowledge of linear algebra. For example, you should know what is a vector and how to calculate the distance between two vectors.
Intermediate to advanced programming skills
Programming may be intimidating, but remember you are coding for social good and “Sometimes it’s the people no one imagines anything of who do the things that no one can imagine.” – The Imitation Game (2014)
Start coding today.
The class is Python based, but you can use R or any other programming language as long as you can complete the assignments and final project. R has its own advantages for sure, but I personally recommend Python because most of the state-of-the-art NLP implementations are in Python. Example Python packages used in this course: Pandas, Requests, regular expression, NetworkX, NLTK, TensorFlow, Keras, Transformers, and Gensim, etc.
Programming is an essential part of this course but not the purpose and will not be taught in this class. You are expected to have an intermediate to advanced level of programming skills before entering the class. At the minimum, you need to pass the following courses before registering this course (or you are confident that these modules are too easy):
Required fundamentals (no particular order)
Register with your UT email (either @utexas.edu
or @austin.utexas.edu
) to have a free license for DataCamp. After completing the below modules, you should be familiar with all the topics listed in this tutorial.
- Programming:
- Introduction to Python (4 hours)
- Intermediate Python (4 hours)
- Introduction to Shell for Data Science (4 hours)
- Writing Functions in Python
- Writing Efficient Python Code
- Data preprocessing and exploratory analysis: