Course Schedule
Computational methods and research design fundamentals (4 weeks)
- Week 1: Course introduction
- Key points: course structure, assignments, syllabus
- Week 2: Computational methods for social sciences: Overview
- Key points: philosophical and epistemological fundamentals, research design overview, comparison between CSS and conventional approaches
- Week 3: Analyzing computational methods from a research design perspective
- Key points: data management, concept representation, data analysis, and scientific communication
- Week 4: Field visit: Texas Advanced Computing Center
- Visit TACC; Discussion on final project options
Analyzing computational social science methods (8 + 1 weeks)
Instructor-lead sessions are voted by the class before 1/27 from these options
- Week 5: Computational methods: NLP algorithms and models as concept representation tools (instructor-lead)
- Key points: methodological background and overview, vector semantics and embeddings, Word2Vec, Doc2Vec, semantic similarity
- Week 6: Research design: Data management (student-lead)
- Week 7: Computational methods: classification and topic modeling (instructor-lead)
- Key points: NLP+ML classification, topic modeling fine-tuning, multilingual topic modeling
- Week 8: Research design: Concept representation (student-lead)
- Week 9: Instructor-lead seminar on computational methods: Voted module 3
- Week 10: Research design: Data analysis (student-lead)
- Week 11: Instructor-lead seminar on computational methods: Voted module 4
- Week 12: Group consultation on final project (no class)
- Week 13: Research design: Scientific communication (student-lead)
Final project
- Week 14: Final project presentations
Weekly Details
Week 1: Course introduction Back2Top
Before class
- Readings:
- Hofman, Jake M., Duncan J. Watts, Susan Athey, Filiz Garip, Thomas L. Griffiths, Jon Kleinberg, Helen Margetts, et al. 2021. “Integrating Explanation and Prediction in Computational Social Science.” Nature 595 (7866): 181–88. https://doi.org/10.1038/s41586-021-03659-0.
- Edelmann, Achim, Tom Wolff, Danielle Montagne, and Christopher A. Bail. 2020. “Computational Social Science and Sociology.” Annual Review of Sociology 46 (1): 61–81. https://doi.org/10.1146/annurev-soc-121919-054621.
- Lazer, David M. J., Alex Pentland, Duncan J. Watts, Sinan Aral, Susan Athey, Noshir Contractor, Deen Freelon, et al. 2020. “Computational Social Science: Obstacles and Opportunities.” Science 369 (6507): 1060–62. https://doi.org/10.1126/science.aaz8170.
In class
- Course overview:
- Motivation and history of this course.
- Course sites: Syllabus website, Canvas, and how to use them.
- Helpful resources: open source communities, ChatGPT (and how to responsibly use it for educational purposes), etc.
- Review final project options.
- Discussion on readings: Analytical capacity of CSS methods
- Review CSS Empirical Studies Database.
After class
- Register Accounts:
- Review “Getting started with Chameleon Cloud”
Week 2: Computational methods for social sciences: Overview Back2Top
Before class
- Ragin, Charles C., and Lisa M. Amoroso. 2011. “The Goals of Social Research.” In Constructing Social Research: The Unity and Diversity of Method, 135–62. Pine Forge Press.
- Leonelli, Sabina. 2020. “Scientific Research and Big Data.” In The Stanford Encyclopedia of Philosophy, edited by Edward N. Zalta, Summer 2020. Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/sum2020/entries/science-big-data/.
In class
- Review upcoming assignments.
- Discussion and lecture on readings.
- Hands-on: High-performance cloud computing with Chameleon
- Start an instance on Chameleon Cloud
- Install Anaconda Python and Jupyter Notebook.
- Snapshot the instance as an image.
- Discussion on final project options.
After class
Week 3: Analyzing computational methods from a research design perspective Back2Top
Before class
- Ragin, Charles C., and Lisa M. Amoroso. 2011. “What Is (and Is Not) Social Research?” In Constructing Social Research: The Unity and Diversity of Method, 5–32. Pine Forge Press.
- Ma, Ji, Islam Akef Ebeid, Arjen de Wit, Meiying Xu, Yongzheng Yang, René Bekkers, and Pamala Wiepking. 2021. “Computational Social Science for Nonprofit Studies: Developing a Toolbox and Knowledge Base for the Field.” VOLUNTAS: International Journal of Voluntary and Nonprofit Organizations, October. https://doi.org/10.1007/s11266-021-00414-x.
In class
- Discussion and lecture on readings.
- Discussion on final project options.
Week 4: Field visit: Texas Advanced Computing Center (TBD) Back2Top
In class
- Visit TACC.
- Discussion on final project options.
After class
Week 5: Computational methods: NLP algorithms and models as concept representation tools Back2Top
Before class
- Required readings (copies of GRS chapters are on course’s Canvas site because of copyright)
- Grimmer, Justin, Margaret E. Roberts, and Brandon M. Stewart. 2022. “Social Science Research and Text Analysis.” In Text as Data: A New Framework for Machine Learning and the Social Sciences. Princeton, New Jersey Oxford: Princeton University Press.
- Grimmer, Justin, Margaret E. Roberts, and Brandon M. Stewart. 2022. “Principles of Measurement.” In Text as Data: A New Framework for Machine Learning and the Social Sciences. Princeton, New Jersey Oxford: Princeton University Press.
- Jurafsky, Daniel, and James H. Martin. 2022. “Vector Semantics and Embeddings.” In Speech and Language Processing, 3rd draft. https://web.stanford.edu/~jurafsky/slp3/.
- Prepare your computational environment, make sure that your Jupyter Lab server has these packages installed:
- NLTK: Preprocessing.
- Stanza: Preprocessing, POS, NER, sentiment analysis.
- Gensim: Preprocessing, vectorization, topic modeling (fixed word-embedding).
- BERTopic: Topic modeling (fixed and contextualized word-embedding, multilingual support, visualization).
- Top2Vec: Topic modeling (fixed and contextualized word-embedding, multilingual support). I recently used it for a multilingual topic modeling task.
- SentenceTransformers: Vectorize sentences or documents. Used by many proceeding packages. I sometime use it to obtain the raw vector values if analysis requires (e.g., calculating text similarity in this and this article, visualizing semantic spaces, etc.)
- Transformers: Train or fine-tune pretrained BERT models. Used by many proceeding packages. I used it to fine-tune a BERT model for classifying nonprofits according to their mission statements.
Week 6: Research design: Data management (student-lead) Back2Top
Before class
- Recommended readings:
- Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature News, 533(7604), 452. https://doi.org/10.1038/533452a
- Wilson, Greg, D. A. Aruliah, C. Titus Brown, Neil P. Chue Hong, Matt Davis, Richard T. Guy, Steven H. D. Haddock, et al. 2014. “Best Practices for Scientific Computing.” PLOS Biology 12 (1): e1001745. https://doi.org/10.1371/journal.pbio.1001745.
- Gentzkow, Matthew, and Jesse M. Shapiro. 2014. Code and Data for the Social Sciences: A Practitioner’s Guide. https://web.stanford.edu/~gentzkow/research/CodeAndData.pdf.
- Wickham, Hadley. 2014. “Tidy Data.” The Journal of Statistical Software 59 (10). http://www.jstatsoft.org/v59/i10/.
- Boyd, Nora Mills. 2018. “Evidence Enriched.” Philosophy of Science 85 (3): 403–21. https://doi.org/10.1086/697747.
- Leonelli, Sabina. 2020. “Scientific Research and Big Data.” In The Stanford Encyclopedia of Philosophy, edited by Edward N. Zalta, Summer 2020. Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/sum2020/entries/science-big-data/.
- Empirical readings (TBD by student group)
In class
- Discussion and lecture on readings.
- Discussion on final project options.
After class
Provide feedback to group report.
Week 8: Research design: Concept representation (student-lead) (TBD) Back2Top
Before class
- Recommended readings:
- Gerring, John. 2012. “Mere Description.” British Journal of Political Science 42 (4): 721–46. https://doi.org/10.1017/S0007123412000130.
- Grimmer, J., &Stewart, B. M. (2013). Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3), 267–297. doi:10.1093/pan/mps028.
- Empirical readings (TBD by student group)
In class
- Discussion and lecture on readings.
- Discussion on final project options.
After class
Provide feedback to group report.
Week 10: Research design: Data analysis (student-lead) (TBD) Back2Top
Before class
- Recommended readings:
- Empirical readings (TBD by student group)
In class
- Discussion and lecture on readings.
- Discussion on final project options.
After class
Provide feedback to group report.
Week 13: Research design: Scientific communication (student-lead) (TBD) Back2Top
Before class
- Recommended readings:
- Kirk, Andy. 2019. Data Visualisation: A Handbook for Data Driven Design. 2nd edition. S.l.: SAGE Publications Ltd.
- Empirical readings (TBD by student group)
In class
- Discussion and lecture on readings.
- Discussion on final project options.
After class
Provide feedback to group report.