Options for voted sessions
Text as data
- Text analysis in social science research: Overview
- Key points: typical process and applications, research design, text corpus resources
- Readings (TBD):
- GRS: Introduction, Social science research and text analysis
- Preprocessing
- Key points: regular expression, tokenization, part-of-speech tagging, meaningful and meaningless words and stopwords
- Readings (TBD):
- GRS: Selection and representation
- JM: Regular Expressions, Text Normalization, Edit Distance
- Text representation and vectorization methods
- Key points: bag-of-words, count vector, word vector, distributed representation of words, word embedding, contextual word embedding
- Readings (TBD):
- JM: Vector semantics and embeddings
- Text analysis: Scaling
- Key points: semantic similarity, sentiment analysis
- Readings (TBD):
- Grimmer, Justin, and Brandon M. Stewart. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21 (3): 267–97. https://doi.org/10.1093/pan/mps028.
- Text analysis: Identification
- Key points: Classification, multilingual topic modeling, named-entity recognition
- Readings (TBD):
- Grimmer, Justin, and Brandon M. Stewart. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21 (3): 267–97. https://doi.org/10.1093/pan/mps028.
Relation as data
- Network analysis in social science research: Overview
- Key points: Basic concepts and applications, research design, network components and levels of analysis
- Readings (TBD):
- Scott, John. 2017. “What Is Social Network Analysis?” In Social Network Analysis, Fourth edition. Thousand Oaks, CA: SAGE Publications Ltd.
- Watts, Duncan J. 2004. “The ‘New’ Science of Networks.” Annual Review of Sociology 30 (1): 243–70. https://doi.org/10.1146/annurev.soc.30.020404.104342.
- Scott, John. 2017. “Terminology for Network Analysis.” In Social Network Analysis, Fourth edition, 73–94. Thousand Oaks, CA: SAGE Publications Ltd.
- Data collection: How to generate networks
- Readings (TBD):
- Scott, John. 2017. “Organising and Analysing Network Data.” In Social Network Analysis, Fourth edition. Thousand Oaks, CA: SAGE Publications Ltd.
- Scott, John. 2017. “Data Collection for Social Network Analysis.” In Social Network Analysis, Fourth edition. Thousand Oaks, CA: SAGE Publications Ltd.
- Readings (TBD):
- Analysis of nodes
- Key concepts: degree, betweenness, eigenvector centrality, etc.
- Readings (TBD):
- Scott, John. 2017. “Popularity Mediation and Exclusion.” In Social Network Analysis, Fourth edition. Thousand Oaks, CA: SAGE Publications Ltd.
- Analysis of communities
- Key concepts: community detection (louvain clustering, “rich club”)
- Readings (TBD):
- Scott, John. 2017. “Groups, Factions and Social Divisions.” In Social Network Analysis, Fourth edition. Thousand Oaks, CA: SAGE Publications Ltd.
- Network topology and hypothesis testing
- Key concepts: modularity, clustering coefficients, random graph.
- Readings (TBD).
Recommended packages
Here I recommend some Python packages based on my own research experience (I may cover some of them in class). Neither the list nor my description is comprehensive. As a social science researcher, I usually define my goals of analysis first, then look for appropriate packages or functions. The technical documentations often enlighten (or empower) me to respond to more novel questions.
- NLTK: Preprocessing.
- Stanza: Preprocessing, POS, NER, sentiment analysis.
- Gensim: Preprocessing, vectorization, topic modeling (fixed word-embedding).
- BERTopic: Topic modeling (fixed and contextualized word-embedding, multilingual support, visualization).
- Top2Vec: Topic modeling (fixed and contextualized word-embedding, multilingual support). I recently used it for a multilingual topic modeling task.
- SentenceTransformers: Vectorize sentences or documents. Used by many proceeding packages. I sometime use it to obtain the raw vector values if analysis requires (e.g., calculating text similarity in this and this article, visualizing semantic spaces, etc.)
- Transformers: Train or fine-tune pretrained BERT models. Used by many proceeding packages. I used it to fine-tune a BERT model for classifying nonprofits according to their mission statements.
- NetworkX: Network analysis.
- igraph: Network analysis, more efficient than NetworkX, but I primarily used it for visualization or functions that NetworkX does not have.
- Gephi: Network visualization. Calculating large networks is very very slow, strongly discoursed. Usually I use NetworkX for crunching numbers then Gephi for visualization.