SOCY 5601 ADVANCED DATA ANALYSIS
TR 3:00-4:15 Ketchum 117 Fall 2002
Instructor: Professor Fred Pampel Phone: 492-5620
Office: 102A IBS #3 (1424 Broadway) Email: fred.pampel@colorado.edu
Office Hours: 1:00-2:00 MW, 2:00-3:00 TR, by appointment
Readings
Statistics with STATA, Version 7.0 by Lawrence Hamilton
Applied Regression: An Introduction by Michael Lewis-Beck
Logistic Regression: A Primer by Fred C. Pampel
Event History Analysis by Paul D. Allison
Selected articles and chapters (listed at end of syllabus)
Objectives
This class begins with brief coverage of the basics of regression, factor analysis, scale construction, and residual analysis. It then concentrates on statistical techniques used to analyze categorical or limited dependent variables. Because the regression model assumes a continuous dependent variable and a normal distribution of the error, many social science applications that involve a qualitative outcome (e.g., getting divorced, committing a crime, voting for a candidate) violate the regression assumptions. We will study several statistical techniques – binary, multinomial, and ordered logistic regression, probit analysis, event history – designed for these kinds of outcome variables. The quantitative research literatures in Sociology, Political Science, and other disciplines employ each of these techniques extensively.
More informally, I intend this class as a practicum in advanced multivariate analysis that emphasizes using statistics with real data. The most important and perhaps the most difficult skill to teach involves application of the statistical techniques to real research problems. The course thus emphasizes the match between theoretical reasoning, substantive research problems, and statistical results. In other words, the course should help prepare you to use advanced quantitative research techniques in your own research as well as to understand their use in the research of others.
The assigned readings include books and articles on statistical techniques, the computer programs used to implement the techniques, and the application of the techniques to address substantive problems. Corresponding to the readings, class will involve 1) reviewing the statistical material in the readings, 2) making sample computer runs on SPSS and STATA, and 3) discussing substantive articles that use the statistics.
First, we will devote class time to reviewing key points and clarifying difficult issues in the statistical readings. Because I won’t be able to cover every point made in the readings, you will need to prepare for class by studying the assigned material. Still, many topics prove difficult enough that going over them in class will solidify your understanding.
Second, we will devote time to discussing research studies from social science journals that use the techniques we are studying. Understanding the ability of the techniques to address substantive issues comprises an important part of understanding advanced statistics. As the semester goes on, I’ll ask students to find and discuss articles of their own that use the techniques.
Third, we will exploit our location in the stat lab by devoting class time to making computer runs with SPSS and STATA. Most students know SPSS better than STATA, but since STATA is far better than SPSS for use with advanced statistics, I will spend time teaching the use of STATA. In any case, obtaining, viewing, and interpreting the statistical output will be a crucial part of the class. Make sure you talk to Jeff Hayes about getting a Windows NT account.
Assignments and Grading
Rather than concentrate on mathematical formula and derivations, the class concentrates on using the computer to generate appropriate statistics and on understanding and interpreting the output. Toward that end, the class requires completion of five homework assignments and three short papers. I will pass out more detailed instructions for the homework and paper assignments during the semester, but can provide an overview of them now.
The five homework assignments involve the interpretation of computer output or published tables. Each assignment contributes 5 percent to the total grade (25 percent together). Although worth only a small part of the grade, these assignments are crucial because they provide the groundwork for the interpretation of your own data in other assignments. More importantly, the course requires that you write three short papers based on the quantitative analysis of your own data set. Each paper is worth 25 percent to the total grade (75 percent together). The first paper should emphasize theory, data, and measurement, use basic regression model, and comprise about 5-7 pages. Each subsequent paper should revise this previous material and add new material based on the use of statistical techniques for a categorical dependent variable.
For the three papers, you will need to select (by the second week) a data set and a dependent variable to analyze. The 2000 General Social Surveys (GSS) and the 2000 National Election Surveys (NES) are already available on the sobek S drive. It is also possible to create a file for other years of the GSS and NES if you would like. In addition, I have access to the following data sets that might be of interest: 1) Eurobarometer Survey of attitudes and behaviors among citizens of nations in the European Community, 2) National Health Interview Survey in 1990 with measures of smoking, drinking, exercise, and subsequent mortality, 3) National Youth Survey on delinquency and crime, 4) Monitoring the Future survey of drug use among high school seniors, 5) Matlab survey on fertility and health in Bangladesh, and 6) aggregate data for high income nations on suicide, homicide, fertility, and life expectancy.
The papers should be clearly written, as if for a professional audience, with a tight connection between theory and results. One needs considerable practice to write clear, organized, and theoretically meaningful prose when describing statistical results. We will discuss writing issues often throughout the semester, but the real learning will come from your efforts to rewrite, revise, edit, and (perhaps most importantly) organize your papers until they read smoothly, proceed logically, and highlight the substantive meaning of the statistical results.
Schedule
Week Date Topic (see Readings) Assignment
1 Aug 27 Orientation
Aug 29 Theory and Statistics
2 Sep 3 Factor Analysis and Scales
Sep 5 Factor Analysis and Scales Select Data
3 Sep 10 Regression Review
Sep 12 Regression Review
4 Sep 17 Residual Analysis
Sep 19 Residual Analysis HW #1 Due
5 Sep 24 Non-Linear Models
Sep 26 Non-Additive Models
6 Oct 1 Logistic Regression Background Paper #1 Due
Oct 3 Logistic Regression Background
7 Oct 8 Logistic Regression Coefficients Reschedule Class Time
Oct 10 No Class: Fall Break
8 Oct 15 Logistic Regression Coefficients Jane Menken Teaches
Oct 17 Logistic Regression Coefficients Jane Menken Teaches
9 Oct 22 Logistic Regression Fit HW #2 Due
Oct 24 Logistic Regression Fit
10 Oct 29 Probit Analysis
Oct 31 Probit Analysis HW #3 Due
11 Nov 5 Multinomial Logit
Nov 7 Multinomial Logit
12 Nov 12 Ordered Logit Paper #2 Due
Nov 14 Ordered Logit
13 Nov 19 Event History Background
Nov 21 Event History Background HW #4 Due
14 Nov 26 Event History Interpretation
Nov 28 Thanksgiving Holiday
15 Dec 3 Sample Selection Models
Dec 5 Sample Selection Models HW #5 Due
16 Dec 10 Open
Dec 12 Open
17 Dec 16 Finals Week Paper #3 Due
Readings
(JSTOR) indicates the article is available for download from www.jstor.org. Except for those from the assigned texts, the other readings are available in the Sociology office for copying.
Week 1. Theory and Statistics
McCloskey, Donald N. 1990. “Formalism in the Social Sciences, Rhetorically Speaking.” The American Sociologist 21:3-11
Freedman, David A. “Statistical Models and Shoe Leather.” Sociological Methodology 21:291-313
Maclure, Malcolm. 1985. “Popperian Refutation in Epidemiology.” American Journal of Epidemiology 121:343-350
Hamilton, pp. 1-6, 18-29, 32-35
Week 2. Factor Analysis and Scales
SPSS Professional Statistics, pp. 47-75
Hamilton, pp. 266-272
SPSS Professional Statistics, pp. 143-148, and STATA Reference Manual, “alpha”
Sampson, R.J. and J.H. Laub. 1990. “Crime and Deviance over the Life Course: The Salience of Adult Social Bonds.” American Sociological Review 55:609-627 (JSTOR)
Week 3. Regression Review
Lewis-Beck, pp. 9-75
SPSS Base System User's Guide, pp. 311-323, 338-346
Hamilton, pp. 124-146
Bryson, B. 1996. “Anything but Heavy Metal: Symbolic Exclusion and Musical Dislikes.” American Sociological Review 61:884-899 (JSTOR)
Week 4. Residual Analysis
Bollen, Kenneth A. and Robert W. Jackman. 1985. "Regression Diagnostics: An Expository Treatment of Outliers and Influential Cases." Sociological Methods and Research 13:510-542
SPSS Base System User's Guide, pp. 324-337, 351-357
Hamilton, pp. 152-170
Muller, E.N. 1995. “Economic Determinants of Democracy.” American Sociological Review 60:966-982 (JSTOR)
Week 5. Non-Linear and Non-Additive Models
Paternoster et al. 1998. “Using the Correct Statistical Test for the Equality of Regression Coefficients.” Criminology 36:859-866
Jaccard, James, Robert Turrisi, and Choi K. Wan. 1990. Interaction Effects in Multiple Regression (Sage Publications), pp. 7-33
Hamilton, pp. 149-151
Mouw, T. and Y. Xie. 1999. “Bilingualism and the Academic Achievement of First- and Second-Generation Asian Americans.” American Sociological Review 64:232-252 (JSTOR)
Week 6. Logistic Regression Background
Pampel, pp. 1-18
Pampel, pp. 74-82
Hamilton, pp. 177-183
Wald, K.D., J.W. Button, and B.A. Rienzo. 1996. “The Politics of Gay Rights in American Communities: Explaining Antidiscrimination Ordinances and Policies.” American Journal of Political Science 40:1152-1178 (JSTOR)
Week 7. Logistic Regression: Coefficient Interpretation
Pampel, pp. 18-39
South, S.J. and K.D. Crowder. 1999. “Neighborhood Effects on Family Formation.” American Sociological Review 64:113-132 (JSTOR)
Week 8. Logistic Regression: More Coefficient Interpretation
DeMaris, A. 1995. “A Tutorial in Logistic Regression.” Journal of Marriage and the Family 57:956-968
Hamilton, pp. 213-223
SPSS Advanced Statistics, pp. 1-14
Lindstrom, D.P. and B. Berhanu. 1999. “The Impact of War, Famine, and Economic Decline on Marital Fertility in Ethiopia.” Demography 36:247-261 (JSTOR)
Week 9. Logistic Regression: Goodness of Fit
Pampel, pp. 39-54
Menard, Scott. 1995. Logistic Regression (Sage Publications), pp. 58-79
Hamilton, pp. 223-227 and SPSS, pp. 19-23
Cohen, L.E., J.P. Broschak, and H.A. Haveman. 1998. “And Then There Were More? The Effect of Organizational Sex Composition on the Hiring and Promotion of Managers.” American Sociological Review 63:711-727 (JSTOR)
Week 10. Probit Analysis
Pampel, pp. 54-68
Stolzenberg, R.M., M Blair-Loy, and L.J. Waite. 1995. “Religious Participation in Early Adulthood: Age and Family Life Cycle Effects on Church Membership.” American Sociological Review 60:84-103 (JSTOR)
Week 11. Multinomial Logit
Menard, Scott. 1995. Logistic Regression (Sage Publications), pp. 80-91
Long, J. Scott and Jeremy Freese. 2001. Regression Models for Categorical Dependent Variables with Stata (Stata Press), pp. 175-198
Hamilton, pp. 230-236
Girard, C. 1993. “Age, Gender, and Suicide: A Cross-National Analysis.” American Sociological Review 58:553-574 (JSTOR)
Week 12. Ordered Logit
DeMaris, Alfred. 1992. Logit Modeling: Practical Applications (Sage Publications), pp. 71-78
Long, J. Scott and Jeremy Freese. 2001. Regression Models for Categorical Dependent Variables with Stata (Stata Press), pp. 141-168
Hamilton, pp. 228-229
Soss, J., S.F. Schram, and T.P. Vartanian. 2001. “Setting the Terms of Relief: Explaining State Policy Choices in the Devolution Revolution.” American Journal of Political Science 45:378-395
Week 13. Event History Background
Allison, pp. 9-22
Allison, pp. 22-42
Hamilton 237-252
Edelman, L.B. 1990. “Legal Environments and Organizational Governance: The Expansion of Due Process in the American Workplace.” American Journal of Sociology 95:1401-1440 (JSTOR)
Week 14. Event History Interpretation
Teachman, Jay D. and Mark D. Hayward. 1993. “Interpreting Hazard Rate Models.” Sociological Research and Methods 21:340-371
Burton, R.P.D., R.J. Johnson, C. Ritter, R.R. Clayton. 1996. “The Effects of Role Socialization on the Initiation of Cocaine Use: An Event History from Adolescence into Middle Adulthood.” Journal of Health and Social Behavior 37:75-90 (JSTOR)
Week 15. Sample Selection
Berk, Richard A. 1983. “An Introduction to Sample Selection Bias in Sociological Data.” American Sociological Review 48:386-398 (JSTOR)
Stata Reference, “heckman”
King, Gary, Robert O. Keohane, and Sidney Verba. 1994. Designing Social Inquiry: Scientific Evidence in Qualitative Research (Princeton University Press), pp. 128-149.
Timpone, R.J. 1998. “Structure, Behavior, and Voter Turnout in the United States.” American Political Science Review 92:145-158 (JSTOR)
Week 16. Open
To be Announced