SOCY 5601 ADVANCED DATA ANALYSIS

                                               TR 3:00-4:15   Ketchum 117   Fall 2002

 

 

Instructor: Professor Fred Pampel                                             Phone: 492-5620

Office: 102A IBS #3 (1424 Broadway)                                    Email: fred.pampel@colorado.edu

Office Hours: 1:00-2:00 MW, 2:00-3:00 TR, by appointment

 

 

Readings

 

Statistics with STATA, Version 7.0  by Lawrence Hamilton

Applied Regression: An Introduction  by Michael Lewis-Beck

Logistic Regression: A Primer  by Fred C. Pampel

Event History Analysis  by Paul D. Allison

Selected articles and chapters (listed at end of syllabus)

 

 

Objectives

 

This class begins with brief coverage of the basics of regression, factor analysis, scale construction, and residual analysis.  It then concentrates on statistical techniques used to analyze categorical or limited dependent variables.  Because the regression model assumes a continuous dependent variable and a normal distribution of the error, many social science applications that involve a qualitative outcome (e.g., getting divorced, committing a crime, voting for a candidate) violate the regression assumptions.  We will study several statistical techniques – binary, multinomial, and ordered logistic regression, probit analysis, event history – designed for these kinds of outcome variables.  The quantitative research literatures in Sociology, Political Science, and other disciplines employ each of these techniques extensively.

 

More informally, I intend this class as a practicum in advanced multivariate analysis that emphasizes using statistics with real data.  The most important and perhaps the most difficult skill to teach involves application of the statistical techniques to real research problems.  The course thus emphasizes the match between theoretical reasoning, substantive research problems, and statistical results.  In other words, the course should help prepare you to use advanced quantitative research techniques in your own research as well as to understand their use in the research of others.

 

The assigned readings include books and articles on statistical techniques, the computer programs used to implement the techniques, and the application of the techniques to address substantive problems.  Corresponding to the readings, class will involve 1) reviewing the statistical material in the readings, 2) making sample computer runs on SPSS and STATA, and 3) discussing substantive articles that use the statistics. 

 


First, we will devote class time to reviewing key points and clarifying difficult issues in the statistical readings.  Because I won’t be able to cover every point made in the readings, you will need to prepare for class by studying the assigned material.  Still, many topics prove difficult enough that going over them in class will solidify your understanding. 

 

Second, we will devote time to discussing research studies from social science journals that use the techniques we are studying.  Understanding the ability of the techniques to address substantive issues comprises an important part of understanding advanced statistics.  As the semester goes on, I’ll ask students to find and discuss articles of their own that use the techniques. 

 

Third, we will exploit our location in the stat lab by devoting class time to making computer runs with SPSS and STATA.  Most students know SPSS better than STATA, but since STATA is far better than SPSS for use with advanced statistics, I will spend time teaching the use of STATA.  In any case, obtaining, viewing, and interpreting the statistical output will be a crucial part of the class.  Make sure you talk to Jeff Hayes about getting a Windows NT account.

 

 

Assignments and Grading

 

Rather than concentrate on mathematical formula and derivations, the class concentrates on using the computer to generate appropriate statistics and on understanding and interpreting the output.  Toward that end, the class requires completion of five homework assignments and three short papers.  I will pass out more detailed instructions for the homework and paper assignments during the semester, but can provide an overview of them now.

 

The five homework assignments involve the interpretation of computer output or published tables.  Each assignment contributes 5 percent to the total grade (25 percent together).  Although worth only a small part of the grade, these assignments are crucial because they provide the groundwork for the interpretation of your own data in other assignments.  More importantly, the course requires that you write three short papers based on the quantitative analysis of your own data set.  Each paper is worth 25 percent to the total grade (75 percent together).  The first paper should emphasize theory, data, and measurement, use basic regression model, and comprise about 5-7 pages.  Each subsequent paper should revise this previous material and add new material based on the use of statistical techniques for a categorical dependent variable.

 

For the three papers, you will need to select (by the second week) a data set and a dependent variable to analyze.  The 2000 General Social Surveys (GSS) and the 2000 National Election Surveys (NES) are already available on the sobek S drive.  It is also possible to create a file for other years of the GSS and NES if you would like.  In addition, I have access to the following data sets that might be of interest: 1) Eurobarometer Survey of attitudes and behaviors among citizens of nations in the European Community, 2) National Health Interview Survey in 1990 with measures of smoking, drinking, exercise, and subsequent mortality, 3) National Youth Survey on delinquency and crime, 4) Monitoring the Future survey of drug use among high school seniors, 5) Matlab survey on fertility and health in Bangladesh, and 6) aggregate data for high income nations on suicide, homicide, fertility, and life expectancy. 


The papers should be clearly written, as if for a professional audience, with a tight connection between theory and results.  One needs considerable practice to write clear, organized, and theoretically meaningful prose when describing statistical results.  We will discuss writing issues often throughout the semester, but the real learning will come from your efforts to rewrite, revise, edit, and (perhaps most importantly) organize your papers until they read smoothly, proceed logically, and highlight the substantive meaning of the statistical results.

 

 

Schedule

 

Week   Date                 Topic (see Readings)                                        Assignment      

 

  1        Aug 27 Orientation                              

            Aug 29 Theory and Statistics                

  2        Sep   3 Factor Analysis and Scales                  

            Sep   5 Factor Analysis and Scales                               Select Data

  3        Sep 10 Regression Review                  

            Sep 12 Regression Review                              

  4        Sep 17 Residual Analysis                                                                     

            Sep 19 Residual Analysis                                              HW #1 Due    

  5        Sep 24 Non-Linear Models                             

            Sep 26 Non-Additive Models                                      

  6        Oct   1             Logistic Regression Background                       Paper #1 Due  

            Oct   3             Logistic Regression Background                      

  7        Oct   8             Logistic Regression Coefficients                        Reschedule Class Time

            Oct 10             No Class: Fall Break                           

  8        Oct 15             Logistic Regression Coefficients                        Jane Menken Teaches 

            Oct 17             Logistic Regression Coefficients                        Jane Menken Teaches

  9        Oct 22             Logistic Regression Fit                          HW #2 Due

            Oct 24             Logistic Regression Fit             

 10       Oct 29             Probit Analysis                                    

            Oct 31             Probit Analysis                                     HW #3 Due    

 11       Nov  5             Multinomial Logit                                 

            Nov 7              Multinomial Logit                                             

 12       Nov 12            Ordered Logit                                                  Paper #2 Due              

            Nov 14            Ordered Logit                         

 13       Nov 19            Event History Background       

            Nov 21            Event History    Background                              HW #4 Due

 14       Nov 26            Event History Interpretation                             

            Nov 28            Thanksgiving Holiday   

 15       Dec   3 Sample Selection Models                                                                     

            Dec   5             Sample Selection Models                                  HW #5 Due    

 16       Dec 10 Open                                                   

            Dec 12 Open                                                                           

 17       Dec 16 Finals Week                                                     Paper #3 Due


 

Readings

 

(JSTOR) indicates the article is available for download from www.jstor.org.  Except for those from the assigned texts, the other readings are available in the Sociology office for copying.

 

Week 1. Theory and Statistics

 

McCloskey, Donald N. 1990. “Formalism in the Social Sciences, Rhetorically Speaking.”  The American Sociologist 21:3-11

 

Freedman, David A. “Statistical Models and Shoe Leather.” Sociological Methodology 21:291-313

 

Maclure, Malcolm. 1985. “Popperian Refutation in Epidemiology.” American Journal of Epidemiology 121:343-350

 

Hamilton, pp. 1-6, 18-29, 32-35

 

Week 2. Factor Analysis and Scales

 

SPSS Professional Statistics, pp. 47-75

 

Hamilton, pp. 266-272

 

SPSS Professional Statistics, pp. 143-148, and STATA Reference Manual, “alpha”

 

Sampson, R.J. and J.H. Laub. 1990. “Crime and Deviance over the Life Course: The Salience of Adult Social Bonds.” American Sociological Review 55:609-627 (JSTOR)

 

Week 3. Regression Review

 

Lewis-Beck, pp. 9-75

 

SPSS Base System User's Guide, pp. 311-323, 338-346

 

Hamilton, pp. 124-146

 

Bryson, B. 1996. “Anything but Heavy Metal: Symbolic Exclusion and Musical Dislikes.” American Sociological Review 61:884-899 (JSTOR)

 

Week 4.  Residual Analysis

 

Bollen, Kenneth A. and Robert W. Jackman. 1985. "Regression Diagnostics:  An Expository Treatment of Outliers and Influential Cases."  Sociological Methods and Research 13:510-542


 

SPSS Base System User's Guide, pp. 324-337, 351-357

 

Hamilton, pp. 152-170

 

Muller, E.N. 1995. “Economic Determinants of Democracy.” American Sociological Review 60:966-982 (JSTOR)

 

Week 5. Non-Linear and Non-Additive Models

 

Paternoster et al. 1998. “Using the Correct Statistical Test for the Equality of Regression Coefficients.” Criminology 36:859-866

 

Jaccard, James, Robert Turrisi, and Choi K. Wan. 1990. Interaction Effects in Multiple Regression (Sage Publications), pp. 7-33

 

Hamilton, pp. 149-151

 

Mouw, T. and Y. Xie. 1999. “Bilingualism and the Academic Achievement of First- and Second-Generation Asian Americans.” American Sociological Review 64:232-252 (JSTOR)

 

Week 6. Logistic Regression Background

 

Pampel, pp. 1-18

 

Pampel, pp. 74-82

 

Hamilton, pp. 177-183

 

Wald, K.D., J.W. Button, and B.A. Rienzo. 1996. “The Politics of Gay Rights in American Communities: Explaining Antidiscrimination Ordinances and Policies.” American Journal of Political Science 40:1152-1178 (JSTOR)

 

Week 7. Logistic Regression: Coefficient Interpretation

 

Pampel, pp. 18-39

 

South, S.J. and K.D. Crowder. 1999. “Neighborhood Effects on Family Formation.” American Sociological Review 64:113-132 (JSTOR)

 

Week 8. Logistic Regression: More Coefficient Interpretation

 

DeMaris, A. 1995. “A Tutorial in Logistic Regression.” Journal of Marriage and the Family 57:956-968

 


Hamilton, pp. 213-223

 

SPSS Advanced Statistics, pp. 1-14

 

Lindstrom, D.P. and B. Berhanu. 1999. “The Impact of War, Famine, and Economic Decline on Marital Fertility in Ethiopia.” Demography 36:247-261 (JSTOR)

 

Week 9.  Logistic Regression: Goodness of Fit

 

Pampel, pp. 39-54

 

Menard, Scott. 1995. Logistic Regression (Sage Publications), pp. 58-79

 

Hamilton, pp. 223-227 and SPSS, pp. 19-23

 

Cohen, L.E., J.P. Broschak, and H.A. Haveman. 1998. “And Then There Were More? The Effect of Organizational Sex Composition on the Hiring and Promotion of Managers.” American Sociological Review 63:711-727 (JSTOR)

 

Week 10. Probit Analysis

 

Pampel, pp. 54-68

 

Stolzenberg, R.M., M Blair-Loy, and L.J. Waite. 1995. “Religious Participation in Early Adulthood: Age and Family Life Cycle Effects on Church Membership.” American Sociological Review 60:84-103 (JSTOR)

 

Week 11. Multinomial Logit

 

Menard, Scott. 1995. Logistic Regression (Sage Publications), pp. 80-91

 

Long, J. Scott and Jeremy Freese. 2001. Regression Models for Categorical Dependent Variables with Stata (Stata Press), pp. 175-198

 

Hamilton, pp. 230-236

 

Girard, C. 1993. “Age, Gender, and Suicide: A Cross-National Analysis.” American Sociological Review 58:553-574 (JSTOR)

 

Week 12. Ordered Logit

 

DeMaris, Alfred. 1992. Logit Modeling: Practical Applications (Sage Publications), pp. 71-78

 

Long, J. Scott and Jeremy Freese. 2001. Regression Models for Categorical Dependent Variables with Stata (Stata Press), pp. 141-168


Hamilton, pp. 228-229

 

Soss, J., S.F. Schram, and T.P. Vartanian. 2001. “Setting the Terms of Relief: Explaining State Policy Choices in the Devolution Revolution.” American Journal of Political Science 45:378-395

 

Week 13.  Event History Background

 

Allison, pp. 9-22

 

Allison, pp. 22-42

 

Hamilton 237-252

 

Edelman, L.B. 1990. “Legal Environments and Organizational Governance: The Expansion of Due Process in the American Workplace.” American Journal of Sociology 95:1401-1440 (JSTOR)

 

Week 14. Event History Interpretation

 

Teachman, Jay D. and Mark D. Hayward. 1993. “Interpreting Hazard Rate Models.” Sociological Research and Methods 21:340-371

 

Burton, R.P.D., R.J. Johnson, C. Ritter, R.R. Clayton. 1996. “The Effects of Role Socialization on the Initiation of Cocaine Use: An Event History from Adolescence into Middle Adulthood.” Journal of Health and Social Behavior 37:75-90 (JSTOR)

 

Week 15.  Sample Selection

 

Berk, Richard A. 1983. “An Introduction to Sample Selection Bias in Sociological Data.” American Sociological Review 48:386-398 (JSTOR)

 

Stata Reference, “heckman”

 

King, Gary, Robert O. Keohane, and Sidney Verba. 1994. Designing Social Inquiry: Scientific Evidence in Qualitative Research (Princeton University Press), pp. 128-149.

 

Timpone, R.J. 1998. “Structure, Behavior, and Voter Turnout in the United States.” American Political Science Review 92:145-158 (JSTOR)

 

Week 16. Open

 

To be Announced