A practical guide to multilevel modeling

https://doi.org/10.1016/j.jsp.2009.09.002Get rights and content

Abstract

Collecting data from students within classrooms or schools, and collecting data from students on multiple occasions over time, are two common sampling methods used in educational research that often require multilevel modeling (MLM) data analysis techniques to avoid Type-1 errors. The purpose of this article is to clarify the seven major steps involved in a multilevel analysis: (1) clarifying the research question, (2) choosing the appropriate parameter estimator, (3) assessing the need for MLM, (4) building the level-1 model, (5) building the level-2 model, (6) multilevel effect size reporting, and (7) likelihood ratio model testing. The seven steps are illustrated with both a cross-sectional and a longitudinal MLM example from the National Educational Longitudinal Study (NELS) dataset. The goal of this article is to assist applied researchers in conducting and interpreting multilevel analyses and to offer recommendations to guide the reporting of MLM analysis results.

Section snippets

Analysis Example 1: Cross-Sectional MLM

This article uses a cross-sectional model and a longitudinal model example from the National Educational Longitudinal Study (NELS; NELS: 88/2000 public use data files; National Center for Education Statistics [NCES], 2002) data set for illustration purposes; no theoretical research questions are tested and no empirical inferences should be drawn from the presented results. The NELS data set consists of various student academic achievement and school environment variables collected from N = 12,144

Clarifying the Research Question

The first step in any data analysis situation involves clarifying the research question, which is particularly important in a multilevel model. As will be shown, clarifying the research question will facilitate the analysis decisions made in subsequent steps. In general, educational researchers are often interested in research questions that focus primarily on a level-1 (e.g., student-level) variable, primarily on a level-2 (e.g., school-level) variable, or an interaction between variables.

Choosing an Estimation Method

MLM software packages generally give researchers the choice between two maximum likelihood (ML) estimators: full information maximum likelihood (FIML) and restricted maximum likelihood (REML). At first glance, this is a technical issue that may tempt researchers into relying on the default settings of their statistical software package. However, the choice of estimator impacts parameter estimates and nested-model test results, so researchers will benefit from making informed decisions about the

Is Multilevel Modeling Needed?

Prior to the analysis of any nested dataset, the question of whether multilevel modeling is needed is a prudent one. Nested datasets do not automatically require multilevel modeling. If there is no variation in response variable scores across level-2 units (e.g., schools), the data can be analyzed using OLS multiple regression. So the question of whether MLM is needed becomes, “How much response variable variation is present at level-2?” Answering this question involves the calculation of the

Building the Level-1 Model

Recall that, in the NELS data analysis example, student SES was the level-1 (i.e., student-level) predictor of science achievement scores. Results for the unconditional model (Eq. (3)) showed significant level-1 variation in NELS science achievement scores. One or more student-level predictors could be added to the level-1 model to explain this variation. However, two additional questions influence the specification of the level-1 model. First, one question that applied researchers face

Building the Level-2 Model

Following the specification of the Level-1 model, the next step involves adding school-level predictors of interest. Recall that NELS science achievement results from the previous analyses showed significant variation in science achievement scores across schools (i.e., intercept variance) and that the impact of SES on science achievement also varied across schools (i.e., slope variance). These results, respectively, are reflected in the current intercept and slope models at Level-2 shown in

Multilevel Effect Size Reporting

Effect sizes in ANOVA and multiple regression analyses, such as Cohen's d, eta-squared (η2), and R2, are familiar to applied researchers, and conversion formulas allow each to be placed on a similar metric to enable appropriate comparisons (see Huberty, 2002). Effect sizes in MLM analyses are not as straightforward, and currently no consensus exists as to the effect sizes that are most appropriate. The MLM effect sizes shown below are generally accepted indices (Singer and Willett, 2003,

Likelihood Ratio Model Testing

In multiple regression, an omnibus F test is used to test whether the explained variance is statistically different from zero. An analogous omnibus test can be conducted in MLM analyses using the likelihood ratio test. Specifically, a likelihood ratio test can be used with the NELS example to compare the unconditional model containing no predictors to the cross-level interaction model that contains student SES, student-to-teacher ratio, and the interaction between the two variables to test the

Is Multilevel Modeling Needed?

The question of whether MLM is needed in longitudinal data scenarios is more straightforward because students are the level-2 analysis unit, and reading achievement scores can intuitively be expected to vary significantly across students. However, to confirm this, an unconditional means (i.e., random effect ANOVA) model can be estimated to compute ICC and design effect statistics (see Eqs. (4), (5)).Level-1:Yti=β0i+rtiLevel-2:β0i=γ00+u0i.Combined:Yti=γ00+u0i+rti.

As stated previously, the MLM

Building the Level-1 Model

Although not readily apparent, the unconditional means model describes the change in each student's reading achievement scores over time as a flat line with a slope of zero located at each student's mean reading achievement score. Adding a level-1 ‘time’ predictor to the model allows the changes in each student's reading achievement scores over time to be modeled with a straight line with a non-zero slope.Level-1:Yti=β0i+β1i(TIMEti)+rtiLevel-2:β0i=γ00+u0iLevel-2:β1i=γ10+u1iCombined:Yij=γ00+γ10(T

Building the Level-2 Model

The previous level-1 longitudinal model indicated significant intercept and slope variance in reading achievement growth across students. A binary level-2 predictor variable, gender (i.e., female = 1, male = 0), was added to the Level-2 model to explain intercept and slope variance in reading achievement. As shown below, gender was added to the Level-2 models uncentered because, by definition, a binary dummy variable has a meaningful zero point (although centering gender is also appropriate).Level-1

Multilevel Effect Size Reporting

The global pseudo-R2 effect size statistic for the longitudinal reading achievement model can be computed in the same way the pseudo-R2 statistic was computed for the cross-sectional model example (see Eq. (19)). Specifically, predicted reading achievement scores (Ŷti) were computed by solving Eq. (37) for each participant. The correlation between the observed and predicted reading achievement scores was r = .25; squaring this value suggests that ([.25]2 = .06) 6% of the variation in reading

Likelihood Ratio Model Testing

Recall that in the cross-sectional model example, two likelihood ratio model tests were conducted. The first was an overall model test that compared the cross-level interaction model to the unconditional model to assess the efficacy of the predictor variables. A similar overall model test was conducted with the NELS longitudinal reading achievement data; the cross-level gender-by-grade level interaction model (Eq. (37)) was tested against the unconditional means model (Eq. (25)). However,

Conclusion

The goal of this article was twofold. The first goal was to clarify the decisions that need to be made by applied researchers prior to MLM data analyses. The second goal was to assist applied researchers in conducting and interpreting MLM analyses and reporting the results. To further both goals, the process of conducting and interpreting MLM analyses was presented as a series of seven steps: (1) clarifying the research question under investigation, (2) choosing the correct parameter estimation

References (31)

  • D.A. Hofmann et al.

    Centering decisions in hierarchical linear models: Implications for research in organizations

    Journal of Management

    (1998)
  • B.O. Muthén et al.

    Multilevel aspects of varying parameters in structural models

  • L.S. Aiken et al.

    Multiple regression: Testing and interpreting interactions

    (1991)
  • J.C. Biesanz et al.

    The role of coding time in estimating and interpreting growth curve models

    Psychological Methods

    (2004)
  • A. Baraldi et al.

    A primer on modern missing data handling methods

    Journal of School Psychology

    (2010)
  • M.A. Clements et al.

    Using multilevel modeling to examine the effects of multitiered interventions

    Psychology in the Schools

    (2007)
  • S.L. Graves et al.

    Multilevel modeling and school psychology: A review and practical example

    School Psychology Quarterly

    (2009)
  • C.K. Enders et al.

    Centering predictor variables in cross-sectional multilevel models: A new look at an old issue

    Psychological Methods

    (2007)
  • A.J. Fairchild et al.

    Evaluating mediation and moderation effects in school psychology: A presentation of methods and review of current practice

    Journal of School Psychology

    (2010)
  • J. Ferron

    Moving between hierarchical model notations

    Journal of Educational and Behavioral Statistics

    (1997)
  • J. Hox

    Multilevel analysis: Techniques and applications

    (2002)
  • C.J. Huberty

    A history of effect size indices

    Educational and Psychological Measurement

    (2002)
  • I.G.G. Kreft et al.

    Introducing multilevel modeling

    (1998)
  • I.G.G. Kreft et al.

    The effect of different forms of centering in hierarchical linear models

    Multivariate Behavioral Research

    (1995)
  • Cited by (936)

    View all citing articles on Scopus
    View full text