Factor analyze this

Pedagogical Tools and Methods

Andrew Worster, MD, MSc;*† Geoff Norman, PhD

From the *Division of Emergency Medicine, McMaster University, Hamilton, Ont., and the †Department of Clinical Epidemiology & Biostatistics, McMaster University, Hamilton, Ont.

CJEM 2009;11(3):240

In this issue of CJEM,1 we learn how medical students interested in emergency medicine are different from those interested in the lesser specialties. To demonstrate this, the authors surveyed 2168 medical students at the beginning of their medical school education. The students were asked how each of 27 variables influenced their first choice of career specialty. These variables or items represented mostly lifestyle and work-style preferences. Although this number of variables is less than the original 41-item survey, it remains high enough to make interpretation cumbersome.

To make this list more reader-friendly, the authors could simply sort the items into a handful of categories and report a score for each. Of course, this assumes that most of the items, although unique, have something in common with at least some of the other items in the list. In other words, it assumes there is communality among the variables that allows them to be grouped into categories. To do this, the authors would first have to decide on how many categories to create. This is critical because it determines everything else that happens after.2 Next, the authors would need to determine the inclusion and exclusion criteria for each category. But how would they manage items that fit equally well into more than 1 category? What about those items that don't fit into any category? If they created additional categories for the "loner" items, they'd begin to defeat the purpose of the exercise, that is, reducing the list size. Last but not least, the authors couldn't be sure that the readers would accept their results as being valid.

So how did the authors reduce the 27 career influences into 6 categories, and how can we be sure that the results are valid? You guessed it (or you read the methods section in the study): factor analysis (FA). Factor analysis is a statistical method of analyzing the pattern of interrelationships among a large number of variables and explaining these patterns by a smaller number of common hypothetical constructs called factors.2 It is important to point out that the term FA is often used interchangeably with principal component analysis. It is also sometimes used as a catch-all term for a myriad of statistical methods that achieve similar objectives, the details of which are beyond the scope of this paper.2

Condensing and compartmentalizing data typically results in a loss of information, but FA is able to minimize this loss while eliminating those variables that contribute little or nothing to the solution. The communality referred to earlier is the characteristic that groups of variables have in common. In FA, we construct underlying "factors," which are just linear combinations of all the variables. The first factor is computed as the linear sum of variables that explains most of the variance in all the data. The step is repeated to determine the combination of variables that explains most of the remaining variance. Eventually the process continues until we have created as many factors as variables. However, this would be of very little value, because some factors may contribute too little variance to be of much informational value. Instead, the number of factors retained is decided by some criterion, usually the Kaiser criterion, which calls for retaining only factors with an eigenvalue (a standardized measure of the variance explained) greater than 1.3 This is a standard method (albeit with some limitations) that precludes creating factors with less variance than any single original variable.2 When deciding which variables to include in each of the selected factors, one looks at the factor load (a measure of the correlation between the individual variable and the factor). Ideally, only variables with a factor loading greater than 0.5 are selected, but commonly variables with loadings as low as 0.25 are also retained.

The authors' application of FA reduced the 27 variables into 6 easily interpreted factors. In doing so, they used statistics to eliminate some potential career influences such as "focus on nonurgent care," and "don't like uncertainty." We would all probably agree that medical students who rated these variables highly are unlikely to enjoy emergency medicine.

REFERENCES

  1. Scott IM, Abu-Laban RB, Gowans MC, et al. Emergency medicine as a career choice: a descriptive study of Canadian medical students. CJEM 2009;11(3):196-206.
  2. Norman GR, Streiner DL. Principal components and factor analysis (Chapter 18). In: Biostatistics, the bare essentials. 2nd ed. Hamilton (ON): BC Decker; 2000. p 163-76.
  3. Kaiser HF. The application of electronic computers to factor analysis. Educ Psychol Meas 1960;20:141-51.