The China Study again
In the comments section of Denise Minger’s post on July 16, 2010, which discusses some of the data from the China Study as a follow up to a previous post on the same topic, Denise herself posted the data she used in her analysis. This data is from the China Study. So I decided to take a look at that data and do a couple of multivariate analyzes with it using WarpPLS warppls.com.
First I built a model that explores relationships with the goal of testing the assumption that the consumption of animal protein causes colorectal cancer, via an intermediate effect on total cholesterol. I built the model with various hypothesized associations to explore several relationships simultaneously, including some commonsense ones. Including commonsense relationships is usually a good idea in exploratory multivariate analyses.
The model is shown on the graph below, with the results. Click on it to enlarge. Use the “CRTL” and “+” keys to zoom in The arrows explore causative associations between variables. The variables are shown within ovals. The meaning of each variable is the following: aprotein = animal protein consumption; pprotein = plant protein consumption; cholest = total cholesterol; crcancer = colorectal cancer.
The path coefficients indicated as beta coefficients reflect the strength of the relationships; they are a bit like standard univariate or Pearson correlation coefficients, except that they take into consideration multivariate relationships they control for competing effects on each variable. A negative beta means that the relationship is negative; i.e., an increase in a variable is associated with a decrease in the variable that it points to.
The P values indicate the statistical significance of the relationship; a P lower than 0.05 means a significant relationship 95 percent or higher likelihood that the relationship is real. The R-squared values reflect the percentage of explained variance for certain variables; the higher they are, the better the model fit with the data. Ignore the “R1i” below the variable names; it simply means that each of the variables is measured through a single indicator or a single measure; that is.
I should note that the P values have been calculated using a nonparametric technique, a form of resampling called jackknifing, which does not require the assumption that the data is normally distributed to be met. This is good, because I checked the data, and it does not look like it is normally distributed. So what does the model above tell us? It tells us that:



Yorum Yaz