library(ggplot2) # for general plotting
library(car)     # for ANOVA (Type II used, better than Type I when there is an unbalanced design)

Information of data source

% inhibition of Cathepsin L activity was assessed againsed a commercial Cathepsin L.

Read in the data and view structure to identify any issues in data formatting

inhibition.data <- read.csv("~/Lab Stuff/Adapted mites/Tomato/% inhibition assays/all on Heinz/TU-A R data.csv", header = TRUE)

# Trail as a factor
inhibition.data$Trial <- factor(inhibition.data$Trial)

str(inhibition.data)
## 'data.frame':    33 obs. of  3 variables:
##  $ Trial     : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Treatment : Factor w/ 3 levels "No mite","TU",..: 1 1 1 1 2 2 2 2 3 3 ...
##  $ Inhibition: num  14.7 25.5 23.5 29.8 56.6 ...

Formulate hypothesis

H0: No difference in % inhibition will be observed.

HA: % inhibition will increase with mite application. TU-A % ihibition will be less than TU % inhibition.

Conduct data exploration

Outliers in the response variable (Inhibition) within explanatory variables (Trial, Treatment).

ggplot(inhibition.data, aes(x = Trial, y = Inhibition)) + geom_boxplot()+ theme_classic()

ggplot(inhibition.data, aes(x = Treatment, y = Inhibition)) + geom_boxplot() + theme_classic()

Outliers left in, probably represents real variaility.

Collinearity of the explanatory variables

Des not apply, all explanatory variables are categorical/factorial.

Spatial/temporal or other hierarchical aspects of sampling design

No, I am treating Trial as a main effect to check for reproducibility (not a random effect/blocking factor).

Interactions (is the quality of the data good enough to include them?)

Yes, interaction betweenTrial and Mite.Strain will be performed to test for reproducibility.

Zero inflation in Y

No

Are categorical covariates balanced?

summary(inhibition.data)
##  Trial    Treatment    Inhibition   
##  1:12   No mite:11   Min.   :14.70  
##  2: 9   TU     :11   1st Qu.:26.31  
##  3:12   TU-A   :11   Median :40.69  
##                      Mean   :37.36  
##                      3rd Qu.:45.88  
##                      Max.   :60.17

Trial not balanced, Treatment balanced.

Apply model

# fit linear model and display model fit information and ANOVA table
m <- lm(Inhibition ~ Treatment + Trial + Treatment:Trial, data = inhibition.data)
summary(m)
## 
## Call:
## lm(formula = Inhibition ~ Treatment + Trial + Treatment:Trial, 
##     data = inhibition.data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.1440  -5.0618   0.2101   2.7072  19.3258 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            23.385      4.939   4.735 8.15e-05 ***
## TreatmentTU            17.461      6.985   2.500  0.01965 *  
## TreatmentTU-A           5.192      6.985   0.743  0.46450    
## Trial2                  7.006      7.544   0.929  0.36231    
## Trial3                 19.887      6.985   2.847  0.00889 ** 
## TreatmentTU:Trial2     -3.915     10.669  -0.367  0.71690    
## TreatmentTU-A:Trial2   -2.579     10.669  -0.242  0.81107    
## TreatmentTU:Trial3    -12.608      9.878  -1.276  0.21400    
## TreatmentTU-A:Trial3   -4.930      9.878  -0.499  0.62223    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.878 on 24 degrees of freedom
## Multiple R-squared:  0.4874, Adjusted R-squared:  0.3165 
## F-statistic: 2.853 on 8 and 24 DF,  p-value: 0.02222
Anova(m)
## Anova Table (Type II tests)
## 
## Response: Inhibition
##                  Sum Sq Df F value   Pr(>F)   
## Treatment        842.46  2  4.3173 0.025026 * 
## Trial           1213.89  2  6.2208 0.006659 **
## Treatment:Trial  170.21  4  0.4361 0.781179   
## Residuals       2341.62 24                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Calculate effect size and display
result.anova<-Anova(m)
ss<-result.anova$"Sum Sq"  ##ss = sum of squares
pes<-ss/(ss+ss[length(ss)])  ##pes = partial e squared
pes[length(pes)]<-""
result.anova$"Part E Sq"<-pes
result.anova
## Anova Table (Type II tests)
## 
## Response: Inhibition
##                  Sum Sq Df F value  Pr(>F) Part E Sq
## Treatment        842.46  2  4.3173 0.02503   0.26458
## Trial           1213.89  2  6.2208 0.00666   0.34141
## Treatment:Trial  170.21  4  0.4361 0.78118   0.06776
## Residuals       2341.62 24
# perform post-hoc Tukey-Kramer test of contrasts
TukeyHSD(aov(Inhibition ~ Treatment + Trial + Treatment:Trial, data = inhibition.data))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Inhibition ~ Treatment + Trial + Treatment:Trial, data = inhibition.data)
## 
## $Treatment
##                   diff        lwr       upr     p adj
## TU-No mite   11.808726   1.290576 22.326876 0.0256985
## TU-A-No mite  2.695627  -7.822523 13.213778 0.7996892
## TU-A-TU      -9.113098 -19.631249  1.405052 0.0982634
## 
## $Trial
##          diff       lwr      upr     p adj
## 2-1  4.841539 -6.035695 15.71877 0.5163993
## 3-1 14.040510  3.970148 24.11087 0.0052752
## 3-2  9.198971 -1.678263 20.07621 0.1085350
## 
## $`Treatment:Trial`
##                            diff         lwr      upr     p adj
## TU:1-No mite:1       17.4612168  -6.2791138 41.20155 0.2803235
## TU-A:1-No mite:1      5.1917338 -18.5485969 28.93206 0.9974487
## No mite:2-No mite:1   7.0059565 -18.6365313 32.64844 0.9888079
## TU:2-No mite:1       20.5526293  -5.0898585 46.19512 0.1914181
## TU-A:2-No mite:1      9.6189813 -16.0235065 35.26147 0.9291800
## No mite:3-No mite:1  19.8867442  -3.8535864 43.62707 0.1528757
## TU:3-No mite:1       24.7395190   0.9991884 48.47985 0.0365764
## TU-A:3-No mite:1     20.1482173  -3.5921133 43.88855 0.1424736
## TU-A:1-TU:1         -12.2694831 -36.0098137 11.47085 0.7077770
## No mite:2-TU:1      -10.4552604 -36.0977482 15.18723 0.8924017
## TU:2-TU:1             3.0914125 -22.5510754 28.73390 0.9999677
## TU-A:2-TU:1          -7.8422355 -33.4847233 17.80025 0.9775257
## No mite:3-TU:1        2.4255274 -21.3148032 26.16586 0.9999910
## TU:3-TU:1             7.2783022 -16.4620284 31.01863 0.9771932
## TU-A:3-TU:1           2.6870005 -21.0533302 26.42733 0.9999801
## No mite:2-TU-A:1      1.8142227 -23.8282651 27.45671 0.9999995
## TU:2-TU-A:1          15.3608955 -10.2815923 41.00338 0.5351571
## TU-A:2-TU-A:1         4.4272476 -21.2152402 30.06974 0.9995245
## No mite:3-TU-A:1     14.6950105  -9.0453202 38.43534 0.4935113
## TU:3-TU-A:1          19.5477853  -4.1925454 43.28812 0.1672700
## TU-A:3-TU-A:1        14.9564835  -8.7838471 38.69681 0.4709312
## TU:2-No mite:2       13.5466728 -13.8662997 40.95965 0.7523889
## TU-A:2-No mite:2      2.6130248 -24.7999477 30.02600 0.9999947
## No mite:3-No mite:2  12.8807877 -12.7617001 38.52328 0.7366069
## TU:3-No mite:2       17.7335625  -7.9089253 43.37605 0.3530881
## TU-A:3-No mite:2     13.1422608 -12.5002270 38.78475 0.7164231
## TU-A:2-TU:2         -10.9336480 -38.3466205 16.47932 0.9033570
## No mite:3-TU:2       -0.6658851 -26.3083729 24.97660 1.0000000
## TU:3-TU:2             4.1868897 -21.4555981 29.82938 0.9996841
## TU-A:3-TU:2          -0.4044120 -26.0468998 25.23808 1.0000000
## No mite:3-TU-A:2     10.2677629 -15.3747249 35.91025 0.9014710
## TU:3-TU-A:2          15.1205377 -10.5219501 40.76303 0.5549612
## TU-A:3-TU-A:2        10.5292360 -15.1132518 36.17172 0.8886934
## TU:3-No mite:3        4.8527748 -18.8875558 28.59311 0.9984057
## TU-A:3-No mite:3      0.2614731 -23.4788575 24.00180 1.0000000
## TU-A:3-TU:3          -4.5913017 -28.3316323 19.14903 0.9989226
# plot interactions
interaction.plot(inhibition.data$Treatment, inhibition.data$Trial, inhibition.data$Inhibition, type="l", leg.bty="o", leg.bg="grey95", lwd=2, ylab="Inhibition", xlab="Treatment", main="Treatment:Trial")

Validate models

inhibition.data$m.fit <- fitted(m)    # fitted values
inhibition.data$m.res <- rstandard(m) # Pearson residuals

Residual distribution / Overdispersion

We assumed normal residuals. This is the least important regression assumption but its can be tested with a qq plot.

ggplot(inhibition.data, aes(sample = m.res)) + geom_qq() +
  geom_abline(intercept = 0, slope = 1) + theme_classic()

Nice.

Residuals vs fitted values

Testing for:

Linearity - there should be no curvilinear pattern in the residuals.

Equal variance - the vertical spread of the residuals should be constant across all fitted values.

ggplot(inhibition.data, aes(x = m.fit, y = m.res)) + 
  geom_point() + geom_hline(yintercept = 0) + geom_smooth() + theme_classic()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Looks good.

Residuals vs explanatory variables

Should be centered around 0, if not then model requires another explanatory variable(s), to account for observed variation.

ggplot(inhibition.data, aes(x = Trial, y = m.res)) + 
  geom_boxplot() + geom_hline(yintercept = 0) + theme_classic()

ggplot(inhibition.data, aes(x = Treatment, y = m.res)) + 
  geom_boxplot() + geom_hline(yintercept = 0) + theme_classic()

ggplot(inhibition.data, aes(x = Treatment:Trial, y = m.res)) + 
  geom_boxplot() + geom_hline(yintercept = 0) + theme_classic()

Nice.

Model is valid and interpretation of ANOVA is good.

sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252   
## [3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C                   
## [5] LC_TIME=English_Canada.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] car_3.0-3     carData_3.0-2 ggplot2_3.1.1
## 
## loaded via a namespace (and not attached):
##  [1] zip_2.0.2         Rcpp_1.0.1        cellranger_1.1.0 
##  [4] pillar_1.4.1      compiler_3.6.0    plyr_1.8.4       
##  [7] forcats_0.4.0     tools_3.6.0       digest_0.6.19    
## [10] evaluate_0.14     tibble_2.1.1      gtable_0.3.0     
## [13] pkgconfig_2.0.2   rlang_0.3.4       openxlsx_4.1.0   
## [16] curl_3.3          yaml_2.2.0        haven_2.1.0      
## [19] xfun_0.7          rio_0.5.16        withr_2.1.2      
## [22] stringr_1.4.0     knitr_1.23        hms_0.4.2        
## [25] grid_3.6.0        data.table_1.12.2 readxl_1.3.1     
## [28] foreign_0.8-71    rmarkdown_1.13    magrittr_1.5     
## [31] scales_1.0.0      htmltools_0.3.6   abind_1.4-5      
## [34] colorspace_1.4-1  labeling_0.3      stringi_1.4.3    
## [37] lazyeval_0.2.2    munsell_0.5.0     crayon_1.3.4