library(ggplot2) # for general plotting
library(car)     # for ANOVA (Type II used, better than Type I when there is an unbalanced design)

Information of data source

% inhibition of Cathepsin L activity was assessed againsed a commercial Cathepsin L.

Read in the data and view structure to identify any issues in data formatting

inhibition.data <- read.csv("~/Lab Stuff/Adapted mites/Tomato/% inhibition assays/TU-A R data.csv", header = TRUE)

# Trail as a factor
inhibition.data$Trial <- factor(inhibition.data$Trial)

str(inhibition.data)
## 'data.frame':    35 obs. of  3 variables:
##  $ Trial     : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Treatment : Factor w/ 3 levels "No mite","TU-A Mites",..: 1 1 1 1 1 3 3 3 3 2 ...
##  $ Inhibition: num  11.84 13.2 4.52 12.32 15.46 ...

Formulate hypothesis

H0: No difference in % inhibition will be observed.

HA: % inhibition will increase with mite application. TU-A % ihibition will be less than TU % inhibition.

Conduct data exploration

Outliers in the response variable (Inhibition) within explanatory variables (Trial, Treatment).

ggplot(inhibition.data, aes(x = Trial, y = Inhibition)) + geom_boxplot()+ theme_classic()

ggplot(inhibition.data, aes(x = Treatment, y = Inhibition)) + geom_boxplot() + theme_classic()

Outlier left in, it probably represents real variaility.

Collinearity of the explanatory variables

Des not apply, all explanatory variables are categorical/factorial.

Spatial/temporal or other hierarchical aspects of sampling design

No, I am treating Trial as a main effect to check for reproducibility (not a random effect/blocking factor).

Interactions (is the quality of the data good enough to include them?)

Interaction betweenTrial and Mite.Strain will be performed to test for reproducibility.

Zero inflation in Y

No

Are categorical covariates balanced?

summary(inhibition.data)
##  Trial       Treatment    Inhibition   
##  1:14   No mite   :12   Min.   : 4.52  
##  2:21   TU-A Mites:12   1st Qu.:17.66  
##         TU Mites  :11   Median :27.63  
##                         Mean   :27.33  
##                         3rd Qu.:32.24  
##                         Max.   :63.75

Trial not balanced, Treatment almost balanced.

Apply model

# fit linear model and display model fit information and ANOVA table
m <- lm(Inhibition ~ Treatment + Trial + Treatment:Trial, data = inhibition.data)
summary(m)
## 
## Call:
## lm(formula = Inhibition ~ Treatment + Trial + Treatment:Trial, 
##     data = inhibition.data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.3940  -3.7536  -0.0567   3.0468  21.5639 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  11.470      3.064   3.743 0.000799 ***
## TreatmentTU-A Mites          19.260      4.333   4.445 0.000118 ***
## TreatmentTU Mites             5.463      4.596   1.189 0.244277    
## Trial2                       13.685      4.012   3.411 0.001924 ** 
## TreatmentTU-A Mites:Trial2   -2.231      5.674  -0.393 0.697086    
## TreatmentTU Mites:Trial2     -1.109      5.877  -0.189 0.851670    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.852 on 29 degrees of freedom
## Multiple R-squared:  0.7116, Adjusted R-squared:  0.6619 
## F-statistic: 14.31 on 5 and 29 DF,  p-value: 4.341e-07
Anova(m)
## Anova Table (Type II tests)
## 
## Response: Inhibition
##                  Sum Sq Df F value    Pr(>F)    
## Treatment       2065.65  2 22.0010 1.536e-06 ***
## Trial           1324.16  1 28.2068 1.069e-05 ***
## Treatment:Trial    7.26  2  0.0773    0.9258    
## Residuals       1361.39 29                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Calculate effect size and display
result.anova<-Anova(m)
ss<-result.anova$"Sum Sq"    ##ss = sum of squares
pes<-ss/(ss+ss[length(ss)])  ##pes = partial e squared
pes[length(pes)]<-""
result.anova$"Part E Sq"<-pes
result.anova
## Anova Table (Type II tests)
## 
## Response: Inhibition
##                  Sum Sq Df F value  Pr(>F) Part E Sq
## Treatment       2065.65  2 22.0010 0.00000   0.60275
## Trial           1324.16  1 28.2068 0.00001   0.49307
## Treatment:Trial    7.26  2  0.0773 0.92582   0.00530
## Residuals       1361.39 29
# perform post-hoc Tukey-Kramer test of contrasts
TukeyHSD(aov(Inhibition ~ Treatment + Trial + Treatment:Trial, data = inhibition.data))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Inhibition ~ Treatment + Trial + Treatment:Trial, data = inhibition.data)
## 
## $Treatment
##                           diff        lwr       upr     p adj
## TU-A Mites-No mite   17.959303  11.051314 24.867293 0.0000015
## TU Mites-No mite      5.482718  -1.580527 12.545962 0.1519604
## TU Mites-TU-A Mites -12.476586 -19.539830 -5.413341 0.0004245
## 
## $Trial
##         diff      lwr     upr    p adj
## 2-1 12.53952 7.704539 17.3745 1.09e-05
## 
## $`Treatment:Trial`
##                                 diff         lwr        upr     p adj
## TU-A Mites:1-No mite:1     19.260478   6.0503939 32.4705618 0.0015071
## TU Mites:1-No mite:1        5.462593  -8.5488169 19.4740030 0.8386607
## No mite:2-No mite:1        13.684573   1.4544116 25.9147341 0.0214206
## TU-A Mites:2-No mite:1     30.714465  18.4843041 42.9446266 0.0000003
## TU Mites:2-No mite:1       18.038409   5.8082478 30.2685703 0.0013129
## TU Mites:1-TU-A Mites:1   -13.797885 -27.8092948  0.2135251 0.0554832
## No mite:2-TU-A Mites:1     -5.575905 -17.8060663  6.6542562 0.7325554
## TU-A Mites:2-TU-A Mites:1  11.453987  -0.7761738 23.6841487 0.0765168
## TU Mites:2-TU-A Mites:1    -1.222069 -13.4522301 11.0080924 0.9996086
## No mite:2-TU Mites:1        8.221980  -4.8696257 21.3135853 0.4139641
## TU-A Mites:2-TU Mites:1    25.251872  12.1602668 38.3434778 0.0000305
## TU Mites:2-TU Mites:1      12.575816  -0.5157895 25.6674215 0.0652772
## TU-A Mites:2-No mite:2     17.029893   5.8653339 28.1944512 0.0008682
## TU Mites:2-No mite:2        4.353836  -6.8107225 15.5183949 0.8385153
## TU Mites:2-TU-A Mites:2   -12.676056 -23.8406150 -1.5114977 0.0189648
# plot interactions
interaction.plot(inhibition.data$Treatment, inhibition.data$Trial, inhibition.data$Inhibition, type="l", leg.bty="o", leg.bg="grey95", lwd=2, ylab="Inhibition", xlab="Treatment", main="Treatment:Trial")

Validate models

inhibition.data$m.fit <- fitted(m)    # fitted values
inhibition.data$m.res <- rstandard(m) # Pearson residuals

Residual distribution / Overdispersion

We assumed normal residuals. This is the least important regression assumption but its can be tested with a qq plot.

ggplot(inhibition.data, aes(sample = m.res)) + geom_qq() +
  geom_abline(intercept = 0, slope = 1) + theme_classic()

Nice.

Residuals vs fitted values

Testing for:

Linearity - there should be no curvilinear pattern in the residuals.

Equal variance - the vertical spread of the residuals should be constant across all fitted values.

ggplot(inhibition.data, aes(x = m.fit, y = m.res)) + 
  geom_point() + geom_hline(yintercept = 0) + geom_smooth() + theme_classic()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Decent.

Residuals vs explanatory variables

Should be centered around 0, if not then model requires another explanatory variable(s), to account for observed variation.

ggplot(inhibition.data, aes(x = Trial, y = m.res)) + 
  geom_boxplot() + geom_hline(yintercept = 0) + theme_classic()

ggplot(inhibition.data, aes(x = Treatment, y = m.res)) + 
  geom_boxplot() + geom_hline(yintercept = 0) + theme_classic()

ggplot(inhibition.data, aes(x = Trial:Treatment, y = m.res)) + 
  geom_boxplot() + geom_hline(yintercept = 0) + theme_classic()

Nice.

Model is valid and interpretation of ANOVA is good.

sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252   
## [3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C                   
## [5] LC_TIME=English_Canada.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] car_3.0-3     carData_3.0-2 ggplot2_3.1.1
## 
## loaded via a namespace (and not attached):
##  [1] zip_2.0.2         Rcpp_1.0.1        cellranger_1.1.0 
##  [4] pillar_1.4.1      compiler_3.6.0    plyr_1.8.4       
##  [7] forcats_0.4.0     tools_3.6.0       digest_0.6.19    
## [10] evaluate_0.14     tibble_2.1.1      gtable_0.3.0     
## [13] pkgconfig_2.0.2   rlang_0.3.4       openxlsx_4.1.0   
## [16] curl_3.3          yaml_2.2.0        haven_2.1.0      
## [19] xfun_0.7          rio_0.5.16        withr_2.1.2      
## [22] stringr_1.4.0     knitr_1.23        hms_0.4.2        
## [25] grid_3.6.0        data.table_1.12.2 readxl_1.3.1     
## [28] foreign_0.8-71    rmarkdown_1.13    magrittr_1.5     
## [31] scales_1.0.0      htmltools_0.3.6   abind_1.4-5      
## [34] colorspace_1.4-1  labeling_0.3      stringi_1.4.3    
## [37] lazyeval_0.2.2    munsell_0.5.0     crayon_1.3.4