library(ggplot2) # for general plotting
library(car)     # for ANOVA (Type II used, better than Type I when there is an unbalanced design)

Information of data source

Enzyme inhibitors were used to treat mites, fecundity and enzyme activity following enzyme inhibition were assessed.

Inhibitors:

PBO - CYP (cytochrome P450) inhibitor

DEM - GST inhibitor

DEF - esterase inhibitor

Read in the data and view structure to identify any issues in data formatting

PBO.data <- read.csv("~/Lab Stuff/Adapted mites/Tomato/Inhibitor assay/Thesis - enzyme activity/PBO/TA CYP enzyme activity R data.csv", header = TRUE)

# Trial as a factor
PBO.data$Trial <- factor(PBO.data$Trial)

str(PBO.data)
## 'data.frame':    86 obs. of  4 variables:
##  $ Trial      : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 2 2 ...
##  $ Mite.Strain: Factor w/ 2 levels "TU","TU-A": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Treatment  : Factor w/ 2 levels "PBO","Water": 2 2 2 2 1 1 1 1 2 2 ...
##  $ Activity   : num  452 449 471 492 386 ...

Formulate hypothesis

H0: There will be no difference in enzyme activity.

HA: Treatment with detoxification enzyme inhibitor will result in degreased enzyme activity.

Conduct data exploration

Outliers in the response variable (Activity) within explanatory variables (Trial, Treatment, Mite.Strain).

ggplot(PBO.data, aes(x = Trial, y = Activity)) + geom_boxplot()+ theme_classic()

ggplot(PBO.data, aes(x = Treatment, y = Activity)) + geom_boxplot()+ theme_classic()

ggplot(PBO.data, aes(x = Mite.Strain, y = Activity)) + geom_boxplot() + theme_classic()

Outliers probably represent real variability and they mostly seem to come from Trial 4, I will keep them in the analysis.

Collinearity of the explanatory variables

Does not apply, all explanatory variables are categorical/factorial.

Spatial/temporal or other hierarchical aspects of sampling design

No, I am treating Trial as a main effect to check for reproducibility (not a random effect/blocking factor).

Interactions (is the quality of the data good enough to include them?)

Interaction betweenTrial and Mite.Strain will be performed to test for reproducibility.

Interaction betweenTrial and Treatment will be performed to test for reproducibility.

Interaction betweenMite.Strain and Treatment will be performed to check if the mite strains responded differently to treatment.

Zero inflation in Y

No

Are categorical covariates balanced?

summary(PBO.data)
##  Trial  Mite.Strain Treatment     Activity    
##  1: 8   TU  :47     PBO  :42   Min.   :263.0  
##  2:26   TU-A:39     Water:44   1st Qu.:360.3  
##  3:32                          Median :424.5  
##  4:20                          Mean   :470.0  
##                                3rd Qu.:514.1  
##                                Max.   :862.7

No, but they are close except for Trial.

Apply model

# fit linear model and display model fit information and ANOVA table
# full model including 3 way interaction term - to verify it is not significant, if it is, interpretation of hypothesis testing will be problematic
m.0 <- lm(Activity ~ Treatment * Mite.Strain * Trial, data = PBO.data)
summary(m.0)
## 
## Call:
## lm(formula = Activity ~ Treatment * Mite.Strain * Trial, data = PBO.data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -117.286  -18.029   -2.871   23.370   75.058 
## 
## Coefficients: (2 not defined because of singularities)
##                                       Estimate Std. Error t value Pr(>|t|)
## (Intercept)                             400.03      18.92  21.139  < 2e-16
## TreatmentWater                           65.58      26.76   2.450  0.01670
## Mite.StrainTU-A                         203.19      23.94   8.489 1.88e-12
## Trial2                                 -101.71      24.43  -4.163 8.58e-05
## Trial3                                  -37.84      23.18  -1.633  0.10692
## Trial4                                  177.09      25.39   6.975 1.23e-09
## TreatmentWater:Mite.StrainTU-A          -38.17      33.85  -1.128  0.26326
## TreatmentWater:Trial2                   -39.92      34.05  -1.172  0.24497
## TreatmentWater:Trial3                   -39.26      32.78  -1.198  0.23490
## TreatmentWater:Trial4                    28.44      35.91   0.792  0.43085
## Mite.StrainTU-A:Trial2                 -101.34      32.41  -3.127  0.00255
## Mite.StrainTU-A:Trial3                 -137.02      30.51  -4.490 2.65e-05
## Mite.StrainTU-A:Trial4                      NA         NA      NA       NA
## TreatmentWater:Mite.StrainTU-A:Trial2    74.25      45.09   1.647  0.10392
## TreatmentWater:Mite.StrainTU-A:Trial3    21.73      43.15   0.504  0.61607
## TreatmentWater:Mite.StrainTU-A:Trial4       NA         NA      NA       NA
##                                          
## (Intercept)                           ***
## TreatmentWater                        *  
## Mite.StrainTU-A                       ***
## Trial2                                ***
## Trial3                                   
## Trial4                                ***
## TreatmentWater:Mite.StrainTU-A           
## TreatmentWater:Trial2                    
## TreatmentWater:Trial3                    
## TreatmentWater:Trial4                    
## Mite.StrainTU-A:Trial2                ** 
## Mite.StrainTU-A:Trial3                ***
## Mite.StrainTU-A:Trial4                   
## TreatmentWater:Mite.StrainTU-A:Trial2    
## TreatmentWater:Mite.StrainTU-A:Trial3    
## TreatmentWater:Mite.StrainTU-A:Trial4    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 37.85 on 72 degrees of freedom
## Multiple R-squared:  0.9499, Adjusted R-squared:  0.9408 
## F-statistic:   105 on 13 and 72 DF,  p-value: < 2.2e-16
Anova(m.0)
## Note: model has aliased coefficients
##       sums of squares computed by model comparison
## Anova Table (Type II tests)
## 
## Response: Activity
##                              Sum Sq Df  F value    Pr(>F)    
## Treatment                     40564  1  28.3177 1.114e-06 ***
## Mite.Strain                  242027  1 168.9599 < 2.2e-16 ***
## Trial                       1604636  3 373.4003 < 2.2e-16 ***
## Treatment:Mite.Strain           102  1   0.0713   0.79018    
## Treatment:Trial               10811  3   2.5156   0.06503 .  
## Mite.Strain:Trial             49941  2  17.4319 6.700e-07 ***
## Treatment:Mite.Strain:Trial    4363  2   1.5228   0.22504    
## Residuals                    103137 72                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# linear model without non-significant 3-way interaction (want to reduce comparisons made in Tukey-Kramer post-hoc test)
m <- lm(Activity ~ Treatment + Mite.Strain + Trial + Mite.Strain:Treatment + Treatment:Trial + Mite.Strain:Trial, data = PBO.data)
summary(m)
## 
## Call:
## lm(formula = Activity ~ Treatment + Mite.Strain + Trial + Mite.Strain:Treatment + 
##     Treatment:Trial + Mite.Strain:Trial, data = PBO.data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -120.250  -18.643    1.106   19.528   78.022 
## 
## Coefficients: (1 not defined because of singularities)
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     400.028     19.057  20.991  < 2e-16 ***
## TreatmentWater                   65.580     26.951   2.433  0.01738 *  
## Mite.StrainTU-A                 186.397     19.110   9.754 6.35e-15 ***
## Trial2                         -112.663     23.701  -4.753 9.60e-06 ***
## Trial3                          -34.875     22.760  -1.532  0.12972    
## Trial4                          185.486     24.490   7.574 8.25e-11 ***
## TreatmentWater:Mite.StrainTU-A   -4.582     17.279  -0.265  0.79160    
## TreatmentWater:Trial2           -19.585     32.028  -0.611  0.54276    
## TreatmentWater:Trial3           -45.189     31.346  -1.442  0.15363    
## TreatmentWater:Trial4            11.651     33.038   0.353  0.72536    
## Mite.StrainTU-A:Trial2          -62.650     22.682  -2.762  0.00724 ** 
## Mite.StrainTU-A:Trial3         -126.155     21.728  -5.806 1.49e-07 ***
## Mite.StrainTU-A:Trial4               NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 38.11 on 74 degrees of freedom
## Multiple R-squared:  0.9478, Adjusted R-squared:   0.94 
## F-statistic: 122.1 on 11 and 74 DF,  p-value: < 2.2e-16
Anova(m)
## Note: model has aliased coefficients
##       sums of squares computed by model comparison
## Anova Table (Type II tests)
## 
## Response: Activity
##                        Sum Sq Df  F value    Pr(>F)    
## Treatment               40564  1  27.9231 1.226e-06 ***
## Mite.Strain            242027  1 166.6057 < 2.2e-16 ***
## Trial                 1604636  3 368.1976 < 2.2e-16 ***
## Treatment:Mite.Strain     102  1   0.0703   0.79160    
## Treatment:Trial         10811  3   2.4806   0.06762 .  
## Mite.Strain:Trial       49941  2  17.1890 7.392e-07 ***
## Residuals              107499 74                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Calculate effect size and display
result.anova<-Anova(m)
## Note: model has aliased coefficients
##       sums of squares computed by model comparison
ss<-result.anova$"Sum Sq"  ##ss = sum of squares
pes<-ss/(ss+ss[length(ss)])  ##pes = partial e squared
pes[length(pes)]<-""
result.anova$"Part E Sq"<-pes
result.anova
## Anova Table (Type II tests)
## 
## Response: Activity
##                        Sum Sq Df  F value  Pr(>F) Part E Sq
## Treatment               40564  1  27.9231 0.00000   0.27396
## Mite.Strain            242027  1 166.6057 0.00000   0.69244
## Trial                 1604636  3 368.1976 0.00000   0.93721
## Treatment:Mite.Strain     102  1   0.0703 0.79160   0.00095
## Treatment:Trial         10811  3   2.4806 0.06762   0.09138
## Mite.Strain:Trial       49941  2  17.1890 0.00000   0.31720
## Residuals              107499 74
# perform post-hoc Tukey-Kramer test of contrasts
TukeyHSD(aov(Activity ~ Treatment + Mite.Strain + Trial + Mite.Strain:Treatment + Treatment:Trial + Mite.Strain:Trial, data = PBO.data))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Activity ~ Treatment + Mite.Strain + Trial + Mite.Strain:Treatment + Treatment:Trial + Mite.Strain:Trial, data = PBO.data)
## 
## $Treatment
##               diff      lwr      upr    p adj
## Water-PBO 38.89777 22.51478 55.28076 1.05e-05
## 
## $Mite.Strain
##             diff      lwr      upr p adj
## TU-A-TU 108.8118 92.36187 125.2617     0
## 
## $Trial
##           diff         lwr       upr     p adj
## 2-1 -115.93944 -156.442119 -75.43675 0.0000000
## 3-1  -82.90045 -122.499614 -43.30129 0.0000030
## 4-1  228.95771  187.049896 270.86553 0.0000000
## 3-2   33.03899    6.588823  59.48915 0.0083585
## 4-2  344.89715  315.101472 374.69282 0.0000000
## 4-3  311.85816  283.302799 340.41353 0.0000000
## 
## $`Treatment:Mite.Strain`
##                          diff        lwr       upr     p adj
## Water:TU-PBO:TU      42.23254  13.000752  71.46433 0.0016605
## PBO:TU-A-PBO:TU     112.84235  81.785321 143.89938 0.0000000
## Water:TU-A-PBO:TU   147.20105 116.572157 177.82995 0.0000000
## PBO:TU-A-Water:TU    70.60981  39.846873 101.37275 0.0000003
## Water:TU-A-Water:TU 104.96851  74.637860 135.29916 0.0000000
## Water:TU-A-PBO:TU-A  34.35870   2.265242  66.45216 0.0311636
## 
## $`Treatment:Trial`
##                        diff          lwr         upr     p adj
## Water:1-PBO:1     62.244798  -21.8233569  146.312952 0.3025261
## PBO:2-PBO:1     -107.178146 -175.8195072  -38.536786 0.0001605
## Water:2-PBO:1    -62.871959 -130.2764375    4.532519 0.0852601
## PBO:3-PBO:1      -61.128702 -127.5904133    5.333010 0.0940520
## Water:3-PBO:1    -42.427401 -108.8891131   24.034310 0.4946642
## PBO:4-PBO:1      222.309398  151.9729341  292.645863 0.0000000
## Water:4-PBO:1    297.850824  227.5143592  368.187288 0.0000000
## PBO:2-Water:1   -169.422944 -238.0643047 -100.781583 0.0000000
## Water:2-Water:1 -125.116757 -192.5212351  -57.712279 0.0000043
## PBO:3-Water:1   -123.373499 -189.8352108  -56.911787 0.0000043
## Water:3-Water:1 -104.672199 -171.1339107  -38.210487 0.0001369
## PBO:4-Water:1    160.064601   89.7281365  230.401065 0.0000000
## Water:4-Water:1  235.606026  165.2695617  305.942490 0.0000000
## Water:2-PBO:2     44.306187   -2.4650163   91.077391 0.0762450
## PBO:3-PBO:2       46.049445    0.6474524   91.451437 0.0444108
## Water:3-PBO:2     64.750745   19.3487526  110.152738 0.0007554
## PBO:4-PBO:2      329.487545  278.5817494  380.393340 0.0000000
## Water:4-PBO:2    405.028970  354.1231746  455.934766 0.0000000
## PBO:3-Water:2      1.743258  -41.7661457   45.252661 1.0000000
## Water:3-Water:2   20.444558  -23.0648455   63.953962 0.8227779
## PBO:4-Water:2    285.181358  235.9560869  334.406629 0.0000000
## Water:4-Water:2  360.722783  311.4975121  409.948054 0.0000000
## Water:3-PBO:3     18.701300  -23.3327771   60.735377 0.8599311
## PBO:4-PBO:3      283.438100  235.5118781  331.364322 0.0000000
## Water:4-PBO:3    358.979525  311.0533033  406.905747 0.0000000
## PBO:4-Water:3    264.736800  216.8105779  312.663022 0.0000000
## Water:4-Water:3  340.278225  292.3520031  388.204447 0.0000000
## Water:4-PBO:4     75.541425   22.3720558  128.710795 0.0008021
## 
## $`Mite.Strain:Trial`
##                       diff        lwr        upr     p adj
## TU-A:1-TU:1             NA         NA         NA        NA
## TU:2-TU:1     -122.2341493 -175.65853 -68.809769 0.0000000
## TU-A:2-TU:1     -0.8505638  -54.27494  52.573816 1.0000000
## TU:3-TU:1      -57.4435988 -108.92462  -5.962578 0.0181623
## TU-A:3-TU:1      0.4549738  -51.02605  51.935994 1.0000000
## TU:4-TU:1      191.3372387  134.94261 247.731871 0.0000000
## TU-A:4-TU:1    375.3904613  318.99583 431.785094 0.0000000
## TU:2-TU-A:1             NA         NA         NA        NA
## TU-A:2-TU-A:1           NA         NA         NA        NA
## TU:3-TU-A:1             NA         NA         NA        NA
## TU-A:3-TU-A:1           NA         NA         NA        NA
## TU:4-TU-A:1             NA         NA         NA        NA
## TU-A:4-TU-A:1           NA         NA         NA        NA
## TU-A:2-TU:2    121.3835855   74.75096 168.016207 0.0000000
## TU:3-TU:2       64.7905504   20.39762 109.183484 0.0005198
## TU-A:3-TU:2    122.6891231   78.29619 167.082056 0.0000000
## TU:4-TU:2      313.5713879  263.56347 363.579304 0.0000000
## TU-A:4-TU:2    497.6246106  447.61669 547.632526 0.0000000
## TU:3-TU-A:2    -56.5930350 -100.98597 -12.200102 0.0038361
## TU-A:3-TU-A:2    1.3055376  -43.08740  45.698471 1.0000000
## TU:4-TU-A:2    192.1878025  142.17989 242.195718 0.0000000
## TU-A:4-TU-A:2  376.2410251  326.23311 426.248941 0.0000000
## TU-A:3-TU:3     57.8985726   15.86450  99.932650 0.0012923
## TU:4-TU:3      248.7808375  200.85462 296.707059 0.0000000
## TU-A:4-TU:3    432.8340601  384.90784 480.760282 0.0000000
## TU:4-TU-A:3    190.8822649  142.95604 238.808487 0.0000000
## TU-A:4-TU-A:3  374.9354875  327.00927 422.861709 0.0000000
## TU-A:4-TU:4    184.0532226  130.88385 237.222592 0.0000000
# plot interactions
interaction.plot(PBO.data$Mite.Strain, PBO.data$Treatment, PBO.data$Activity, type="l", leg.bty="o", leg.bg="grey95", lwd=2, ylab="Activity", xlab="Mite.Strain", main="Mite.Strain:Treatment")

interaction.plot(PBO.data$Treatment, PBO.data$Trial, PBO.data$Activity, type="l", leg.bty="o", leg.bg="grey95", lwd=2, ylab="Activity", xlab="Treatment", main="Treatment:Trial")

interaction.plot(PBO.data$Trial, PBO.data$Mite.Strain, PBO.data$Activity, type="l", leg.bty="o", leg.bg="grey95", lwd=2, ylab="Activity", xlab="Trial", main="Trial:Mite.Strain")

Validate model

PBO.data$m.fit <- fitted(m)    # fitted values
PBO.data$m.res <- rstandard(m) # Pearson residuals

Residual distribution / Overdispersion

We assumed normal residuals. This is the least important regression assumption but its can be tested with a qq plot.

ggplot(PBO.data, aes(sample = m.res)) + geom_qq() +
  geom_abline(intercept = 0, slope = 1) + theme_classic()

Nice, a few points deviating from normal near the ends, but I donโ€™t see a problem with the assumption of normaliy.

Residuals vs fitted values

Testing for:

Linearity - there should be no curvilinear pattern in the residuals.

Equal variance - the vertical spread of the residuals should be constant across all fitted values.

ggplot(PBO.data, aes(x = m.fit, y = m.res)) + 
  geom_point() +   geom_hline(yintercept = 0) +  geom_smooth() + theme_classic()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Little curvature, confidence intervals always include 0. Spread of variables looks pretty good.

Residuals vs explanatory variables

Should be centered around 0, if not then model requires another explanatory variable(s), to account for observed variation.

ggplot(PBO.data, aes(x = Trial, y = m.res)) + 
  geom_boxplot() + theme_classic() + geom_hline(yintercept = 0) 

ggplot(PBO.data, aes(x = Treatment, y = m.res)) + 
  geom_boxplot() + theme_classic() + geom_hline(yintercept = 0) 

ggplot(PBO.data, aes(x = Mite.Strain, y = m.res)) + 
  geom_boxplot() + theme_classic() + geom_hline(yintercept = 0) 

Nice.

Models is valid and interpretation of ANOVAs is good.

sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252   
## [3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C                   
## [5] LC_TIME=English_Canada.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] car_3.0-3     carData_3.0-2 ggplot2_3.1.1
## 
## loaded via a namespace (and not attached):
##  [1] zip_2.0.2         Rcpp_1.0.1        cellranger_1.1.0 
##  [4] pillar_1.4.1      compiler_3.6.0    plyr_1.8.4       
##  [7] forcats_0.4.0     tools_3.6.0       digest_0.6.19    
## [10] evaluate_0.14     tibble_2.1.1      gtable_0.3.0     
## [13] pkgconfig_2.0.2   rlang_0.3.4       openxlsx_4.1.0   
## [16] curl_3.3          yaml_2.2.0        haven_2.1.0      
## [19] xfun_0.7          rio_0.5.16        withr_2.1.2      
## [22] stringr_1.4.0     knitr_1.23        hms_0.4.2        
## [25] grid_3.6.0        data.table_1.12.2 readxl_1.3.1     
## [28] foreign_0.8-71    rmarkdown_1.13    magrittr_1.5     
## [31] scales_1.0.0      htmltools_0.3.6   abind_1.4-5      
## [34] colorspace_1.4-1  labeling_0.3      stringi_1.4.3    
## [37] lazyeval_0.2.2    munsell_0.5.0     crayon_1.3.4