library(ggplot2) # for general plotting
library(car)     # for ANOVA (Type II used, better than Type I when there is an unbalanced design)

Information of data source

Suppression assay using TU and TU-A mites as inducers of tomato defences and checking TU fecundity on adjacent leaf.

Read in the data and view structure to identify any issues in data formatting

suppression.data <- read.csv("~/Lab Stuff/Adapted mites/Tomato/Suppression Analysis/Systemic Suppression Assay/Suppression Analysis-Fecundity assay/All reps R.csv", header = TRUE)

# Trial as a factor
suppression.data$Trial <- factor(suppression.data$Trial)

str(suppression.data)
## 'data.frame':    40 obs. of  3 variables:
##  $ Trial      : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Mite.Strain: Factor w/ 2 levels "TU","TU-A": 1 1 1 1 1 1 2 2 2 2 ...
##  $ Eggs.Mite  : num  9.18 8.94 7.94 11.79 6.29 ...

Formulate biological hypothesis

H0: There will be no change in fecundity of TU mites when co-infested with TU-A.

HA: TU-A mite co-infestation will increase TU fecundity.

Conduct data exploration

Outliers in the response variable (Eggs.Mite) within explanatory variables (Trial, Mite.Strain).

ggplot(suppression.data, aes(x = Trial, y = Eggs.Mite)) + geom_boxplot() + theme_classic()

ggplot(suppression.data, aes(x = Mite.Strain, y = Eggs.Mite)) + geom_boxplot() + theme_classic()

No outliers.

Collinearity of the explanatory variables

Des not apply, all explanatory variables are categorical/factorial.

Spatial/temporal or other hierarchical aspects of sampling design

No, I am treating Trial as a main effect to check for reproducibility (not a random effect/blocking factor).

Interactions (is the quality of the data good enough to include them?)

Interaction betweenTrial and Mite.Strain will be performed to test for reproducibility.

Zero inflation in Y

No

Are categorical covariates balanced?

summary(suppression.data)
##  Trial  Mite.Strain   Eggs.Mite     
##  1:12   TU  :20     Min.   : 3.000  
##  2:16   TU-A:20     1st Qu.: 5.351  
##  3:12               Median : 6.377  
##                     Mean   : 6.630  
##                     3rd Qu.: 7.821  
##                     Max.   :11.789

Not in Trial (close), yes in Mite.Strain

Apply model

# fit linear model and display model fit information and ANOVA table
m <- lm(Eggs.Mite ~ Mite.Strain + Trial + Mite.Strain:Trial, data = suppression.data)
summary(m)
## 
## Call:
## lm(formula = Eggs.Mite ~ Mite.Strain + Trial + Mite.Strain:Trial, 
##     data = suppression.data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9821 -1.0611 -0.2934  0.9985  3.9042 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              9.2762     0.6484  14.305    6e-16 ***
## Mite.StrainTU-A         -1.7754     0.9170  -1.936 0.061215 .  
## Trial2                  -3.4027     0.8578  -3.967 0.000356 ***
## Trial3                  -3.4243     0.9170  -3.734 0.000689 ***
## Mite.StrainTU-A:Trial2   2.3635     1.2131   1.948 0.059680 .  
## Mite.StrainTU-A:Trial3   1.0451     1.2969   0.806 0.425929    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.588 on 34 degrees of freedom
## Multiple R-squared:  0.4446, Adjusted R-squared:  0.3629 
## F-statistic: 5.442 on 5 and 34 DF,  p-value: 0.0008776
Anova(m)
## Anova Table (Type II tests)
## 
## Response: Eggs.Mite
##                   Sum Sq Df F value    Pr(>F)    
## Mite.Strain        2.668  1  1.0574 0.3110713    
## Trial             56.213  2 11.1406 0.0001901 ***
## Mite.Strain:Trial  9.772  2  1.9366 0.1597651    
## Residuals         85.778 34                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Calculate effect size and display
result.anova<-Anova(m)
ss<-result.anova$"Sum Sq"  ##ss = sum of squares
pes<-ss/(ss+ss[length(ss)])  ##pes = partial e squared
pes[length(pes)]<-""
result.anova$"Part E Sq"<-pes
result.anova
## Anova Table (Type II tests)
## 
## Response: Eggs.Mite
##                   Sum Sq Df F value  Pr(>F) Part E Sq
## Mite.Strain        2.668  1  1.0574 0.31107   0.03016
## Trial             56.213  2 11.1406 0.00019   0.39589
## Mite.Strain:Trial  9.772  2  1.9366 0.15976   0.10227
## Residuals         85.778 34
# plot interactions

interaction.plot(suppression.data$Mite.Strain, suppression.data$Trial, suppression.data$Eggs.Mite, type="l", leg.bty="o", leg.bg="grey95", lwd=2, ylab="Eggs.Mite", xlab="Mite Strain", main="Mite.Strain:Trial")

Validate model

suppression.data$m.fit <- fitted(m)    # fitted values
suppression.data$m.res <- rstandard(m) # Pearson residuals

Residual distribution / Overdispersion

We assumed normal residuals. This is the least important regression assumption but its can be tested with a qq plot.

ggplot(suppression.data, aes(sample = m.res)) + geom_qq() +
  geom_abline(intercept = 0, slope = 1) + theme_classic()

Nice.

Residuals vs fitted values

Testing for:

Linearity - there should be no curvilinear pattern in the residuals.

Equal variance - the vertical spread of the residuals should e constant across all fitted values.

ggplot(suppression.data, aes(x = m.fit, y = m.res)) + 
  geom_point() + geom_hline(yintercept = 0) + geom_smooth() + theme_classic()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Linearity - pretty good - confidence interval includes 0 for all values.

Equal variance - no obvious problems

Residuals vs explanatory variables

Should be centered around 0, if not then model requires another explanatory variable(s), to account for observed variation.

ggplot(suppression.data, aes(x = Mite.Strain, y = m.res)) + 
  geom_boxplot() + 
  geom_hline(yintercept = 0) + theme_classic()

ggplot(suppression.data, aes(x = Trial, y = m.res)) + 
  geom_boxplot() + 
  geom_hline(yintercept = 0) + theme_classic()

ggplot(suppression.data, aes(x = Mite.Strain:Trial, y = m.res)) + 
  geom_boxplot() + 
  geom_hline(yintercept = 0) + theme_classic()

Looks good.

Model is valid and interpretation of ANOVA is good.

sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252   
## [3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C                   
## [5] LC_TIME=English_Canada.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] car_3.0-3     carData_3.0-2 ggplot2_3.1.1
## 
## loaded via a namespace (and not attached):
##  [1] zip_2.0.2         Rcpp_1.0.1        cellranger_1.1.0 
##  [4] pillar_1.4.1      compiler_3.6.0    plyr_1.8.4       
##  [7] forcats_0.4.0     tools_3.6.0       digest_0.6.19    
## [10] evaluate_0.14     tibble_2.1.1      gtable_0.3.0     
## [13] pkgconfig_2.0.2   rlang_0.3.4       openxlsx_4.1.0   
## [16] curl_3.3          yaml_2.2.0        haven_2.1.0      
## [19] xfun_0.7          rio_0.5.16        withr_2.1.2      
## [22] stringr_1.4.0     knitr_1.23        hms_0.4.2        
## [25] grid_3.6.0        data.table_1.12.2 readxl_1.3.1     
## [28] foreign_0.8-71    rmarkdown_1.13    magrittr_1.5     
## [31] scales_1.0.0      htmltools_0.3.6   abind_1.4-5      
## [34] colorspace_1.4-1  labeling_0.3      stringi_1.4.3    
## [37] lazyeval_0.2.2    munsell_0.5.0     crayon_1.3.4