Linear mixed effects matching Kanai - saccade latency
Test against the null model
bfKanaiLateral = anovaBF(latency~stimulation*leg*direction+subject, data = data.frame(filter(latencyMedianBaseline, type == "lateral")), whichModels = "withmain", whichRandom = "subject", progress = FALSE, iterations = 100000) # compute Bayes Factors
View(latencyMedianBaseline)
bfKanaiLateral = sort(bfKanaiLateral, decreasing = TRUE) # sort such that winning model is at the top
First we compare all models to the most simple (null) model, which is the intercept only + random effect model: latency ~ subject
. This does not test for effects of SUBJECT but models it as a nuisance factor (whichRandom = "subject"
). In addition, to decrease the model space, we do not consider models that have an interaction without the corresponding main effects (whichModels = "withmain"
).
kable(select(extractBF(bfKanaiLateral), bf)) # show only the Bayes factors in a table
leg + subject |
0.8399089 |
stimulation + subject |
0.7416393 |
stimulation + leg + subject |
0.6465591 |
direction + subject |
0.1357138 |
direction + leg + subject |
0.1114138 |
stimulation + direction + subject |
0.0995612 |
stimulation + direction + leg + subject |
0.0872255 |
stimulation + leg + stimulation:leg + subject |
0.0478623 |
stimulation + direction + stimulation:direction + leg + subject |
0.0246398 |
stimulation + direction + stimulation:direction + subject |
0.0215074 |
direction + leg + direction:leg + subject |
0.0110546 |
stimulation + direction + leg + direction:leg + subject |
0.0085640 |
stimulation + direction + leg + stimulation:leg + subject |
0.0063807 |
stimulation + direction + stimulation:direction + leg + direction:leg + subject |
0.0018722 |
stimulation + direction + stimulation:direction + leg + stimulation:leg + subject |
0.0014185 |
stimulation + direction + leg + stimulation:leg + direction:leg + subject |
0.0006242 |
stimulation + direction + stimulation:direction + leg + stimulation:leg + direction:leg + subject |
0.0001322 |
stimulation + direction + stimulation:direction + leg + stimulation:leg + direction:leg + stimulation:direction:leg + subject |
0.0000286 |
The winning model is the one with just a main effect of LEG, with a Bayes factor of 0.8. However, so even for this model there is less evidence than for the null model, which by definition has a Bayes Factor of 1.
The conventional interpretation for Bayes Factors is the following:
- BF10 < 0.1: strong evidence for H0
- 0.1 < BF10 < .33: moderate evidence for H0
0.333 < BF10 < 1: anecdotal evidence for H0
BF10 = 1: equivalent evidence for H0 and H1
- 1 BF10 < 3: anecdotal evidence for H1
- 3 < BF10 < 10: moderate evidence for H1
BF10 > 10: strong evidence H1
So to switch between expressing the Bayes Factor as evidence for H1 (BF10) or H0 (BF01), you simply invert it (divide by 1).
We can compute the evidence for a particular effect by comparing this winning model with the best-fitting model that does not contain the effect. We can compute the evidence for absence of a particular effect by comparing the winning model with the best-fitting model that does contain the effect.
The evidence for the effect of LEG can be quantified by comparing the Bayes factors of the first and the 3rd model (because that is the first one that does not contain LEG as a factor): 1.3. This constitues only anocdotal evidence for the presence of a leg effect, even though it was significant in the classical ANOVA. But, given that the leg model itself is such a poor fit, in the Bayesian analysis this weak evidence is no surprise.
The evidence for the absence of the STIMULATION by DIRECTION interaction can be quantified by comparing the Bayes factors of the first and the 9th model (because that is the first one that does contain the interaction): 34.1. This constitues strong evidence for the absence of the interaction.
Test against the full model
Another option for quantifying evidence for a particular effect is to compare the full model to a model where that effect is omitted (whichModels = top")
. The full model is stimulation + direction + stimulation:direction + leg + stimulation:leg + direction:leg + stimulation:direction:leg + subject
.
bfKanaiLateralFull = anovaBF(latency~stimulation*leg*direction+subject, data = data.frame(filter(latencyMedianBaseline, type == "lateral")), whichModels = "top", whichRandom = "subject", progress = FALSE, iterations = 100000) # compute Bayes Factors
bfKanaiLateralFull
Bayes factor top-down analysis
--------------
When effect is omitted from stimulation + direction + leg + stimulation:direction + stimulation:leg + direction:leg + stimulation:direction:leg + subject , BF is...
[1] Omit direction:leg:stimulation : 5.384841 <U+00B1>5.62%
[2] Omit direction:leg : 9.833237 <U+00B1>5.51%
[3] Omit leg:stimulation : 13.49779 <U+00B1>5.66%
[4] Omit direction:stimulation : 4.282142 <U+00B1>5.45%
[5] Omit leg : 1.191349 <U+00B1>7.91%
[6] Omit direction : 7.358471 <U+00B1>5.63%
[7] Omit stimulation : 1.295222 <U+00B1>8.68%
Against denominator:
latency ~ stimulation + direction + leg + stimulation:direction + stimulation:leg + direction:leg + stimulation:direction:leg + subject
---
Bayes factor type: BFlinearModel, JZS
Removing the LEG effect from the model yields a higher Bayes Factor, so removing this effect actually improved the model, although only by a little bit. The evidence thus goes in the direction of the null; when expressed in favor of the alternative, the Bayes Factir becomes less than one: 1 1.191 =
0.8
Similarly, removing the DIRECTION by STIMULATION effect improves the model, and this time a bit more: in the range of moderate evidence for the null.
Bayesian model averaging
The problem with the 1st option (comparing single models, for example to the winning model) is that for designs with many factors (and therefore models), it becomes risky and a bit subjective to base conclusions on just two models. In this case, the winning model is even a bad fit, so it doesn’t seem appropriate to use it as a benchmark.
The problem with the 2nd option (comparing against the full model) is actually very apparent in this dataset: the full model is a terrible fit, as it comes in last. There is even a lot of evidence against it when compared to the null model!
One solution is to do Bayesian Model Averaging: compare multiple models and aggregate the Bayes Factors.
In the JASP stats package, this analysis is also called the “inclusion Bayes factor”. Briefly, it compares all models that include the effect of interest vs. all models that do not. For examples and a conceptual explanation, see example 5 in: Bayesian inference for psychology. Part II: Example applications with JASP.
There’s also a different way of calculating inclusion Bayes factors (conceptualized by Sebastiaan Mathot) that is currently being implemented in JASP. This is called the “inclusion Bayes factor across matched models”, and is more selective in the set of models that’s being compared than the standard inclusion BF. Briefly, the inclusion BF across matched models compares:
- all models that include the effect of interest, but NO interactions with the effect of interest, VERSUS
- the models that result from stripping the effect of interest from this set of models.
Let’s compare the default inclusion Bayes Factors for all effects:
# Default inclusion Bayes Factors
kable(inclusionBF(bfKanaiLateral))
stimulation |
0.2873386 |
direction |
0.0556146 |
stimulation:direction |
0.0287605 |
leg |
0.3194814 |
stimulation:leg |
0.0327913 |
direction:leg |
0.0128231 |
stimulation:direction:leg |
0.0001360 |
Doing the analysis this way, we also find strong evidence against most of these effects; particularly the interactions.
My preferred approach is the inclusion Bayes factor across matched models. The evidence is qualitatively similar, but less strong, probably because we aren’t including as many poorly fitting models in the model comparison (which have a very large BF10 and therefore a lot of effect on the composite Bayes Factor):
kable(inclusionBF(bfKanaiLateral, models = "matched"))
stimulation |
0.7547571 |
direction |
0.1344014 |
stimulation:direction |
0.2449654 |
leg |
0.8555487 |
stimulation:leg |
0.0733784 |
direction:leg |
0.0962754 |
stimulation:direction:leg |
0.2163735 |
Linear mixed effects matching Kanai - center saccades
bfKanaiCenter = anovaBF(latency~stimulation*leg*direction+subject, data = data.frame(filter(latencyMedianBaseline, type == "center")), whichModels="withmain", whichRandom = "subject", progress = FALSE, iterations = 100000) # compute Bayes Factors
bfKanaiCenter <- sort(bfKanaiCenter, decreasing = TRUE) # sort such that winning model is at the top
kable(select(extractBF(bfKanaiCenter), bf)) # show only the Bayes factors in a table
stimulation + direction + subject |
555.2527534 |
stimulation + direction + stimulation:direction + subject |
401.8383357 |
stimulation + direction + leg + subject |
94.0231519 |
stimulation + direction + stimulation:direction + leg + subject |
71.1333686 |
stimulation + subject |
56.4457669 |
stimulation + leg + subject |
9.9269257 |
stimulation + direction + leg + direction:leg + subject |
9.1764175 |
direction + subject |
8.1385456 |
stimulation + direction + stimulation:direction + leg + direction:leg + subject |
6.8923098 |
stimulation + direction + leg + stimulation:leg + subject |
6.3767641 |
stimulation + direction + stimulation:direction + leg + stimulation:leg + subject |
4.7562332 |
direction + leg + subject |
1.2906652 |
stimulation + direction + leg + stimulation:leg + direction:leg + subject |
0.6672675 |
stimulation + leg + stimulation:leg + subject |
0.6516657 |
stimulation + direction + stimulation:direction + leg + stimulation:leg + direction:leg + subject |
0.5806634 |
leg + subject |
0.1579592 |
direction + leg + direction:leg + subject |
0.1210615 |
stimulation + direction + stimulation:direction + leg + stimulation:leg + direction:leg + stimulation:direction:leg + subject |
0.0672617 |
Interestingly, there is a lot more evidence across the board, especially for the models that feature Stimulation and Direction.
kable(inclusionBF(bfKanaiCenter, models = "matched"))
stimulation |
67.6885828 |
direction |
9.7544628 |
stimulation:direction |
0.7290812 |
leg |
0.1726179 |
stimulation:leg |
0.0681792 |
direction:leg |
0.0981963 |
stimulation:direction:leg |
0.1158360 |
There seems to be a mismatch with the classical ANOVA: An effect of stimulation receives strong support, whereas it was non-significant. The effect of direction was significant, but it is less strongly supported by the inclusion Bayes Factor (although the evidence still goes in the right direction).
Main effect of stimulation
latencyMedianBaseline %>%
filter(type == "center") %>%
group_by(subject,stimulation) %>%
summarise(latency = mean(latency)) %>%
ggplot(aes(stimulation, latency)) +
geom_hline(yintercept = 0, linetype = "dashed") +
stat_summary(fun.y = mean, geom = "line", aes(group = 1), size = 2) +
geom_line(aes(colour = subject, group = subject))

It is easy to see why this effect is non-significant: the average difference is tiny and there is a lot of variability. The plot shows one major outlier though in terms of the effect size (S01): let’s see what happens to the Bayes Factor if we take their data out.
latencyNoS01 <- latencyMedianBaseline %>%
filter(subject != "S01") %>%
mutate(subject = factor(subject))
bfKanaiCenterNoS01 = anovaBF(latency~stimulation*leg*direction+subject, data = data.frame(filter(latencyNoS01, type == "center")), whichModels="withmain", whichRandom = "subject", progress = FALSE, iterations = 100000) # compute Bayes Factors
bfKanaiCenterNoS01 <- sort(bfKanaiCenterNoS01, decreasing = TRUE) # sort such that winning model is at the top
kable(select(extractBF(bfKanaiCenterNoS01), bf)) # show only the Bayes factors in a table
stimulation + direction + subject |
1370.5021891 |
direction + subject |
610.7330230 |
stimulation + direction + leg + subject |
531.6196659 |
stimulation + direction + stimulation:direction + subject |
352.9869220 |
direction + leg + subject |
200.7493579 |
stimulation + direction + stimulation:direction + leg + subject |
123.1974214 |
stimulation + direction + leg + direction:leg + subject |
63.0498822 |
stimulation + direction + leg + stimulation:leg + subject |
42.8446155 |
direction + leg + direction:leg + subject |
24.4897141 |
stimulation + direction + stimulation:direction + leg + direction:leg + subject |
14.9256981 |
stimulation + direction + stimulation:direction + leg + stimulation:leg + subject |
8.3443542 |
stimulation + direction + leg + stimulation:leg + direction:leg + subject |
4.0678602 |
stimulation + subject |
1.9848227 |
stimulation + direction + stimulation:direction + leg + stimulation:leg + direction:leg + subject |
1.0168697 |
stimulation + leg + subject |
0.6129997 |
leg + subject |
0.3006916 |
stimulation + direction + stimulation:direction + leg + stimulation:leg + direction:leg + stimulation:direction:leg + subject |
0.1633244 |
stimulation + leg + stimulation:leg + subject |
0.0422355 |
kable(inclusionBF(bfKanaiCenterNoS01, models = "matched"))
stimulation |
2.3502132 |
direction |
699.4732539 |
stimulation:direction |
0.2487328 |
leg |
0.3664546 |
stimulation:leg |
0.0767869 |
direction:leg |
0.1186097 |
stimulation:direction:leg |
0.1606149 |
This completely abolishes the strong support for Stimulation, and greatly enhances the support for Direction, bringing the Bayesian and the classical ANOVAs more in line. At present it is unclear why the classical and Bayesian analyses differ in this regard. When simulating this case for normally distributed data, the Bayes Factors and p-values track each other nicely (see discussion on JASP/BayesFactor forum). This suggests there must be some assumption that is not met in this particular dataset, which is causing the divergence between the analyses.
Still, let’s do some follow-up tests with all of the data (i.e. including this subject) to see whether the anodal or cathodal change scores are significantly different from 0 on their own.
Bayesian one-sample t-tests:
latencyMedianBaseline %>%
filter(type == "center") %>% # keep only center saccades
group_by(stimulation,subject) %>% # for each session and subject
summarise(deviation.end = mean(latency)) %>% # average over all other variables
spread(stimulation,deviation.end) %>% # make separate columns with test data
summarise_if(is.numeric, funs(extractBF(ttestBF(.), onlybf = TRUE))) %>% # run Bayesian t-test on each column, keeping only the BF
gather(stimulation,BF,anodal,cathodal) %>% # make row for each stimulation condition
kable(.)
anodal |
0.3218926 |
cathodal |
0.2840670 |
Frequentist one-sample t-tests:
latencyMedianBaseline %>%
filter(type == "center") %>% # keep only center saccades
group_by(stimulation,subject) %>% # for each session and subject
summarise(deviation.end = mean(latency)) %>% # average over all other variables (df is now still grouped per stimulation)
summarise_if(is.numeric, funs(list(tidy(t.test(.))))) %>% # run one-sample t-test for each stimulation condition, return tidy data frames
unnest() %>% # unpack the list-column with data frame for each test
kable(.)
anodal |
-1.608974 |
-0.9875463 |
0.3328370 |
25 |
-4.964508 |
1.746559 |
One Sample t-test |
two.sided |
cathodal |
1.096154 |
0.8335773 |
0.4124131 |
25 |
-1.612139 |
3.804446 |
One Sample t-test |
two.sided |
So neither are actually significant or have a BF with evidence for the alternative.