A simulation study inspired by experiments in improving Wikipedia editing experience, and demonstrating multiple methodologies for analyzing data.
In these notes we will use the statistical software and programming language R, along with a few packages. RStudio IDE is highly recommended, but not required.
Use the following code to install those packages if you want to follow along:
install.packages(
c("tidyverse", "zeallot", "lme4", "car", "brms", "tidybayes", "cmdstanr"),
repos = c("https://mcstan.org/rpackages/", getOption("repos"))
)
cmdstanr::install_cmdstan()
# Optional:
install.packages(c("gt", "gtsummary")) # tables
Suppose we’re running a test of a new editor interface for making edits to an article and we wish to know whether this new editing interface is an improvement – defined as “whether the user published the edit after starting it.” Specifically, we’re interested in a lift of 20% in the publish (successful edit attempt) probability.
We will build a model of edit attempt outcomes \(y\) – which is 1 if the edit was published and 0 if it was not. We use \(p\) to indicate the probability of a successful edit – in other words: \(p = \text{Pr}(y = 1)\). The simplest version of our model is a logistic regression:
\[ \begin{align*} y &\sim \text{Bernoulli}(p)\\ \text{logit}(p) &= \beta_0 + \beta_1 \times \text{used new interface} \end{align*} \]
where \(\beta_0\) is the intercept, and \(\beta_1\) is the slope and the effect associated with using the new interface. In other words:
\[ \begin{align*} & \text{Pr}(\text{edit published}\text{did not use new interface}) & =~ & \text{logit}^{1}(\beta_0)\\ & \text{Pr}(\text{edit published}\text{used new interface}) & =~ & \text{logit}^{1}(\beta_0 + \beta_1) \end{align*} \]
The goal of the statistical model here is to infer the values of \(\beta_0\) and \(\beta_1\).
It is important to remember that these model parameters (also called coefficients) are on the logodds scale, and that when we interpret their estimates we must be careful and sometimes we need to apply transformations to make sense of them.
For example, (Gelman, Hill, and Vehtari 2021) suggest using the “divideby4” rule for convenience. Under this transformation \(\beta_1/4\) is a reasonable approximation of the maximum increase in the probability of success corresponding to a unit difference – in our case the difference between which interface was used (\(\text{used new interface} = 0\) vs \(\text{used new interface} = 1\)).
Alternatively, \(\exp(\beta_1)\) – or \(e^{\beta_1}\) – represents the multiplicative effect on the odds of an edit getting published.
To see the methods in action we will simulate users, their edit attempts, and the outcomes of those attempts – whether they were successful (edit published) or not – and then see how the models are able to recover the true values of the parameters which were used in the simulation.
where b1
is the effect of the new editor interface on the logodds scale (refer to the introduction above).
simple_example < tibble(
new_interface = c(FALSE, TRUE),
formula = c("\\[\\beta_0\\]", "\\[\\beta_0 + \\beta_1\\]")
) %>%
mutate(
log_odds = b0 + b1 * new_interface,
prob = invlogit(log_odds)
)
simple_example %>%
gt() %>%
fmt(vars(new_interface), fns = function(x) ifelse(x, "New", "Old")) %>%
fmt_percent(vars(prob), decimals = 1) %>%
cols_label(
new_interface = "Editing interface",
formula = html("logit<sup>1</sup>(p)"),
log_odds = "Logodds scale",
prob = "Publish probability"
) %>%
tab_header(
"Logistic regression model",
html(sprintf("Using β<sub>0</sub> = %.1f and β<sub>1</sub> = %.1f", b0, b1))
)
Logistic regression model  

Using β_{0} = 0.7 and β_{1} = 0.8  
Editing interface  logit^{1}(p)  Logodds scale  Publish probability 
Old  \[\beta_0\]  0.7  33.2% 
New  \[\beta_0 + \beta_1\]  0.1  52.5% 
Let’s generate some data to use for our study! We will first build a user simulator and then use that to simulate many users making edits, sometimes using the new interface and sometimes using the old interface.
simulate_user_edit_attempts < function() {
# unique for each user; informs that user's "base publish prob."
user_intercept < b0 + rnorm(1, 0, 0.75)
# how many edit attempts did this user make?
n_edit_attempts < ceiling(rexp(1, 1/50))
# simulate very active users either very into new interface or not
prob_using_new_interface < ifelse(
n_edit_attempts > 150, # 'active': 150+ edit attempts
sample(c(0.05, 0.85), 1), # for active users
rbeta(1, 1.5, 1.5) # for everyone else
)
# simulate which edit attempts were performed with which interface:
used_new_interface < rbernoulli(n_edit_attempts, p = prob_using_new_interface)
# each edit attempt's success prob. depends on user and interface used:
user_log_odds < user_intercept + b1 * used_new_interface
success_probabilities < invlogit(user_log_odds)
# simulate each edit attempt's outcome, based on that edit attempt's prob.:
edit_attempts < map_lgl(success_probabilities, rbernoulli, n = 1)
# return simulated user's edits and outcomes:
tibble(
user_intercept = user_intercept,
user_success = boot::inv.logit(user_intercept),
new_interface = used_new_interface * 1L,
success_prob = success_probabilities,
edit_success = edit_attempts * 1L
)
}
To see this simulation in action, let’s simulate one user and print the first 10 edit attempts:
# A tibble: 10 x 5
user_intercept user_success new_interface success_prob edit_success
<dbl> <dbl> <int> <dbl> <int>
1 0.247 0.561 0 0.561 1
2 0.247 0.561 1 0.740 0
3 0.247 0.561 1 0.740 1
4 0.247 0.561 1 0.740 1
5 0.247 0.561 1 0.740 0
6 0.247 0.561 0 0.561 1
7 0.247 0.561 0 0.561 0
8 0.247 0.561 0 0.561 1
9 0.247 0.561 1 0.740 1
10 0.247 0.561 0 0.561 0
Notice how the user’s individual intercept gets transformed into their base probability, how the choice of the interface either keeps it the same or lifts it, and how even using the new interface (and in this case having 74% probability of publishing the edit) did not yield successes in every instance.
That was just one user, but our analysis will require more. Let’s simulate 100 users:
set.seed(42)
simulated_edits < map_dfr(
1:100,
~ simulate_user_edit_attempts(),
.id = "user_id"
)
simulated_edits < simulated_edits %>%
mutate(
# format user IDs to use zero padding:
user_id = factor(sprintf("%03.0f", as.numeric(user_id)))
)
First, let’s see what results we get from a simple model that treats all edit attempts as independent and interchangeable, even though we know this to be wrong since edit attempts from the same user are going to share a base success probability (based on that user’s simulated intercept):
The results are:
fit0 %>%
tbl_regression(
label = list(new_interface = "Using new interface"),
intercept = TRUE
)
Characteristic  log(OR)^{1}  95% CI^{1}  pvalue 

(Intercept)  0.82  0.90, 0.74  <0.001 
Using new interface  0.82  0.71, 0.93  <0.001 
^{
1
}
OR = Odds Ratio, CI = Confidence Interval

This model has done an OK job of estimating the coefficients \(\beta_0\) = 0.7 and \(\beta_1\) = 0.8.
However, the model above is fundamentally flawed because it assumes all of the edit attempts are independent and identically distributed, but this is not the case. A much more correct model would incorporate the fact that these edit attempts are related to each other because they were done by the same users.
Hierarchical regression models – also known as multilevel models and mixedeffects models (due to the presence of random effects in addition to fixed effects) – are used to model this structure when individuals observations are grouped. We refer to these groupings as random effects. In fact, we can even model more deeply nested structures – for example, if we had simulated multiple wikis and multiple users within those wikis then we could model this hierarchy.
The results are:
tbl1 < fit1 %>%
tbl_regression(
label = list(new_interface = "Using new interface"),
intercept = TRUE
)
tbl1
Characteristic  log(OR)^{1}  95% CI^{1}  pvalue 

(Intercept)  0.77  0.95, 0.60  <0.001 
Using new interface  0.86  0.72, 1.0  <0.001 
^{
1
}
OR = Odds Ratio, CI = Confidence Interval

This model has done a better job of estimating the coefficients \(\beta_0\) = 0.7 and \(\beta_1\) = 0.8.
In order to interpret these estimates, we have to apply a transformation. However, we need to be careful about the confidence intervals. The delta method models can be used to estimate the confidence intervals of the transformed parameters:
c("b1/4", "exp(b1)") %>%
map_dfr(
.f = car::deltaMethod,
object = fit1,
parameterNames = paste0("b", 0:1)
) %>%
rownames_to_column(var = "transformation") %>%
gt(rowname_col = "transformation") %>%
fmt_number(vars(`Estimate`, `SE`, `2.5 %`, `97.5 %`)) %>%
cols_merge(vars(`2.5 %`, `97.5 %`), pattern = "({1}, {2})") %>%
cols_label(`2.5 %` = "95% CI", SE = "Standard Error") %>%
tab_header("Transformed effects", "Estimated using delta method") %>%
tab_footnote("CI: Confidence Interval", cells_column_labels(vars(`2.5 %`)))
Transformed effects  

Estimated using delta method  
Estimate  Standard Error  95% CI^{1}  
b1/4  0.21  0.02  (0.18, 0.25) 
exp(b1)  2.36  0.17  (2.03, 2.69) 
^{
1
}
CI: Confidence Interval

The maximum lift in edit attempt success probability is 21% (95% CI: 18%25%), and because the confidence interval does not include 0 this is statistically significant at the 0.05 level. When using the new interface the odds of publishing the edit are multiplied by 2.36 (95% CI: 2.032.69) – effectively double – and in this case we would determine statistical significance (at the 0.05 level) by checking whether this confidence interval contains 1, since a multiplicative effect of 1 is no change either way.
One of the things we did in the simulation was have special behavior for highlyactive users. Specifically, when a user made more than 150 simulated edit attempts our simulation made approximately 5% of their edits use the new interface OR 85% of their edits use the new interface – effectively simulating power users who are very apprehensive about the new interface or who have a strong preference for it.
most_active_users %>%
mutate(
prop_new_interface = n_new_interface / n_edits,
prop_successes = n_successes / n_edits,
prop_successes_new = n_successes_new / n_new_interface
) %>%
gt() %>%
fmt(
starts_with("prop_"),
fns = function(x) sprintf("%s%.0f%%", ifelse(x < 0.1, " ", ""), 100 * x)
) %>%
cols_merge(ends_with("_new_interface"), pattern = "{1} (<b>{2}</b>)") %>%
cols_merge(ends_with("_successes"), pattern = "{1} (<b>{2}</b>)") %>%
cols_merge(ends_with("_successes_new"), pattern = "{1} (<b>{2}</b>)") %>%
cols_align("left", vars(user_id)) %>%
cols_align("right", starts_with("n_")) %>%
cols_label(
user_id = "User ID",
n_edits = "Total edit attempts",
n_new_interface = "Edits made using new interface",
n_successes = "Total edits published",
n_successes_new = "Edits published with new interface"
) %>%
tab_footnote(
footnote = html("(<b>%</b>) is the proportion of total edits which were made using the new interface"),
locations = cells_column_labels(vars(n_new_interface))
) %>%
tab_footnote(
footnote = html("(<b>%</b>) is the proportion of edits made using the new interface which were published"),
locations = cells_column_labels(vars(n_successes_new))
) %>%
tab_footnote(
footnote = html("(<b>%</b>) is the proportion of total edits published"),
locations = cells_column_labels(vars(n_successes))
) %>%
tab_style(
cell_text(font = "monospace", size = "large"),
cells_body(starts_with("n_"))
) %>%
tab_style(
style = list(
cell_fill(color = "gray95")
),
cells_body(rows = (1:nrow(most_active_users)) %% 2 == 1)
) %>%
tab_header("Most active (simulated) users")
Most active (simulated) users  

User ID  Total edit attempts  Edits made using new interface^{1}  Edits published with new interface^{2}  Total edits published^{3} 
002  424  349 (82%)  142 (41%)  161 (38%) 
003  262  8 ( 3%)  2 (25%)  118 (45%) 
023  150  118 (79%)  82 (69%)  99 (66%) 
025  125  22 (18%)  10 (45%)  37 (30%) 
035  180  6 ( 3%)  3 (50%)  55 (31%) 
040  182  164 (90%)  75 (46%)  81 (45%) 
059  188  10 ( 5%)  2 (20%)  43 (23%) 
067  122  90 (74%)  45 (50%)  52 (43%) 
070  176  9 ( 5%)  1 (11%)  16 ( 9%) 
094  159  8 ( 5%)  2 (25%)  15 ( 9%) 
^{
1
}
(%) is the proportion of total edits which were made using the new interface
^{
2
}
(%) is the proportion of edits made using the new interface which were published
^{
3
}
(%) is the proportion of total edits published

These users account for 36% of all edits in this simulated dataset.
Let us see what happens to the results when we exclude those users from analysis:
tbl2 < fit2 %>%
tbl_regression(
label = list(new_interface = "Using new interface"),
intercept = TRUE
)
tbl_merge(
list(tbl1, tbl2),
c("Including most active users", "Excluding most active users")
)
Characteristic  Including most active users  Excluding most active users  

log(OR)^{1}  95% CI^{1}  pvalue  log(OR)^{1}  95% CI^{1}  pvalue  
(Intercept)  0.77  0.95, 0.60  <0.001  0.74  0.92, 0.55  <0.001 
Using new interface  0.86  0.72, 1.0  <0.001  0.89  0.73, 1.0  <0.001 
^{
1
}
OR = Odds Ratio, CI = Confidence Interval

list(
`Including most active users` = c("b1/4", "exp(b1)") %>%
map_dfr(
.f = car::deltaMethod,
object = fit1,
parameterNames = paste0("b", 0:1)
) %>%
rownames_to_column(var = "transformation"),
`Excluding most active users` = c("b1/4", "exp(b1)") %>%
map_dfr(
.f = car::deltaMethod,
object = fit2,
parameterNames = paste0("b", 0:1)
) %>%
rownames_to_column(var = "transformation")
) %>%
bind_rows(.id = "dataset") %>%
gt(rowname_col = "transformation", groupname_col = "dataset") %>%
fmt_number(vars(`Estimate`, `SE`, `2.5 %`, `97.5 %`)) %>%
cols_merge(vars(`2.5 %`, `97.5 %`), pattern = "({1}, {2})") %>%
cols_label(`2.5 %` = "95% CI", SE = "Standard Error") %>%
tab_header("Transformed effects") %>%
tab_footnote("CI: Confidence Interval", cells_column_labels(vars(`2.5 %`)))
Transformed effects  

Estimate  Standard Error  95% CI^{1}  
Including most active users  
b1/4  0.21  0.02  (0.18, 0.25) 
exp(b1)  2.36  0.17  (2.03, 2.69) 
Excluding most active users  
b1/4  0.22  0.02  (0.18, 0.26) 
exp(b1)  2.43  0.20  (2.04, 2.81) 
^{
1
}
CI: Confidence Interval

In this case the estimates did not change much when excluding these most active users, but still there is no reason not to include those users.
One of the benefits of switching to a Bayesian approach with this model is we can reason about the parameters (and any other quantities) probabilistically. This benefit will become apparent soon, but first we need to fit the Bayesian model and obtain draws of the parameters from what’s called the joint posterior distribution – so called because it’s a reallocation of possibilities (from values of parameters we thought were likely apriori) after taking into account observed data.
summary(fit3)
Family: bernoulli
Links: mu = logit
Formula: edit_success ~ new_interface + (1  user_id)
Data: simulated_edits (Number of observations: 5519)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total postwarmup samples = 4000
GroupLevel Effects:
~user_id (Number of levels: 100)
Estimate Est.Error l95% CI u95% CI Rhat Bulk_ESS
sd(Intercept) 0.70 0.07 0.57 0.85 1.00 1023
Tail_ESS
sd(Intercept) 2119
PopulationLevel Effects:
Estimate Est.Error l95% CI u95% CI Rhat Bulk_ESS
Intercept 0.77 0.09 0.95 0.60 1.00 1295
new_interface 0.85 0.07 0.72 1.00 1.00 5895
Tail_ESS
Intercept 2141
new_interface 3459
Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
tbl3 < fit3 %>%
spread_draws(b_new_interface, b_Intercept) %>%
mutate(
exp_b = exp(b_new_interface),
b4 = b_new_interface / 4,
avg_lift = invlogit(b_Intercept + b_new_interface)  invlogit(b_Intercept)
) %>%
pivot_longer(
b_new_interface:avg_lift,
names_to = "param",
values_to = "val"
) %>%
group_by(param) %>%
summarize(
ps = c(0.025, 0.5, 0.975),
qs = quantile(val, probs = ps),
.groups = "drop"
) %>%
mutate(
quantity = ifelse(
param %in% c("b_Intercept", "b_new_interface"),
"Parameter", "Function of parameter(s)"
),
param = factor(
param,
c("b_Intercept", "b_new_interface", "exp_b", "b4", "avg_lift"),
c("(Intercept)", "Using new interface", "Multiplicative effect on odds", "Divideby4 rule", "Average lift")
),
ps = factor(ps, c(0.025, 0.5, 0.975), c("lower", "median", "upper")),
) %>%
pivot_wider(names_from = "ps", values_from = "qs") %>%
arrange(quantity, param)
tbl3 %>%
gt(rowname_col = "param", groupname_col = "quantity") %>%
row_group_order(c("Parameter", "Function of parameter(s)")) %>%
fmt_number(vars(lower, median, upper), decimals = 3) %>%
fmt_percent(columns = vars(median, lower, upper), rows = 2:3, decimals = 1) %>%
cols_align("center", vars(median, lower, upper)) %>%
cols_merge(vars(lower, upper), pattern = "({1}, {2})") %>%
cols_move_to_end(vars(lower)) %>%
cols_label(median = "Point Estimate", lower = "95% CI") %>%
tab_style(cell_text(weight = "bold"), cells_row_groups()) %>%
tab_footnote("CI: Credible Interval", cells_column_labels(vars(lower))) %>%
tab_footnote(
html("Average lift = Pr(SuccessNew interface)  Pr(SuccessOld interface) = logit<sup>1</sup>(β<sub>0</sub> + β<sub>1</sub>)  logit<sup>1</sup>(β<sub>0</sub>)"),
cells_body(vars(median), 3)
) %>%
tab_header("Posterior summary of model parameters")
Posterior summary of model parameters  

Point Estimate  95% CI^{1}  
Parameter  
(Intercept)  −0.773  (−0.952, −0.601) 
Using new interface  0.854  (0.715, 0.999) 
Function of parameter(s)  
Multiplicative effect on odds  2.349  (2.044, 2.717) 
Divideby4 rule  21.3%  (17.9%, 25.0%) 
Average lift  20.4%^{2}  (17.1%, 23.8%) 
^{
1
}
CI: Credible Interval
^{
2
}
Average lift = Pr(SuccessNew interface)  Pr(SuccessOld interface) = logit^{1}(β_{0} + β_{1})  logit^{1}(β_{0})

The results are essentially the same as the nonBayesian model in the previous section. A couple of key differences:
Hypothesis testing in this paradigm works a little differently. Rather than performing null hypothesis significance testing (NHST), rejecting or failingtoreject the null hypothesis, and calculating pvalues, we can ask for probabilities of ranges.
What’s the probability using the new interface at least doubles the odds of an edit getting published?
hyp1 < "exp(new_interface) > 2"
fit3 %>%
hypothesis(hyp1)
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper
1 (exp(new_interfac... > 0 0.36 0.17 0.09 0.65
Evid.Ratio Post.Prob Star
1 87.89 0.99 *

'CI': 90%CI for onesided and 95%CI for twosided hypotheses.
'*': For onesided hypotheses, the posterior probability exceeds 95%;
for twosided hypotheses, the value tested against lies outside the 95%CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Of interest is the Post.Prob
column – the posterior probability. In this case, we are almost absolutely sure that the odds of an edit getting published at least double when the new interface is being used.
What’s the probability that the average lift in edit attempt success probability is positive?
hyp2 < "invlogit((Intercept) + new_interface)  invlogit((Intercept)) > 0"
fit3 %>%
hypothesis(hyp2)
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper
1 (invlogit((Interc... > 0 0.2 0.02 0.18 0.23
Evid.Ratio Post.Prob Star
1 Inf 1 *

'CI': 90%CI for onesided and 95%CI for twosided hypotheses.
'*': For onesided hypotheses, the posterior probability exceeds 95%;
for twosided hypotheses, the value tested against lies outside the 95%CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
What’s the probability that the average lift in edit attempt success probability is at least 20%?
h < "invlogit((Intercept) + new_interface)  invlogit((Intercept)) > 0.2"
(hyp3 < hypothesis(fit3, hypothesis = h))
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper
1 (invlogit((Interc... > 0 0 0.02 0.02 0.03
Evid.Ratio Post.Prob Star
1 1.46 0.59

'CI': 90%CI for onesided and 95%CI for twosided hypotheses.
'*': For onesided hypotheses, the posterior probability exceeds 95%;
for twosided hypotheses, the value tested against lies outside the 95%CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
There is a 0.5935 probability that the difference between the Pr(Success  New interface) and Pr(Success  Old interface) is greater than 0.2 – and it is up to the stakeholder to decide whether that is high enough for them or if they would prefer to see a probability of, say, 0.8.
Another question we may ask is “is there a significant difference in proportions of successful edits between edits made using the new interface and those made using the old interface?”
edit_counts < simulated_edits %>%
group_by(user_id) %>%
summarize(
n_total_edits = n(),
new_ui_total_edits = sum(new_interface),
old_ui_total_edits = sum(1  new_interface),
new_ui_successful_edits = sum(edit_success * new_interface),
old_ui_successful_edits = sum(edit_success * (1  new_interface))
) %>%
# Only users who have used both UIs at least once:
filter(
new_ui_total_edits > 0,
old_ui_total_edits > 0
) %>%
mutate(
new_ui_prop_success = new_ui_successful_edits / new_ui_total_edits,
old_ui_prop_success = old_ui_successful_edits / old_ui_total_edits
)
Note: 4 users were excluded from this analysis because they did not use both interfaces, and this analysis requires at least one edit attempt using each of the interfaces.
tidy_counts < edit_counts %>%
select(user_id, contains("ui")) %>%
pivot_longer(
!user_id,
names_to = c("interface", ".value"),
names_pattern = "(.*)_ui_(.*)",
values_drop_na = TRUE
)
ggplot(tidy_counts, aes(y = prop_success, x = interface)) +
geom_boxplot() +
geom_jitter(aes(size = total_edits), width = 0.2, height = 0, alpha = 0.5) +
scale_y_continuous("Proportion of edits published", labels = scales::percent_format(1)) +
scale_size_continuous(
"Edits attempted by user",
range = c(1, 4), trans = "log", breaks = c(1, 5, 55)
) +
ggtitle("Proportion of edits published, by interface") +
theme(legend.position = "bottom")
The users at the top (100%) and bottom (0%) made very few edits, so it is easy for the proportion to be “all” or “none” when the denominator is, say, 1.
To actually test the hypothesis we perform a paired ttest since we have a success proportion from each editor for each user.
t.test(
x = edit_counts$new_ui_prop_success,
y = edit_counts$old_ui_prop_success,
alternative = "greater",
paired = TRUE
)
Paired ttest
data: edit_counts$new_ui_prop_success and edit_counts$old_ui_prop_success
t = 5.0261, df = 95, pvalue = 1.175e06
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
0.09907877 Inf
sample estimates:
mean of the differences
0.1479853
The average difference in proportion of published edits between the interfaces is 14.8% (p < 0.001), which is both practically and statistically significant.
The underlying question here is different. The models above inferred the latent (hidden) value of the publish probability, which is different than inferring the average proportion of published edits.
Also, for anyone curious:
hyp4 < "invlogit((Intercept) + new_interface)  invlogit((Intercept)) > 0.148"
fit3 %>%
hypothesis(hyp4)
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper
1 (invlogit((Interc... > 0 0.06 0.02 0.03 0.08
Evid.Ratio Post.Prob Star
1 1999 1 *

'CI': 90%CI for onesided and 95%CI for twosided hypotheses.
'*': For onesided hypotheses, the posterior probability exceeds 95%;
for twosided hypotheses, the value tested against lies outside the 95%CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Software  Version 

OS  macOS Big Sur 10.16 
R  R version 4.0.4 (20210215) 
CmdStan  2.26.0 
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/bearloga/wmfmlmglmnotes, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".