There is a common perception that it is difficult to detect interactions (moderations) in multiple linear regression (MLR). There are several possible reasons for this, but a priori, I think the power for detecting an interaction is often equivalent to that for detecting a main effect, except in a few special cases, as demonstrated in the code below.

In a MLR with two independent variables (regressors) X and Z predicting the dependent variable Y, we can express a (linear) interaction as the product of X and Z:

`Y ~ Bx*X + Bz*Z + Bxz*(X*Z) + E`

where `Y`

, `X`

and `Z`

are vectors,
`Bx`

, `Bz`

and `Bxz`

are parameters
(scalars), `*`

is the element-wise product, and
`E`

is a vector of residuals. We’ll label the interaction
term as XZ.

This paper McClelland & Judd 1993 argues that moderation effects are difficult to detect. It makes a useful distinction between “experimental” and “field” studies. Experimental studies are assumed to manipulate (often extreme) values of X and Z, represented here as 2 levels of each factor (i.e categorical variables, indicated using dummy variables with values [0,1] in the regressors). Field studies, on the other hand, measure continuous values of X and Z. Below I simulate both types, and argue that interactions are in principle just as detectable as main effects for experimental studies, and can be less or more detectable than main effects in field studies, depending on the range (spread) of X and Z, the correlation between them, and the presence of any error in measuring X and Z.

Before simulating, here are some other possible reasons for a difficulty in detecting an interaction, which are not so relevant to discussion here:

It may be an empirical fact that interactions between the types of variables measured in many studies have a smaller effect size than the main effect of those variables. Or there may be theoretical reasons why the interaction could only be ordinal rather than disordinal (cross-over) (see Lakens Blog, in which case the effect size might be half that for the main effects. Here however we consider the case when there is no such empirical or a priori knowledge, i.e, when effect sizes for X, Z and XZ are equal (ie

`Bx=Bz=Bxz=1`

).The addition of the interaction term to the MLR model eats an additional degree of freedom (df). This is true, but it is strange to not include it, unless we have a priori reason not to expect an interaction, and in any case, its effect on statistical power is negligible (compared to tests of main effects) if we have enough data, as here.

It is important to mean-correct X and Z before multiplying them in equation above, otherwise the interaction term XZ will be more correlated with X and Z (if one or both have non-zero mean), i.e, the interaction and main effects will be confounded. (Note interaction terms are orthogonal to main effects in experimental studies, by design).

Sometimes people think they have low power to detect “interactions” because they have fewer values per cell when they break down the data according to the specific levels of two or more factors. However, this really refers to the power to detect the “simple effects” on subsets of the data, for example to interpret what is driving an overall interaction; here we only consider the omnibus interaction involving all the data.

Sometimes people say (for experimental studies) that taking differences of random samples reduces the signal-to-noise ratio (SNR). This is true if the mean of two independent samples have the same sign, because signal will be decreased by subtraction but noise will be increased. Hence one might think that interactions, as differences of differences, would have even lower SNR. However, interactions are no different from main effects in this sense, since both involve taking differences and averages across conditions (i.e., in a 2x2 design, the contrasts for the two main effects across four conditions might be [1 1 -1 -1] and [1 -1 1 -1], in which case the interaction would be [1 -1 -1 1], all of which involve the same amount of subtraction/averaging, as code below demonstrates).

There has been research looking at the effect of “range restriction” on power to detect interactions in MLR (e.g., Aguinis & Stone-Romero 1997), for example where participants are only selected for a study if they score above some criterion on one variable, eg X. This clearly reduces the range of the interaction term too, which of course reduces its power, but here we assume no such range restriction.

Power is easily simulated by generating random data many times (ie simulating many studies), fitting a model, and counting how many studies produce a significant result (here, defined as the p-value being less than alpha=0.05, calculated separately for each main effect and interaction).

Let’s start by defining some parameters:

```
nsim = 20 # Number of simulations (determines precision of power estimate)
nexp = 100 # Number of experiments/studies (determines precision of power estimate)
nsub = 40 # Number of subjects (must be >1 and multiple of 4 if arg$expt=1)
Beta = c(1, 1, 1) # Betas (effect sizes) for X, Z, XZ
nsd = 15 # sd of measurement noise
```

Most should be self-explanatory, or expanded below. The values of
`nsim`

, `nexp`

and `nsub`

are just
chosen to give reasonable precision on power estimates below that can be
calculated in reasonable time on typical PC (10-30s), but you can
increase them if you want more precise estimates (at the expense of
time).

We can gather these parameters into an R list `arg`

and
pass to the function `pow`

below to return the proportion of
true effects detected (with p-value < 0.05). In brief, the
`pow`

function itself loops through `nsim`

iterations of `nexp`

experiments on `nsub`

subjects, on each occasion generating random data with the
`gen_dat`

function, and then fitting the MLR above using
`lm`

or `lme`

in R. The details of these functions
can be explored later, but for the moment, just define them:

```
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
```

```
# main power function
pow <- function(arg) {
if (is.null(arg$Xrep)) {arg$Xrep = 0}
if (is.null(arg$Zrep)) {arg$Zrep = 0}
if (is.null(arg$seqTest)) {arg$seqTest = 0}
if (is.null(arg$msd)) {arg$msd = 0}
SE = sub_eff(arg$nsub, arg$Xrep, arg$Zrep)
set.seed(100) # if want same results each time run
psig = matrix(NA, nrow = arg$nsim, ncol = 3)
alpha = 0.05
for (i in 1:arg$nsim) {
nsig = 0
for (j in 1:arg$nexp) {
d = gen_dat(arg)
if (arg$msd > 0) { # If regressors measured with noise (msd)
d$X = d$X + rnorm(n = length(d$X), mean = 0, sd = arg$msd)
d$Z = d$Z + rnorm(n = length(d$Z), mean = 0, sd = arg$msd)
d$X = (d$X - mean(d$X)); d$Z = (d$Z - mean(d$Z))
d$XZ = d$X*d$Z
}
if (arg$expt & (arg$Xrep | arg$Zrep)) {
d$ind = SE$ind
m = lme(y ~ X + Z + XZ, data = d, random = ~ 1 | ind)
nsig = nsig + (summary(m)$tTable[2:4,5] < alpha)
} else { # basic lm quicker (ie when no subject effects)
m = lm(y ~ X + Z + XZ, data = d)
if (arg$seqTest) { nsig = nsig + (anova(m)$`Pr(>F)`[1:3] < alpha) }
else { nsig = nsig + (summary(m)$coefficients[2:4, 4] < alpha) }
}
}
psig[i, ] = nsig / arg$nexp
}
return(psig)
}
# generate data
gen_dat <- function(arg) {
if (is.null(arg$Xrep)) {arg$Xrep = 0}
if (is.null(arg$Zrep)) {arg$Zrep = 0}
if (arg$expt) {
O = matrix(1, nrow = arg$nsub/4, ncol = 1)
X = c(0, 0, 1, 1)
Z = c(0, 1, 0, 1)
X = X %x% O; Z = Z %x% O
X = scale(X); Z = scale(Z)
} else if (is.null(arg$R)) {
C = matrix(c(arg$vX, arg$cXZ, arg$cXZ, arg$vZ), ncol = 2)
R = mvtnorm::rmvnorm(arg$nsub, c(X = 0, Z = 0), C)
X = R[, 1]; Z = R[, 2]
} else {
X = arg$R[, 1]; Z = arg$R[, 2]
}
X = (X - mean(X)) # /sd(X) - may not want to scale if range meaningful and of interest
Z = (Z - mean(Z)) # /sd(X) - may not want to scale if range meaningful and of interest
XZ = X * Z
if (arg$expt & (arg$Xrep>0 | arg$Zrep>0)) {
Xm = matrix(c(1, arg$Xrep, arg$Xrep, 1), ncol = 2)
Zm = matrix(c(1, arg$Zrep, arg$Zrep, 1), ncol = 2)
C = (Xm %x% Zm) %x% (arg$scor * diag(arg$nsub/4))
C = C + (diag(arg$nsub) * (1-arg$scor))
C = arg$nsd * C
} else {
C = arg$nsd * diag(arg$nsub)
}
E = mvtnorm::rmvnorm(1, matrix(0, nrow = arg$nsub, ncol = 1), C)
y = arg$Beta[1] * X + arg$Beta[2] * Z + arg$Beta[3] * XZ + t(E)
d = data.frame(y, X, Z, XZ)
return(d)
}
# sub function to deal with subject effects
sub_eff <- function(nsub,Xrep,Zrep) {
SX = diag(2)
SZ = diag(2)
if (Xrep > 0) { SX = matrix(c(1, 1), nrow = 2, ncol = 1) }
if (Zrep > 0) { SZ = matrix(c(1, 1), nrow = 2, ncol = 1) }
mat = (SX %x% SZ) %x% diag(nsub/4)
ind = matrix(NA, nrow = nrow(mat), ncol = 1)
for (j in 1:ncol(mat)) {
ind[(mat[,j]!=0)] = j
}
SE = list(mat=mat, ind=ind)
return(SE)
}
# plot results
plot_bar <- function(psig) {
d = data.frame(effect = c('X','Z','XZ'), power = apply(psig,2,mean), sd = apply(psig,2,sd))
ggplot(d) +
geom_bar( aes(x=factor(effect, levels = c('X','Z','XZ')), y=power), stat="identity", fill="skyblue", alpha=0.7) + xlab('Effect') +
geom_errorbar( aes(x=effect, ymin=power-sd, ymax=power+sd), width=0.4, colour="orange", alpha=0.9, linewidth=1.3)
}
```

Ok, so for experimental studies, we can create regressors from dummy
coding the two levels of X and Z in a 2x2 design. This is done within
the `gen_dat`

function with the lines:

```
arg = list(nsub=40) # not in gen_dat, but to get snippet below working
O = matrix(1, nrow = arg$nsub/4, ncol = 1)
X = c(0, 0, 1, 1); Z = c(0, 1, 0, 1)
X = X %x% O; Z = Z %x% O
X = scale(X); Z = scale(Z)
XZ = X * Z
```

(see Henson,
2015 if you want more information about the general linear model for
factorial designs, eg ANOVAs). Note that the scaling of dummy regressors
does matter - they need to be Z-scored (`scale`

in R) so that
their mean=0 and SD=1 - on assumption that we do not know the true
scaling of the underlying factors being manipulated (or orthonormalised
if more than 2 levels per factor). The interaction term is then
calculated by the product of Z-scored X and Z regressors (see comment
above about mean-correction).

To visualise these regressors together with some example data (with an interaction):

```
set.seed(100)
Y = 0 + 0*X + 0*Z + 1*XZ + rnorm(40, mean=0, sd=1) # Just generate some example data with an interaction
colors = c("black", "green", "blue", "red")
matplot(cbind(Y, X, Z, XZ), type = "l", lwd = c(1,2,2,2), main="Data and GLM Design Matrix for 2x2 experimental design", col=colors, lty=c(3,3,2,1), ylab="Regressor value")
legend(35, -0.25, legend=c("Y", "X", "Z", "XZ"), col=colors, lty=c(3,3,2,1), cex=0.8)
```

The GLM finds the best linear combination of the three regressors (solid lines) to fit the data (dotted line); in this case, you should be able to see that the parameter estimates should be (on average) 1 for XZ regressor and 0 for the X and Z regressors.

The rest of `gen_dat`

deals with adding correlated data
across subjects (determined by the parameter `scor`

) if one
or both factors is a repeated measure, which we will consider for
within-subject designs later. First, let’s estimate power for a simple
case where each level of X and Z is measured on different subjects (i.e,
a fully between-subject design), i.e., we have nsub/4 subjects in each
of 4 groups.

To tell the code that X and Z are from an experimental design (rather
than field study) in which two levels were tested for each, we set the
`expt`

variable to 1. We can gather these into a list called
`arg`

to pass to the `pow`

function:

```
arg = list(nsim = nsim, nexp = nexp, nsub = nsub, Beta = Beta, nsd = nsd)
arg = list.append(arg, expt = 1)
#system.time({psig = pow(arg)}) # should be ~10s
plot_bar(pow(arg))
```

You should see three bars that are equivalent in terms of overlapping error bars (with mean power of ~35%) - but most importantly, the power for interaction XZ does not differ from that of both main effects X and Z.

If you want to convert to a fully within-subject design, you need to
set Xrep and Zrep parameters to 1, to indicate that both are
repeated-measures. The degree of correlation between measures is then
controlled by the parameter `scor`

, which we can set to 0.5
for example. This affects the covariance used to create the data in
`gen_dat`

, and switches to (slightly slower) `lme`

in `pow`

function, in order to model subjects as random
effects.

If you uncomment the first call to plot_bar below, you will see that power increases to ~60% relative to the between-subject design above, but remains equivalent between the interaction and the main effects. Perhaps more interesting is to make one factor (eg X) a repeated-measure, but keep the other factor (eg Z) as two groups of independent subjects:

```
arg$Xrep = 1; arg$Zrep = 1; arg$scor = 0.5;
#plot_bar(pow(arg))
arg$scor = 0.5; arg$Xrep = 1; arg$Zrep = 0;
plot_bar(pow(arg))
```

Now you will see that the interaction has the same power as the main effect of the repeated-measure factor (X), and both have greater power than the between-subject factor (Z). In other words, the advantage of having measurements of one factor repeated within subjects transfers to interactions involving that factor too.

Now let’s consider a study in which X and Z are continuous variables,
e.g, measurements of some variable in the environment (we will assume
zero noise in those measurements for moment; though see later). We can
do this by saying it is not an experimental study, ie
`arg$expt=0`

. This then generates the regressors afresh for
each simulated study (to minimise random error) from a multivariate
normal distribution paramaterised by variances `vX`

and
`vZ`

and covariance `cXZ`

, as in these lines of
`gen_dat`

:

```
arg = list(vX = 1, vZ = 1, cXZ = 0) # not in gen_dat, but to get snippet below working
C = matrix(c(arg$vX, arg$cXZ, arg$cXZ, arg$vZ), ncol = 2)
R = mvtnorm::rmvnorm(nsub, c(X = 0, Z = 0), C)
X = R[, 1]; Z = R[, 2]
X = (X - mean(X))
Z = (Z - mean(Z))
XZ = X * Z
```

The above example assumes X and Z are independent, ie covariance
`cXZ=0`

(i.e., orthogonal on average), as in first example
below. To visualise these regressors and some example data (without an
interaction):

```
set.seed(100)
Y = 0 + 0*X + 0*Z + 1*XZ + rnorm(40, mean=0, sd=1) # Just generate some example data with an interaction
colors = c("black", "green", "blue", "red")
matplot(cbind(Y, X, Z, XZ), type = "l", lwd = c(1,2,2,2), main="Data and GLM Design Matrix for 2x2 field design", col=colors, lty=c(3,3,2,1), ylab="Regressor value")
legend(35, -1, legend=c("Y", "X", "Z", "XZ"), col=colors, lty=c(3,3,2,1), cex=0.8)
```

Again, in this example generated with only an interaction, you should be able to see that the data closely follows the interaction regressor XZ.

To start with, we will assume X and Z have unit variance (sd=1), by
setting the `vX=1`

and `vZ=1`

, and are
independent, by setting `cXZ=0`

:

```
arg = list(nsim = nsim, nexp = nexp, nsub = nsub, Beta = Beta, nsd = nsd)
arg = list.append(arg, expt = 0, vX = 1, vZ = 1, cXZ = 0)
plot_bar(pow(arg))
```

In this case, the power for the interaction is comparable to that for
the main effects. It is interesting to look at the distributions of X, Z
and XZ (and Y), which we can do with `ggpairs`

(first
increasing the number of subjects for clearer distributions):

`arg2=arg; arg2$nsub = 1000; ggpairs(gen_dat(arg2))`

The interaction regressor XZ clearly has a non-Gaussian distribution (more kurtotic), since it is the element-wise product of two Gaussians. (Note the distribution of the predictors does not matter for MLR; only the distribution of the error matters, which must be Gaussian, which is ensured here by how the data are generated). However, the XZ regressor remains largely uncorrelated with the X and Z regressors, so has comparable power (we will see below how high correlation between regressors reduces the power to detect their unique effects).

In fact, if you increase the number of simulations
(e.g. `nsim=1000; nexp=1000`

), to reduce the error bars on
the power estimates, you will see (after several hours!) a small but
reliable decrease in power for the interaction XZ relative to the main
effects. I suspect this reflects a reduced effective “range” of the XZ
regressor owing to the higher-order terms in Equation 2 of McClelland
& Judd 1993. Nonetheless, at least for the parameter values
here, the loss in power is negligible compared to effects of more
correlated regressors and measurement noise considered below.

Consider what happens when X and Z have a smaller range, ie
`vX<1`

and `vZ<1`

(while being independent,
ie `cXZ=0`

):

```
arg$vX = 0.5; arg$vZ = 0.5; arg$cXZ = 0
plot_bar(pow(arg))
```

This is the first time when the interaction has appreciably less
power than the main effects. This is simply because the range (sd in
this case) of the interaction term is the product of the ranges for both
main effects. The smaller the range of a variable, the harder it is to
detect in the presence of additive noise. Since the sd’s of X and Z
(i.e, sqrt(0.5)) are less than 1, the sd of their product (even if not
Gaussian) must be even smaller (i.e, 0.5). More generally, the
interaction will tend to have less power than a main effect when the
product of the sd’s of the main effects is less than the sd of that main
effect (e.g, when `vX*vZ < vX`

when comparing to main
effect of X). On the other hand, if the sd’s of X and Z are greater than
1, then the interaction regressor will have a larger sd than the main
effects, and the interaction will therefore have higher power than the
main effects!

So when will the range (e.g. sd) of measured variables be smaller or greater than 1? This is difficult to answer in general, because it depends on whether X and Z are measured with meaningful scales (and that those scales are commensurate across X and Z). If their scales are not meaningful / commensurate, then they are usually standardised (i.e, Z-scored) before entering the MLR, which ensures they have sd=1 and therefore that the interaction will have comparable power (at least when X and Z are independent). Nonetheless, it should be remembered that if X and Z have meaningful scales (and are therefore not standardised), along which they vary little, then their interaction can be more difficult to detect.

Another situation where the interaction has less power is when the
variables are not measured exactly, i.e, there is noise in their
measurement. Strictly speaking, this is an “errors-in-variables”
situation, but in practice such error can be absorbed into the
`lm`

model’s single error term. This can be simulated by
setting the parameter `msd>0`

(note that the data are
generated in `gen_dat`

with the true values of X and Z, but
then fit in `pow`

with values of X and Z that have additional
Gaussian noise with sd of `msd`

).

```
arg = list(nsim = nsim, nexp = nexp, nsub = nsub, Beta = Beta, nsd = nsd)
arg = list.append(arg, expt = 0, vX = 1, vZ = 1, cXZ = 0)
arg$nsd = 2; arg$msd = 2 # reduce main noise (nsd) to get power comparable to previous examples
plot_bar(pow(arg))
```

The interaction is now more difficult to detect than the main effects because the measurement noise in X and Z has been amplified simply by their multiplication to create the regressor XZ. Therefore, if one has imperfect measurement of the true variables producing the data, then interactions will suffer in their sensitivity.

In a final example where the power for detecting an interaction is
reduced, we will generate cases where the interaction term is correlated
with one or both main effects (even after Z-scoring), and where one
decides to partition shared variance “sequentially” to each term in the
MLR, giving priority to main effects over interactions. This is a
situation where the order of regressors matters in a MLR, and arises
when interactions appear to “the right of” main effects in the
`lm`

equation. In the `pow`

function, sequential
testing is implemented by using the default `anova`

function
on the lm fit, which uses type I sum of squares, rather than the usual
`summary`

function.

We will also take this opportunity to illustrate a final option of
passing user-specified regressors directly to `pow`

via the
`arg$R`

parameter. Here we will generate a uniform
distribution for X, and make Z a squared version of X (before
mean-correcting X). Both will be standardised before entering the MLR,
to ensure their range is equivalent, but the lack of mean-correction of
X prior to multiplication with Z ensures that X is highly correlated
with Z, and also that the product XZ is correlated with both main
effects, as can be seen here:

```
arg = list(nsim = nsim, nexp = nexp, nsub = nsub, Beta = Beta, nsd = nsd, expt = 0)
X = seq(1,arg$nsub)/arg$nsub; Z = X^2; X = scale(X); Z = scale(Z); arg$R = cbind(X, Z)
arg2=arg; arg2$nsub = 1000; ggpairs(gen_dat(arg2))
```

With the default simultaneous testing of each regressor (after reducing noise just to get power off floor with such highly correlated regressors), the interaction term has most power (like in previous example of correlated regressors), because it is less correlated with X and Z than they are with each other:

```
arg$nsd=1
plot_bar(pow(arg))
```

However, when we apply sequential testing (effectively orthogonalising each term with respect to all terms to the left of it), the interaction now has least power:

```
arg$seqTest = 1;
plot_bar(pow(arg))
```

Nonetheless, there must always be an a priori reason to orthogonalise one regressor with respect to others in MLR (i.e, effectively assigning the shared variance to the other regressors). One could argue that main effects always have priority over interactions, but I would argue that this is again domain-specific knowledge, and in principle interactions are as interesting as main effects.

The relative power of interactions versus main effects in multiple
linear regression depends on a number of considerations, such as the
range of each regressor, the correlation between them, whether there is
additional measurement noise in regressor values, and whether
significance is assessed with simultaneous or sequential testing.
Notwithstanding the caveats mentioned at the start of this piece (e.g,
equal effect sizes), I would argue that, in experimental designs, where
the scaling of factors is arbitrary (ordinal), and main effects and
interactions are orthogonal by design, interactions are equivalent in
their power to main effects, assuming all are either measured within- or
between-subjects (and when one factor is measured within-subject, the
interaction involving that factor gains the same power advantage over
between-subject effects). For field designs, with continuous, measured
variables, I would argue that interactions still do not
*necessarily* have less power, but they can when 1) there is
measurement noise in the variables, 2) when the variables have a
meaningful scale (such that their standardisation is inappropriate) and
their range of values is low, or 3) sequential testing is employed such
that interaction terms are effectively de-emphasised by orthogonalising
(residualising) them with respect to (correlated) main effects.