Estimating noise ceilings for the Pearson correlation coefficient

In this post, I derive a commonly used split-half estimator for correlational noise ceilings. This involves a detour through classical test theory and the Spearman-Brown prediction formula.

Published

20 April 2026

Many scientific experiments have the following setup: choose some condition $X \in \mathcal{X}$ , then produce a real-valued measurement $Y \in \mathbb{R}$ that is the average of multiple “reps” taken at condition $X$ . In my graduate school lab, the canonical example was choosing an image $X$ , then measuring the rep-averaged firing rate $Y$ of a particular neuron (or multiunit) in a monkey’s visual cortex.

This procedure had randomness: if we repeated this procedure using the same image $X$ , we would almost surely record a different rep-averaged firing rate $Y$ from that neuron. Because of this, there was a fundamental limit to how well we could predict that neuron’s firing rate based on the image that was shown. In other words, there was a noise ceiling.

Getting an estimate of the noise ceiling is important if one wants to build models $\hat{Y} = f(X)$ : it lets one distinguish when the model is genuinely failing as opposed to hitting irreducible randomness in the data.

Interestingly, this fairly universal situation can be addressed using an approach originally developed by the psychologists Charles Spearman and William Brown to deal with the rather specific problem that human responses to mental tests are noisy (Brown, 1910; Spearman, 1910). This approach is particularly popular in systems neuroscience, where one often sees Spearman and Brown name-dropped in methods sections – e.g. (Issa & DiCarlo, 2012; Ratan Murty et al., 2021; Lahner et al., 2024).

Conceptually, there are two parts to this blog post: presenting a definition of a noise ceiling, then presenting an estimator for it. The short version is this:

Noise ceiling definition = the highest Pearson correlation coefficient that any model $f$ could hope to achieve, over some distribution $(X, Y) \sim P$
Noise ceiling estimator = splitting your data into two halves (rep-wise), measuring their sample correlation, then applying the Spearman-Brown prediction formula.

Motivating a definition of a noise ceiling

To motivate this post’s working definition of a noise ceiling, I’ll first assume a particular modeling goal: that a model $\hat{Y} = f(X)$ is to be evaluated in terms of its population Pearson correlation coefficient (PCC) over some distribution $(X, Y) \sim P$ . This is a probabilistic frame, where experiments consist of draws from a joint distribution over conditions $X$ and measurements $Y$ .

There are ways in which this modeling goal might have limitations for your use case. For example, the PCC is invariant to rescaling and offsets of the model’s predictions. If you care about identifying a model $f$ that spits out predictions $\hat Y$ in the same units as your measurements $Y$ , rather than merely correlating positively, you might be better served using a goodness-of-fit metric like mean-squared-error.

But given that models are evaluated using the PCC, a natural noise ceiling definition is:

$\boxed{ \eta_{\star} := \operatorname{sup}_f \operatorname{Corr}(f(X), Y) }$

In words, $\eta_{\star}$ is the highest PCC that any model could hope to achieve with $Y$ .

At first glance, this might seem like an intractable definition: it’s not as if we can loop over all possible models $f$ , calculate their corresponding population PCCs, then select the highest one; there are an infinite number of models. But it turns out we can simplify this formula, and write $\eta_{\star}$ purely in terms of moments of $P$ .

Deriving the noise ceiling of the Pearson correlation coefficient

The derivation below is a kind of tedious and mechanical calculation. But the core idea is intuitive: the highest possible Pearson correlation coefficient (PCC) is achieved by a model $f_\star(X)$ that is a copy of the reproducible “signal” $s(X) := \mathbb{E}[Y | X]$ . So if we can calculate the PCC achieved by this model, we have our noise ceiling $\eta_\star$ .

First, by definition of the population Pearson correlation coefficient $\operatorname{Corr}$ , $\begin{aligned} \eta_{\star} &:= \operatorname{sup}_f \operatorname{Corr}(f(X), Y) \\ &= \operatorname{sup}_f \left( \frac{\operatorname{Cov}(f(X), Y)}{\sqrt{\operatorname{Var}{(f(X))}} \sqrt{\operatorname{Var}(Y)}} \right) \end{aligned}$

It is easy to show the term inside $\operatorname{sup}_f$ is invariant to scaling and offsets of $f(X)$ , so WLOG we shall consider functions $g(X)$ where $\mathbb{E}[g(X)] = 0$ and $\operatorname{Var}(g(X))=1$ . Thus, $\begin{aligned} \eta_{\star} &= \operatorname{sup}_{g: \mathbb{E}[g(X)]=0, \operatorname{Var}(g(X))=1} \left( \frac{\operatorname{Cov}(g(X), Y)}{\sqrt{\operatorname{Var}{(g(X))}} \sqrt{\operatorname{Var}(Y)}} \right) \\ &= \operatorname{sup}_{g: \mathbb{E}[g(X)]=0, \operatorname{Var}(g(X))=1} \left( \frac{\operatorname{Cov}(g(X), Y)}{\sqrt{\operatorname{Var}(Y)}} \right) \end{aligned}$

By an elementary property of covariance and the fact $\mathbb{E}[g(X)] =0$ , $\begin{aligned} \operatorname{Cov}(g(X), Y) &= \mathbb{E}[g(X)Y] - \mathbb{E}[g(X)]\mathbb{E}[Y] \\ &= \mathbb{E}[g(X)Y] \end{aligned}$

By the law of total covariance,

$\begin{aligned} \operatorname{Cov}(g(X), Y) &= \mathbb{E}[\operatorname{Cov}(g(X), Y | X)] + \operatorname{Cov}(\mathbb{E}[g(X) | X], \mathbb{E}[Y | X]) \end{aligned}$

As $g(X)$ has no variance conditional on $X$ , this simplifies to $\begin{aligned} \operatorname{Cov}(g(X), Y) &= \operatorname{Cov}(g(X), \mathbb{E}[Y | X]) \end{aligned}$

To simplify notation, let $s(X):=\mathbb{E}[Y|X]$ . Substituting,

$\begin{aligned} \eta_{\star} &= \operatorname{sup}_{g: \mathbb{E}[g(X)]=0, \operatorname{Var}(g(X))=1} \left( \frac{\operatorname{Cov}(g(X), s(X))}{\sqrt{\operatorname{Var}(Y)}} \right) \end{aligned}$

For any random variables $A$ and $B$ , we have the covariance inequality, by Cauchy-Schwarz:

$|\operatorname{Cov}(A, B)| \leq \sqrt{\operatorname{Var}(A) \operatorname{Var}(B)}$

Moreover, we know equality is achieved for the positive upper bound iff one variable is an affine function of the other with positive slope. Thus, a function of the form $g_{\star}(X) := \alpha s(X) + \beta$ with $\alpha > 0$ will achieve the upper bound of $\eta_\star$ . Recall we restricted ourselves to functions $g$ where $g(X)$ has mean zero and unit variance, so we know the upper bound $\eta_\star$ is realized by

$g_\star(X) = \frac{s(X) - \mathbb{E}[s(X)]}{\sqrt{\operatorname{Var}(s(X))}}$

Substituting,

$\begin{aligned} \eta_{\star} &= \frac{\operatorname{Cov}(g_\star(X), s(X))}{\sqrt{\operatorname{Var}(Y)}} \\ &= \frac{\sqrt{\operatorname{Var}(g_\star(X))}\sqrt{\operatorname{Var}(s(X))}}{\sqrt{\operatorname{Var}(Y)}} \end{aligned}$

We know $\operatorname{Var}(g_\star(X))=1$ , and we used the notation $s(X):=\mathbb{E}[Y|X]$ . Simplifying,

$\boxed{ \eta_{\star} = \sqrt{\frac{\operatorname{Var}(\mathbb{E}[Y|X])}{\operatorname{Var}(Y)}} }$

In words: the noise ceiling for the Pearson correlation coefficient is the square root of the signal-to-total-variance ratio.

An estimator for the noise ceiling $\eta_{\star}$ using split-half reliability

In practice, we cannot ever know the value of the population parameter $\eta_{\star}$ , as we don’t have direct access to the true values of $\operatorname{Var}(\mathbb{E}[Y|X])$ or $\operatorname{Var}(Y)$ . In a real experiment, we often only have access to $n$ samples $(X_1, Y_1), (X_2, Y_2), ..., (X_n, Y_n)$ drawn from $P$ .

Here, I’ll go over a strategy which is available when each of your measurements $Y_i$ is itself an average of $m$ repeated measurements taken at $X_i$ , or reps. Notationally, I’ll use $R_{ij}$ to denote the $j$ th repeated measurement taken at $X_i$ . So:

$Y_i = \frac{1}{m}\sum_{j=1}^{m} R_{ij}$

If your data look like this, then you can use the split-half method to estimate the noise ceiling $\eta_\star$ . The basic move is to split the reps at each condition into two disjoint halves, average the reps within each half, and then ask how correlated those two half-datasets are across conditions.

Let’s write down the notation. For a random split $b$ , let $A_i^{(b)}$ and $B_i^{(b)}$ denote the two sets of rep indices for condition $X_i$ . These sets are disjoint, each has size $m/2$ , and together they contain the rep indices $\{1, ..., m\}$ . For that split, define the two half-averaged measurements:

$\begin{aligned} Y_i^{(b, 1)} &:= \frac{1}{|A_i^{(b)}|}\sum_{j \in A_i^{(b)}} R_{ij} \\ Y_i^{(b, 2)} &:= \frac{1}{|B_i^{(b)}|}\sum_{j \in B_i^{(b)}} R_{ij} \end{aligned}$

Now stack these half-averaged measurements across the $n$ sampled experimental conditions:

$\begin{aligned} \mathbf{Y}^{(b, 1)} &:= (Y_1^{(b, 1)}, Y_2^{(b, 1)}, ..., Y_n^{(b, 1)}) \\ \mathbf{Y}^{(b, 2)} &:= (Y_1^{(b, 2)}, Y_2^{(b, 2)}, ..., Y_n^{(b, 2)}) \end{aligned}$

The split-half correlation for split $b$ is then just the sample Pearson correlation between these two length- $n$ vectors:

$\hat{r}^{(b)}_\text{half} := \hat{r}(\mathbf{Y}^{(b, 1)}, \mathbf{Y}^{(b, 2)})$

Now, before going any further, I’ll just present the estimator for the noise ceiling $\eta_\star$ without explanation:

$\boxed{ \hat \eta_\star = \frac{1}{B}\sum_{b=1}^{B} \sqrt{\frac{2 \hat{r}^{(b)}_\text{half}}{1 + \hat{r}^{(b)}_\text{half}}} }$

Where $B$ is the number of iterations, with larger $B$ being better. In the last section of this post, I’ll describe how this formula was derived.

Where the estimator comes from

To understand where the estimator $\hat \eta_\star$ comes from, we must take a brief detour into the land of classical test theory, before returning back to the more general setting.

I’ll organize the derivation of the estimator into four steps:

Introducing the concept of reliability from classical test theory.
Deriving the Spearman-Brown prediction formula.
Establishing the connection between split-half correlations and the noise ceiling $\eta_\star$
Assembling the final, Monte Carlo estimator for $\eta_\star$ .

1. Reliability

In classical test theory, the objects of study are tests. A test consists of a set of $m$ items, each of which yields a real-valued score when administered to a person. These $m$ individual item scores are averaged together to result in an overall test score which reflects something about an individual’s underlying psychological construct, like his or her ability to perform multiplication.

Suppose you administered a test to a population of people, and you observed variation in their test scores. A key methodological concern that classical test theory addresses is: how much of the raw test score variation is due to true variation in psychological constructs, as opposed to random noise inherent to the test taking process?

To measure this, classical test theory introduced a real-valued quantity called reliability, denoted by the symbol $\rho$ :

$\rho := \frac{\operatorname{Var}(\mathbb{E}[\text{Test Score} | \text{Person}])}{\operatorname{Var}(\text{Test Score})}$

This should look suspiciously similar to the definition of the noise ceiling for the Pearson correlation coefficient (PCC), $\eta_\star$ , from before. Indeed, if we translate things into the notation we’ve been using:

Let $X$ denote the $\text{Person}$ taking the test
Let $Y$ denote the $\text{Test Score}$ , which is the average of $m$ item scores. If we let $R_j$ denote the score on the $j$ th item, then $Y:= \frac{1}{m}\sum_{j=1}^m R_j$ .

Then we see that reliability is equal to the square of the noise ceiling:

$\rho = \frac{\operatorname{Var}(\mathbb{E}[Y | X])}{\operatorname{Var}(Y)} = \eta_\star^2$

Equivalently, the noise ceiling for the Pearson correlation coefficient is the square root of the reliability. $\boxed{ \eta_\star = \sqrt{\rho} }$

It’s a little mind-bending to understand what a noise ceiling means here, if you’re more used to thinking about noise ceilings for neurons (as I am). In the classical test taking setting, the noise ceiling $\eta_\star$ indicates the highest PCC a model that maps people $X$ to test scores $Y$ could achieve.

It’s also worth pausing to note that the terminology can be confusing: the symbol $\rho$ is often used to denote both reliability and population Pearson correlation coefficients. In this post, I use $\operatorname{Corr}$ for the population PCC, and $\rho$ for reliability.

2. Deriving the Spearman-Brown prediction formula

Now we’re equipped with the needed vocabulary to present the Spearman-Brown prediction formula. This formula predicts the reliability $\rho$ of a test after increasing the number of test items by a factor of $k$ . The key assumption in this formula is that the test items are parallel: conditioned on a test taker, the item scores each have the same expected value, and the item score residuals are independent with the same variance.

The Spearman-Brown prediction formula is:

$\rho_k = \frac{k \rho}{1 + (k-1) \rho}$

To derive the formula, one starts by calculating the reliability of the original, base-length test $Y$ . Recall that $X$ denotes the person taking the test.

$\rho = \frac{\operatorname{Var}(\mathbb{E}[Y|X])}{\operatorname{Var}(Y)}$

Next, recall that a test score $Y$ is the average of $m$ items:

$Y := \frac{1}{m} \sum_{j=1}^{m} R_j$

A key assumption is that item scores $R_j$ have the same expected value conditional on the test taker $X$ . That is, there exists some function $s$ where $s(X) = \mathbb{E}[R_j | X], \forall j$ . We will also let $E_j := R_j - s(X)$ . Substituting,

$\begin{aligned} Y &= \frac{1}{m} \sum_{j=1}^{m}(s(X) + E_j) \\ &= s(X) + \frac{1}{m} \sum_{j=1}^{m} E_j \end{aligned}$

Calculating the variance of $Y$ ,

$\begin{aligned} \operatorname{Var}(Y) &= \operatorname{Var}\left( s(X) + \frac{1}{m} \sum_{j=1}^{m} E_j\right) \\ \end{aligned}$

By assumption, all residuals $E_j$ are independent of $s(X)$ . Thus,

$\begin{aligned} \operatorname{Var}(Y) &= \operatorname{Var}(s(X)) +\operatorname{Var}\left( \frac{1}{m} \sum_{j=1}^{m} E_j\right) \\ \end{aligned}$

By assumption, the residuals $E_j$ are independent across items and have common variance $\operatorname{Var}(E)$ :

$\begin{aligned} \operatorname{Var}(Y) &= \operatorname{Var}(s(X)) + \frac{1}{m} \operatorname{Var}(E) \\ \end{aligned}$

Substituting into the reliability formula,

$\begin{aligned} \rho &= \frac{\operatorname{Var}(\mathbb{E}[Y|X])}{\operatorname{Var}(Y)} \\ &= \frac{\operatorname{Var}(s(X))}{\operatorname{Var}(s(X)) + \frac{1}{m} \operatorname{Var}(E)} \end{aligned}$

Suppose you increased the number of items by a factor of $k$ . Denote this reliability by $\rho_k$ :

$\begin{aligned} \rho_k &= \frac{\operatorname{Var}(s(X))}{\operatorname{Var}(s(X)) + \frac{1}{km} \operatorname{Var}(E)} \end{aligned}$

One can rewrite $\rho_k$ in terms of $\rho$ , using simple algebraic manipulations. First, substitute $\operatorname{Var}(s(X)) = \rho \left(\operatorname{Var}(s(X)) + \frac{1}{m} \operatorname{Var}(E)\right)$

$\begin{aligned} \rho_k &= \frac{\rho \left(\operatorname{Var}(s(X)) + \frac{1}{m}\operatorname{Var}(E)\right)} {\rho \left(\operatorname{Var}(s(X)) + \frac{1}{m}\operatorname{Var}(E)\right) + \frac{1}{km}\operatorname{Var}(E)} \\ &= \frac{k\rho \left(\operatorname{Var}(s(X)) + \frac{1}{m}\operatorname{Var}(E)\right)} {k\rho \left(\operatorname{Var}(s(X)) + \frac{1}{m}\operatorname{Var}(E)\right) + \frac{1}{m}\operatorname{Var}(E)} \\ &= \frac{k\rho \left(\operatorname{Var}(s(X)) + \frac{1}{m}\operatorname{Var}(E)\right)} {k\rho \left(\operatorname{Var}(s(X)) + \frac{1}{m}\operatorname{Var}(E)\right) + \frac{1}{m}\operatorname{Var}(E) + \operatorname{Var}(s(X)) - \operatorname{Var}(s(X))} \\ &= \frac{k\rho} {k\rho + 1 - \frac{\operatorname{Var}(s(X))}{\operatorname{Var}(s(X)) + \frac{1}{m}\operatorname{Var}(E)}} \\ &= \frac{k\rho}{k\rho + 1 - \rho} \\ &= \frac{k\rho}{1 + (k-1)\rho} ~\square \\ \end{aligned}$

3. The relationship of split-half correlations to the noise ceiling $\eta_\star$

One can show that the population Pearson correlation coefficient between two split halves $Y^{(1)}$ and $Y^{(2)}$ of an experiment with $m$ reps is equal to the reliability $\rho_{\text{half}}$ of a hypothetical experiment with $m/2$ reps:

$\begin{aligned} \operatorname{Corr}(Y^{(1)}, Y^{(2)}) = \frac{\operatorname{Cov}(Y^{(1)}, Y^{(2)})}{\sqrt{\operatorname{Var}(Y^{(1)})\operatorname{Var}(Y^{(2)})}} \end{aligned}$

As $Y^{(1)}$ and $Y^{(2)}$ are conditionally i.i.d. on $X$ , let $\operatorname{Var}(Y^{(h)}) := \operatorname{Var}(Y^{(1)})=\operatorname{Var}(Y^{(2)})$ . Moreover, they have the same means conditional on $X$ , so through the law of total covariance this implies $\operatorname{Cov}(Y^{(1)}, Y^{(2)}) = \operatorname{Var}(\mathbb{E}[Y^{(h)}|X])$ . So,

$\begin{aligned} \operatorname{Corr}(Y^{(1)}, Y^{(2)}) = \frac{\operatorname{Var}(\mathbb{E}[Y^{(h)}|X])}{\operatorname{Var}(Y^{(h)})} \end{aligned}$

By definition, this is equal to the reliability of an experiment with $m/2$ reps. So,

$\rho_\text{half} = \operatorname{Corr}(Y^{(1)}, Y^{(2)})$

Now, to obtain the reliability $\rho_\text{full}$ of the “full” experiment with $m$ reps from the reliability $\rho_\text{half}$ of the experiment with $m/2$ reps, we simply apply the $k=2$ case of the Spearman-Brown formula,

$\rho_\text{full} = \frac{2\rho_\text{half}}{1 + \rho_\text{half}} = \frac{2 \operatorname{Corr}(Y^{(1)}, Y^{(2)})}{1+\operatorname{Corr}(Y^{(1)}, Y^{(2)})}$

Finally, we know from above that $\eta_\star = \sqrt{\rho_\text{full}}$ . Thus,

$\boxed{ \eta_\star = \sqrt{\frac{2 \operatorname{Corr}(Y^{(1)}, Y^{(2)})}{1+\operatorname{Corr}(Y^{(1)}, Y^{(2)})}} }$

4. Assembling the estimator

We are now equipped to assemble an estimator $\hat \eta_\star$ . The guiding idea is the plug-in principle, where we will “plug in” sample-based estimates for the population parameters in the formula for $\eta_\star$ .

Namely, we use the sample Pearson correlation coefficient $\hat r$ as an estimator for $\operatorname{Corr}$ . Given random split halves $\mathbf{Y}^{(b, 1)}$ and $\mathbf{Y}^{(b, 2)}$ , we have the estimator:

$\hat{r}^{(b)}_\text{half} = \hat{r}(\mathbf{Y}^{(b, 1)}, \mathbf{Y}^{(b, 2)})$

We then simply plug this in to the population formula for $\eta_\star$ to obtain the estimator:

$\begin{aligned} \hat{\eta}^{(b)}_\star &= \sqrt{ \frac{2 \hat{r}^{(b)}_\text{half}}{1 + \hat{r}^{(b)}_\text{half}} }. \end{aligned}$

In practice, this procedure is repeated over $B$ random splits, and the resulting estimates are averaged. This averaging reduces the variation in our final estimate, as the choice of split $b$ is random. This leads to the final estimator for the noise ceiling:

$\boxed{ \hat \eta_\star = \frac{1}{B}\sum_{b=1}^{B} \sqrt{\frac{2 \hat{r}^{(b)}_\text{half}}{1 + \hat{r}^{(b)}_\text{half}}} }$

In general, the estimator is biased because sample correlations, the Spearman-Brown correction, and the square root are all finite-sample and nonlinear operations. Moreover, it can behave badly if split-half correlations are near zero or are negative.

The estimator is more defensible as the number of sampled conditions and number of sampled reps grow. To give an arbitrary reference point, folks in the lab would regularly use this formula to estimate noise ceilings for visual neuron spike rates when they had datasets consisting of ~100s of images, with ~50 reps each.

In the derivation above, we assumed every condition had the same number of reps $m$ , and that each split half contained $m/2$ reps. In the more realistic scenario where rep counts vary across conditions, the above derivation no longer applies exactly. People often still use this estimator heuristically, especially when the rep counts are similar or all reasonably large.

Closing thoughts

I was first taught about the estimator $\hat \eta_\star$ as a first year graduate student from my PhD advisor and senior lab colleagues, who passed it down almost like a piece of lab folklore. It was pretty mysterious to me how a method developed to analyze psychological survey data would end up becoming a workhorse in analyzing spike rates from visually driven neurons. It felt like I was learning some sort of esoteric statistical spell, not taught in the classroom.

Indeed, I had never even encountered the notion of a noise ceiling in any of the introductory statistics courses I had taken. When it came to the study of Pearson correlation coefficients, the focus was usually on conducting null hypothesis testing against $H_0: \operatorname{Corr}(f(X), Y) = 0$ . The focus on noise ceilings provided an interesting complement to this frame, in that it encourages one to start thinking about $H_0: \operatorname{Corr}(f(X), Y) = \eta_\star$ .

As a final note, it’s important to note that the split-half estimator in this post is not the only possible estimator of $\eta_\star$ . Since $\eta_\star$ is defined in terms of $\operatorname{Var}(\mathbb{E}[Y|X])$ and $\operatorname{Var}(Y)$ , one could instead try to estimate these variance components directly, instead of going through the ceremony of generating split-halves. I actually suspect that’s a better method, in some sense. Maybe that’s a topic for a future blog post. This one ended up being much longer than I thought!

Brown, W. (1910). Some experimental results in the correlation of mental abilities 1. British Journal of Psychology, 1904-1920, 3(3), 296–322.
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3(3), 271.
Issa, E. B., & DiCarlo, J. J. (2012). Precedence of the eye region in neural processing of faces. Journal of Neuroscience, 32(47), 16666–16682.
Ratan Murty, N. A., Bashivan, P., Abate, A., DiCarlo, J. J., & Kanwisher, N. (2021). Computational models of category-selective brain regions enable high-throughput tests of selectivity. Nature Communications, 12(1), 5540.
Lahner, B., Dwivedi, K., Iamshchinina, P., Graumann, M., Lascelles, A., Roig, G., Gifford, A. T., Pan, B., Jin, S. Y., Ratan Murty, N. A., & others. (2024). Modeling short visual events through the BOLD moments video fMRI dataset and metadata. Nature Communications, 15(1), 6241.