Couple of weeks ago, I was hunting for a non-linear association measure for a use case I was working on. That’s when I came across this paper that introduces “Rearrangement Correlation.” It provides a fresh take on the tried-and-tested Pearson’s r. I couldn’t use the result for my specific problem. Nonetheless, the paper is a pretty cool demonstration of how first-principles thinking can unearth patterns that are hidden in plain sight.
The core idea is based on the fact that:
Pearson’s r is essentially scaled covariance, with the scaling factor determined by the Cauchy-Schwarz inequality.
Decades ago, it was shown that if we use a bound looser than the Cauchy-Schwarz, the ability of Pearson’s r to capture linear relationships reduces. The looser the bound, the more crippled/smaller the space of capture-able linear relationships become. An extreme example of this is the Concordance Coefficient. It can only capture identical relationships (e.g., Y = ±X). It cannot even capture additive relationships like X+Y. Why? Because, it uses a bound much looser than the Cauchy-Schwarz.
But what if we do the complete opposite? What if we use a bound tighter than the Cauchy-Schwarz? Would this expand the range of relationships that can be captured, effectively the non-linear and monotonic ones?
It turns out the answer is yes! This paper here explains how by introducing a really bad name for the procedure: Rearrangement Correlation.
Now, the most important question: we already have Spearman’s rank-order correlation that is capable of capturing non-linear monotonic relationship. Then why bother? I don’t know. The paper didn’t do a good job explaining why Rearrangement Correlation will have any competitive advantage against Spearman’s. Probably, you don’t need to sort the data prior to computing the correlation?
Anyway, it was a fun read nonetheless!
