Hey information theorists/terrorists: what is a good way to measure information in a multivariate setting?
I can compute the per-variable and joint Shannon entropy just fine, but would like something that priorities a big per-variable decrease over a slightly higher joint decrease. (It’s better if I can determine two variables completely instead of something complex that allows me to kind-of determine the combination of five.)
I could hack something essentially lexicographically comparing sorted vectors of per-variable decreases, but wonder if there is any best-practice know place…