MeanEmbeddingTest¶
-
class
hyppo.ksample.
MeanEmbeddingTest
(num_randfreq=5)¶ Mean Embedding test statistic and p-value.
The Mean Embedding test is a two-sample test that uses differences in (analytic) mean embeddings of two data distributions in a reproducing kernel Hilbert space. 1.
- Parameters
num_randfreq (
int
) -- Used to construct random array with size(p, q)
where p is the number of dimensions of the data and q is the random frequency at which the test is performed. These are the random test points at which test occurs (see notes).
Notes
The test statistic, like the Smooth CF statistic, takes on the following form:
\[W_n\Sigma_n^{-1}W_n\]As seen in the above formulation, this test-statistic takes the same form as the Hotelling \(T^2\) statistic found in
hyppo.ksample.Hotelling
. However, the components are defined differently in this case. Given data sets X and Y, define the following as \(Z_i\), the vector of differences:\[Z_i = (k(X_i, T_1) - k(Y_i, T_1), \ldots, k(X_i, T_J) - k(Y_i, T_J)) \in \mathbb{R}^J\]The above is the vector of differences between kernels at test points, \(T_j\). The kernel maps into the reproducing kernel Hilbert space. This same formulation is used in the Mean Embedding Test. Moving forward, \(W_n\) can be defined:
\[W_n = \frac{1}{n} \sum_{i = 1}^n Z_i\]This leaves \(\Sigma_n\), the covariance matrix as:
\[\Sigma_n = \frac{1}{n}ZZ^T\]Once \(S_n\) is calculated, a threshold \(r_{\alpha}\) corresponding to the \(1 - \alpha\) quantile of a Chi-squared distribution w/ J degrees of freedom is chosen. Null is rejected if \(S_n\) is larger than this threshold.
References
- 1
Kacper Chwialkowski, Aaditya Ramdas, Dino Sejdinovic, and Arthur Gretton. Fast two-sample testing with analytic representations of probability measures. arXiv:1506.04725 [math, stat], 2015.
Methods Summary
|
Calculates the mean embedding test statistic. |
|
Calculates the mean embedding test statistic and p-value. |
-
MeanEmbeddingTest.
statistic
(x, y, random_state)¶ Calculates the mean embedding test statistic.
- Parameters
- Returns
stat (
float
) -- The computed mean embedding statistic.
-
MeanEmbeddingTest.
test
(x, y, random_state=None)¶ Calculates the mean embedding test statistic and p-value.
- Parameters
- Returns
Examples
>>> import numpy as np >>> from hyppo.ksample import MeanEmbeddingTest >>> np.random.seed(1234) >>> x = np.random.randn(500, 10) >>> y = np.random.randn(500, 10) >>> stat, pvalue = MeanEmbeddingTest().test(x, y, random_state=1234) >>> '%.2f, %.3f' % (stat, pvalue) '5.33, 0.377'