SmoothCFTest¶
-
class
hyppo.ksample.
SmoothCFTest
(num_randfreq=5)¶ Smooth Characteristic Function test statistic and p-value
The Smooth Characteristic Function test is a two-sample test that uses differences in the smoothed (analytic) characteristic function of two data distributions in order to determine how different the two data are 1.
- Parameters
num_randfreq (
int
) -- Used to construct random array with size(p, q)
where p is the number of dimensions of the data and q is the random frequency at which the test is performed. These are the random test points at which test occurs (see notes).
Notes
The test statistic takes on the following form:
\[nW_n\Sigma_n^{-1}W_n\]As seen in the above formulation, this test-statistic takes the same form as the Hotelling \(T^2\) statistic. However, the components are defined differently in this case. Given data sets X and Y, define the following as \(Z_i\), the vector of differences:
\[Z_i = (k(X_i, T_1) - k(Y_i, T_1), \ldots, k(X_i, T_J) - k(Y_i, T_J)) \in \mathbb{R}^J\]The above is the vector of differences between kernels at test points, \(T_j\). This same formulation is used in the Mean Embedding Test. Moving forward, \(W_n\) can be defined:
\[W_n = \frac{1}{n} \sum_{i = 1}^n Z_i\]This leaves \(\Sigma_n\), the covariance matrix as:
\[\Sigma_n = \frac{1}{n}ZZ^T\]In the specific case of the Smooth Characteristic function test, the vector of differences can be defined as follows:
\[Z_i = (f(X_i)\sin(X_iT_1) - f(Y_i)\sin(Y_iT_1), f(X_i)\cos(X_iT_1) - f(Y_i)\cos(Y_iT_1),\cdots) \in \mathbb{R}^{2J}\]Once \(S_n\) is calculated, a threshold \(r_{\alpha}\) corresponding to the \(1 - \alpha\) quantile of a Chi-squared distribution w/ J degrees of freedom is chosen. Null is rejected if \(S_n\) is larger than this threshold.
References
- 1
Kacper Chwialkowski, Aaditya Ramdas, Dino Sejdinovic, and Arthur Gretton. Fast two-sample testing with analytic representations of probability measures. arXiv:1506.04725 [math, stat], 2015.
Methods Summary
|
Calculates the smooth CF test statistic. |
|
Calculates the smooth CF test statistic and p-value. |
-
SmoothCFTest.
statistic
(x, y, random_state)¶ Calculates the smooth CF test statistic.
- Parameters
- Returns
stat (
float
) -- The computed Smooth CF statistic.
-
SmoothCFTest.
test
(x, y, random_state=None)¶ Calculates the smooth CF test statistic and p-value.
- Parameters
- Returns
Examples
>>> import numpy as np >>> from hyppo.ksample import SmoothCFTest >>> np.random.seed(1234) >>> x = np.random.randn(500, 10) >>> y = np.random.randn(500, 10) >>> stat, pvalue = SmoothCFTest().test(x, y, random_state=1234) >>> '%.2f, %.3f' % (stat, pvalue) '4.70, 0.910'