DiscrimTwoSample¶
-
class
hyppo.discrim.
DiscrimTwoSample
(is_dist=False, remove_isolates=True)¶ Two Sample Discriminability test statistic and p-value.
Two sample test measures whether the discriminability is different for one dataset compared to another. More details can be described in [1].
Let \(\hat D_{x_1}\) denote the sample discriminability of one approach, and \(\hat D_{x_2}\) denote the sample discriminability of another approach. Then,
\[\begin{split}H_0: D_{x_1} &= D_{x_2} \\ H_A: D_{x_1} &> D_{x_2}\end{split}\]Alternatively, tests can be done for \(D_{x_1} < D_{x_2}\) and \(D_{x_1} \neq D_{x_2}\).
Methods Summary
Helper function that calculates the discriminability test statistic. |
|
|
Calculates the test statistic and p-value for a two sample test for discriminability. |
-
DiscrimTwoSample.
statistic
(x, y)¶ Helper function that calculates the discriminability test statistic.
- Parameters
x, y (
ndarray
offloat
) -- Input data matrices. x and y must have the same number of samples. That is, the shapes must be (n, p) and (n, q) where n is the number of samples and p and q are the number of dimensions. Alternatively, x and y can be distance matrices, where the shapes must both be (n, n).- Returns
stat (
float
) -- The computed two sample discriminability statistic.
-
DiscrimTwoSample.
test
(x1, x2, y, reps=1000, alt='neq', workers=- 1, random_state=None)¶ Calculates the test statistic and p-value for a two sample test for discriminability.
- Parameters
x1, x2 (
ndarray
offloat
) -- Input data matrices. x1 and x2 must have the same number of samples. That is, the shapes must be (n, p) and (n, q) where n is the number of samples and p and q are the number of dimensions. Alternatively, x1 and x2 can be distance matrices, where the shapes must both be (n, n), andis_dist
must set toTrue
in this case.y (
ndarray
offloat
) -- A vector containing the sample ids for our n samples. Should be matched to the inputs such thaty[i]
is the corresponding label forx_1[i, :]
andx_2[i, :]
.reps (
int
,optional (default
:1000)
) -- The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.alt (
{"greater", "less", "neq"}
(default:"neq"
)
) -- The alternative hypothesis for the test. Can test that first dataset is more discriminable (alt = "greater"), less discriminable (alt = "less") or unequal discriminability (alt = "neq").workers (
int
,optional (default
:-1)
) -- The number of cores to parallelize the p-value computation over. Supply -1 to use all cores available to the Process.
- Returns
Examples
>>> import numpy as np >>> from hyppo.discrim import DiscrimTwoSample >>> x1 = np.ones((100,2), dtype=float) >>> x2 = np.concatenate([np.zeros((50, 2)), np.ones((50, 2))], axis=0) >>> y = np.concatenate([np.zeros(50), np.ones(50)], axis=0) >>> discrim1, discrim2, pvalue = DiscrimTwoSample().test(x1, x2, y) >>> '%.1f, %.1f, %.2f' % (discrim1, discrim2, pvalue) '0.5, 1.0, 0.00'