Module dstats.tests

Hypothesis testing beyond simple CDFs. All functions work with input ranges with elements implicitly convertible to double unless otherwise noted.

Author

David Simcha

Functions

NameDescription
binomialTest(k, n, p) Two-sided binomial test for whether P(success) == p. The one-sided alternatives are covered by dstats.distrib.binomialCDF and binomialCDFR. k is the number of successes observed, n is the number of trials, p is the probability of success under the null.
chiSquareContingency(inputData) Performs a Pearson's chi-square test on a contingency table of arbitrary dimensions. When the chi-square test is mentioned, this is usually the one being referred to. Takes a set of finite forward ranges, one for each column in the contingency table. These can be expressed either as a tuple of ranges or a range of ranges. Returns a P-value for the alternative hypothesis that frequencies in each row of the contingency table depend on the column against the null that they don't.
chiSquareFit(observed, expected, countProp) Performs a one-way Pearson's chi-square goodness of fit test between a range of observed and a range of expected values. This is a useful statistical test for testing whether a set of observations fits a discrete distribution.
chiSquareObs(x, y) Given two vectors of observations of jointly distributed variables x, y, tests the null hypothesis that values in x are independent of the corresponding values in y. This is done using Pearson's Chi-Square Test. For a similar test that assumes the data has already been tabulated into a contingency table, see chiSquareContingency.
correlatedAnova(dataIn) Performs a correlated sample (within-subjects) ANOVA. This is a generalization of the paired T-test to 3 or more treatments. This function accepts data as either a tuple of ranges (1 for each treatment, such that a given index represents the same subject in each range) or similarly as a range of ranges.
dAgostinoK(range) A test for normality of the distribution of a range of values. Based on the assumption that normally distributed values will have a sample skewness and sample kurtosis very close to zero.
falseDiscoveryRate(pVals, dep) Computes the false discovery rate statistic given a list of p-values, according to Benjamini and Hochberg (1995) (independent) or Benjamini and Yekutieli (2001) (dependent). The Dependency parameter controls whether hypotheses are assumed to be independent, or whether the more conservative assumption that they are correlated must be made.
fisherExact(contingencyTable, alt) Fisher's Exact test for difference in odds between rows/columns in a 2x2 contingency table. Specifically, this function tests the odds ratio, which is defined, for a contingency table c, as (c[0][0] * c[1][1]) / (c[1][0] * c[0][1]). Alternatives are Alt.less, meaning true odds ratio < 1, Alt.greater, meaning true odds ratio > 1, and Alt.twoSided, meaning true odds ratio != 1.
fisherExact(contingencyTable, alt) Convenience function. Converts a dynamic array to a static one, then calls the overload.
fishersMethod(pVals) Fisher's method of meta-analyzing a set of P-values to determine whether there are more significant results than would be expected by chance. Based on a chi-square statistic for the sum of the logs of the P-values.
friedmanTest(dataIn) The Friedman test is a non-parametric within-subject ANOVA. It's useful when parametric assumptions cannot be made. Usage is identical to correlatedAnova().
fTest(data) The F-test is a one-way ANOVA extension of the T-test to >2 groups. It's useful when you have 3 or more groups with equal variance and want to test whether their means are equal. Data can be input as either a tuple or a range. This may contain any combination of ranges of numeric types, MeanSD structs and Summary structs.
gTestContingency(inputData) The G or likelihood ratio chi-square test for contingency tables. Roughly the same as Pearson's chi-square test (chiSquareContingency), but may be more accurate in certain situations and less accurate in others.
gTestFit(observed, expected, countProp) The G or likelihood ratio chi-square test for goodness of fit. Roughly the same as Pearson's chi-square test (chiSquareFit), but may be more accurate in certain situations and less accurate in others. However, it is still based on asymptotic distributions, and is not exact. Usage is is identical to chiSquareFit.
gTestObs(x, y) Given two ranges of observations of jointly distributed variables x, y, tests the null hypothesis that values in x are independent of the corresponding values in y. This is done using the Likelihood Ratio G test. Usage is similar to chiSquareObs. For an otherwise identical test that assumes the data has already been tabulated into a contingency table, see gTestContingency.
hochberg(pVals) Uses the Hochberg procedure to control the familywise error rate assuming that hypothesis tests are independent. This is more powerful than Holm-Bonferroni correction, but requires the independence assumption.
holmBonferroni(pVals) Uses the Holm-Bonferroni method to adjust a set of P-values in a way that controls the familywise error rate (The probability of making at least one Type I error). This is basically a less conservative version of Bonferroni correction that is still valid for arbitrary assumptions and controls the familywise error rate. Therefore, there aren't too many good reasons to use regular Bonferroni correction instead.
kendallCorTest(range1, range2, alt, exactThresh) Tests the hypothesis that the Kendall Tau-b between two ranges is different from 0. Alternatives are Alt.less (kendallCor(range1, range2) < 0), Alt.greater (kendallCor(range1, range2) > 0) and Alt.twoSided (kendallCor(range1, range2) != 0).
kruskalWallis(dataIn) The Kruskal-Wallis rank sum test. Tests the null hypothesis that data in each group is not stochastically ordered with respect to data in each other groups. This is a one-way non-parametric ANOVA and can be thought of as either a generalization of the Wilcoxon rank sum test to >2 groups or a non-parametric equivalent to the F-test. Data can be input as either a tuple of ranges (one range for each group) or a range of ranges (one element for each group).
ksTest(F, Fprime) Performs a Kolmogorov-Smirnov (K-S) 2-sample test. The K-S test is a non-parametric test for a difference between two empirical distributions or between an empirical distribution and a reference distribution.
ksTest(Femp, F) One-sample Kolmogorov-Smirnov test against a reference distribution. Takes a callable object for the CDF of refernce distribution.
ksTestDestructive(F, Fprime) Same as ksTest, except sorts in place, avoiding memory allocations.
ksTestDestructive(Femp, F) Ditto.
levenesTest(data) Tests the null hypothesis that the variances of all groups are equal against the alternative that heteroscedasticity exists. data must be either a tuple of ranges or a range of ranges. central is an alias for the measure of central tendency to be used. This can be any function that maps a forward range of numeric types to a numeric type. The commonly used ones are median (default) and mean (less robust). Trimmed mean is sometimes useful, but is currently not implemented in dstats.summary.
multinomialTest(countsIn, proportions) The exact multinomial goodness of fit test for whether a set of counts fits a hypothetical distribution. counts is an input range of counts. proportions is an input range of expected proportions. These are normalized automatically, so they can sum to any value.
pairedTTest(before, after, testMean, alt, confLevel) Paired T test. Tests the hypothesis that the mean difference between corresponding elements of before and after is testMean. Alternatives are Alt.less, meaning the that the true mean difference (before[i] - after[i]) is less than testMean, Alt.greater, meaning the true mean difference is greater than testMean, and Alt.twoSided, meaning the true mean difference is not equal to testMean.
pairedTTest(diffSummary, testMean, alt, confLevel) Compute a paired T test directly from summary statistics of the differences between corresponding samples.
pearsonCorTest(range1, range2, alt, confLevel) Tests the hypothesis that the Pearson correlation between two ranges is different from some 0. Alternatives are Alt.less (pearsonCor(range1, range2) < 0), Alt.greater (pearsonCor(range1, range2) 0) and Alt.twoSided (pearsonCor(range1, range2) != 0).
pearsonCorTest(cor, N, alt, confLevel) Same as overload, but uses pre-computed correlation coefficient and sample size instead of computing them.
runsTest(obs, alt) Wald-wolfowitz or runs test for randomness of the distribution of elements for which positive() evaluates to true. For example, given a sequence of coin flips [H,H,H,H,H,T,T,T,T,T] and a positive() function of "a == 'H'", this test would determine that the heads are non-randomly distributed, since they are all at the beginning of obs. This is done by counting the number of runs of consecutive elements for which positive() evaluates to true, and the number of consecutive runs for which it evaluates to false. In the example above, we have 2 runs. These are the block of 5 consecutive heads at the beginning and the 5 consecutive tails at the end.
signTest(before, after, alt) Sign test for differences between paired values. This is a very robust but very low power test. Alternatives are Alt.less, meaning elements of before are typically less than corresponding elements of after, Alt.greater, meaning elements of before are typically greater than elements of after, and Alt.twoSided, meaning that there is a significant difference in either direction.
signTest(data, mu, alt) Similar to the overload, but allows testing for a difference between a range and a fixed value mu.
spearmanCorTest(range1, range2, alt) Tests the hypothesis that the Spearman correlation between two ranges is different from some 0. Alternatives are Alt.less (spearmanCor(range1, range2) < 0), Alt.greater (spearmanCor(range1, range2) > 0) and Alt.twoSided (spearmanCor(range1, range2) != 0).
studentsTTest(data, testMean, alt, confLevel) One-sample Student's T-test for difference between mean of data and a fixed value. Alternatives are Alt.less, meaning mean(data) < testMean, Alt.greater, meaning mean(data) > testMean, and Alt.twoSided, meaning mean(data)!= testMean.
studentsTTest(sample1, sample2, testMean, alt, confLevel) Two-sample T test for a difference in means, assumes variances of samples are equal. Alteratives are Alt.less, meaning mean(sample1) - mean(sample2) < testMean, Alt.greater, meaning mean(sample1) - mean(sample2) > testMean, and Alt.twoSided, meaning mean(sample1) - mean(sample2) != testMean.
welchAnova(data) Same as fTest, except that this test does not require the assumption of equal variances. In exchange it's slightly less powerful.
welchTTest(sample1, sample2, testMean, alt, confLevel) Two-sample T-test for difference in means. Does not assume variances are equal. Alteratives are Alt.less, meaning mean(sample1) - mean(sample2) < testMean, Alt.greater, meaning mean(sample1) - mean(sample2) > testMean, and Alt.twoSided, meaning mean(sample1) - mean(sample2) != testMean.
wilcoxonRankSum(sample1, sample2, alt, exactThresh) Computes Wilcoxon rank sum test statistic and P-value for a set of observations against another set, using the given alternative. Alt.less means that sample1 is stochastically less than sample2. Alt.greater means sample1 is stochastically greater than sample2. Alt.twoSided means sample1 is stochastically less than or greater than sample2.
wilcoxonSignedRank(before, after, alt, exactThresh) Computes a test statistic and P-value for a Wilcoxon signed rank test against the given alternative. Alt.less means that elements of before are stochastically less than corresponding elements of after. Alt.greater means elements of before are stochastically greater than corresponding elements of after. Alt.twoSided means there is a significant difference in either direction.
wilcoxonSignedRank(data, mu, alt, exactThresh) Same as the overload, but allows testing whether a range is stochastically less than or greater than a fixed value mu rather than paired elements of a second range.

Structs

NameDescription
ConfInt A plain old data struct for returning the results of hypothesis tests that also produce confidence intervals. Contains, can implicitly convert to, a TestRes.
GTestRes This struct is a subtype of TestRes and is used to return the results of gTestContingency and gTestObs. Due to the information theoretic interpretation of the G test, it contains an extra field to return the mutual information in bits.
RunsTest Runs test as in runsTest(), except calculates online instead of from stored array elements.
TestRes A plain old data struct for returning the results of hypothesis tests.

Enums

NameDescription
Alt Alternative hypotheses. Exact meaning varies with test used.
Dependency For falseDiscoveryRate.
Expected For chiSquareFit and gTestFit, is expected value range counts or proportions?

Manifest constants

NameTypeDescription
isArrayLike
isSummary Tests whether a struct/class has the necessary information for calculating a T-test. It must have a property .mean (mean), .stdev (stdandard deviation), .var (variance), and .N (sample size).

Aliases

NameTypeDescription
chiSqrContingency
chiSqrFit
kcorTest
pcorTest
scorTest