Module dstats.sort
A comprehensive sorting library for statistical functions. Each function takes N arguments, which are arrays or arraylike objects, sorts the first and sorts the rest in lockstep. For merge and insertion sort, if the last argument is a ulong*, increments the dereference of this ulong* by the bubble sort distance between the first argument and the sorted version of the first argument. This is useful for some statistical calculations.
All sorting functions have the precondition that all parallel input arrays must have the same length.
Notes
Comparison functions must be written such that compFun(x, x) == false. For example, "a < b" is good, "a <= b" is not.
These functions are heavily optimized for sorting arrays of ints and floats (by far the most common case when doing statistical calculations). In these cases, they can be several times faster than the equivalent functions in std.algorithm. Since sorting is extremely important for nonparametric statistics, this results in important realworld performance gains. However, it comes at a price in terms of generality:
1. They assume that what they are sorting is cheap to copy via normal assignment.
2. They don't work at all with general ranges, only arrays and maybe ranges very similar to arrays.
3. All tuning and microoptimization is done with ints and floats, not classes, large structs, strings, etc.
Examples
auto foo = [3, 1, 2, 4, 5]. dup;
auto bar = [8, 6, 7, 5, 3]. dup;
qsort(foo, bar);
assert(foo == [1, 2, 3, 4, 5]);
assert(bar == [6, 7, 8, 5, 3]);
auto baz = [1.0, 0, 1, 2, 3]. dup;
mergeSort!("a > b")(bar, foo, baz);
assert(bar == [8, 7, 6, 5, 3]);
assert(foo == [3, 2, 1, 4, 5]);
assert(baz == [1.0, 0, 1, 2, 3]);
Author
David Simcha
Functions
Name  Description 

heapSort

Heap sort. Unstable, O(N log N) time average and worst case, O(1) space, large constant term in time complexity. 
heapSortImpl


insertionSort

Insertion sort. O(N^{2}) time worst, average case, O(1) space, VERY small constant, which is why it's useful for sorting small subarrays in divide and conquer algorithms. If last argument is a ulong*, increments the dereference of this argument by the bubble sort distance between the input array and the sorted version of the input. 
insertionSortImpl


intIsNaN


makeMultiHeap


medianOf3


merge


mergeInPlace


mergeSort

Merge sort. O(N log N) time, O(N) space, small constant. Stable sort. If last argument is a ulong* instead of an arraylike type, the dereference of the ulong* will be incremented by the bubble sort distance between the input array and the sorted version. This is useful in some statistics functions such as Kendall's tau. 
mergeSortImpl


mergeSortInPlace

Inplace merge sort, based on C++ STL's stable_sort(). O(N log^{2} N) time complexity, O(1) space complexity, stable. Much slower than plain old mergeSort(), so only use it if you really need the O(1) space. 
mergeSortInPlaceImpl


mergeSortTemp

Merge sort, allowing caller to provide a temp variable. This allows recycling instead of repeated allocations. If D is data, T is temp, and U is a ulong* for calculating bubble sort distance, this can be called as mergeSortTemp(D, D, D, T, T, T, U) or mergeSortTemp(D, D, D, T, T, T) where each D has a T of corresponding type. 
multiSiftDown


partitionK

Partitions the input data according to compFun, such that position k contains the kth largest/smallest element according to compFun. For all elements e with indices < k, !compFun(data[k], e) is guaranteed to be true. For all elements e with indices > k, !compFun(e, data[k]) is guaranteed to be true. For example, if compFun is "a < b", all elements with indices < k will be <= data[k], and all elements with indices larger than k will be >= k. Reorders any additional input arrays in lockstep. 
partitionKImpl


postProcess


postProcess


prepareForSorting


qsort

Quick sort. Unstable, O(N log N) time average, worst case, O(log N) space, small constant term in time complexity. 
qsortImpl


quickSelect

Returns the kth largest/smallest element (depending on compFun, 0indexed) in the input array in O(N) time. Allocates memory, does not modify input array. 
removeWhitespace


rotateLeft


rotateRight

Classes
Name  Description 

SortException

Structs
Name  Description 

TopN

Given a set of data points entered through the put function, this output range maintains the invariant that the top N according to compFun will be contained in the data structure. Uses a heap internally, O(log N) insertion time. Good for finding the largest/smallest N elements of a very large dataset that cannot be sorted quickly in its entirety, and may not even fit in memory. If less than N datapoints have been entered, all are contained in the structure. 
Templates
Name  Description 

isSimpleComparison

Enum values
Name  Type  Description 

DATA


TEMP

Aliases
Name  Type  Description 

ArrayElemType

T
