Numerical utilities (`pwkit.numutil`)¶

The numpy and scipy packages provide a whole host of routines, but there are still some that are missing. The pwkit.numutil module provides several useful additions.

The functionality in this module can be grouped into these categories:

Making functions that auto-broadcast their arguments

Convenience functions for statistics

Convenience functions for pandas.DataFrame objects

Parallelized versions of simple math algorithms

Tophat and step functions

Making functions that auto-broadcast their arguments¶

@pwkit.numutil.broadcastize(n_arr, ret_spec=0, force_float=True)¶

Wrap a function to automatically broadcast numpy.ndarray arguments.

It’s often desirable to write numerical utility functions in a way that’s compatible with vectorized processing. It can be tedious to do this, however, since the function arguments need to turned into arrays and checked for compatible shape, and scalar values need to be special cased.

The @broadcastize decorator takes care of these matters. The decorated function can be implemented in vectorized form under the assumption that all array arguments have been broadcast to the same shape. The broadcasting of inputs and (potentially) de-vectorizing of the return values are done automatically. For instance, if you decorate a function foo(x,y) with @numutil.broadcastize(2), you can implement it assuming that both x and y are numpy.ndarray objects that have at least one dimension and are both of the same shape. If the function is called with only scalar arguments, x and y will have shape (1,) and the function’s return value will be turned back into a scalar before reaching the caller.

The n_arr argument specifies the number of array arguments that the function takes. These are required to be at the beginning of its argument list.

The ret_spec argument specifies the structure of the function’s return value.

0 indicates that the value has the same shape as the (broadcasted) vector arguments. If the arguments are all scalar, the return value will be scalar too.
1 indicates that the value is an array of higher rank than the input arguments. For instance, if the input has shape (3,), the output might have shape (4,4,3); in general, if the input has shape s, the output will have shape t + s for some tuple t. If the arguments are all scalar, the output will have a shape of just t. The numpy.asarray() function is called on such arguments, so (for instance) you can return a list of arrays [a, b] and it will be converted into a numpy.ndarray.
None indicates that the value is completely independ of the inputs. It is returned as-is.
A tuple t indicates that the return value is also a tuple. The elements of the ret_spec tuple should contain the values listed above, and each element of the return value will be handled accordingly.

The default ret_spec is 0, i.e. the return value is expected to be an array of the same shape as the argument(s).

If force_float is true (the default), the input arrays will be converted to floating-point types if necessary (with numpy.asfarray()) before being passed to the function.

Example:

@numutil.broadcastize (2, ret_spec=(0, 1, None)):
def myfunction (x, y, extra_arg):
    print ('a random non-vector argument is:', extra_arg)
    z = x + y
    z[np.where (y)] *= 2
    higher_vector = [x, y, z]
    return z, higher_vector, 'hello'

Convenience functions for statistics¶

`rms`(x)	Return the square root of the mean of the squares of `x`.
`weighted_mean`(values, uncerts, **kwargs)
`weighted_mean_df`(df, **kwargs)	The same as `weighted_mean()`, except the argument is expected to be a two-column `pandas.DataFrame` whose first column gives the data values and second column gives their uncertainties.
`weighted_variance`(x, weights)	Return the variance of a weighted sample.

pwkit.numutil.rms(x)[source]¶: Return the square root of the mean of the squares of x.

pwkit.numutil.weighted_mean(values, uncerts, **kwargs)[source]¶

pwkit.numutil.weighted_mean_df(df, **kwargs)[source]¶: The same as weighted_mean(), except the argument is expected to be a two-column pandas.DataFrame whose first column gives the data values and second column gives their uncertainties. Returns (weighted_mean, uncertainty_in_mean).

pwkit.numutil.weighted_variance(x, weights)[source]¶

Return the variance of a weighted sample.

The weighted sample mean is calculated and subtracted off, so the returned variance is upweighted by n / (n - 1). If the sample mean is known to be zero, you should just compute np.average(x**2, weights=weights).

Convenience functions for `pandas.DataFrame` objects¶

`reduce_data_frame`(df, chunk_slicers[, …])	“Reduce” a DataFrame by collapsing rows in grouped chunks.
`reduce_data_frame_evenly_with_gaps`(df, …)	“Reduce” a DataFrame by collapsing rows in grouped chunks, grouping based on gaps in one of the columns.
`slice_around_gaps`(values, maxgap)	Given an ordered array of values, generate a set of slices that traverse all of the values.
`slice_evenly_with_gaps`(values, target_len, …)	Given an ordered array of values, generate a set of slices that traverse all of the values.
`dfsmooth`(window, df, ucol[, k])	Smooth a `pandas.DataFrame` according to a window, weighting based on uncertainties.
`smooth_data_frame_with_gaps`(window, df, …)	Smooth a `pandas.DataFrame` according to a window, weighting based on uncertainties, and breaking the smoothing process at gaps in a time axis.
`fits_recarray_to_data_frame`(recarray[, …])	Convert a FITS data table, stored as a Numpy record array, into a Pandas DataFrame object.
`data_frame_to_astropy_table`(dataframe)	This is a backport of the Astropy method `astropy.table.table.Table.from_pandas()`.
`usmooth`(window, uncerts, data, *kwargs)	Smooth data series according to a window, weighting based on uncertainties.
`page_data_frame`(df[, pager_argv])	Render a DataFrame as text and send it to a terminal pager program (e.g.

pwkit.numutil.reduce_data_frame(df, chunk_slicers, avg_cols=(), uavg_cols=(), minmax_cols=(), nchunk_colname='nchunk', uncert_prefix='u', min_points_per_chunk=3)[source]¶

“Reduce” a DataFrame by collapsing rows in grouped chunks. Returns another DataFrame with similar columns but fewer rows.

Arguments:

df: The input pandas.DataFrame.
chunk_slicers: An iterable that returns values that are used to slice df with its pandas.DataFrame.iloc() indexer. An example value might be the generator returned from slice_evenly_with_gaps().
avg_cols: An iterable of names of columns that are to be reduced by taking the mean.
uavg_cols: An iterable of names of columns that are to be reduced by taking a weighted mean.
minmax_cols: An iterable of names of columns that are to be reduced by reporting minimum and maximum values.
nchunk_colname: The name of a column to create reporting the number of rows contributing to each chunk.
uncert_prefix: The column name prefix for locating uncertainty estimates. By default, the uncertainty on the column "temp" is given in the column "utemp".
min_points_per_chunk: Require at least this many rows in each chunk. Smaller chunks are discarded.

Returns a new pandas.DataFrame.

pwkit.numutil.reduce_data_frame_evenly_with_gaps(df, valcol, target_len, maxgap, **kwargs)[source]¶

“Reduce” a DataFrame by collapsing rows in grouped chunks, grouping based on gaps in one of the columns.

This function combines reduce_data_frame() with slice_evenly_with_gaps().

pwkit.numutil.slice_around_gaps(values, maxgap)[source]¶: Given an ordered array of values, generate a set of slices that traverse all of the values. Within each slice, no gap between adjacent values is larger than maxgap. In other words, these slices break the array into chunks separated by gaps of size larger than maxgap.

pwkit.numutil.slice_evenly_with_gaps(values, target_len, maxgap)[source]¶: Given an ordered array of values, generate a set of slices that traverse all of the values. Each slice contains about target_len items. However, no slice contains a gap larger than maxgap, so a slice may contain only a single item (if it is surrounded on both sides by a large gap). If a non-gapped run of values does not divide evenly into target_len, the algorithm errs on the side of making the slices contain more than target_len items, rather than fewer. It also attempts to keep the slice size uniform within each non-gapped run.

pwkit.numutil.dfsmooth(window, df, ucol, k=None)[source]¶

Smooth a pandas.DataFrame according to a window, weighting based on uncertainties.

Arguments are:

window: The smoothing window.
df: The pandas.DataFrame.
ucol: The name of the column in df that contains the uncertainties to weight by.
k = None: If specified, only every k-th point of the results will be kept. If k is None (the default), it is set to window.size, i.e. correlated points will be discarded.

Returns: a smoothed data frame.

The returned data frame has a default integer index.

Example:

sdata = numutil.dfsmooth(np.hamming(7), data, 'u_temp')

pwkit.numutil.fits_recarray_to_data_frame(recarray, drop_nonscalar_ok=True)[source]¶

Convert a FITS data table, stored as a Numpy record array, into a Pandas DataFrame object. By default, non-scalar columns are discarded, but if drop_nonscalar_ok is False then a ValueError is raised. Column names are lower-cased. Example:

from pwkit import io, numutil
hdu_list = io.Path('my-table.fits').read_fits()
# assuming the first FITS extension is a binary table:
df = numutil.fits_recarray_to_data_frame(hdu_list[1].data)

FITS data are big-endian, whereas nowadays almost everything is little-endian. This seems to be an issue for Pandas DataFrames, where df[['col1', 'col2']] triggers an assertion for me if the underlying data are not native-byte-ordered. This function normalizes the read-in data to native endianness to avoid this.

pwkit.numutil.data_frame_to_astropy_table(dataframe)[source]¶: This is a backport of the Astropy method astropy.table.table.Table.from_pandas(). It converts a Pandas pandas.DataFrame object to an Astropy astropy.table.Table.

pwkit.numutil.usmooth(window, uncerts, *data, **kwargs)[source]¶

Smooth data series according to a window, weighting based on uncertainties.

Arguments:

window: The smoothing window.
uncerts: An array of uncertainties used to weight the smoothing.
data: One or more data series, of the same size as uncerts.
k = None: If specified, only every k-th point of the results will be kept. If k is None (the default), it is set to window.size, i.e. correlated points will be discarded.

Returns: (s_uncerts, s_data[0], s_data[1], ...), the smoothed uncertainties and data series.

Example:

u, x, y = numutil.usmooth(np.hamming(7), u, x, y)

pwkit.numutil.page_data_frame(df, pager_argv=['less'], **kwargs)[source]¶

Render a DataFrame as text and send it to a terminal pager program (e.g. less), so that one can browse a full table conveniently.

df: The DataFrame to view
pager_argv: default ['less']: A list of strings passed to subprocess.Popen that launches the pager program
kwargs: Additional keywords are passed to pandas.DataFrame.to_string().

Returns None. Execution blocks until the pager subprocess exits.

Parallelized versions of simple math algorithms¶

`parallel_newton`(func, x0[, fprime, …])	A parallelized version of `scipy.optimize.newton()`.
`parallel_quad`(func, a, b[, par_args, …])	A parallelized version of `scipy.integrate.quad()`.

pwkit.numutil.parallel_newton(func, x0, fprime=None, par_args=(), simple_args=(), tol=1.48e-08, maxiter=50, parallel=True, **kwargs)[source]¶

A parallelized version of scipy.optimize.newton().

Arguments:

func: The function to search for zeros, called as f(x, [*par_args...], [*simple_args...]).
x0: The initial point for the zero search.
fprime: (Optional) The first derivative of func, called the same way.
par_args: Tuple of additional parallelized arguments.
simple_args: Tuple of additional arguments passed identically to every invocation.
tol: The allowable error of the zero value.
maxiter: Maximum number of iterations.
parallel: Controls parallelization; default uses all available cores. See pwkit.parallel.make_parallel_helper().
kwargs: Passed to scipy.optimize.newton().

Returns: an array of locations of zeros.

Finds zeros in parallel. The values x0, tol, maxiter, and the items of par_args should all be numeric, and may be N-dimensional Numpy arrays. They are all broadcast to a common shape, and one zero-finding run is performed for each element in the resulting array. The return value is an array of zero locations having the same shape as the common broadcast of the parameters named above.

The simple_args are passed to each function identically for each integration. They do not need to be Pickle-able.

Example:

>>> parallel_newton(lambda x, a: x - 2 * a, 2,
                    par_args=(np.arange(6),))
<<< array([  0.,   2.,   4.,   6.,   8.,  10.])
>>> parallel_newton(lambda x: np.sin(x), np.arange(6))
<<< array([  0.00000000e+00,   3.65526589e-26,   3.14159265e+00,
             3.14159265e+00,   3.14159265e+00,   6.28318531e+00])

pwkit.numutil.parallel_quad(func, a, b, par_args=(), simple_args=(), parallel=True, **kwargs)[source]¶

A parallelized version of scipy.integrate.quad().

Arguments are:

func: The function to integrate, called as f(x, [*par_args...], [*simple_args...]).
a: The lower limit(s) of integration.
b: The upper limits(s) of integration.
par_args: Tuple of additional parallelized arguments.
simple_args: Tuple of additional arguments passed identically to every invocation.
parallel: Controls parallelization; default uses all available cores. See pwkit.parallel.make_parallel_helper().
kwargs: Passed to scipy.integrate.quad(). Don’t set full_output to True.

Returns: integrals and errors; see below.

Computes many integrals in parallel. The values a, b, and the items of par_args should all be numeric, and may be N-dimensional Numpy arrays. They are all broadcast to a common shape, and one integral is performed for each element in the resulting array. If this common shape is (X,Y,Z), the return value has shape (2,X,Y,Z), where the subarray [0,…] contains the computed integrals and the subarray [1,…] contains the absolute error estimates. If a, b, and the items in par_args are all scalars, the return value has shape (2,).

The simple_args are passed to each integrand function identically for each integration. They do not need to be Pickle-able.

Example:

>>> parallel_quad(lambda x, u, v, q: u * x + v,
                  0, # a
                  [3, 4], # b
                  (np.arange(6).reshape((3,2)), np.arange(3).reshape((3,1))), # par_args
                  ('hello',),)

Computes six integrals and returns an array of shape (2,3,2). The functions that are evaluated are:

[[ 0*x + 0, 1*x + 0 ],
 [ 2*x + 1, 3*x + 1 ],
 [ 4*x + 2, 5*x + 2 ]]

and the bounds of the integrals are:

[[ (0, 3), (0, 4) ],
 [ (0, 3), (0, 4) ],
 [ (0, 3), (0, 4) ]]

In all cases the unused fourth parameter q is 'hello'.

Tophat and step functions¶

`unit_tophat_ee`(x)	Tophat function on the unit interval, left-exclusive and right-exclusive.
`unit_tophat_ei`(x)	Tophat function on the unit interval, left-exclusive and right-inclusive.
`unit_tophat_ie`(x)	Tophat function on the unit interval, left-inclusive and right-exclusive.
`unit_tophat_ii`(x)	Tophat function on the unit interval, left-inclusive and right-inclusive.
`make_tophat_ee`(lower, upper)	Return a ufunc-like tophat function on the defined range, left-exclusive and right-exclusive.
`make_tophat_ei`(lower, upper)	Return a ufunc-like tophat function on the defined range, left-exclusive and right-inclusive.
`make_tophat_ie`(lower, upper)	Return a ufunc-like tophat function on the defined range, left-inclusive and right-exclusive.
`make_tophat_ii`(lower, upper)	Return a ufunc-like tophat function on the defined range, left-inclusive and right-inclusive.
`make_step_lcont`(transition)	Return a ufunc-like step function that is left-continuous.
`make_step_rcont`(transition)	Return a ufunc-like step function that is right-continuous.

pwkit.numutil.unit_tophat_ee(x)[source]¶: Tophat function on the unit interval, left-exclusive and right-exclusive. Returns 1 if 0 < x < 1, 0 otherwise.

pwkit.numutil.unit_tophat_ei(x)[source]¶: Tophat function on the unit interval, left-exclusive and right-inclusive. Returns 1 if 0 < x <= 1, 0 otherwise.

pwkit.numutil.unit_tophat_ie(x)[source]¶: Tophat function on the unit interval, left-inclusive and right-exclusive. Returns 1 if 0 <= x < 1, 0 otherwise.

pwkit.numutil.unit_tophat_ii(x)[source]¶: Tophat function on the unit interval, left-inclusive and right-inclusive. Returns 1 if 0 <= x <= 1, 0 otherwise.

pwkit.numutil.make_tophat_ee(lower, upper)[source]¶: Return a ufunc-like tophat function on the defined range, left-exclusive and right-exclusive. Returns 1 if lower < x < upper, 0 otherwise.

pwkit.numutil.make_tophat_ei(lower, upper)[source]¶: Return a ufunc-like tophat function on the defined range, left-exclusive and right-inclusive. Returns 1 if lower < x <= upper, 0 otherwise.

pwkit.numutil.make_tophat_ie(lower, upper)[source]¶: Return a ufunc-like tophat function on the defined range, left-inclusive and right-exclusive. Returns 1 if lower <= x < upper, 0 otherwise.

pwkit.numutil.make_tophat_ii(lower, upper)[source]¶: Return a ufunc-like tophat function on the defined range, left-inclusive and right-inclusive. Returns 1 if lower < x < upper, 0 otherwise.

pwkit.numutil.make_step_lcont(transition)[source]¶: Return a ufunc-like step function that is left-continuous. Returns 1 if x > transition, 0 otherwise.

pwkit.numutil.make_step_rcont(transition)[source]¶: Return a ufunc-like step function that is right-continuous. Returns 1 if x >= transition, 0 otherwise.

Numerical utilities (pwkit.numutil)¶

Making functions that auto-broadcast their arguments¶

Convenience functions for statistics¶

Convenience functions for pandas.DataFrame objects¶

Parallelized versions of simple math algorithms¶

Tophat and step functions¶

Numerical utilities (`pwkit.numutil`)¶

Convenience functions for `pandas.DataFrame` objects¶