Numerical utilities (pwkit.numutil
)¶
The numpy
and scipy
packages provide a whole host of
routines, but there are still some that are missing. The pwkit.numutil
module provides several useful additions.
The functionality in this module can be grouped into these categories:
Making functions that auto-broadcast their arguments¶
-
@
pwkit.numutil.
broadcastize
(n_arr, ret_spec=0, force_float=True)¶ Wrap a function to automatically broadcast
numpy.ndarray
arguments.It’s often desirable to write numerical utility functions in a way that’s compatible with vectorized processing. It can be tedious to do this, however, since the function arguments need to turned into arrays and checked for compatible shape, and scalar values need to be special cased.
The
@broadcastize
decorator takes care of these matters. The decorated function can be implemented in vectorized form under the assumption that all array arguments have been broadcast to the same shape. The broadcasting of inputs and (potentially) de-vectorizing of the return values are done automatically. For instance, if you decorate a functionfoo(x,y)
with@numutil.broadcastize(2)
, you can implement it assuming that both x and y arenumpy.ndarray
objects that have at least one dimension and are both of the same shape. If the function is called with only scalar arguments, x and y will have shape(1,)
and the function’s return value will be turned back into a scalar before reaching the caller.The n_arr argument specifies the number of array arguments that the function takes. These are required to be at the beginning of its argument list.
The ret_spec argument specifies the structure of the function’s return value.
0
indicates that the value has the same shape as the (broadcasted) vector arguments. If the arguments are all scalar, the return value will be scalar too.1
indicates that the value is an array of higher rank than the input arguments. For instance, if the input has shape(3,)
, the output might have shape(4,4,3)
; in general, if the input has shapes
, the output will have shapet + s
for some tuplet
. If the arguments are all scalar, the output will have a shape of justt
. Thenumpy.asarray()
function is called on such arguments, so (for instance) you can return a list of arrays[a, b]
and it will be converted into anumpy.ndarray
.None
indicates that the value is completely independ of the inputs. It is returned as-is.- A tuple
t
indicates that the return value is also a tuple. The elements of the ret_spec tuple should contain the values listed above, and each element of the return value will be handled accordingly.
The default ret_spec is
0
, i.e. the return value is expected to be an array of the same shape as the argument(s).If force_float is true (the default), the input arrays will be converted to floating-point types if necessary (with
numpy.asfarray()
) before being passed to the function.Example:
@numutil.broadcastize (2, ret_spec=(0, 1, None)): def myfunction (x, y, extra_arg): print ('a random non-vector argument is:', extra_arg) z = x + y z[np.where (y)] *= 2 higher_vector = [x, y, z] return z, higher_vector, 'hello'
Convenience functions for statistics¶
rms (x) |
Return the square root of the mean of the squares of x . |
weighted_mean (values, uncerts, **kwargs) |
|
weighted_mean_df (df, **kwargs) |
The same as weighted_mean() , except the argument is expected to be a two-column pandas.DataFrame whose first column gives the data values and second column gives their uncertainties. |
weighted_variance (x, weights) |
Return the variance of a weighted sample. |
-
pwkit.numutil.
weighted_mean_df
(df, **kwargs)[source]¶ The same as
weighted_mean()
, except the argument is expected to be a two-columnpandas.DataFrame
whose first column gives the data values and second column gives their uncertainties. Returns(weighted_mean, uncertainty_in_mean)
.
-
pwkit.numutil.
weighted_variance
(x, weights)[source]¶ Return the variance of a weighted sample.
The weighted sample mean is calculated and subtracted off, so the returned variance is upweighted by
n / (n - 1)
. If the sample mean is known to be zero, you should just computenp.average(x**2, weights=weights)
.
Convenience functions for pandas.DataFrame
objects¶
reduce_data_frame (df, chunk_slicers[, …]) |
“Reduce” a DataFrame by collapsing rows in grouped chunks. |
reduce_data_frame_evenly_with_gaps (df, …) |
“Reduce” a DataFrame by collapsing rows in grouped chunks, grouping based on gaps in one of the columns. |
slice_around_gaps (values, maxgap) |
Given an ordered array of values, generate a set of slices that traverse all of the values. |
slice_evenly_with_gaps (values, target_len, …) |
Given an ordered array of values, generate a set of slices that traverse all of the values. |
dfsmooth (window, df, ucol[, k]) |
Smooth a pandas.DataFrame according to a window, weighting based on uncertainties. |
smooth_data_frame_with_gaps (window, df, …) |
Smooth a pandas.DataFrame according to a window, weighting based on uncertainties, and breaking the smoothing process at gaps in a time axis. |
fits_recarray_to_data_frame (recarray[, …]) |
Convert a FITS data table, stored as a Numpy record array, into a Pandas DataFrame object. |
data_frame_to_astropy_table (dataframe) |
This is a backport of the Astropy method astropy.table.table.Table.from_pandas() . |
usmooth (window, uncerts, *data, **kwargs) |
Smooth data series according to a window, weighting based on uncertainties. |
page_data_frame (df[, pager_argv]) |
Render a DataFrame as text and send it to a terminal pager program (e.g. |
-
pwkit.numutil.
reduce_data_frame
(df, chunk_slicers, avg_cols=(), uavg_cols=(), minmax_cols=(), nchunk_colname='nchunk', uncert_prefix='u', min_points_per_chunk=3)[source]¶ “Reduce” a DataFrame by collapsing rows in grouped chunks. Returns another DataFrame with similar columns but fewer rows.
Arguments:
- df
- The input
pandas.DataFrame
. - chunk_slicers
- An iterable that returns values that are used to slice df with its
pandas.DataFrame.iloc()
indexer. An example value might be the generator returned fromslice_evenly_with_gaps()
. - avg_cols
- An iterable of names of columns that are to be reduced by taking the mean.
- uavg_cols
- An iterable of names of columns that are to be reduced by taking a weighted mean.
- minmax_cols
- An iterable of names of columns that are to be reduced by reporting minimum and maximum values.
- nchunk_colname
- The name of a column to create reporting the number of rows contributing to each chunk.
- uncert_prefix
- The column name prefix for locating uncertainty estimates. By default, the
uncertainty on the column
"temp"
is given in the column"utemp"
. - min_points_per_chunk
- Require at least this many rows in each chunk. Smaller chunks are discarded.
Returns a new
pandas.DataFrame
.
-
pwkit.numutil.
reduce_data_frame_evenly_with_gaps
(df, valcol, target_len, maxgap, **kwargs)[source]¶ “Reduce” a DataFrame by collapsing rows in grouped chunks, grouping based on gaps in one of the columns.
This function combines
reduce_data_frame()
withslice_evenly_with_gaps()
.
-
pwkit.numutil.
slice_around_gaps
(values, maxgap)[source]¶ Given an ordered array of values, generate a set of slices that traverse all of the values. Within each slice, no gap between adjacent values is larger than maxgap. In other words, these slices break the array into chunks separated by gaps of size larger than maxgap.
-
pwkit.numutil.
slice_evenly_with_gaps
(values, target_len, maxgap)[source]¶ Given an ordered array of values, generate a set of slices that traverse all of the values. Each slice contains about target_len items. However, no slice contains a gap larger than maxgap, so a slice may contain only a single item (if it is surrounded on both sides by a large gap). If a non-gapped run of values does not divide evenly into target_len, the algorithm errs on the side of making the slices contain more than target_len items, rather than fewer. It also attempts to keep the slice size uniform within each non-gapped run.
-
pwkit.numutil.
dfsmooth
(window, df, ucol, k=None)[source]¶ Smooth a
pandas.DataFrame
according to a window, weighting based on uncertainties.Arguments are:
- window
- The smoothing window.
- df
- The
pandas.DataFrame
. - ucol
- The name of the column in df that contains the uncertainties to weight by.
- k = None
- If specified, only every k-th point of the results will be kept. If k
is None (the default), it is set to
window.size
, i.e. correlated points will be discarded.
Returns: a smoothed data frame.
The returned data frame has a default integer index.
Example:
sdata = numutil.dfsmooth(np.hamming(7), data, 'u_temp')
-
pwkit.numutil.
fits_recarray_to_data_frame
(recarray, drop_nonscalar_ok=True)[source]¶ Convert a FITS data table, stored as a Numpy record array, into a Pandas DataFrame object. By default, non-scalar columns are discarded, but if drop_nonscalar_ok is False then a
ValueError
is raised. Column names are lower-cased. Example:from pwkit import io, numutil hdu_list = io.Path('my-table.fits').read_fits() # assuming the first FITS extension is a binary table: df = numutil.fits_recarray_to_data_frame(hdu_list[1].data)
FITS data are big-endian, whereas nowadays almost everything is little-endian. This seems to be an issue for Pandas DataFrames, where
df[['col1', 'col2']]
triggers an assertion for me if the underlying data are not native-byte-ordered. This function normalizes the read-in data to native endianness to avoid this.See also
pwkit.io.Path.read_fits_bintable()
.
-
pwkit.numutil.
data_frame_to_astropy_table
(dataframe)[source]¶ This is a backport of the Astropy method
astropy.table.table.Table.from_pandas()
. It converts a Pandaspandas.DataFrame
object to an Astropyastropy.table.Table
.
-
pwkit.numutil.
usmooth
(window, uncerts, *data, **kwargs)[source]¶ Smooth data series according to a window, weighting based on uncertainties.
Arguments:
- window
- The smoothing window.
- uncerts
- An array of uncertainties used to weight the smoothing.
- data
- One or more data series, of the same size as uncerts.
- k = None
- If specified, only every k-th point of the results will be kept. If k
is None (the default), it is set to
window.size
, i.e. correlated points will be discarded.
Returns:
(s_uncerts, s_data[0], s_data[1], ...)
, the smoothed uncertainties and data series.Example:
u, x, y = numutil.usmooth(np.hamming(7), u, x, y)
-
pwkit.numutil.
page_data_frame
(df, pager_argv=['less'], **kwargs)[source]¶ Render a DataFrame as text and send it to a terminal pager program (e.g. less), so that one can browse a full table conveniently.
- df
- The DataFrame to view
- pager_argv: default
['less']
- A list of strings passed to
subprocess.Popen
that launches the pager program - kwargs
- Additional keywords are passed to
pandas.DataFrame.to_string()
.
Returns
None
. Execution blocks until the pager subprocess exits.
Parallelized versions of simple math algorithms¶
parallel_newton (func, x0[, fprime, …]) |
A parallelized version of scipy.optimize.newton() . |
parallel_quad (func, a, b[, par_args, …]) |
A parallelized version of scipy.integrate.quad() . |
-
pwkit.numutil.
parallel_newton
(func, x0, fprime=None, par_args=(), simple_args=(), tol=1.48e-08, maxiter=50, parallel=True, **kwargs)[source]¶ A parallelized version of
scipy.optimize.newton()
.Arguments:
- func
- The function to search for zeros, called as
f(x, [*par_args...], [*simple_args...])
. - x0
- The initial point for the zero search.
- fprime
- (Optional) The first derivative of func, called the same way.
- par_args
- Tuple of additional parallelized arguments.
- simple_args
- Tuple of additional arguments passed identically to every invocation.
- tol
- The allowable error of the zero value.
- maxiter
- Maximum number of iterations.
- parallel
- Controls parallelization; default uses all available cores. See
pwkit.parallel.make_parallel_helper()
. - kwargs
- Passed to
scipy.optimize.newton()
.
Returns: an array of locations of zeros.
Finds zeros in parallel. The values x0, tol, maxiter, and the items of par_args should all be numeric, and may be N-dimensional Numpy arrays. They are all broadcast to a common shape, and one zero-finding run is performed for each element in the resulting array. The return value is an array of zero locations having the same shape as the common broadcast of the parameters named above.
The simple_args are passed to each function identically for each integration. They do not need to be Pickle-able.
Example:
>>> parallel_newton(lambda x, a: x - 2 * a, 2, par_args=(np.arange(6),)) <<< array([ 0., 2., 4., 6., 8., 10.]) >>> parallel_newton(lambda x: np.sin(x), np.arange(6)) <<< array([ 0.00000000e+00, 3.65526589e-26, 3.14159265e+00, 3.14159265e+00, 3.14159265e+00, 6.28318531e+00])
-
pwkit.numutil.
parallel_quad
(func, a, b, par_args=(), simple_args=(), parallel=True, **kwargs)[source]¶ A parallelized version of
scipy.integrate.quad()
.Arguments are:
- func
- The function to integrate, called as
f(x, [*par_args...], [*simple_args...])
. - a
- The lower limit(s) of integration.
- b
- The upper limits(s) of integration.
- par_args
- Tuple of additional parallelized arguments.
- simple_args
- Tuple of additional arguments passed identically to every invocation.
- parallel
- Controls parallelization; default uses all available cores. See
pwkit.parallel.make_parallel_helper()
. - kwargs
- Passed to
scipy.integrate.quad()
. Don’t set full_output to True.
Returns: integrals and errors; see below.
Computes many integrals in parallel. The values a, b, and the items of par_args should all be numeric, and may be N-dimensional Numpy arrays. They are all broadcast to a common shape, and one integral is performed for each element in the resulting array. If this common shape is (X,Y,Z), the return value has shape (2,X,Y,Z), where the subarray [0,…] contains the computed integrals and the subarray [1,…] contains the absolute error estimates. If a, b, and the items in par_args are all scalars, the return value has shape (2,).
The simple_args are passed to each integrand function identically for each integration. They do not need to be Pickle-able.
Example:
>>> parallel_quad(lambda x, u, v, q: u * x + v, 0, # a [3, 4], # b (np.arange(6).reshape((3,2)), np.arange(3).reshape((3,1))), # par_args ('hello',),)
Computes six integrals and returns an array of shape
(2,3,2)
. The functions that are evaluated are:[[ 0*x + 0, 1*x + 0 ], [ 2*x + 1, 3*x + 1 ], [ 4*x + 2, 5*x + 2 ]]
and the bounds of the integrals are:
[[ (0, 3), (0, 4) ], [ (0, 3), (0, 4) ], [ (0, 3), (0, 4) ]]
In all cases the unused fourth parameter q is
'hello'
.
Tophat and step functions¶
unit_tophat_ee (x) |
Tophat function on the unit interval, left-exclusive and right-exclusive. |
unit_tophat_ei (x) |
Tophat function on the unit interval, left-exclusive and right-inclusive. |
unit_tophat_ie (x) |
Tophat function on the unit interval, left-inclusive and right-exclusive. |
unit_tophat_ii (x) |
Tophat function on the unit interval, left-inclusive and right-inclusive. |
make_tophat_ee (lower, upper) |
Return a ufunc-like tophat function on the defined range, left-exclusive and right-exclusive. |
make_tophat_ei (lower, upper) |
Return a ufunc-like tophat function on the defined range, left-exclusive and right-inclusive. |
make_tophat_ie (lower, upper) |
Return a ufunc-like tophat function on the defined range, left-inclusive and right-exclusive. |
make_tophat_ii (lower, upper) |
Return a ufunc-like tophat function on the defined range, left-inclusive and right-inclusive. |
make_step_lcont (transition) |
Return a ufunc-like step function that is left-continuous. |
make_step_rcont (transition) |
Return a ufunc-like step function that is right-continuous. |
-
pwkit.numutil.
unit_tophat_ee
(x)[source]¶ Tophat function on the unit interval, left-exclusive and right-exclusive. Returns 1 if 0 < x < 1, 0 otherwise.
-
pwkit.numutil.
unit_tophat_ei
(x)[source]¶ Tophat function on the unit interval, left-exclusive and right-inclusive. Returns 1 if 0 < x <= 1, 0 otherwise.
-
pwkit.numutil.
unit_tophat_ie
(x)[source]¶ Tophat function on the unit interval, left-inclusive and right-exclusive. Returns 1 if 0 <= x < 1, 0 otherwise.
-
pwkit.numutil.
unit_tophat_ii
(x)[source]¶ Tophat function on the unit interval, left-inclusive and right-inclusive. Returns 1 if 0 <= x <= 1, 0 otherwise.
-
pwkit.numutil.
make_tophat_ee
(lower, upper)[source]¶ Return a ufunc-like tophat function on the defined range, left-exclusive and right-exclusive. Returns 1 if lower < x < upper, 0 otherwise.
-
pwkit.numutil.
make_tophat_ei
(lower, upper)[source]¶ Return a ufunc-like tophat function on the defined range, left-exclusive and right-inclusive. Returns 1 if lower < x <= upper, 0 otherwise.
-
pwkit.numutil.
make_tophat_ie
(lower, upper)[source]¶ Return a ufunc-like tophat function on the defined range, left-inclusive and right-exclusive. Returns 1 if lower <= x < upper, 0 otherwise.
-
pwkit.numutil.
make_tophat_ii
(lower, upper)[source]¶ Return a ufunc-like tophat function on the defined range, left-inclusive and right-inclusive. Returns 1 if lower < x < upper, 0 otherwise.