Title: | Approximate Summation of Series |
---|---|
Description: | Application of theoretical results which ensure that the summation of an infinite discrete series is within an arbitrary margin of error of its true value. The C code under the hood is shared through header files to allow users to sum their own low level functions as well. Based on the paper by Braden (1992) <doi: 10.2307/2324995>. |
Authors: | Guido A. Moreira [cre, aut] , Luiz Max Carvalho [ctb] |
Maintainer: | Guido A. Moreira <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.4.14 |
Built: | 2024-11-13 20:23:51 UTC |
Source: | https://github.com/guidoamoreira/sumr |
A discrete finite series is calculated exactly with no need for approximations. This can also be used for infinite series approximation with a pre-determined number of iterations, but this has no guarantee of quality of approximation. Result is returned in the log scale.
finiteSum(logFunction, parameters = numeric(), n, n0 = 0)
finiteSum(logFunction, parameters = numeric(), n, n0 = 0)
logFunction |
The function that returns the series value
an in
the log scale. Can either be an |
parameters |
A numeric vector with parameters used in logFunction.
Vectorized summation over various parameter values sets is not implemented.
Use |
n |
A single integer positive number indicating the number of iterations to perform in the function. |
n0 |
The sum will be performed for the series starting at this value. |
A summed-objects()
object. Note that the sum is returned in the
log
scale.
precompiled()
provides a list with precompiled functions
that can be used for the summation.
# Sum values from 5 to 100 finiteSum(function(x, p) log(x), numeric(), 100 - 5, 5)
# Sum values from 5 to 100 finiteSum(function(x, p) log(x), numeric(), 100 - 5, 5)
For series that pass the ratio test, the approximation is analytically guaranteed to have an error that is smaller than epsilon. This can occasionally not happen due to floating point arithmetic. Result is returned in the log scale.
infiniteSum( logFunction, parameters = numeric(), logL = NULL, alternate = FALSE, epsilon = 1e-15, maxIter = 1e+05, n0 = 0, forceAlgorithm = 0 )
infiniteSum( logFunction, parameters = numeric(), logL = NULL, alternate = FALSE, epsilon = 1e-15, maxIter = 1e+05, n0 = 0, forceAlgorithm = 0 )
logFunction |
The function that returns the series absolute value
|an| in
the log scale. If it is an alternating series, this is defined in
argument |
parameters |
A numeric vector with parameters used in |
logL |
The log of the limit value of
an+1/an which
must be smaller than 1, or smaller than 0 in the log scale. Ignored if the
series is alternating, defined with argument |
alternate |
Either -1, 0 or 1. If 0 (or |
epsilon |
The desired error margin for the approximation. See 'details'. |
maxIter |
The maximum number of iterations for the approximation. In most cases, this number will not be reached unless it is very small. A value too high is not recommended as an array of this size is reserved in memory during the algorithm. |
n0 |
The sum will be approximated for the series starting at this value
for the first parameter of |
forceAlgorithm |
A value to control which summation algorithm to use.
Ignored if the series is alternating, defined with argument |
The approximated sum is based on some theoretical results which,
analytically, guarantee that the approximation will be within epsilon
distance to the true value. It is possible that the numerical result fails
to fall in this distance due to floating point arithmetic. The C
code
under the hood is being continuously reviewed to minimize this problem. They
seem to occur more often when the series decays very fast to zero or when the
total is a large number.
For these theoretical results to work, the series must pass the ratio test,
which means that the ratio
an+1/an must
converge to a number when
goes to infinity. The log of
should be provided to the function for a better approximation.
This is not necessary in case a precompiled function is used. In this case
the value of
is coded into the package.
Another requirement in the current installment of this function is that the series must have only a single maximum. This is the case for most discrete probability distributions and marginalization problems. This limitation will be addressed in the future.
There are currently two implemented algorithms that perform the calculations.
The first, called Sum-To-Threshold, sums the series values until the series
values are smaller than epsilon
. This is the fastest algorithm, but
it is only guaranteed to provide an approximation within the desired error
margin when .
The second algorithm, called Error bounding pairs is based on a more general
result which works for any . This algorithm sums the series
until
Then the approximation is the added values of the sum plus
0.5 (an+1/(1-L) + an+1 an/(an - an+1))The Error bounding pairs method usually requires less function evaluations
than the Sum-To-Threshold one, however the convergence checking is more
demanding, which means that it is typically slower, albeit slightly. If
, the convergence checking can be reduced and the Error bounding
pairs becomes almost as fast as the Sum-To-Threshold method.
The third algorithm is called batches method and is used when is
left at
NULL
. Its use requires some fine tuning, so there is a
standalone function for it called infiniteSum_batches()
. Its use
and functionality can be seen in its own documentation. When called as a
result of this function, default settings are used.
The forceAlgorithm
parameter can be used to
control which algorithm to use. When it is 0, the program automatically
selects the Sum-to-threshold when and the Error bounding pairs
when
. Method batches is selected when
is left
NULL
. If forceAlgorithm
is set to 1, the Sum-To-Threshold
algorithm is forced. If it is 2, then the Error bounding pairs is forced. A
small note, the Error bounding pairs algorithm can go up to maxIter
+
1 function evaluations. This is due to its convergence checking dependence on
an+1. Finally, if the
parameter is set as 3, the batches algorithm is used with default settings.
If the series is alternating, the Sum-To-Threshold convergence condition on the series absolute value guarantees the result, regardless of the ratio limit.
The function to be summed can be an R function or a string naming the
precompiled function in the package. The list of precompiled functions can
be found in precompiled()
, and more functions will be added in
time. As is intuitive, using a precompiled function is much faster than using
an R
function. In fact, it has been observed to be dozens times faster.
The advanced user can program their own precompiled functions and use the package's summation algorithms by linking the appropriate header file. See the GitHub readme for the a quick tutorial.
A summed-objects()
object. Note that the sum is returned in the
log
scale.
precompiled()
provides a list with precompiled functions
that can be used for the summation. infiniteSum_batches()
is
an alternate method which does not require knowledge of the logL
argument.
## Define some function that is known to pass the ratio test. param <- 0.1 funfun <- function(k, p) return(k * log1p(-p[1])) result <- infiniteSum(funfun, parameters = param, logL = log1p(-param)) ## This series is easy to verify analytically TrueSum <- -log(param) TrueSum - result$sum # Since exp(logL) = 0.9, the Error bounding pairs # algorithm is used. Notice that it only required # 2 function evaluations for the approximation, that is result$n ## A common problem is finding the normalizing constant for the ## Conway-Maxwell-Poisson distribution. It has already been included ## in the precompiled list of functions. comp_params <- c(lambda = 5, nu = 3) result <- infiniteSum("COMP", comp_params) result
## Define some function that is known to pass the ratio test. param <- 0.1 funfun <- function(k, p) return(k * log1p(-p[1])) result <- infiniteSum(funfun, parameters = param, logL = log1p(-param)) ## This series is easy to verify analytically TrueSum <- -log(param) TrueSum - result$sum # Since exp(logL) = 0.9, the Error bounding pairs # algorithm is used. Notice that it only required # 2 function evaluations for the approximation, that is result$n ## A common problem is finding the normalizing constant for the ## Conway-Maxwell-Poisson distribution. It has already been included ## in the precompiled list of functions. comp_params <- c(lambda = 5, nu = 3) result <- infiniteSum("COMP", comp_params) result
A simple method to perform the summation. It adds the values in batches and
stops when the accumulated batch is smaller than the desired threshold. There
is an implementation purely in R
and one in C
. The one in R is
usually slightly faster due to vectorized computing. Result is returned
in the log scale.
infiniteSum_batches( logFunction, parameters = numeric(), batch_size = 40, epsilon = 1e-15, maxIter = 1e+05, n0 = 0 ) infiniteSum_batches_C( logFunction, parameters = numeric(), batch_size = 40, epsilon = 1e-15, maxIter = 1e+05, n0 = 0 )
infiniteSum_batches( logFunction, parameters = numeric(), batch_size = 40, epsilon = 1e-15, maxIter = 1e+05, n0 = 0 ) infiniteSum_batches_C( logFunction, parameters = numeric(), batch_size = 40, epsilon = 1e-15, maxIter = 1e+05, n0 = 0 )
logFunction |
The function that returns the series value
an in
the log scale. Can either be an |
parameters |
A numeric vector with parameters used in logFunction.
Vectorized summation over various parameter values sets is not implemented.
Use |
batch_size |
The batch size at which point convergence checking is performed. The algorithm perform at least twice this number of function evaluations. See 'details'. |
epsilon |
The desired error margin for the approximation. See 'details'. |
maxIter |
The maximum number of iterations for the approximation. In most cases, this number will not be reached unless it is very small. |
n0 |
The sum will be approximated for the series starting at this value. |
The series an must pass
the ratio convergence test, meaning that the ratio
an+1/an must
converge to a number when
goes to infinity.
The batches algorithm consists of evaluating the function a fixed number of
times for two checkpoints. If the difference between the sum at these
checkpoints is smaller than epsilon
, the code stops and the later
checkpoint sum is returned. Else, continue summing until the next checkpoint.
All checkpoints are batch_size
long.
This function's efficiency is reliant on the choice of batch_size
.
If it is set too large, the algorithm overshoots the necessary
number of function evaluations too much. If it is set too small, the
algorithm will need to process too many partial summations which slows it
down. However, if they are well calibrated for the series, they can
potentially be very efficient.
Since the batch sizes are known before the calculations are made,
function evaluations can be vectorized. This is why there are two functions
available. infiniteSum_batches
does the calculations at the R
level, while infiniteSum_batches_C
interfaces the low level C
code. However, the C
code does not use vectorization since it isn't
available on long double precision type, and therefore the R
level
function should be faster in most cases.
Another difference is that the low level code uses double precision for the calculations. This means that it is less prone to rounding errors. But this also means that the two functions can sometimes require a different number of iterations and function evaluations to reach the stop criteria. This is shown in the examples.
Another requirement in the current installment of this function is that the series must have only a single maximum. This is the case for most discrete probability distributions and marginalization problems. This limitation will be addressed in the future.
A summed-objects()
object. Note that the sum is returned in the
log
scale.
precompiled()
provides a list with precompiled functions
that can be used for the summation. infiniteSum()
is a more
efficient algorithm.
## Define some function that is known to pass the ratio test. param = 0.1 funfun <- function(k, p) return(k * log1p(-p[1])) result <- infiniteSum_batches(funfun, parameters = param) ## This series is easy to verify analytically TrueSum = -log(param) TrueSum - result$sum # Notice that it required 400 function evaluations for the approximation. result$n # If we use the C function, it reaches a lower error, but requires more # iterations result_C <- infiniteSum_batches_C(funfun, parameters = param) TrueSum - result_C$sum result_C$n ## A common problem is finding the normalizing constant for the ## Conway-Maxwell-Poisson distribution. It has already been included ## in the precompiled list of functions. comp_params = c(lambda = 5, nu = 3) result <- infiniteSum_batches("COMP", comp_params) # With a specifically chosen argument value, the summation can be done with # fewer iterations. But it is usually hard to know the ideal choice for # applications beforehand result$n infiniteSum_batches("COMP", comp_params, batch_size = 11)$n # A small batch_size ensures a small number of iterations, but slows the # method due to multiple checking. infiniteSum_batches("COMP", comp_params, batch_size = 2)$n
## Define some function that is known to pass the ratio test. param = 0.1 funfun <- function(k, p) return(k * log1p(-p[1])) result <- infiniteSum_batches(funfun, parameters = param) ## This series is easy to verify analytically TrueSum = -log(param) TrueSum - result$sum # Notice that it required 400 function evaluations for the approximation. result$n # If we use the C function, it reaches a lower error, but requires more # iterations result_C <- infiniteSum_batches_C(funfun, parameters = param) TrueSum - result_C$sum result_C$n ## A common problem is finding the normalizing constant for the ## Conway-Maxwell-Poisson distribution. It has already been included ## in the precompiled list of functions. comp_params = c(lambda = 5, nu = 3) result <- infiniteSum_batches("COMP", comp_params) # With a specifically chosen argument value, the summation can be done with # fewer iterations. But it is usually hard to know the ideal choice for # applications beforehand result$n infiniteSum_batches("COMP", comp_params, batch_size = 11)$n # A small batch_size ensures a small number of iterations, but slows the # method due to multiple checking. infiniteSum_batches("COMP", comp_params, batch_size = 2)$n
More functions are periodically added to this list for convenience and speed.
These functions are all evaluated in the log scale and pass the ratio test,
that is, the limit of
an+1/an as n
goes to infinity is a value . The value of
is
indicated in each entry. It is calculated automatically when the precompiled
functions are used in the summation.
This series is the kernel of the Conway-Maxwell-Poisson distribution, which generalizes the Poisson and Geometric distributions. Its form is
for and
.
When , this series reduces to the Poisson distribution kernel
and the sum (in the log scale) is known to be
. When
and
, the series reduces to the Geometric
distribution kernel with parameter
. The series is known to
sum to
. Finally, as
goes to
the distribution
approaches a Bernoulli distribution with parameter
.
Another known result is when , in which case the sum is the
modified Bessel function of the first kind of order 0 evaluated at
.
String to access the precompiled function: "COMP"
.
parameter vector: c(lambda, nu)
.
This series is the kernel of the double Poisson distribution, which is a special case of the double exponential family, which extends it. Its form is
CC01C8fC38BWvCn8UdxTAcaF5lGqI0tJ
for and
.
When , this series reduces to the Poisson distribution kernel
and the sum (in the log scale) is known to be 0.
String to acccess the precompiled function: "double_poisson"
.
parameter vector: c(mu, phi)
This is the series form solution for the function. There are more time efficient methods for its evaluation however they don't guarantee good approximations with large parameters. Its form is
mbz8Y164527PHVYWJJsmiXtGrdUMXilG-1
for and
any real value.
The modified Bessel function of the second kind can be obtained with
where represents the modified function of the first kind and
of the second kind. It is worth remembering the
infiniteSum()
function returns the sum in the log scale, which must be adjusted for the
formula above.
String to access the precompiled function: "bessel_I"
parameter vector: c(x, alpha)
This is the same function as the one above, except that parameter is
given in the log scale. This is provided for numerical stability. For the
cases where
x
is not very large, sums using this function and the
above should return the same sum.
String to access the precompiled function: "bessel_I_logX"
parameter vector: c(logx, alpha)
In some cases, the sum of the series is known in closed form for some values of the parameters. The package function does not check for these cases and just performs the approximation. If the exact value is desired by the user when it is known, they must take responsibility for checking and providing these values.
Another important thing to note is that the precompiled functions perform all calculations with twice the numerical precision than R. Therefore, in some cases, there might be very small differences in the sum when comparing the results of the function using the precompiled function and the same function defined at the R level.
infiniteSum()
, finiteSum()
and
infiniteSum_batches()
Contains the summations in the log scale. The value can either be an approximation to an infinite series or a finite sum.
## S3 method for class 'summed' print(x, ...) ## S3 method for class 'summed' as.double(x, ...)
## S3 method for class 'summed' print(x, ...) ## S3 method for class 'summed' as.double(x, ...)
x |
The |
... |
Currently unused. |
For print
: The invisible object.
For as.numeric
/as.double
: The approximated sum.
sum
The resulting sum in the log scale.
n
The performed number of iterations. This value represents the number of series elements evaluations performed during the summation.
method
The method used for the summation.
maxReached
TRUE
or FALSE
. Indicates whether the
maximum iterations was reached.
infiniteSum()
, infiniteSum_batches()
and
finiteSum()
for available methods.