Discussion:
Nested functions in numlib
(too old to reply)
Werner Pamler
2017-04-03 16:32:25 UTC
Permalink
Raw Message
In addition to the incomplete gamma function I am planning to add a
series of other special functions useful for statistical calculations,
among them the incomplete beta function and its inverse.

Calculation of the inverse incomplete beta function usually is performed
in the literature by numerical root finding procedures. Numlib is
well-equipped for this purpose, it has the unit "roo" with the procedure
"roof1r" which does a bisection search of the root lying between
positive and negative guess values. The function for which the zero
point is to be found must be specified as a procedure variable of type
rfunc1r (defined in "typ", probably meaning "real function of one real
variable"); in case of the inverse incomplete beta function, this
function is the complete beta function, of course, and this depends on
three parameters, a, b, and x.

In the current state of numlib, parameters must be declared globally so
that the function passed as a parameter gets access to them:

var
_a, _b, _y: ArbFloat;

function _betai(x: ArbFloat): ArbFloat;
begin
Result := betai(_a, _b, x) - _y;
end;

function invbetai(a, b, y: ArbFloat): ArbFloat;
const
EPS = 1e-7;
var
term: ArbInt = 0;
begin
_a := a;
_b := b;
_y := y;
roof1r(@_betai, 0, 1, EPS, EPS, Result, term);
end;

This is not very nice. It would be better if the test function could be
declared as a nested function inside "invbetai":

function invbetai(a, b, y: ArbFloat): ArbFloat;

function _invbetai(x: ArbFloat): ArbFloat;
begin
Result := betai(a, b, x) - y;
end;

const
EPS = 1e-7;
var
term: ArbInt = 0;
begin
roof1r(@_betai(0, 1, EPS, EPS, Result, term);
end;

In order to make this construction pass the compiler I would like to

- declare another rfuncf1r type, now as "is nested"
type
rfuncf1rn = function(x: ArbFloat): ArbFloat is nested; //
trailing "n" means "nested"

- modify roof1r to use the nested function
Procedure roof1r(f: rfunc1rn; a, b, ae, re: ArbFloat; Var x:
ArbFloat; var term: ArbInt); overload;

- to avoid breaking existing code and to avoid forcing users of the
original version to add a {$modeswitch nestedprocvars} to their code, I
want to overload this function with a version using the non-nested
function, but calling the nested version:

Procedure roof1r(f: rfunc1r; a, b, ae, re: ArbFloat; Var x: ArbFloat;
Var term: ArbInt); overload;

function _f(x: ArbFloat): ArbFloat;
begin
Result := f(x);
end;

begin
roof1r(@_f, a, b, ae, re, x, term);
end;

Is there a chance that such a patch would be accepted?

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo
Marco van de Voort
2017-04-04 01:28:55 UTC
Permalink
Raw Message
Post by Werner Pamler
Is there a chance that such a patch would be accepted?
Did you test performance? Repeated access to parent frame in tight loops
might be suboptimal. Could maybe be helped with some pointer work?
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/f
Werner Pamler
2017-04-04 15:33:41 UTC
Permalink
Raw Message
Post by Marco van de Voort
Did you test performance? Repeated access to parent frame in tight loops
might be suboptimal. Could maybe be helped with some pointer work?
Right, I should have done that before asking...

Here are the results of a test running the original roof1r routine (A),
the modified one using the nested function (B) and other modified one
using a non-nested function but calling the version with the nested
function (C). In each case, several functions are passed to the root
finder which is called 5 million times, each call with a (reproducibly)
different parameter:

f(x) = x
(A) ORIGINAL version: 0.656s for 5000000
runs (check: y = 0.00000000)
(B) NESTED version: 0.703s for 5000000
runs (7%)
(C) Global function calling nested function: 0.735s for 5000000
runs (12%)

f(x) = x^2
ORIGINAL version: 6.296s for 5000000
runs (check: y = 0.00000000)
NESTED version: 6.313s for 5000000
runs (0%)
Global function calling nested function: 6.546s for 5000000
runs (4%)

f(x) = exp(x)
ORIGINAL version: 6.734s for 5000000
runs (check: y = 0.00000000)
NESTED version: 6.703s for 5000000
runs (0%)
Global function calling nested function: 6.890s for 5000000
runs (2%)

f(x) = arcsin(x)
ORIGINAL version: 5.718s for 5000000
runs (check: y = 0.00000000)
NESTED version: 5.718s for 5000000
runs (0%)
Global function calling nested function: 5.937s for 5000000
runs (4%)

f(x) = erf(x)
ORIGINAL version: 6.391s for 5000000
runs (check: y = 0.00000000)
NESTED version: 6.422s for 5000000
runs (0%)
Global function calling nested function: 6.673s for 5000000
runs (4%)

f(x) = gammaLn(x)
ORIGINAL version: 15.260s for 5000000
runs (check: y = 0.00000000)
NESTED version: 15.142s for 5000000
runs (-1%)
Global function calling nested function: 15.426s for 5000000
runs (1%)

I would interpret these results such that there are no dramatic
slow-downs due to calling variant C. Variant B (nested funtion) is
roughly the same speed as the original procedure.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/list
Marco van de Voort
2017-04-04 06:59:17 UTC
Permalink
Raw Message
Post by Marco van de Voort
Post by Werner Pamler
Is there a chance that such a patch would be accepted?
Did you test performance? Repeated access to parent frame in tight loops
might be suboptimal. Could maybe be helped with some pointer work?
(no it can't be helped with pointer work, it is a loop around a function
call, not a memory access, don't reply in the middle of the
night Marco)
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://l
Michael Schnell
2017-04-04 08:10:50 UTC
Permalink
Raw Message
Performance-wise, moreover for some tasks (such as Matrix
multiplication) with modern multi-core machines parallel calculation
could increase performance greatly. This can be done e.g. by using a
thread pool. (I once did a thread pool implementation based on TThread,
but I suppose there are more "official" sources).

-Michael
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepasca

Loading...