[fpc-devel] The 15k bounty: Optimizing executable speed forLinux x86 / LLVM

J. Gareth Moreton

2018-11-17 02:52:28 UTC

At the moment, I'm experimenting with overhauling the x86_64 optimizer to
see if I can reduce the number of passes through a block of code - my hope
is to greatly increase the speed of the compiler without sacrificing the
optimisations performed under -O1 and -O2.Â At present, I've attempted to
not modify i386 because I wish to use it as a control case (i.e. do my
changes break other platforms?)

It's probably not worthy of the bounty, but I'm enjoying the challenge to
seeing if I can improve the overall speed in places.
Gareth aka. Kit

With some compiler tuning and a few tricks (two changes to the code

and hand-simulated peephole optimizations, but I

You can improve performance further by devirtualising all method calls

using wpo. First compile it with -FWvipri.wpo

-OWDEVIRTCALLS,OPTVMTS and next with -Fwvipri.wpo

-OwDEVIRTCALLS,OPTVMTS (at least on my machine it gives a small boost,

and makes the results also more stable).
Since I only have a preliminary llvm version (with Dwarf EH) running on

macOS, I can't provide a direct Kylix

comparison. The versions below are both x86-64. As mentioned before, a

32 bit FPC/LLVM is still quite a way off.

$ time ./vipribenchmemcache_nodeps
VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0,

NumberOfChannels=6, BufferPackets=5000,

NumberOfSynchroThreads=4

.................................................................................................

Time: 5016ms = 9669059 pkts/s = 14680 MB/s
realÂ Â Â 0m5.137s
userÂ Â Â 0m5.042s
sysÂ Â Â 0m0.017s
FPC 3.3.1 + llvm (clang from Xcode 10.1 with -O3 on FPC-generated llvm

IR) and -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS (no

$ time ./vipribenchmemcache_nodeps_llvm
VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0,

NumberOfChannels=6, BufferPackets=5000,

NumberOfSynchroThreads=4

.................................................................................................................

Time: 5018ms = 11259466 pkts/s = 17094 MB/s
realÂ Â Â 0m5.161s
userÂ Â Â 0m5.060s
sysÂ Â Â 0m0.017s

compiler/nmem.pas | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/compiler/nmem.pas b/compiler/nmem.pas
index d5c1d85e8f..52add1fd81 100644
--- a/compiler/nmem.pas
+++ b/compiler/nmem.pas
@@ -1176,7 +1176,7 @@ implementation
begin
include(flags,nf_write);
{ see comment in tsubscriptnode.mark_write }
- if not(is_implicit_pointer_object_type(left.resultdef)) then
+ if not(is_implicit_array_pointer(left.resultdef)) then
left.mark_write;
end;
?

Hmmm, needs a few more of my changes to make work, though it should work
if used only with the benchmark.

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[1]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Links:
------
[1] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel