Got some new timing metrics for compiling Lazarus under Windows. I haven't
got them for Linux because I only have it on a virtual machine, which
significantly skews the performance:
----
-O3: Trunk
[125.383] 1285571 lines compiled, 125.4 sec, 9137600 bytes code, 788740
bytes data
[122.078] 1285571 lines compiled, 122.1 sec, 9137600 bytes code, 788740
bytes data
[119.125] 1285571 lines compiled, 119.1 sec, 9137600 bytes code, 788740
bytes data
Avg. 122.195 sec. Binary size = 19,325,952 bytes
-O3: Overhaul
[103.133] 1285571 lines compiled, 103.1 sec, 9133968 bytes code, 788740
bytes data
[105.234] 1285571 lines compiled, 105.2 sec, 9133968 bytes code, 788740
bytes data
[104.906] 1285571 lines compiled, 104.9 sec, 9133968 bytes code, 788740
bytes data
Avg. 104.424 sec. Binary size = 19,322,368 bytes
Time improvement: 14.5%
Size improvement: 0.0185%
----
-O2: Trunk
[118.852] 1285571 lines compiled, 118.9 sec, 9103760 bytes code, 788996
bytes data
[120.266] 1285571 lines compiled, 120.3 sec, 9103760 bytes code, 788996
bytes data
[116.531] 1285571 lines compiled, 116.5 sec, 9103760 bytes code, 788996
bytes data
Avg. 118.550 sec. Binary size = 19,292,672 bytes
-O2: Overhaul
[100.875] 1285571 lines compiled, 100.9 sec, 9100096 bytes code, 788996
bytes data
[100.922] 1285571 lines compiled, 100.9 sec, 9100096 bytes code, 788996
bytes data
[101.813] 1285571 lines compiled, 101.8 sec, 9100096 bytes code, 788996
bytes data
Avg. 101.203 sec. Binary size = 19,289,088 bytes
Time improvement: 14.6%
Size improvement: 0.0186%
----
-O1: Trunk
[114.641] 1285571 lines compiled, 114.6 sec, 10196576 bytes code, 788996
bytes data
[112.734] 1285571 lines compiled, 112.7 sec, 10196576 bytes code, 788996
bytes data
[113.516] 1285571 lines compiled, 113.5 sec, 10196576 bytes code, 788996
bytes data
Avg. 113.630 sec. Binary size = 20,370,432 bytes
-O1: Overhaul
[99.711] 1285571 lines compiled, 99.7 sec, 10193536 bytes code, 788996
bytes data
[102.375] 1285571 lines compiled, 102.4 sec, 10193536 bytes code, 788996
bytes data
[102.211] 1285571 lines compiled, 102.2 sec, 10193536 bytes code, 788996
bytes data
Avg. 101.432 sec. Binary size = 20,370,360 bytes
Time improvement: 10.7%
Size improvement: 0.000353%
----
Note there are no actual new optimisations, so to speak. The size
improvements come from more intelligent elimination of dead labels and the
removal of unnecessary alignment hints, which allows some new branch
optimisations to be found.
One thing that is a little bit mysterious and might warrant further
investigation... the binary size is larger on O3 than it is for O2 on both
the trunk and the overhaul patches. It might be that some of the
optimisations sacrifice code size for speed, but over 3 kilobytes feels
rather significant. I may be wrong though.
Does anyone have other test projects to compile that would give more
coverage for the timing metrics?
Gareth aka. Kit
On Thu 06/12/18 15:55 , "J. Gareth Moreton" ***@moreton-family.com
sent:
I believed I've fixed the bug. Thanks for your help.
I had misunderstood one of the internal methods and, as a result, it
wasn't resetting the register allocation usage with each iteration of the
loop (and to add insult to injury, caused a memory leak!). By sheer
coincidence, this wasn't a problem under Windows because of some additional
code that skipped over the function prologue, but got triggered under
Linux.
I've updated all of the patch files in the bug report and added an
additional one, since one function in particular got a bigger rework than
everything else (overhaul-mov-refactor).
I haven't had a chance to re-test the timings yet, although I've tried to
provide a couple of additional savings for -O1 and -O2.
Gareth aka. Kit
P.S. Note that the code is very messy with functions being split between
i386 and x86_64. This is for testing and control cases. If x86_64 is
successful, I intend to remove the distinctions and have i386 and x86_64
share the same overhaul. One platform at a time though!
On Sun 02/12/18 23:21 , "Marģers ." ***@inbox.lv sent:
I run it no linux. Problem code part.
type PLongData = ^TLongData;
     TLongData = array [0..100] of longint;
function binarySearchLong ( sortedArray:PLongData; nLen,
toFind:longint):longint;
var low, high, mid, l, h, m : longint;
begin
   { Returns index of toFind in sortedArray, or -1 if not found}
   low := 0;
   high := nLen - 1;
   l := sortedArray^[low];
   h := sortedArray^[high];
   while ((l = toFind)) do
   begin
        mid := (low + high) shr 1;  { var "low" in register
r8d }
        m := sortedArray^[mid];
        if (m < toFind) then
        begin
             low := mid + 1;
             l := sortedArray^[low];
      { asm code generated
-- with trunk
       lea    r8d,
[r11d+1H]Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
   mov esi, r8d
--end trunk
-- with overhaul  it never set r8d to new value, but should
       lea    esi,
[r11d+1H]Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
-- end overhaul
       mov    r10d, dword
[rdi+rsi*4]Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
       jmp   Â
?_00144Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
      }
        end else
        if (m > toFind) then
        begin
             high := mid - 1;
             h := sortedArray^[high];
        end else
        begin
           binarySearchLong:=mid;
           exit;
        end;
       Â
   end;
   if (sortedArray^[low] = toFind) then
   begin
        binarySearchLong:=low;
   end else
       binarySearchLong := -1; { Not found}
end;
   ----- Reply to message -----
Subject: Re: [fpc-devel] x86_64 Optimizer Overhaul
Date: 2018. gada 2. decembris 23:32:36
From: J. Gareth Moreton
To: FPC developers' list Thanks for the feedback. Do you have a
reproducible case, and does it fail on Linux or Windows? I'll have a look
for the infinite loops in the meantime.  Gareth aka. Kit   Â
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org [1]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Links:
------
[1] mailto:fpc-***@lists.freepascal.org
[2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel