Discussion:
x86_64 Optimizer Overhaul
(too old to reply)
J. Gareth Moreton
2018-12-01 15:06:00 UTC
Permalink
Hi everyone,

So for the fast few weeks, I've attempted to do some work on the peephole
optimizer, specifically to reduce the number of passes required to produce
optimised code, and I can finally submit a patch:
https://bugs.freepascal.org/view.php?id=34628

This is only for x86_64, and I've tried to keep i386 separate until I know
x86_64 works.  I hope it works and is to a high standard.  The bug report
should explain most of the logic behind the changes.

I've had problems testing it under Linux due to configuration
difficulties, so if anyone is willing to try out "make all", I'll be most
grateful.

Thank you.

Gareth aka. Kit
J. Gareth Moreton
2018-12-02 04:39:12 UTC
Permalink
Following advice from Florian, I've split my submission into five separate
patches so they are easier to test.  It also now compiles under
x86_64-linux.  It seems that there's an apparent fault with one of the MOV
optimisations that was causing incorrect code to be generated in some
instances.  I have a good idea as to what's going on and can try to fix
this at another time.

Hopefully now it's stable enough for time metrics to be taken and to
confirm it doesn't break other platforms.
Some more refactoring should be performed down the line; I plan to do this
once my code is confirmed reasonable and I begin adapting it for i386,
where there's a bounty for speed gains!

Find all the new patch files over here:
https://bugs.freepascal.org/view.php?id=34628 - note that some of the
patches require others to work; prerequisite information is given in the
first note.

Gareth aka. Kit
J. Gareth Moreton
2018-12-02 20:30:11 UTC
Permalink
Thanks for the feedback.  Do you have a reproducible case, and does it
fail on Linux or Windows?  I'll have a look for the infinite loops in the
meantime.

Gareth aka. Kit
Post by J. Gareth Moreton
I've had problems testing it under Linux due to configuration
difficulties, so if anyone is willing to try out "make all", I'll be most
grateful. 

"make all" work well on linux.

Compiler options -O3 and -O4 are broken.
It was possible to compile my program, but program at some point went into
never ending loop - cpu usage 100% and response zero.

Compiling my speed test program using -O2, optimizations made by Overhaul,
was speed lose by 2% comparing to current trunk.  I guess, optimizations
is good for compiler itself, but no so much for user programs.

margers
       
Marģers . via fpc-devel
2018-12-02 23:21:06 UTC
Permalink
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

J. Gareth Moreton
2018-12-02 22:24:47 UTC
Permalink
That's interesting. Thanks for that. Time to get fixing.

In the meantime I'm also fixing up the buggy optimisation that caused the
original crash on Linux... nothing against the contributor, but it looks
like some badly-copied code from the MovXX routine... it even still
mentions "movsx" in the comments!

Hopefully this effort on the overhaul won't all be for naught.

Gareth aka. Kit

On Sun 02/12/18 23:21 , "Marģers ." ***@inbox.lv sent:
I run it no linux. Problem code part.

type PLongData = ^TLongData;
      TLongData = array [0..100] of longint;

function binarySearchLong ( sortedArray:PLongData; nLen,
toFind:longint):longint;
var low, high, mid, l, h, m : longint;
begin
    { Returns index of toFind in sortedArray, or -1 if not found}
    low := 0;
    high := nLen - 1;

    l := sortedArray^[low];
    h := sortedArray^[high];

    while ((l = toFind)) do
    begin
         mid := (low + high) shr 1;   { var "low" in register
r8d }
         m := sortedArray^[mid];

         if (m < toFind) then
         begin
              low := mid + 1;
              l := sortedArray^[low];

        { asm code generated
-- with trunk
        lea     r8d,
[r11d+1H]                          
    mov  esi, r8d
--end trunk
-- with overhaul   it never set r8d to new value, but should
        lea     esi,
[r11d+1H]                          
-- end  overhaul

        mov     r10d, dword
[rdi+rsi*4]                 
        jmp    
?_00144                                 

        }
         end else
         if (m > toFind) then
         begin
              high := mid - 1;
              h := sortedArray^[high];
         end else
         begin
            binarySearchLong:=mid;
            exit;
         end;
         
    end;

    if (sortedArray^[low] = toFind) then
    begin
         binarySearchLong:=low;
    end else
        binarySearchLong := -1; { Not found}
end;

      ----- Reply to message -----
Subject: Re: [fpc-devel] x86_64 Optimizer Overhaul
Date: 2018. gada 2. decembris 23:32:36
From: J. Gareth Moreton
To: FPC developers' list Thanks for the feedback.  Do you have a
reproducible case, and does it fail on Linux or Windows?  I'll have a look
for the infinite loops in the meantime.   Gareth aka. Kit      
Marģers . via fpc-devel
2018-12-02 20:54:39 UTC
Permalink
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Loading...