Discussion:
[fpc-devel] ARM/AARCH64 work
J. Gareth Moreton via fpc-devel
2021-04-26 06:09:52 UTC
Permalink
HI everyone,

So a quick update on my current work in progress on ARM/AArch64. First the annoying news, besides
the broken laptop... I've mislaid my ARM (32-bit) MicroSD card for the Raspberry Pi, so I can't test
on that platform for the moment until I find it again. Hopefully I can find it, otherwise I'll have
to buy a new one and wait for my laptop to return so I can flash the 32-bit Raspberry Pi OS onto it.

In terms of actual development, I've been pursuing a couple of things so far. One is some improved
peephole optimisations to ldr and str statements, and the other is implementing "magic division"
where division by a constant is replaced with a multiplication. The ldr/str optimisations have
stalled for the moment because of the heap corruption bug that occurs on the trunk, and my
optimisations seem to expose it a bit more, while my magic-div changes are almost there, but I'm
having problems with very large numbers. In actuality, none of the dedicated division tests picked
it up, but I got some mysterious failures elsewhere, and I eventually found a reproducible case in a
benchmark test I'm writing. This also shows the speed improvements when built under -O2:

Trunk:

Division compilation and timing test (using constants from System and Sysutils)
-------------------------------------------------------------------------------
Unsigned 32-bit division by 2 - Pass - average iteration duration: 2.095 ns
Unsigned 32-bit division by 3 - Pass - average iteration duration: 4.191 ns
Unsigned 32-bit division by 10 - Pass - average iteration duration: 3.958 ns
Unsigned 32-bit division by 100 - Pass - average iteration duration: 3.492 ns
Unsigned 64-bit division by 2 - Pass - average iteration duration: 2.095 ns
Unsigned 64-bit division by 3 - Pass - average iteration duration: 4.191 ns
Unsigned 64-bit division by 5 - Pass - average iteration duration: 3.958 ns
Unsigned 64-bit division by 10 - Pass - average iteration duration: 4.191 ns
Unsigned 64-bit division by 1,000,000,000 - Pass - average iteration duration: 6.519 ns
Signed 64-bit division by 10 - Pass - average iteration duration: 4.191 ns
Signed 64-bit division by 18 - Pass - average iteration duration: 3.958 ns
Signed 64-bit division by 24 - Pass - average iteration duration: 3.725 ns
Signed 64-bit division by 10,000 (Currency) - Pass - average iteration duration: 6.985 ns
Signed 64-bit division by 86,400,000 - Pass - average iteration duration: 5.821 ns

ok
- Sum of average durations: 59.372 ns
- Overall average duration: 4.241 ns

magic-div:

Division compilation and timing test (using constants from System and Sysutils)
-------------------------------------------------------------------------------
Unsigned 32-bit division by 2 - Pass - average iteration duration: 1.630 ns
Unsigned 32-bit division by 3 - Pass - average iteration duration: 2.328 ns
Unsigned 32-bit division by 10 - Pass - average iteration duration: 2.328 ns
Unsigned 32-bit division by 100 - Pass - average iteration duration: 2.328 ns
Unsigned 64-bit division by 2 - Pass - average iteration duration: 1.630 ns
Unsigned 64-bit division by 3 - Pass - average iteration duration: 3.027 ns
Unsigned 64-bit division by 5 - Pass - average iteration duration: 3.027 ns
Unsigned 64-bit division by 10 - Pass - average iteration duration: 3.027 ns
Unsigned 64-bit division by 1,000,000,000 - FAIL - 18446744073709551615 div 1000000000; expected
18446744073 got 1266874893
Signed 64-bit division by 10 - Pass - average iteration duration: 3.027 ns
Signed 64-bit division by 18 - Pass - average iteration duration: 3.027 ns
Signed 64-bit division by 24 - Pass - average iteration duration: 3.027 ns
Signed 64-bit division by 10,000 (Currency) - Pass - average iteration duration: 3.027 ns
Signed 64-bit division by 86,400,000 - Pass - average iteration duration: 3.027 ns

I figure once I fix that failure, I can submit a patch. I'll submit the bench test too because it
will be good for speed comparisons and can act as a test case itself.

Gareth aka. Kit
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
https://lists.freepascal.org/cgi-bin
Florian Klämpfl via fpc-devel
2021-04-26 16:26:33 UTC
Permalink
Post by J. Gareth Moreton via fpc-devel
and the other is implementing "magic division"
where division by a constant is replaced with a multiplication.
You are aware there is code for arm 32 bit and e.g. x86 which can most likely be reused/adapted? It might be even that meanwhile somebody made a generic implementation of it?
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
https://list
J. Gareth Moreton via fpc-devel
2021-04-26 14:14:35 UTC
Permalink
Indeed. I'm reusing the code that generates
the magic constant for me and any additional
flags. I've also fixed the bug that caused
the failure in my test. Just doing some
final cheeks and improving my bench test.

It might be possible to make more code cross
platform later on. I'mm not due.

Gareth aka. Kit

On Mon 26/04/21 17:26 , Florian Klämpfl via
Post by J. Gareth Moreton via fpc-devel
Am 26.04.2021 um 08:09 schrieb J. Gareth
Moreton
Post by J. Gareth Moreton via fpc-devel
and the other is implementing "magic
division"
where division by a constant is replaced
with a
Post by J. Gareth Moreton via fpc-devel
multiplication.
You are aware there is code for arm 32 bit
and e.g. x86 which can most
Post by J. Gareth Moreton via fpc-devel
likely be reused/adapted? It might be even
that meanwhile somebody made a
Post by J. Gareth Moreton via fpc-devel
generic implementation of it?
____________________________________________
___
Post by J. Gareth Moreton via fpc-devel
fpc-devel maillist - fpc-
https://lists.freepascal.org/cgi-
bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
https://lists.freepascal.
J. Gareth Moreton via fpc-devel
2021-04-26 15:25:03 UTC
Permalink
I meant to say "I'm not sure" at the end there. The joys of writing an e-mail on the phone!

Gareth aka. KIt
Post by J. Gareth Moreton via fpc-devel
Indeed. I'm reusing the code that generates
the magic constant for me and any additional
flags. I've also fixed the bug that caused
the failure in my test. Just doing some
final cheeks and improving my bench test.
It might be possible to make more code cross
platform later on. I'mm not due.
Gareth aka. Kit
On Mon 26/04/21 17:26 , Florian Klämpfl via
Post by J. Gareth Moreton via fpc-devel
Am 26.04.2021 um 08:09 schrieb J. Gareth
Moreton
Post by J. Gareth Moreton via fpc-devel
and the other is implementing "magic
division"
where division by a constant is replaced
with a
Post by J. Gareth Moreton via fpc-devel
multiplication.
You are aware there is code for arm 32 bit
and e.g. x86 which can most
Post by J. Gareth Moreton via fpc-devel
likely be reused/adapted? It might be even
that meanwhile somebody made a
Post by J. Gareth Moreton via fpc-devel
generic implementation of it?
____________________________________________
___
Post by J. Gareth Moreton via fpc-devel
fpc-devel maillist - fpc-
de
Post by J. Gareth Moreton via fpc-devel
https://lists.freepascal.org/cgi-
bin/mailman/listinfo/fpc-devel
_______________________________________________
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
https://lists.fr
J. Gareth Moreton via fpc-devel
2021-04-26 19:41:34 UTC
Permalink
Submitted a patch with the "magic division" and a second patch with a new bench test to showcase the
improvements. I hope it is to your satisfaction.

https://bugs.freepascal.org/view.php?id=38806

Gareth aka. Kit
Post by J. Gareth Moreton via fpc-devel
I meant to say "I'm not sure" at the end there. The joys of
writing an e-mail on the phone!
Gareth aka. KIt
Post by J. Gareth Moreton via fpc-devel
Indeed. I'm reusing the code that generates
the magic constant for me and any additional
flags. I've also fixed the bug that caused
the failure in my test. Just doing some
final cheeks and improving my bench test.
It might be possible to make more code cross
platform later on. I'mm not due.
Gareth aka. Kit
On Mon 26/04/21 17:26 , Florian Klämpfl via
Post by J. Gareth Moreton via fpc-devel
Am 26.04.2021 um 08:09 schrieb J.
Gareth
Post by J. Gareth Moreton via fpc-devel
Moreton
Post by J. Gareth Moreton via fpc-devel
and the other is implementing
"magic
Post by J. Gareth Moreton via fpc-devel
Post by J. Gareth Moreton via fpc-devel
division"
where division by a constant is
replaced
Post by J. Gareth Moreton via fpc-devel
with a
Post by J. Gareth Moreton via fpc-devel
multiplication.
You are aware there is code for arm 32 bit
and e.g. x86 which can most
Post by J. Gareth Moreton via fpc-devel
likely be reused/adapted? It might be even
that meanwhile somebody made a
Post by J. Gareth Moreton via fpc-devel
generic implementation of it?
____________________________________________
___
Post by J. Gareth Moreton via fpc-devel
fpc-devel maillist - fpc-
de
lists.freepascal.org
Post by J. Gareth Moreton via fpc-devel
Post by J. Gareth Moreton via fpc-devel
https://lists.freepascal.org/cgi-
bin/mailman/listinfo/fpc-devel
_______________________________________________
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
https://lists.freepascal.o

Loading...