Discussion:
Bitset assembler
(too old to reply)
Jy V
2016-09-08 09:02:39 UTC
Permalink
Raw Message
Hello to all assembler experts,

I would greatly appreciate if some people could help me prepare some asm
code for FreePascal for Win32, Win64, Linux x86, Linux x64 (and maybe some
ARM32bit + AARCH64)
I am using Lazarus 1.6, FPC 3.0.0 SVN revision 51630
x86_64-win64-win32/win64

function BitsetGet(const Bits; Index: UInt32): Boolean; assembler;
asm
{$IFDEF WIN64} // Win64 IN: rcx = Bits, edx = Index OUT: rax = Result
bt (%rcx), %edx // -> Error asm: [bt reg32,mem32]
// bt (%rcx), %rdx // -> Error asm: [bt reg64,mem64]
sbb %eax, %eax
and %eax, $1
{$ELSE} // Linux IN: rdi = Bits, esi = Index OUT: rax = Result
bt (%rdi), %esi
sbb %rax, %rax
and %rax, $1
{$ENDIF}
end;

for reference the x86 DCC code which is working :

function BitsetGet(const Bits; Index: UInt32): Boolean;
asm
bt [eax], edx
sbb eax, eax
and eax, 1
end;

and the x64 DCC code which should be working :

function BitsetGet(const Bits; Index: UInt32): Boolean;
asm
bt [rcx], edx
sbb eax, eax
and eax, 1
end;

--------------------------------------------------------
procedure BitsetSet(var Bits; Index: UInt32); assembler;
asm
{$IFDEF WIN64} // Win64 IN: rcx = Bits, edx = Index OUT: eax = Result
bts (%rcx), %edx // -> Error asm: [bt reg32,mem32]
sbb %eax, %eax
and %eax, $1
{$ELSE} // Linux IN: rdi = Bits, esi = Index OUT: eax = Result
bts (%rdi), %esi
sbb %eax, %eax
and %eax, $1
{$ENDIF}
end;

for reference the x86 DCC code which is working :

procedure BitsetSet(var Bits; Index: UInt32);
asm
bts [eax], edx
end;

and the x64 DCC code which should be working :

function BitsetGet(const Bits; Index: UInt32): Boolean;
asm
bt [rcx], edx
sbb eax, eax
and eax, 1
end;


--------------------------------------------------------
procedure BitsetReset(var Bits; Index: UInt32); assembler;
asm
{$IFDEF WIN64} // Win64 IN: rcx = Bits, edx = Index OUT: eax = Result
btr (%rcx), %edx // -> Error asm: [bt reg32,mem32]
sbb %eax, %eax
and %eax, $1
{$ELSE} // Linux IN: rdi = Bits, esi = Index OUT: eax = Result
btr (%rdi), %esi
sbb %eax, %eax
and %eax, $1
{$ENDIF}
end;

for reference the x86 DCC code which is working :

procedure BitsetReset(var Bits; Index: UInt32);
asm
btr [eax],edx
end;

and the x64 DCC code which should be working :

procedure BitsetReset(var Bits; Index: UInt32);
asm
btr [rcx], edx
sbb eax, eax
and eax, 1
end;

Thank you for any help.
Michael Schnell
2016-09-08 11:35:40 UTC
Permalink
Raw Message
Just a note:

Such functions often are needed for semaphores etc.

In that case they need to take multi-thread and multi-Processor issues
into account.

-Michael
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Thaddy de Koning
2016-09-09 09:00:34 UTC
Permalink
Raw Message
Before I answer that: did you check what assembler code the compiler
generates? That may be just as efficient as handcoded assembly in this
case.
With the proper optimizations it will probably hard to improve on.
Compile the code with -O4 and -s. That generates the assembler output in a
*.s file.

The compiler is rather good at bitmaniputations.
Post by Jy V
Hello to all assembler experts,
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Jy V
2016-09-10 10:55:39 UTC
Permalink
Raw Message
Post by Thaddy de Koning
With the proper optimizations it will probably hard to improve on.
Compile the code with -O4 and -s. That generates the assembler output in a
*.s file.
The compiler is rather good at bitmaniputations.
I am not sure the FreePascal compiler is able to convert the code of the
procedure BitsetSet(var Bits; Index: UInt32);
PUInt64Array(@Bits)^[Index shr 6] := PUInt64Array(@Bits)^[Index shr 6] or
(Int64(1) shl (Index and 63));

into a single instruction:

bts [eax], edx
Jonas Maebe
2016-09-10 15:17:20 UTC
Permalink
Raw Message
Post by Jy V
I am not sure the FreePascal compiler is able to convert the code of the
procedure BitsetSet(var Bits; Index: UInt32);
or (Int64(1) shl (Index and 63));
bts [eax], edx
It could easily do it with

type
tbitarray = bitpacked array[0..high(qword)-1] of boolean;
pbitarray = ^tbitarray;

var
ba: pbitarray;
index: qword;
begin
...
ba^[index]:=1;
end.

but only *if* someone would first override
thlcgobj.a_load_regconst_subsetref_intern() for x86 in the compiler
source code and implement the special cases for setting a single bit to
0 or 1 (which is not the case, currently). The bts instruction is
already used for include(setvar,value), but sets are obviously limited
to 256 elements.


Jonas
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Jy V
2016-09-11 07:43:58 UTC
Permalink
Raw Message
Thank you Jonas,
so back to my original question,
is there an asm expert out there who knows if the syntax is invalid, or
simply the compiler does not implement bt, bts, btr instructions

function BitsetGet(const Bits; Index: UInt32): Boolean; assembler;
asm
{$IFDEF WIN64} // Win64 IN: rcx = Bits, edx = Index OUT: rax = Result
bt (%rcx), %edx // -> Error asm: [bt reg32,mem32]
// bt (%rcx), %rdx // -> Error asm: [bt reg64,mem64]
sbb %eax, %eax
and %eax, $1
{$ELSE} // Linux IN: rdi = Bits, esi = Index OUT: rax = Result
bt (%rdi), %esi
sbb %rax, %rax
and %rax, $1
{$ENDIF}
end;
Post by Jonas Maebe
Post by Jy V
I am not sure the FreePascal compiler is able to convert the code of the
procedure BitsetSet(var Bits; Index: UInt32);
or (Int64(1) shl (Index and 63));
bts [eax], edx
It could easily do it with
type
tbitarray = bitpacked array[0..high(qword)-1] of boolean;
pbitarray = ^tbitarray;
var
ba: pbitarray;
index: qword;
begin
...
ba^[index]:=1;
end.
but only *if* someone would first override thlcgobj.a_load_regconst_subsetref_intern()
for x86 in the compiler source code and implement the special cases for
setting a single bit to 0 or 1 (which is not the case, currently). The bts
instruction is already used for include(setvar,value), but sets are
obviously limited to 256 elements.
Jonas
_______________________________________________
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Tomas Hajny
2016-09-11 08:36:45 UTC
Permalink
Raw Message
Post by Jy V
Thank you Jonas,
so back to my original question,
is there an asm expert out there who knows if the syntax is invalid, or
simply the compiler does not implement bt, bts, btr instructions
.
.

In general, GNU assembler syntax requires you to specify the operand size
in the instruction name. You can use the Intel syntax (i.e. the working
code you tried with DCC) when adding {$ASMMODE INTEL} at the top. You can
also have a look at the translated GNU assembler syntax version by
compiling with command line parameter -a and looking at the generated file
with prefix .s.

Hope this helps

Tomas


_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Jy V
2016-09-11 12:51:13 UTC
Permalink
Raw Message
Thank you Thomas, I will experiment with {$ASMMODE INTEL}
Post by Tomas Hajny
Post by Jy V
Thank you Jonas,
so back to my original question,
is there an asm expert out there who knows if the syntax is invalid, or
simply the compiler does not implement bt, bts, btr instructions
.
.
In general, GNU assembler syntax requires you to specify the operand size
in the instruction name. You can use the Intel syntax (i.e. the working
code you tried with DCC) when adding {$ASMMODE INTEL} at the top. You can
also have a look at the translated GNU assembler syntax version by
compiling with command line parameter -a and looking at the generated file
with prefix .s.
Hope this helps
Tomas
_______________________________________________
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Jeppe Johansen
2016-09-11 13:11:27 UTC
Permalink
Raw Message
Post by Jy V
Hello to all assembler experts,
I would greatly appreciate if some people could help me prepare some
asm code for FreePascal for Win32, Win64, Linux x86, Linux x64 (and
maybe some ARM32bit + AARCH64)
I am using Lazarus 1.6, FPC 3.0.0 SVN revision 51630
x86_64-win64-win32/win64
Here's an ARM version that runs in 5 cycles on a Cortex A8:
mov r2,r1,lsr #5
mov r12,#1
ldr r3,[r0, r2, lsl #2]!
orr r2,r3,r12,lsl r1
str r2,[r0]
and r0,r12,r3,lsr r1

It's one cycle faster than what the compiler can generate due to it not
doing the pre-indexed writeback optimization when the address
calculation has shifts.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Jonas Maebe
2016-09-11 13:23:14 UTC
Permalink
Raw Message
Post by Jeppe Johansen
mov r2,r1,lsr #5
mov r12,#1
ldr r3,[r0, r2, lsl #2]!
orr r2,r3,r12,lsl r1
str r2,[r0]
and r0,r12,r3,lsr r1
It's one cycle faster than what the compiler can generate due to it not
doing the pre-indexed writeback optimization when the address
calculation has shifts.
Given that this code will be in an non-inlinable routine (we can't
inline routines with inline assembler), the Pascal version is probably
faster then (since you won't have the call/return overhead).


Jonas

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Loading...