Discussion:
Maximum symbol length
(too old to reply)
B***@blaise.ru
2018-06-15 08:54:24 UTC
Permalink
Raw Message
What is the official maximum symbol length? If it is 255, as the definition "TSymStr = ShortString" suggests, then FPC does not always take proper care of it.
In particular, TGNUAssembler.WriteTree has plenty of concatenations similar to this one
writer.AsmWriteLn(tai_symbol(hp).sym.name + ':')
which result in corrupt asm files.

(I am bumping into this bug in practice, with closures.)

I presume, those concatenations should be converted into chains of invocations. This should also make things faster for "symansistr" case.

Side question: I see that the define "symansistr" is used for JVM and LLVM. What was the rationale?
--
βþ
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fp
B***@blaise.ru
2018-06-22 15:19:21 UTC
Permalink
Raw Message
Post by B***@blaise.ru
What is the official maximum symbol length? If it is 255, as the definition "TSymStr = ShortString" suggests, then FPC does not always take proper care of it.
In particular, TGNUAssembler.WriteTree has plenty of concatenations similar to this one
    writer.AsmWriteLn(tai_symbol(hp).sym.name + ':')
which result in corrupt asm files.
(I am bumping into this bug in practice, with closures.)
I presume, those concatenations should be converted into chains of invocations. This should also make things faster for "symansistr" case.
Side question: I see that the define "symansistr" is used for JVM and LLVM. What was the rationale?
Anyone?

In case I was not clear enough: when I asked should "those concatenations be converted into chains of invocations", I meant "will a patch with such fixes be accepted", or is there a recommended solution for these bugs?

If such concatenations are somehow dear to someone, I could imagine a solution where the actual limit is defined to be, say, 250. But then, still, care would need to be taken not to overconcatenate in lines like
writer.AsmWriteLn('.quad .' + tai_symbol(hp).sym.name + ', ***@tocbase, 0');

Also, I need the definitive answer on the limit to properly shorten symbols. Right now, it seems there are places where some heuristics are applied that do not guarantee proper results (i.e. symbols may be cut, not shortened -- even before they reach the above buggy output code).
--
βþ
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailma
Jonas Maebe
2018-06-22 18:42:24 UTC
Permalink
Raw Message
Post by B***@blaise.ru
Post by B***@blaise.ru
Side question: I see that the define "symansistr" is used for JVM and
LLVM. What was the rationale?
Anyone?
The rationale for the above is that they need symbols that are longer
than 255 characters. The reason the rest uses shortstrings is this is
assumed to be faster.
Post by B***@blaise.ru
In case I was not clear enough: when I asked should "those
concatenations be converted into chains of invocations", I meant "will a
patch with such fixes be accepted", or is there a recommended solution
for these bugs?
I would propose to switch all targets to use use ansistrings for symbol
names.


Jonas
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://l
B***@blaise.ru
2018-06-22 20:49:10 UTC
Permalink
Raw Message
The rationale for the above is that they need symbols that are longer than 255 characters.
And such symbols could not be shortened by hashing heads or tails?
The reason the rest uses shortstrings is this is assumed to be faster.
I see that overloaded methods taking ShortString are not always provided (supposedly, for the new stuff implemented only in ansistrings). Hello, hidden conversions; bye-bye to performance :)
I would propose to switch all targets to use use ansistrings for symbol names.
Is this the consensus?

Personally, if I had any stake in this, I would be against it. I mean, FPC is already slower than DCC.

In my book, proper ShortString plumbing is not that hard, and I am willing to fix the code I stumble upon. And properly trim and hash symbols exceeding the limit.
--
βþ
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.fr
Jonas Maebe
2018-06-23 08:48:22 UTC
Permalink
Raw Message
Post by B***@blaise.ru
Post by Jonas Maebe
The rationale for the above is that they need symbols that are longer
than 255 characters.
And such symbols could not be shortened by hashing heads or tails?
The type information must be fully encoded in the symbol names on those
platforms. You could of course rewrite the assembler symbol dictionary
and assembler writers to keep that type information and the base name
separate, and then write them piece by piece every time they're needed.
That would just complicate things though.
Post by B***@blaise.ru
Post by Jonas Maebe
I would propose to switch all targets to use use ansistrings for symbol names.
Is this the consensus?
Personally, if I had any stake in this, I would be against it. I mean,
FPC is already slower than DCC.
I doubt this is a major contributor to that fact (especially since
implicit exception frames are disabled for the compiler binary, so
ansistrings don't result in extra exception frames). Additionally, this
hashing makes it impossible to provide debuggers with a function to
reverse-map function symbol names onto class/method/type-overload, which
is a pain.

In theory, you could probably add support to debuggers to ignore the
symbol names and have them concatenate the class name, method name, and
parameter types, reproducing all the same hashing done by the compiler,
but in general debuggers don't do this for performance reasons (so you
can set breakpoints without parsing the debug information of the entire
binary up front).


Jonas
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mail
Sven Barth via fpc-devel
2018-06-23 09:19:32 UTC
Permalink
Raw Message
Post by Jonas Maebe
Post by B***@blaise.ru
I would propose to switch all targets to use use ansistrings for symbol names.
Is this the consensus?
Personally, if I had any stake in this, I would be against it. I
mean, FPC is already slower than DCC.
I doubt this is a major contributor to that fact (especially since
implicit exception frames are disabled for the compiler binary, so
ansistrings don't result in extra exception frames). Additionally,
this hashing makes it impossible to provide debuggers with a function
to reverse-map function symbol names onto class/method/type-overload,
which is a pain.
In theory, you could probably add support to debuggers to ignore the
symbol names and have them concatenate the class name, method name,
and parameter types, reproducing all the same hashing done by the
compiler, but in general debuggers don't do this for performance
reasons (so you can set breakpoints without parsing the debug
information of the entire binary up front).
But aren't there output formats that do have length restrictions for
symbol names? I take it that ELF and PE/COFF won't be problematic, but
what about those used for OS/2, DOS, etc.?

Regards,
Sven
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepa
Jonas Maebe
2018-06-23 09:24:38 UTC
Permalink
Raw Message
Post by Sven Barth via fpc-devel
But aren't there output formats that do have length restrictions for
symbol names? I take it that ELF and PE/COFF won't be problematic, but
what about those used for OS/2, DOS, etc.?
If needed somewhere, symbol names could be shortened in the respective
assembler writers. They should be the exception rather than the rule.


Jonas
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freep
n***@gmail.com
2018-06-23 14:31:53 UTC
Permalink
Raw Message
Post by Sven Barth via fpc-devel
Post by Jonas Maebe
Post by B***@blaise.ru
Post by Jonas Maebe
I would propose to switch all targets to use use ansistrings
for
symbol names.
Is this the consensus?
Personally, if I had any stake in this, I would be against it. I
mean, FPC is already slower than DCC.
I doubt this is a major contributor to that fact (especially since
implicit exception frames are disabled for the compiler binary, so
ansistrings don't result in extra exception frames). Additionally,
this hashing makes it impossible to provide debuggers with a
function
to reverse-map function symbol names onto class/method/type-
overload,
which is a pain.
In theory, you could probably add support to debuggers to ignore the
symbol names and have them concatenate the class name, method name,
and parameter types, reproducing all the same hashing done by the
compiler, but in general debuggers don't do this for performance
reasons (so you can set breakpoints without parsing the debug
information of the entire binary up front).
But aren't there output formats that do have length restrictions for
symbol names? I take it that ELF and PE/COFF won't be problematic, but
what about those used for OS/2, DOS, etc.?
The OMF object format (used by DOS and Win16) has a limit of 255
characters for symbol names. But since we have an internal linker for
i8086-msdos, we can invent an extension to the format, that allows for
longer names. With some additional work, it can even be made backward
compatible.

Nikolay
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/c
Florian Klaempfl
2018-07-01 18:27:45 UTC
Permalink
Raw Message
Post by Jonas Maebe
Post by B***@blaise.ru
Personally, if I had any stake in this, I would be against it. I mean,
FPC is already slower than DCC.
I doubt this is a major contributor to that fact (especially since
implicit exception frames are disabled for the compiler binary, so
ansistrings don't result in extra exception frames).
I tested on x86_64-linux and the increase is around 10 % for make cycle.
So it cannot be neglected imo.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.f
Jonas Maebe
2018-07-01 18:49:37 UTC
Permalink
Raw Message
Post by Florian Klaempfl
Post by Jonas Maebe
I doubt this is a major contributor to that fact (especially since
implicit exception frames are disabled for the compiler binary, so
ansistrings don't result in extra exception frames).
I tested on x86_64-linux and the increase is around 10 % for make cycle.
So it cannot be neglected imo.
Many compiler helpers still use "string" or "shortstring"
parameters/results/local variables, so there are still quite a few type
conversions going on from shortstring to ansistring.


Jonas
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listi
Florian Klaempfl
2018-07-01 19:17:57 UTC
Permalink
Raw Message
Post by Jonas Maebe
Post by Florian Klaempfl
Post by Jonas Maebe
I doubt this is a major contributor to that fact (especially since
implicit exception frames are disabled for the compiler binary, so
ansistrings don't result in extra exception frames).
I tested on x86_64-linux and the increase is around 10 % for make cycle.
So it cannot be neglected imo.
Many compiler helpers still use "string" or "shortstring"
parameters/results/local variables, so there are still quite a few type
conversions going on from shortstring to ansistring.
Yes, but this needs to be profiled in detail.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://li

Maciej Izak
2018-06-22 18:46:38 UTC
Permalink
Raw Message
Post by Jonas Maebe
I would propose to switch all targets to use use ansistrings for symbol
names.
+1

I was asking for this some time ago in core mailing list... As temporary
solution compiler can be compiled with

OPT="-dsymansistr"

Even without something special like closures FPC is unable to link many of
my Delphi code :

===begin example===
unit X2.Logic.Aggregates.PXI.Proxy.Implementations;

...

interface

type
TPXI_Proxy_TGlobalVariablesAggregate = class(TPXI_Proxy,
IPXI_Proxy_TGlobalVariablesAggregate)
protected

function GetOPERATOR_LOGIN: PUTF8Char; CDecl;
===begin example===

this is because generated symbol is longer than 255 chars :

WRPR_$X2.LOGIC.AGGREGATES.PXI.PROXY.IMPLEMENTATIONS_$$_TPXI_PROXY_TGLOBALVARIABLESAGGREGATE_$_IPXI_PROXY_TGLOBALVARIABLESAGGREGATE_$_8_$_X2.LOGIC.AGGREGATES.PXI.PROXY.IMPLEMENTATIONS$_$TPXI_PROXY_TGLOBALVARIABLESAGGREGATE_$__$$_GETOPERATOR_LOGIN$$PUTF8CHA
--
Best regards,
Maciej Izak
Loading...