Discussion:
UTF-8 string literals
Add Reply
Mattias Gaertner
2017-05-05 11:53:35 UTC
Reply
Permalink
Raw Message
Hi,

AFAIK FPC stores UTF-8 string literals (-Fcutf8) as widestrings
instead of UTF8String. Please correct me if I'm wrong.

This has several side effects:

1. When using a character outside BMP FPC stops with:
Error: UTF-8 code greater than 65535 found
For example:
const Eyes = '👀';

2. Assigning a UTF-8 literal to an UTF8String requires a
widestringmanager.
For example non ISO-8859-1 chars are mangled:
var u: UTF8String = 'äöüالعَرَبِيَّة';

3. PChar on a string literal does not work as expected. You get the
bytes of a widestring instead.


What would happen if FPC would be extended to store UTF-8
literals as UTF8String?
What are the disadvantages?


Mattias
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://l
Michael Van Canneyt
2017-05-05 12:30:32 UTC
Reply
Permalink
Raw Message
Post by Mattias Gaertner
Hi,
AFAIK FPC stores UTF-8 string literals (-Fcutf8) as widestrings
instead of UTF8String. Please correct me if I'm wrong.
Error: UTF-8 code greater than 65535 found
const Eyes = '👀';
2. Assigning a UTF-8 literal to an UTF8String requires a
widestringmanager.
var u: UTF8String = 'ÀöÌالعَرَؚِيَّة';
I assume you mean UTF-16 literal ?
Post by Mattias Gaertner
3. PChar on a string literal does not work as expected. You get the
bytes of a widestring instead.
You should weigh the advantages you outline here against the disadvantages of
no longer knowing how string literals will be encoded.

It means e.g. the resource string tables will have entries that are UTF16 encoded
or entries that are UTF8 encoded, depending on the unit they come from.
This is highly undesirable.

By forcing everything UTF16 we ensure delphi compatibility (yes it does matter)
and we also ensure a uniform set of string tables.

Michael.
Mattias Gaertner
2017-05-05 12:57:01 UTC
Reply
Permalink
Raw Message
On Fri, 5 May 2017 14:30:32 +0200 (CEST)
Post by Michael Van Canneyt
[...]
Post by Mattias Gaertner
AFAIK FPC stores UTF-8 string literals (-Fcutf8) as widestrings
instead of UTF8String. Please correct me if I'm wrong.
Error: UTF-8 code greater than 65535 found
const Eyes = '👀';
2. Assigning a UTF-8 literal to an UTF8String requires a
widestringmanager.
var u: UTF8String = 'äöüالعَرَبِيَّة';
I assume you mean UTF-16 literal ?
Huh? The codepage is utf-8, the string type is utf-8, FPC stores UCS-2,
why do you ask about UTF-16?
Post by Michael Van Canneyt
Post by Mattias Gaertner
3. PChar on a string literal does not work as expected. You get the
bytes of a widestring instead.
You should weigh the advantages you outline here against the disadvantages of
no longer knowing how string literals will be encoded.
At the moment string literals are encoded in two different ways
depending on codepage, character values, literal format and probably
some more attributes I don't know. That often confuses users. IMO it
would be less confusing if matching string type and codepage would work
without conversion.
Post by Michael Van Canneyt
It means e.g. the resource string tables will have entries that are UTF16 encoded
or entries that are UTF8 encoded, depending on the unit they come from.
This is highly undesirable.
Ehm, the compiled-in resourcestring tables are AnsiString.
AFAIK you need the UTF-8 system codepage to use the full UTF-16
capabilities of the rsj files.
Post by Michael Van Canneyt
By forcing everything UTF16 we ensure delphi compatibility (yes it does matter)
and we also ensure a uniform set of string tables.
It will be a glory day, when this is accomplished.
But some people can't wait that long.

Mattias
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-dev
Michael Van Canneyt
2017-05-05 13:55:32 UTC
Reply
Permalink
Raw Message
Post by Mattias Gaertner
On Fri, 5 May 2017 14:30:32 +0200 (CEST)
[...]
Post by Mattias Gaertner
AFAIK FPC stores UTF-8 string literals (-Fcutf8) as widestrings
instead of UTF8String. Please correct me if I'm wrong.
To make sure I was presenting correct facts, I did some tests.

As a result of the tests, I think the above statement is wrong.

{$codepage utf8}

var
p : pchar;

begin
P:=Pchar('some string literal');
end.

Results in the following assembler:

.globl _$PROGRAM$_Ld1
_$PROGRAM$_Ld1:
.ascii "some string literal\000"
.Le11:

Not widestring as far as I can see ?

To be sure, I added some russian characters:

.Ld1:
.ascii "some string literal \320\272\320\270\321\202\320\260"
.ascii "\320\271\321\201\320\272\320\276\320\263\320\276\000"

Again, not widestring ?

home: >cat u.pp
{$codepage utf8}
var
p : pchar;

begin
P:=Pchar('some string literal кОтайскПгП');
end.

So, I tried a resourcestring:


.Ld3$strlab:
.short 65001,1
.long 0
.quad -1,30
.Ld3:
.ascii "some more \320\272\320\270\321\202\320\260\320\271\321"
.ascii "\201\320\272\320\276\320\263\320\276\000"

Again, no widestring, as far as I can see.

Michael.
Mattias Gaertner
2017-05-05 14:13:12 UTC
Reply
Permalink
Raw Message
On Fri, 5 May 2017 15:55:32 +0200 (CEST)
Post by Michael Van Canneyt
Post by Mattias Gaertner
On Fri, 5 May 2017 14:30:32 +0200 (CEST)
[...]
Post by Mattias Gaertner
AFAIK FPC stores UTF-8 string literals (-Fcutf8) as widestrings
instead of UTF8String. Please correct me if I'm wrong.
To make sure I was presenting correct facts, I did some tests.
As a result of the tests, I think the above statement is wrong.
Naah, not wrong, just a non precise term "UTF-8 string literal". ;)

ASCII is stored by FPC as 8-bit string. No problem with that.
The interesting part are the non ASCII strings. Try your code with
the string examples I gave.

Mattias
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mai
Michael Van Canneyt
2017-05-05 14:14:22 UTC
Reply
Permalink
Raw Message
Post by Mattias Gaertner
On Fri, 5 May 2017 15:55:32 +0200 (CEST)
Post by Michael Van Canneyt
Post by Mattias Gaertner
On Fri, 5 May 2017 14:30:32 +0200 (CEST)
[...]
Post by Mattias Gaertner
AFAIK FPC stores UTF-8 string literals (-Fcutf8) as widestrings
instead of UTF8String. Please correct me if I'm wrong.
To make sure I was presenting correct facts, I did some tests.
As a result of the tests, I think the above statement is wrong.
Naah, not wrong, just a non precise term "UTF-8 string literal". ;)
ASCII is stored by FPC as 8-bit string. No problem with that.
The interesting part are the non ASCII strings. Try your code with
the string examples I gave.
I used non-ascii. Did you not see the russian characters ?

Michael.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepasc
Sven Barth via fpc-devel
2017-05-05 14:16:18 UTC
Reply
Permalink
Raw Message
Post by Michael Van Canneyt
Post by Mattias Gaertner
On Fri, 5 May 2017 14:30:32 +0200 (CEST)
[...]
Post by Mattias Gaertner
AFAIK FPC stores UTF-8 string literals (-Fcutf8) as widestrings
instead of UTF8String. Please correct me if I'm wrong.
To make sure I was presenting correct facts, I did some tests.
As a result of the tests, I think the above statement is wrong.
In all three cases you are either explicitly or implicitly forcing the
compiler to convert it to Ansi/UTF-8 and since it's a constant it takes a
compiletime shortcut.
If you'd do a Writeln without the typecast then it will be a UTF-16
constant that is stored in the binary *if* the string contains a character
Post by Michael Van Canneyt
$7F.
Regards,
Sven
Michael Van Canneyt
2017-05-05 14:22:18 UTC
Reply
Permalink
Raw Message
Post by Sven Barth via fpc-devel
Post by Michael Van Canneyt
Post by Mattias Gaertner
On Fri, 5 May 2017 14:30:32 +0200 (CEST)
[...]
Post by Mattias Gaertner
AFAIK FPC stores UTF-8 string literals (-Fcutf8) as widestrings
instead of UTF8String. Please correct me if I'm wrong.
To make sure I was presenting correct facts, I did some tests.
As a result of the tests, I think the above statement is wrong.
In all three cases you are either explicitly or implicitly forcing the
compiler to convert it to Ansi/UTF-8 and since it's a constant it takes a
compiletime shortcut.
That was on purpose because Mattias' example on the Lazarus list required
this. The point was that PChar() is not usable on string literals.

See also his initial mail, which contains the statement:

"3. PChar on a string literal does not work as expected. You get the
bytes of a widestring instead."

So, I did a typecast. (even though I think it is horrible code).
Post by Sven Barth via fpc-devel
If you'd do a Writeln without the typecast then it will be a UTF-16
constant that is stored in the binary *if* the string contains a character
Post by Michael Van Canneyt
$7F.
Well, at least now I understand very well why people find it confusing :-)

I think we'll need a comprehensive table in the documentation.
Can this be produced somehow ?

Michael.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-b
Martok
2017-05-05 13:20:00 UTC
Reply
Permalink
Raw Message
Post by Michael Van Canneyt
You should weigh the advantages you outline here against the disadvantages of
no longer knowing how string literals will be encoded.
As a programmer, either I don't want to know (declared const without giving
explicit type) or I do, then I did declare it correctly:

{$codepage utf8}
var u: UTF8String = 'äöüالعَرَبِيَّة';
-> UTF8String containing the characters I entered in the source file (in this
case(!!) just 1:1 copy).

{$codepage utf8}
var u: UCS4String= 'äöü';
-> UCS4 encoded Version, either 000000e4 000000f6 000000fc or the equivalent
with combining characters

There should probably be an error if the characters I typed don't actually exist
in the declared type (emoji in an UCS2String), but otherwise, there's no good
reason why that shouldn't "just work".
Post by Michael Van Canneyt
It means e.g. the resource string tables will have entries that are UTF16 encoded
or entries that are UTF8 encoded, depending on the unit they come from.
This is highly undesirable.
Always convert from "unit CP" to UTF8 (or UTF16 if some binary compat is
required), done. Aren't they just internal anyway?
Post by Michael Van Canneyt
By forcing everything UTF16 we ensure delphi compatibility (yes it does matter)
and we also ensure a uniform set of string tables.
If that was what happened, ok. But from the error message Matthias listed as (1)
I would assume that the actual string type is UCS2String, at least at some point
in the process.

Just my 2 cents...

Martok

PS: adding to the discussion over on the Lazarus ML: I just found a fourth wiki
page describing a slightly different Unicode support. This is getting ridiculous.


_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/
Sven Barth via fpc-devel
2017-05-06 07:39:11 UTC
Reply
Permalink
Raw Message
Post by Martok
PS: adding to the discussion over on the Lazarus ML: I just found a fourth wiki
page describing a slightly different Unicode support. This is getting ridiculous.
That might be the one from Michael Schnell. Probably it should be marked
with a big, fat warning that it's merely a user's suggestion and nothing
official.

Regards,
Sven
Martok
2017-05-06 12:51:30 UTC
Reply
Permalink
Raw Message
That might be the one from Michael Schnell. Probably it should be marked with a
big, fat warning that it's merely a user's suggestion and nothing official.
Not even that. This one looks relatively obvious to me ;)

I've filed a bug as <https://bugs.freepascal.org/view.php?id=31758> for reference.


Martok

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailma
Michael Schnell
2017-05-09 11:02:41 UTC
Reply
Permalink
Raw Message
Post by Sven Barth via fpc-devel
That might be the one from Michael Schnell. Probably it should be
marked with a big, fat warning that it's merely a user's suggestion
and nothing official.
I hope it is absolutely clear in the text that this is only a suggestion
and not something that is real (or will be real in the near future).

-Michael

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/
Michael Schnell
2017-05-09 12:59:16 UTC
Reply
Permalink
Raw Message
Post by Sven Barth via fpc-devel
That might be the one from Michael Schnell.
Very unlikely, as this text does not mention anything about how a source
file byte sequence is converted in a String constant / literal.

-Michael
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listin
Mattias Gaertner
2017-05-10 06:38:28 UTC
Reply
Permalink
Raw Message
On Tue, 9 May 2017 14:59:16 +0200
Post by Michael Schnell
Post by Sven Barth via fpc-devel
That might be the one from Michael Schnell.
Very unlikely, as this text does not mention anything about how a source
file byte sequence is converted in a String constant / literal.
I think he meant this one:
http://wiki.lazarus.freepascal.org/index.php?title=not_Delphi_compatible_enhancement_for_Unicode_Support&action=history

I thought Mschnell is Michael Schnell. Was this wrong?

Mattias
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman
Martok
2017-05-10 16:00:38 UTC
Reply
Permalink
Raw Message
That's the one I also think Sven was talking about.
I just searched for "Unicode". Michael's proposal comes up, but I guess the
title is fairly obvious.


But apparently everything is rainbows and unicorns and there is absolutely no
problem with the documentation at all, so I guess this week-long discussion here
never happened anyway.


Martok
Post by Mattias Gaertner
On Tue, 9 May 2017 14:59:16 +0200
Post by Michael Schnell
Post by Sven Barth via fpc-devel
That might be the one from Michael Schnell.
Very unlikely, as this text does not mention anything about how a source
file byte sequence is converted in a String constant / literal.
http://wiki.lazarus.freepascal.org/index.php?title=not_Delphi_compatible_enhancement_for_Unicode_Support&action=history
I thought Mschnell is Michael Schnell. Was this wrong?
Mattias
_______________________________________________
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freep
Michael Van Canneyt
2017-05-11 08:03:58 UTC
Reply
Permalink
Raw Message
Post by Martok
But apparently everything is rainbows and unicorns and there is absolutely no
problem with the documentation at all, so I guess this week-long discussion here
never happened anyway.
This is not quite correct. I have proposed to add a table to the official
documentation, documenting in the programmer's guide how the strings are
stored.

But I personally lack the information to put in this table, and I am waiting
for input from others. I have put in a mail what I know, it needs to be
amended by the compiler people.

Michael.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://l
Juha Manninen
2017-05-11 11:52:37 UTC
Reply
Permalink
Raw Message
Post by Martok
I just searched for "Unicode".
I wanted to delete the old page
http://wiki.freepascal.org/LCL_Unicode_Support
completely but I don't know how to do it so I just made it empty.
Anybody knows how to delete it?

I also renamed the "Better Unicode Support ..." page to "Unicode Support ...".
http://wiki.freepascal.org/Unicode_Support_in_Lazarus
I am now improving and simplifying it.
I try to concentrate on how to code in a Delphi compatible way.

Martok, please take care of the other pages you found. Mark them as
invalid or deprecated, delete wrong info, rename them ... whatever. Be
creative.
FYI, the wiki can be edited by anybody.

Juha
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-b
Mattias Gaertner
2017-05-11 11:59:52 UTC
Reply
Permalink
Raw Message
On Thu, 11 May 2017 14:52:37 +0300
Post by Juha Manninen
Post by Martok
I just searched for "Unicode".
I wanted to delete the old page
http://wiki.freepascal.org/LCL_Unicode_Support
completely but I don't know how to do it so I just made it empty.
Anybody knows how to delete it?
Don't delete a page. Add a hint where the new content is.

Mattias
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.

Juha Manninen
2017-05-05 14:03:23 UTC
Reply
Permalink
Raw Message
On Fri, May 5, 2017 at 2:53 PM, Mattias Gaertner
Post by Mattias Gaertner
Error: UTF-8 code greater than 65535 found
const Eyes = '👀';
I copy a related post from Lazarus list by myself and Sven Barth.
It belongs here:

On Fri, May 5, 2017 at 3:56 PM, Sven Barth via Lazarus
Post by Mattias Gaertner
That is mainly due to the compiler not supporting surrogate pairs for the
UTF-8 -> UTF-16 conversion. If it would support them, then there wouldn't be
a problem anymore...
That is a serious bug. Getting codepoints right is the absolute
minimum requirement for Unicode support. Surrogate pairs are the
UTF-16 equivalent of multi-byte codepoints in UTF-8.

Now I understand this was not caused by our UTF-8 run-time switch
"hack". It is a plain bug in FPC.
Is there a plan to fix it?

Juha
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org
Michael Van Canneyt
2017-05-05 14:05:28 UTC
Reply
Permalink
Raw Message
Post by Juha Manninen
On Fri, May 5, 2017 at 2:53 PM, Mattias Gaertner
Post by Mattias Gaertner
Error: UTF-8 code greater than 65535 found
const Eyes = '👀';
I copy a related post from Lazarus list by myself and Sven Barth.
On Fri, May 5, 2017 at 3:56 PM, Sven Barth via Lazarus
Post by Mattias Gaertner
That is mainly due to the compiler not supporting surrogate pairs for the
UTF-8 -> UTF-16 conversion. If it would support them, then there wouldn't be
a problem anymore...
That is a serious bug. Getting codepoints right is the absolute
minimum requirement for Unicode support. Surrogate pairs are the
UTF-16 equivalent of multi-byte codepoints in UTF-8.
Now I understand this was not caused by our UTF-8 run-time switch
"hack". It is a plain bug in FPC.
Is there a plan to fix it?
Incomplete UTF-16 support is a bug. Bugs should always be fixed?

Michael.
Sven Barth via fpc-devel
2017-05-05 14:08:41 UTC
Reply
Permalink
Raw Message
Post by Juha Manninen
On Fri, May 5, 2017 at 2:53 PM, Mattias Gaertner
Post by Mattias Gaertner
Error: UTF-8 code greater than 65535 found
const Eyes = '👀';
I copy a related post from Lazarus list by myself and Sven Barth.
On Fri, May 5, 2017 at 3:56 PM, Sven Barth via Lazarus
Post by Mattias Gaertner
That is mainly due to the compiler not supporting surrogate pairs for the
UTF-8 -> UTF-16 conversion. If it would support them, then there wouldn't be
a problem anymore...
That is a serious bug. Getting codepoints right is the absolute
minimum requirement for Unicode support. Surrogate pairs are the
UTF-16 equivalent of multi-byte codepoints in UTF-8.
Now I understand this was not caused by our UTF-8 run-time switch
"hack". It is a plain bug in FPC.
Is there a plan to fix it?
Now it is fixed :D (revision 36116; maybe we should merge that to fixes
once I or someone else tested a big endian target)

Regards,
Sven
Mattias Gaertner
2017-05-06 17:01:14 UTC
Reply
Permalink
Raw Message
On Fri, 5 May 2017 16:08:41 +0200
Post by Sven Barth via fpc-devel
[...]
Now it is fixed :D (revision 36116; maybe we should merge that to fixes
once I or someone else tested a big endian target)
Thank You!

Mattias
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailma
Sven Barth via fpc-devel
2017-05-07 10:13:18 UTC
Reply
Permalink
Raw Message
Post by Sven Barth via fpc-devel
Post by Juha Manninen
On Fri, May 5, 2017 at 2:53 PM, Mattias Gaertner
Post by Mattias Gaertner
Error: UTF-8 code greater than 65535 found
const Eyes = '👀';
I copy a related post from Lazarus list by myself and Sven Barth.
On Fri, May 5, 2017 at 3:56 PM, Sven Barth via Lazarus
Post by Mattias Gaertner
That is mainly due to the compiler not supporting surrogate pairs for the
UTF-8 -> UTF-16 conversion. If it would support them, then there wouldn't be
a problem anymore...
That is a serious bug. Getting codepoints right is the absolute
minimum requirement for Unicode support. Surrogate pairs are the
UTF-16 equivalent of multi-byte codepoints in UTF-8.
Now I understand this was not caused by our UTF-8 run-time switch
"hack". It is a plain bug in FPC.
Is there a plan to fix it?
Now it is fixed :D (revision 36116; maybe we should merge that to fixes
once I or someone else tested a big endian target)

Okay, it works correctly on big endian targets as well (and Mac OS X 10.4
even has valid characters for the console to test with :D ). Thus this
change could be merged to 3.0.3.

Regards,
Sven
Florian Klaempfl
2017-05-07 08:27:58 UTC
Reply
Permalink
Raw Message
Post by Mattias Gaertner
Hi,
AFAIK FPC stores UTF-8 string literals (-Fcutf8)
-Fc tells the compiler only the encoding of the source code page, it
says nothing how string constant shall be encoded.
Post by Mattias Gaertner
as widestrings
instead of UTF8String. Please correct me if I'm wrong.
Error: UTF-8 code greater than 65535 found
const Eyes = '👀';
2. Assigning a UTF-8 literal to an UTF8String requires a
widestringmanager.
var u: UTF8String = 'äöüالعَرَبِيَّة';
3. PChar on a string literal does not work as expected. You get the
bytes of a widestring instead.
Well, it depends on what you expect :)
Post by Mattias Gaertner
What would happen if FPC would be extended to store UTF-8
literals as UTF8String?
What are the disadvantages?
1. Backward compatibility. Due to its windows origins and history, the
default unicode encoding in FPC is UTF-16, FPC uses also internally
UTF-16 everywhere.

2. What would happen then the other way around? When casting the string
constant to a PUnicodeChar (what probably a lot of delphi code does)?

3. Personally, I still think, UTF-16 is the "native" unicode type: all
important APIs use UTF-16, for me, UTF-8 is a hack.

What we could do of course is, that if a constant is assigned to a
string with explicit utf-8 encoding, that the compiler does the
conversion at run time. But it complicates things even more. This does
not solve the PChar problem, but I think, when somebody uses unicode
source files and PChar, he is on how own :)

I think, it would nice if Michael (v. C.) prepares some section for the
docs and we comment and help him to improve it.

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi
Mattias Gaertner
2017-05-07 08:43:41 UTC
Reply
Permalink
Raw Message
On Sun, 7 May 2017 10:27:58 +0200
Post by Florian Klaempfl
[...]
2. What would happen then the other way around? When casting the string
constant to a PUnicodeChar (what probably a lot of delphi code does)?
Good point.
Post by Florian Klaempfl
[...]
I think, it would nice if Michael (v. C.) prepares some section for the
docs and we comment and help him to improve it.
That would be highly appreciated.

Mattias
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailm
Michael Van Canneyt
2017-05-07 09:41:37 UTC
Reply
Permalink
Raw Message
Post by Mattias Gaertner
On Sun, 7 May 2017 10:27:58 +0200
Post by Florian Klaempfl
[...]
2. What would happen then the other way around? When casting the string
constant to a PUnicodeChar (what probably a lot of delphi code does)?
Good point.
Post by Florian Klaempfl
[...]
I think, it would nice if Michael (v. C.) prepares some section for the
docs and we comment and help him to improve it.
That would be highly appreciated.
I would be glad to do so, but I need something to start with.

In my reply to Sven I asked if a set of rules exist.

As far as I understand:

- By default, strings are stored internally as UTF-16.
- Unless it is an ascii string, in which case it is stored as plain ascii
- In special cases such as a typecast, the compiler stores them as UTF8 ?

A bit shallow...


Michael.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http:/
Marco van de Voort
2017-05-07 12:16:02 UTC
Reply
Permalink
Raw Message
Post by Sven Barth via fpc-devel
Post by Sven Barth via fpc-devel
Post by Juha Manninen
Is there a plan to fix it?
Now it is fixed :D (revision 36116; maybe we should merge that to fixes
once I or someone else tested a big endian target)
Okay, it works correctly on big endian targets as well (and Mac OS X 10.4
even has valid characters for the console to test with :D ). Thus this
change could be merged to 3.0.3.
Done.
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo
Sven Barth via fpc-devel
2017-05-07 15:55:12 UTC
Reply
Permalink
Raw Message
Post by Sven Barth via fpc-devel
Post by Sven Barth via fpc-devel
Post by Juha Manninen
Is there a plan to fix it?
Now it is fixed :D (revision 36116; maybe we should merge that to fixes
once I or someone else tested a big endian target)
Okay, it works correctly on big endian targets as well (and Mac OS X 10.4
even has valid characters for the console to test with :D ). Thus this
change could be merged to 3.0.3.
Done.
Thanks :)

Regards,
Sven

_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.or
Loading...