Discussion:
TBufferedFileStream
(too old to reply)
José Mejuto
2016-09-03 20:08:16 UTC
Permalink
Raw Message
Hello,

Added to the bugtracker implementation of TBufferedFileStream, there is
one compatibility issue note in the bug entry.

http://bugs.freepascal.org/view.php?id=30549
--
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Michael Van Canneyt
2016-09-03 20:15:46 UTC
Permalink
Raw Message
Post by José Mejuto
Hello,
Added to the bugtracker implementation of TBufferedFileStream, there is one
compatibility issue note in the bug entry.
http://bugs.freepascal.org/view.php?id=30549
Nice. I have assigned this to myself.

Out of curiosity:
Why do you think this is needed ?
Is the caching provided by most of the operating systems not enough ?

Michael.
José Mejuto
2016-09-03 20:28:16 UTC
Permalink
Raw Message
Post by Michael Van Canneyt
Post by José Mejuto
Added to the bugtracker implementation of TBufferedFileStream, there
is one compatibility issue note in the bug entry.
http://bugs.freepascal.org/view.php?id=30549
Nice. I have assigned this to myself.
Why do you think this is needed ? Is the caching provided by most of the
operating systems not enough ?
Hello,

1) Delphi compatibility (It is available in Delphi).

2) TFileStream basically is a frontend for FileRead and FileWrite
(Talking about Windows) and once you invoke one of those you enter in
multithread contention, thread swapping, and maybe kernel mode (I'm not
sure) so if you read or write small pieces of data from files the
performance is very poor. Typical example is TFileStream.GetByte which
I'm using in a parser, because the file could be very big, and I don't
want to read everything in a TMemoryStream, performance of
cache/buffered and regular is dramatically different:

----------------------------------------------
CACHE 1000000 byte sequential reads in 46 ms.
FILE 1000000 byte sequential reads in 2200 ms.
----------------------------------------------

And this result is with the system cache hot (in fact the full file is
in system cache).

In the other hand a cache system is powerful than a buffered system if
you are writing something like a filesystem over a TFileStream where you
may need to jump here and there to read data, file allocation tables,
attributes, and so on, in this case a cache with multiple pages instead
just one improves the result (avoid multiple reads in the same zones).
--
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
silvioprog
2016-09-03 22:23:58 UTC
Permalink
Raw Message
On Sat, Sep 3, 2016 at 5:28 PM, José Mejuto <***@gmail.com> wrote:
[...]
... Typical example is TFileStream.GetByte which I'm using in a parser,
because the file could be very big, and I don't want to read everything in
a TMemoryStream, performance of cache/buffered and regular is dramatically
----------------------------------------------
CACHE 1000000 byte sequential reads in 46 ms.
FILE 1000000 byte sequential reads in 2200 ms.
----------------------------------------------
[...]

Did you get this result from some sample? If so, could you share it?! I
would be glad to test it checking the TBufferedFileStream performance. :-)
--
Silvio Clécio
José Mejuto
2016-09-03 23:20:58 UTC
Permalink
Raw Message
Post by José Mejuto
----------------------------------------------
CACHE 1000000 byte sequential reads in 46 ms.
FILE 1000000 byte sequential reads in 2200 ms.
----------------------------------------------
[...]
Did you get this result from some sample? If so, could you share it?! I
would be glad to test it checking the TBufferedFileStream performance. :-)
Hello,

Yes, the code is in the test for TBufferedFileStream in the bug ticket

http://bugs.freepascal.org/view.php?id=30549

Delphi TBufferedFileStream should be faster as I think it implementes a
dumb buffer, or just better a read ahead buffer, so it works better for
sequential read and worst for butterfly reads on same zones.

For my works I just use my class TCacheStream which applies the same
cache code but over any stream already created. I need this as a stream
filter because I'm working on virtual file systems which access a stream
(whichever class) as a blocks device stream, so I can in example work
with ZIP files as they was a disk with functions to browse the ZIP
entries, create files, delete, read and write, once each "file" is
closed and the stream that holds the whole file is freed in destroy it
updates the zip according, compressing new and modified streams and
copying/moving the blocks not touched.

Part of this code is in the Excel reader/writer in fpspreadsheet package.

Most of this classes are mostly beta versions, operative in my
environment but not valid for wide use, that's the reason I had not
published them (and the lack of comments, and code convolution).

Currently I have, more or less:

Native filesystem: Read/Write. Maps the native filesystem to my virtual
filesystem.

ZIP filesystem: Read/Write. Up to 2GB zip files.

FAT16: Read. Typical dd images.

FAT32: Read. Typical dd images.

Microsoft Binary Compound: Read/Write with limitations. Used in XLS, DOC,...

Sample filesystem: Read/Write. Very limited sample filesystem.

MBR Partition: Read. Used to access dd images when partition information
is available.

RAR: Browse only. With special crafted dll for windows also decompress,
but it is a dirty hack.

ISO: Read only. Limited and based in GPL code.

The virtual file system allows to mount an filesystem (above) in a
folder for browsing, open, and so on, so in example you open a ZIP and
inside you find another zip, you can mount the inner stream in a folder
and access to inner files transparently to your code, with a path like

F:=VirtualLayerRoot.CreateStream("/zip1/zip2/myfile.txt",fmOpenRead);

I can publish the code but I can not provide any guarantee of correct
work and that it will not delete all your hard disk :)
--
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
silvioprog
2016-09-04 00:04:19 UTC
Permalink
Raw Message
Post by José Mejuto
----------------------------------------------
Post by José Mejuto
CACHE 1000000 byte sequential reads in 46 ms.
FILE 1000000 byte sequential reads in 2200 ms.
----------------------------------------------
[...]
Did you get this result from some sample? If so, could you share it?! I
would be glad to test it checking the TBufferedFileStream performance. :-)
Hello,
Yes, the code is in the test for TBufferedFileStream in the bug ticket
http://bugs.freepascal.org/view.php?id=30549
Great. I'm going to download it.

Delphi TBufferedFileStream should be faster as I think it implementes a
Post by José Mejuto
dumb buffer, or just better a read ahead buffer, so it works better for
sequential read and worst for butterfly reads on same zones.
For my works I just use my class TCacheStream which applies the same cache
code but over any stream already created. I need this as a stream filter
because I'm working on virtual file systems which access a stream
(whichever class) as a blocks device stream, so I can in example work with
ZIP files as they was a disk with functions to browse the ZIP entries,
create files, delete, read and write, once each "file" is closed and the
stream that holds the whole file is freed in destroy it updates the zip
according, compressing new and modified streams and copying/moving the
blocks not touched.
Part of this code is in the Excel reader/writer in fpspreadsheet package.
Most of this classes are mostly beta versions, operative in my environment
but not valid for wide use, that's the reason I had not published them (and
the lack of comments, and code convolution).
Native filesystem: Read/Write. Maps the native filesystem to my virtual
filesystem.
ZIP filesystem: Read/Write. Up to 2GB zip files.
FAT16: Read. Typical dd images.
FAT32: Read. Typical dd images.
Microsoft Binary Compound: Read/Write with limitations. Used in XLS, DOC,...
Sample filesystem: Read/Write. Very limited sample filesystem.
MBR Partition: Read. Used to access dd images when partition information
is available.
RAR: Browse only. With special crafted dll for windows also decompress,
but it is a dirty hack.
ISO: Read only. Limited and based in GPL code.
The virtual file system allows to mount an filesystem (above) in a folder
for browsing, open, and so on, so in example you open a ZIP and inside you
find another zip, you can mount the inner stream in a folder and access to
inner files transparently to your code, with a path like
F:=VirtualLayerRoot.CreateStream("/zip1/zip2/myfile.txt",fmOpenRead);


If I understood right, did you create something like GIO or GVFS? Or
neither of them hehe

Another question, does it work remotely too?

I can publish the code but I can not provide any guarantee of correct work
Post by José Mejuto
and that it will not delete all your hard disk :)
Lol! :-D
--
Silvio Clécio
José Mejuto
2016-09-04 00:39:55 UTC
Permalink
Raw Message
Post by silvioprog
If I understood right, did you create something like GIO or GVFS? Or
neither of them hehe
Hello,

Yes and no :) It's something like GVFS but at your program level only. I
originally develop it to be used in a forensic tool (which was not
finally developed) to allow the same data scan engine to work in
compressed files as they were real files so the file access logic is
completly isolated from the information gather engine as this one works
over a TVirtualFileSystem, how you implement your TVirtualFileSystem is
up to your decision.

You basically need to implement this functions:

function intfOpenFile(const AFileName: UTF8String; const AMode:
cardinal): TvlHandle; virtual; abstract;

function intfCloseFile(const Handle: TvlHandle): Boolean; virtual; abstract;

function intfFindList(const APath: UTF8String; const AMask: UTF8String):
TVirtualLayer_FolderList; virtual; abstract;

function intfSeek(const AHandle: TvlHandle; const APosition: int64;
const Origin: word): int64; virtual; abstract;

function intfRead(const Handle: TvlHandle; const Buffer: PBYTE; const
Size: int64): int64; virtual; abstract;

function intfWrite(const Handle: TvlHandle; const Buffer: PBYTE; const
Size: int64): int64; virtual; abstract;

function intfGetFileSize(const AHandle: TvlHandle): int64; virtual;
abstract;

function intfSetFileSize(const AHandle: TvlHandle; const ANewFileSize:
int64): Boolean; virtual; abstract;

function intfDeleteFile(const AFileName: UTF8String): boolean; virtual;
abstract;

function intfGetFreeSpace(const APath: UTF8String): int64; virtual;
abstract;

function intfIsWritableMedia(): Boolean; virtual; abstract;

function intfMakeFolder(const AFolder: UTF8String): Boolean; virtual;
abstract;

function intfRemoveFolder(const AFolder: UTF8String): Boolean; virtual;
abstract;

function intfCopy(const ASourceFileName,ATargetFileName: UTF8String):
Boolean; virtual;

function intfMove(const ASourceFileName,ATargetFileName: UTF8String):
Boolean; virtual;

And you get a file system working. Of course it is a basic file system,
without many functions and some restrictions like all the system works
in case sensitive mode whichever the original one supports.
Post by silvioprog
Another question, does it work remotely too?
As far as you can implement those functions yes, in fact I started
(almost nothing) to write an http access using a remote zip file for
read only and using the http byte ranges to access it.

Maybe I should retake this work :) if there are some interest in the
community.

Attached is a sample (some folder names erased) of the explorer I use to
test the different file systems in "real" situations.


--
silvioprog
2016-09-04 01:58:52 UTC
Permalink
Raw Message
Post by silvioprog
If I understood right, did you create something like GIO or GVFS? Or
Post by silvioprog
neither of them hehe
Hello,
Yes and no :) It's something like GVFS but at your program level only. I
originally develop it to be used in a forensic tool (which was not finally
developed) to allow the same data scan engine to work in compressed files
as they were real files so the file access logic is completly isolated from
the information gather engine as this one works over a TVirtualFileSystem,
how you implement your TVirtualFileSystem is up to your decision.
TvlHandle; virtual; abstract;
[...]

Nice API.

As far as you can implement those functions yes, in fact I started (almost
Post by silvioprog
nothing) to write an http access using a remote zip file for read only and
using the http byte ranges to access it.
I spent a much time working in an efficient byte serving layer, because the
company needed to provide files allowing resume of up/download contents. I
did it with raw TFileStream, but I need to spend some time testing
TBufferedFileStream, I think this class can be very useful for creating
fast layer allowing download acceleration of big files (that has sequencial
small repetitive reads), and I've already implemented a structure allowing
the following simultaneous byte range groups:

HTTP/1.1 206 Partial Content
Date: Wed, 15 Nov 1995 06:25:24 GMT
Last-Modified: Wed, 15 Nov 1995 04:58:08 GMT
Content-Length: 1741
Content-Type: multipart/byteranges; boundary=THIS_STRING_SEPARATES

--THIS_STRING_SEPARATES
Content-Type: application/pdf
Content-Range: bytes 500-999/8000

...the first byes range...
--THIS_STRING_SEPARATES
Content-Type: application/pdf
Content-Range: bytes 7000-7999/8000

...the second bytes range
--THIS_STRING_SEPARATES--

ASAP I'm going to redo my benchmarking tests replacing the TFileStream to
TBufferedFileStream and checking if I really can get any performance gain.

Maybe I should retake this work :) if there are some interest in the
Post by silvioprog
community.
Searching files under compressed files seems really seems a very useful
feature.
Post by silvioprog
Attached is a sample (some folder names erased) of the explorer I use to
test the different file systems in "real" situations.
I took a look at it, thanks! :-)

--
Silvio Clécio
silvioprog
2016-09-04 02:45:43 UTC
Permalink
Raw Message
On Sat, Sep 3, 2016 at 10:58 PM, silvioprog <***@gmail.com> wrote:
[...]
... under compressed files ...
Sorry, I meant "within compressed files".
--
Silvio Clécio
Michael Van Canneyt
2016-09-04 05:15:38 UTC
Permalink
Raw Message
Post by José Mejuto
Post by Michael Van Canneyt
Post by José Mejuto
Added to the bugtracker implementation of TBufferedFileStream, there
is one compatibility issue note in the bug entry.
http://bugs.freepascal.org/view.php?id=30549
Nice. I have assigned this to myself.
Why do you think this is needed ? Is the caching provided by most of the
operating systems not enough ?
Hello,
1) Delphi compatibility (It is available in Delphi).
2) TFileStream basically is a frontend for FileRead and FileWrite (Talking
about Windows) and once you invoke one of those you enter in multithread
contention, thread swapping, and maybe kernel mode (I'm not sure) so if you
read or write small pieces of data from files the performance is very poor.
Typical example is TFileStream.GetByte which I'm using in a parser, because
the file could be very big, and I don't want to read everything in a
TMemoryStream, performance of cache/buffered and regular is dramatically
----------------------------------------------
CACHE 1000000 byte sequential reads in 46 ms.
FILE 1000000 byte sequential reads in 2200 ms.
----------------------------------------------
And this result is with the system cache hot (in fact the full file is in
system cache).
In the other hand a cache system is powerful than a buffered system if you
are writing something like a filesystem over a TFileStream where you may need
to jump here and there to read data, file allocation tables, attributes, and
so on, in this case a cache with multiple pages instead just one improves the
result (avoid multiple reads in the same zones).
And why did you not use the bufstream unit of FPC ?

The TBufStream stream implemented there works with any other stream, not just files.

Michael.
José Mejuto
2016-09-04 11:44:25 UTC
Permalink
Raw Message
Post by Michael Van Canneyt
Post by José Mejuto
In the other hand a cache system is powerful than a buffered system if
you are writing something like a filesystem over a TFileStream where
you may need to jump here and there to read data, file allocation
tables, attributes, and so on, in this case a cache with multiple
pages instead just one improves the result (avoid multiple reads in
the same zones).
And why did you not use the bufstream unit of FPC ?
The TBufStream stream implemented there works with any other stream, not just files.
Hello,

In fact I wrote my code over TStream but in Delphi the class is
inherited from TFileStream so I take my "TStreamFilter" and adapt it to
inherit from TFileStream just as Delphi do.

The second powerful reason is that I was not aware about TBufStream :)

TBufStream does support seek ? And SetSize ?

My code like TBufStream inherits from TStream. Which is the advantage of
inherit from TOwnerStream ?
--
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Michael Van Canneyt
2016-09-04 12:04:53 UTC
Permalink
Raw Message
Post by José Mejuto
Post by Michael Van Canneyt
Post by José Mejuto
In the other hand a cache system is powerful than a buffered system if
you are writing something like a filesystem over a TFileStream where
you may need to jump here and there to read data, file allocation
tables, attributes, and so on, in this case a cache with multiple
pages instead just one improves the result (avoid multiple reads in
the same zones).
And why did you not use the bufstream unit of FPC ?
The TBufStream stream implemented there works with any other stream, not just files.
Hello,
In fact I wrote my code over TStream but in Delphi the class is inherited
from TFileStream so I take my "TStreamFilter" and adapt it to inherit from
TFileStream just as Delphi do.
The second powerful reason is that I was not aware about TBufStream :)
It's even documented.

http://www.freepascal.org/docs-html/current/fcl/bufstream/index.html
Post by José Mejuto
TBufStream does support seek ? And SetSize ?
Seek: yes. Setsize not.

The TBufStream class has 2 descendents: one for reading, one for writing.
This means the implementation is much more simple, and for most usecases,
this is sufficient.
Post by José Mejuto
My code like TBufStream inherits from TStream. Which is the advantage of
inherit from TOwnerStream ?
TOwnerStream is only useful if you have a second stream which you use as a
source. It will free it for you. This is useful when chaining streams.

I will add your implementation to the bufstream unit.
It offers more functionality, but works only on files.

It's - as usual - a tradeoff.

Michael.
José Mejuto
2016-09-04 12:29:07 UTC
Permalink
Raw Message
Post by Michael Van Canneyt
Post by José Mejuto
The second powerful reason is that I was not aware about TBufStream :)
It's even documented.
http://www.freepascal.org/docs-html/current/fcl/bufstream/index.html
Hello,

Sure :) But I'm not aware about all the gems in fpc code :)
Post by Michael Van Canneyt
Post by José Mejuto
TBufStream does support seek ? And SetSize ?
Seek: yes. Setsize not.
I was asking looking at code:

function TWriteBufStream.Seek(const Offset: Int64; Origin: TSeekOrigin):
Int64;
begin
if (Offset=0) and (Origin=soCurrent) then
Result := FTotalPos
else
BufferError(SErrInvalidSeek);
end;

Maybe I'm missing something ?

On the SetSize side I think there is a missing check, SetSize can make
the file smaller, so think in a 1 Kb file, you read the first byte so
the buffer will be filled with the 1 Kb data, now you SetSize to 1 byte
and read another byte, this read will return 1 byte success read which
is out of the bounds of the file.
I know this is a terrible border case and the solution could be just
invalidate buffer when a SetSize is called.
Post by Michael Van Canneyt
The TBufStream class has 2 descendents: one for reading, one for writing.
This means the implementation is much more simple, and for most
usecases, this is sufficient.
That's more or less what I was using in other programs, with my own
class (I should read more docs).
Post by Michael Van Canneyt
Post by José Mejuto
My code like TBufStream inherits from TStream. Which is the advantage
of inherit from TOwnerStream ?
TOwnerStream is only useful if you have a second stream which you use as a
source. It will free it for you. This is useful when chaining streams.
Oh! I see, I'm using the typical Create(Stream,Owned=true)
Post by Michael Van Canneyt
I will add your implementation to the bufstream unit. It offers more
functionality, but works only on files.
It's - as usual - a tradeoff.
Not exactly, it is possible to add my original implementation
TCacheStream which works over any TStream and create a
TBufferedFileStream (for Delphi compat.) inheriting from it, but it will
not have the same inheritance chain.

What's the best for fpc ? A generic stream cache, a inheritance
compatible TBufferedFileStream ? Or maybe both ? Or None ? :)
--
_______________________________________________
fpc-devel maillist - fpc-***@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Michael Van Canneyt
2016-09-04 12:51:43 UTC
Permalink
Raw Message
Post by José Mejuto
Post by Michael Van Canneyt
Post by José Mejuto
The second powerful reason is that I was not aware about TBufStream :)
It's even documented.
http://www.freepascal.org/docs-html/current/fcl/bufstream/index.html
Hello,
Sure :) But I'm not aware about all the gems in fpc code :)
Post by Michael Van Canneyt
Post by José Mejuto
TBufStream does support seek ? And SetSize ?
Seek: yes. Setsize not.
Int64;
begin
if (Offset=0) and (Origin=soCurrent) then
Result := FTotalPos
else
BufferError(SErrInvalidSeek);
end;
Maybe I'm missing something ?
On the SetSize side I think there is a missing check, SetSize can make the
file smaller, so think in a 1 Kb file, you read the first byte so the buffer
will be filled with the 1 Kb data, now you SetSize to 1 byte and read another
byte, this read will return 1 byte success read which is out of the bounds of
the file.
I know this is a terrible border case and the solution could be just
invalidate buffer when a SetSize is called.
Post by Michael Van Canneyt
The TBufStream class has 2 descendents: one for reading, one for writing.
This means the implementation is much more simple, and for most
usecases, this is sufficient.
That's more or less what I was using in other programs, with my own class (I
should read more docs).
Post by Michael Van Canneyt
Post by José Mejuto
My code like TBufStream inherits from TStream. Which is the advantage
of inherit from TOwnerStream ?
TOwnerStream is only useful if you have a second stream which you use as a
source. It will free it for you. This is useful when chaining streams.
Oh! I see, I'm using the typical Create(Stream,Owned=true)
Post by Michael Van Canneyt
I will add your implementation to the bufstream unit. It offers more
functionality, but works only on files.
It's - as usual - a tradeoff.
Not exactly, it is possible to add my original implementation TCacheStream
which works over any TStream and create a TBufferedFileStream (for Delphi
compat.) inheriting from it, but it will not have the same inheritance chain.
What's the best for fpc ? A generic stream cache, a inheritance compatible
TBufferedFileStream ? Or maybe both ? Or None ? :)
Generic stream cache.
If you can rework your implementation to replace TBufStream, that would be absolutely superb !

We can keep the 2 descendents for backwards compatibility.

Michael.

Loading...