JIT compiler efficiency

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

JIT compiler efficiency

tinycc-devel mailing list
Hi TinyTCC developers

Firstly I want to thank you guys for maintaining this amazing compiler that I have just discovered.

My question is:

When using libtcc as a JIT compiler, I would like to know if TCC already has some clever mechanism to reuse the results of - include libtcc.dll *.def and *.a.

The reason I ask is because if I want to use TCC as JIT compiler for some dynamically generated C code repeatedly, I want to avoid unnecessary I/O and processing of the header files, static/dynamic libraries, etc. repeatedly.

Can you please confirm whether there is some type of reuse/caching in the JIT compiler to handle this scenario? If yes, where in the code this is done? If not how to go about addressing this?

Thank you
Faisal


_______________________________________________
Tinycc-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/tinycc-devel
Reply | Threaded
Open this post in threaded view
|

Re: JIT compiler efficiency

Joshua Scholar
I've only been playing with libtcc for a week, so I don't have all the answers, but I am interested in a similar use.  You might be interested in the questions I've asked and the answers I got.

My impressions so far:
1) tcc is a c compiler, and doesn't have any features added to make it suitable for a jit other than 
a) the ability to compile code that's in memory to a buffer that's in memory
b) the ability to supply the addresses of external symbols to the compiler
c) and the ability to retrieve the addresses of symbols that the compiler generated.

Just the simplest things have been done. 
There's no support for adding new code to a system (other than by making a new, unrelated state and supplying addresses from the previous compilations to it).

You have to make a new state for each time you call the compiler.  You can't delete the old states or the code from them will be unusable.  You can't reuse the symbol tables from them.

Every time you compile a few things from the runtime system are duplicated. Bits of the runtime system are linked again, taking more memory, although if you make no calls into the runtime system then some code won't be linked, I'm told, but that probably means that very basic things would be missing.

I've been told that at least you're using some run time from the enclosing program - for instance the heap.  Thank God.

Obviously it has to load the headers and libraries that aren't part of libtcc every compilation - luckily the compiler is super fast.

I've been told that the above flaws wouldn't be too hard to fix, but we'd have to dig through the code and fix them ourselves. 

I HAVE made a special version that keeps the include and library directories embedded in the runtime so it doesn't have to read from the disk to use those.  It works but it's not complete and it's not hosted anywhere.  If I keep working on it, I'll fork the source on github.

My current version has decisions that I don't like and I'm busy remedying them.  The current version starts with a zip file of the directories you want embedded, turns that into source code that's a byte array, then compiles that in.
And the runtime links in zlib and minizip to decompress that data when it's included or linked. It all works but

1) it adds way too much complexity to a project that's supposed to cross compatible - zlib and minizip

2) it slows down compilation by
a) having to decompress the data
b) requiring a critical section on the minizip/zlib code
c) allocating memory for each file as you use it and deallocating it when done

I'll soon be done a new version without these problems.
1) it doesn't need zlib or minizip, instead it uses a simple program that I wrote in C that can be part of the project, one that tcc can compile itself too, of course.
2) the data is stored in memory uncompressed so that opening a file is nothing more than finding a pointer to memory that's already there.  No decompressing, no locks, no memory allocation and deallocation.

So I've already done some work to make libtcc more friendly - creating internal assets instead of requiring a large directory structure on a user's machine.

But all said, I'm beginning to think that tcc won't meet the requirements of my project.
When I was testing a hash function for this new code, I noticed that compiled under tcc, Spooky Hash runs 1/10th the speed as compiled under gcc.  10 to 1 slowness is a much bigger factor than I anticipated.
On the other hand, it may be that this was an unusually optimizable loop and tcc does much better than that on average. 
One sign of that is that if I use tcc to compile tcc, then use the version it created itself to do the process again, the compile time is still fast.
I haven't measured it doesn't seem much more than 2 times slower than a tcc generated by the Microsoft compiler.

For my own project, while I could probably add the features I said TCC was missing above, I doubt my ability to add an optimization phase to TCC.   So if I stick with TCC at all, it will only be because I'm having fun adding things to the TCC project, not because it will get me something I can stick with.  For that, I can't find anything that has enough features other than LLVM - and I don't think jit support in LLVM is mature.  It's being used, but the component isn't stable and the new versions aren't compatible with old ones, so the new one might not even be used anywhere yet.

Joshua Scholar

On Sun, Dec 27, 2020 at 1:23 AM fm663-subs--- via Tinycc-devel <[hidden email]> wrote:
Hi TinyTCC developers

Firstly I want to thank you guys for maintaining this amazing compiler that I have just discovered.

My question is:

When using libtcc as a JIT compiler, I would like to know if TCC already has some clever mechanism to reuse the results of - include libtcc.dll *.def and *.a.

The reason I ask is because if I want to use TCC as JIT compiler for some dynamically generated C code repeatedly, I want to avoid unnecessary I/O and processing of the header files, static/dynamic libraries, etc. repeatedly.

Can you please confirm whether there is some type of reuse/caching in the JIT compiler to handle this scenario? If yes, where in the code this is done? If not how to go about addressing this?

Thank you
Faisal

_______________________________________________
Tinycc-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/tinycc-devel

_______________________________________________
Tinycc-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/tinycc-devel