Pages: [1]
  Print  
Author Topic: Some guy working since 1987 SPEAKS OUT!!!!!!  (Read 1079 times)
Offline (Unknown gender) justyellowboy
Posted on: August 20, 2010, 07:24:48 PM

Member
Joined: Aug 2010
Posts: 14

View Profile
Fuck those two words.

http://www.drdobbs.com/blog/archives/2010/08/c_compilation_s.html;jsessionid=QVLRWW0F0X0PBQE1GHRSKHWATMY32JVN

But anyway, this guy has been coding compilers since 1987. Since ENIGMA is a compiler with (debatedly) slow compile time (I don't care too much about it, at least there's only one "compilation window." Now somebody find a way to make the game show up as a window in my task bar or I will surely do it myself someday!), maybe this can shed some light on the culprits behind the waiting time.

The more reasons we keep in mind, the less confusion, the easier it will be to stomp any issues of our own before moving on to these (if we can progressively do this, and why am I talking about "we?" I mean, "you until I get up and pin this language and compilers down." There).
Logged
Offline (Male) Rusky
Reply #1 Posted on: August 20, 2010, 07:38:16 PM

Resident Troll
Joined: Feb 2008
Posts: 960
MSN Messenger - rpjohnst@gmail.com
View Profile WWW Email
ENIGMA suffers from all the problems in that article, with the added awesomeness of yet another pass to translate GML constructs to C++ before jumping into the actual compilation. Unless Josh somehow changes his mind about EDL being GML + C++, those problems are unavoidable.
Logged
Offline (Male) Josh @ Dreamland
Reply #2 Posted on: August 20, 2010, 09:38:41 PM

Prince of all Goldfish
Developer
Location: Ohio, United States
Joined: Feb 2008
Posts: 2933

View Profile Email
Quote
1. Trigraph and Universal character name conversion.
2. Backslash line splicing.
3. Conversion to preprocessing tokens. The Standard notes this is context dependent.
4. Preprocessing directives executed, macros expanded, #include's read and run through phases 1..4.
5. Conversion of source characters inside char and string literals to the execution character set.
6. String literal concatenation.
7. Conversion of preprocessing tokens to C++ tokens.

My parser does the first six in one pass. The seventh one is unnecessary for my purposes. Also, my compiler skips over the code inside implemented functions, and is not precise in its template instantiations. As a result, I can parse a whole lot of libraries (the entire C posix library and C++ standard template library, in addition to ENIGMA's own headers) in under a third of a second on my machine. (The libraries in question being libraries being sys/signal.h; cpio.h, dirent.h, fcntl.h, grp.h, pthread.h, pwd.h, sys/ipc.h, sys/msg.h, sys/sem.h, sys/stat.h, sys/time.h, sys/types.h, sys/utsname.h, sys/wait.h, tar.h, termios.h, unistd.h, utime.h;  map, list, stack, string, vector, limits, iostream, fstream, cstdlib, cstdio, and cmath.)
I can't control how the GCC does its passes, though, and so yes, I inherit its problems, to which that list applies.

I have, however, done my best to make ENIGMA modular. The vast majority of ENIGMA is partitioned into independent modules that only need recompiled when updated.

Someone told me I should roll my own compiler, too (Not necessarily for C++, but for EDL). I could probably modify my parser to pay closer attention to templates and to tree up the code as well as the declarations, without doubling the time taken. However, I don't have what it takes to assemble and optimize all of that code myself. I believe that I could do it faster than the GCC. I have no idea that I could do it faster than Clang. But I could not do as good a job on the output as either.

Item number 5 on that man's list, though, is just the kind of thing my parser fixes.
Quote
5. The meaning of every semantic and syntactic (not just lexical) construct depends on the totality of the source text that precedes it. Nothing is context independent. There's no way to correctly preparse, or even lex, a file without looking at the #include file contents. Headers can mean different things the second time they are #include'd (and in fact, there are headers that take advantage of this).
The actual function code in the headers is never necessary to assess such. The very point of my parser is, in fact, to fix that. ENIGMA reads the headers up front, extracting only type and parameter information, in order to emulate GM's speedy syntax check. The method is rough at present, but the parsers are still quite fast. They will probably increase in precision over time, as it becomes necessary for me to do accurate checks on function parameter casts.

Anyway, the point is, I have recognized these problems for a while and have done my best to deal with them. We will alwas have dependencies on the GCC (or some other competent compiler--I've been looking into ways to allow compiler trade-out for anything that supports the GNU extensions I use, which presently may just be lrint). It falls on them to implement fixes for the obvious problems.
Logged
"That is the single most cryptic piece of code I have ever seen." -Master PobbleWobble
"I disapprove of what you say, but I will defend to the death your right to say it." -Evelyn Beatrice Hall, Friends of Voltaire
Offline (Unknown gender) luiscubal
Reply #3 Posted on: September 17, 2010, 12:28:55 PM
Member
Joined: Jun 2009
Posts: 452

View Profile Email
No matter how well Josh optimizes his parser, the output is still just passed to GCC.
Even though GCC allows stuff like trigraphs to be disabled(and as Josh says his parser already deals with trigraphs, I'm guessing GCC already skips that phase), some phases are necessary and impossible to avoid.

I have reasons to believe that the bottleneck of ENIGMA compilation times is GCC.

Clang is pretty incredible when it comes to compile times, although it is still in most cases slower than GCC when it comes to runtime speed(well, at least LLVM-GCC is).

If you ever decided to roll your own compiler for EDL, and that parser left out C++(no #include, no templates, no etc.), then you probably could have better compilation times than Clang. Runtime speed should be comparable if you too used LLVM, which greatly simplifies code generation and optimization.
If you decided to roll your own compiler for EDL and include all C++ features, then I don't think you'd have good chances of beating Clang.
Logged
Offline (Male) Josh @ Dreamland
Reply #4 Posted on: September 17, 2010, 06:16:15 PM

Prince of all Goldfish
Developer
Location: Ohio, United States
Joined: Feb 2008
Posts: 2933

View Profile Email
Aware. The compilation speed issues definitely fall on GCC. And such is indeed out of my power to fix. It doesn't help that I don't ship ENIGMA with precompiled object files. That much needs to be improved on, and will need to be very shortly as people start compiling for more platforms. I need to keep a folder of objects for each device, at very least.

Eventually I will need to invest in precompiled headers, as well.

For now, though, I've tried to use few GNU extensions that would inhibit faster compilers from being incorporated. Right now, the only function that comes to mind is lrint(), but compiler support for that seems to be good. Serp used some GNU #defines in his #ifs in various functions, which should only fail to optimize correctly on faster compilers.

Ultimately, though, GCC must do the final compilation of all projects. Despite its slower compile times, it has the widest support by those who work towards cross-compilation, and is quite consistent between platforms with its #defines and its type sizes. It is by that saving grace that I managed to compile all the audio codecs natively without huge recode.

Anyway, something about tacos.
Logged
"That is the single most cryptic piece of code I have ever seen." -Master PobbleWobble
"I disapprove of what you say, but I will defend to the death your right to say it." -Evelyn Beatrice Hall, Friends of Voltaire
Pages: [1]
  Print