1. Trigraph and Universal character name conversion.
2. Backslash line splicing.
3. Conversion to preprocessing tokens. The Standard notes this is context dependent.
4. Preprocessing directives executed, macros expanded, #include's read and run through phases 1..4.
5. Conversion of source characters inside char and string literals to the execution character set.
6. String literal concatenation.
7. Conversion of preprocessing tokens to C++ tokens.
My parser does the first six in one pass. The seventh one is unnecessary for my purposes. Also, my compiler skips over the code inside implemented functions, and is not precise in its template instantiations. As a result, I can parse a whole lot of libraries (the entire C posix library and C++ standard template library, in addition to ENIGMA's own headers) in under a third of a second on my machine. (The libraries in question being libraries being sys/signal.h; cpio.h, dirent.h, fcntl.h, grp.h, pthread.h, pwd.h, sys/ipc.h, sys/msg.h, sys/sem.h, sys/stat.h, sys/time.h, sys/types.h, sys/utsname.h, sys/wait.h, tar.h, termios.h, unistd.h, utime.h; map, list, stack, string, vector, limits, iostream, fstream, cstdlib, cstdio, and cmath.)
I can't control how the GCC does its passes, though, and so yes, I inherit its problems, to which that list applies.
I have, however, done my best to make ENIGMA modular. The vast majority of ENIGMA is partitioned into independent modules that only need recompiled when updated.
Someone told me I should roll my own compiler, too (Not necessarily for C++, but for EDL). I could probably modify my parser to pay closer attention to templates and to tree up the code as well as the declarations, without doubling the time taken. However, I don't have what it takes to assemble and optimize all of that code myself. I believe that I could do it faster than the GCC. I have no idea that I could do it faster than Clang. But I could not do as good a job on the output as either.
Item number 5 on that man's list, though, is just the kind of thing my parser fixes.
5. The meaning of every semantic and syntactic (not just lexical) construct depends on the totality of the source text that precedes it. Nothing is context independent. There's no way to correctly preparse, or even lex, a file without looking at the #include file contents. Headers can mean different things the second time they are #include'd (and in fact, there are headers that take advantage of this).
The actual function code in the headers is never necessary to assess such. The very point of my parser is, in fact, to fix that. ENIGMA reads the headers up front, extracting only type and parameter information, in order to emulate GM's speedy syntax check. The method is rough at present, but the parsers are still quite fast. They will probably increase in precision over time, as it becomes necessary for me to do accurate checks on function parameter casts.
Anyway, the point is, I have recognized these problems for a while and have done my best to deal with them. We will alwas have dependencies on the GCC (or some other competent compiler--I've been looking into ways to allow compiler trade-out for anything that supports the GNU extensions I use, which presently may just be lrint). It falls on them to implement fixes for the obvious problems.