ENIGMA Forums

General fluff => Announcements => Topic started by: Josh @ Dreamland on November 03, 2011, 12:46:26 pm

Title: We can't decide
Post by: Josh @ Dreamland on November 03, 2011, 12:46:26 pm
The ENIGMA project has hit a snag recently. Basically, our collective ideologies have become tangled.
It looks something like this:

(http://dl.dropbox.com/u/1052740/Reaction/clusterfsck/tangle.jpg).

Ism and I are both sorting issues with our projects. Ism is dealing with Java's ill-equipped generics, trying to structure LGM to be more extensible for future releases. As far as you need to be concerned, this is so we can add our Definitions and Overworld resources, as well as other potential resources down the road.

As for myself, right now I am dealing with a small issue regarding a dynamic type that implements implicit accessors and setters for generic data, which has presented a number of "how should I"s, but in the big picture, my concern is for the progression of ENIGMA's parser.

One point is certain: I need to rewrite or do serious work on the C parser. ENIGMA is presently ill-equipped to make distinctions between the following sample lines:

(id)
(int)
(show_message)
(100001)

This has led to issues in interpreting some GM6 code unless extraneous parentheses are added, which is unacceptable to newcomers, as well as to issues implementing some of C++'s more desirable features, such as complicated ternary expressions. This is because ENIGMA needs information from C.

The purpose of the C++ parser—be it the current one, a rewrite, or Clang—is to provide ENIGMA's EDL parser with information about available types, functions, and other constructs. For example, ENIGMA would not know what var a; meant if the C++ parser couldn't understand var's header, and it would not understand what show_message was if it could not read function definitions. The parser also needs to be able to resolve complicated types for further error checking. Hence, we need some mechanism of parsing C++ sources accurately enough for ENIGMA to produce correct C++ code.

I am faced with two options which I have filtered out as the best.


That said, I am torn between the two options. I need people to say, "50 MB and a potential shitton problems that aren't yours, in exchange for four languages? That sounds worth it to me!" Or to say, "for that price, just write your own."




TL;DR version:

(1) Custom(2) Clang
Tiny (Less than 1MB); fast, pointed runtime50 MB; Parses EVERYTHING, though quickly
Gives precisely the needed information, no more, no lessGives general information that can likely be used to meet all of ENIGMA's purposes.
Supports interfacing with other languages (Lua, Python, JavaScript) at the cost of hundreds of megabytes on top of Clang
Likely to be sole maintainer, responsible for all aspects including any potential errors. This would be no different from now. At worst, it could mean a second recode in the future, but ideally I would make the code sufficiently extensible to prevent that this time.Maintenance involves separating Clang from LLVM as cleanly as possible every time an update is made; any parse errors are not the responsibility of the ENIGMA team, and may or may not be dealt with in a timely manner. Potentially, we'd be facing another MinGW fiasco. (See #13297 (http://sourceware.org/bugzilla/show_bug.cgi?id=13297))



Additional Q/A:
dazappa: Clang is a "frontend to LLVM." So you would use Clang but not LLVM? And what's the final size decision, 1gb or 50mb?
Josh@Dreamland: Well, clang has LLVM dependencies, so I would be cutting LLVM into little pieces and throwing away the ones I don't need. 50MB is the projected size after I throw away the little pieces.

dazappa: Would you rather maintain 50mb of shit that you don't know, or 1mb of shit you do?
Josh@Dreamland: Good question. Ideally, for Clang, the maintenance would just be updating Clang headers, adding any more pieces of LLVM that become necessary, and making sure it compiles as though the configure script had run. Maintaining a megabyte of C++ parser can be just as difficult, if not more so as my responsibility extends beyond making sure it simply compiles.

dazappa: Well, you failed to write the C++ parser happily the first time, so you think you'd be able to do it better the second time? Using clang might save you time if you can get it setup and be able to easily update the headers like you think.
Josh@Dreamland: I do think I'd be able to do better the second time. As you can see, the current version succeeds for the most part, with one warning it throws three times. If I code the second version knowing everything I do from the first, and with all that shit in mind, I should be able to get it to play nice.

dazappa: ...Until you realize you have to rewrite it for a 3rd time.
Josh@Dreamland: I'm going to use a system very similar to the recursive descent scheme Rusky talks about. Basically, it would use the body of the current C parser, which invokes a function to handle the token in the context of each statement. Instead of calling this big mash-up switch statement that makes a hundred if checks, the parser would call one function based on the current context and pass it the token, and that function would deal with it appropriately. This means adding a new type of statement to the list would be pretty easy.

Go to town.
Title: Re: We can't decide
Post by: luiscubal on November 03, 2011, 01:12:36 pm
Quote
MinGW LD/MSys
The horror!

Quote
interpreter for C++ which, depending on its speed, could mean a native method of doing execute_string()
Highly unlikely. C++ takes some time to parse. Depending on the exact implementation(e.g. which headers are imported and how they are used), it could take several seconds just to get the code compiled. We're talking about time BEFORE the code in execute_string even STARTS running.

My suggestion is something on the lines of extern "C".
Decide which features of C++ you want in EDL. C++ includes a lot of junk which you may want to ignore.
Then decide a good interface to have C++ <-> EDL interop. So people can write code directly in C++ *without* any EDL improvements, or code in EDL with limited C++ features. So effectively two compilers, but one of those is already developed so you wouldn't need to worry about it.

Example:

myfile.cpp
Code: [Select]
#include <iostream>
#include "enigma.h"

using std::cout;
using std::endl;

ENIGMA_EXPORT void my_cpp_func(int value) {
   cout << value << endl;
}

interface.eci
Code: [Select]
foreign(c) function "exit" void(int : on_invalid(replace 123) );
foreign(cpp) function "my_cpp_func" void(int : on_invalid(abort) ) ; //exceptions also acceptable

hello.edl
Code: [Select]
var x = 3;
int y = 4;
my_cpp_func(x);
my_cpp_func(y);
my_cpp_func("Hello"); //Compile-time error
string z = "Hello";
my_cpp_func(z); //Run-time error
exit("Hello"); //exit with error code 123

ECI files would essentially be EDL, just like H files are C++.

In the end, link the whole thing together using whatever LD would prefer(either mingw ld or llvm ld).
Also, the syntax here is merely an example.
Title: Re: We can't decide
Post by: Josh @ Dreamland on November 03, 2011, 01:54:17 pm
I'm still personally leaning toward V8 for execute_string. It was just another option worth mentioning. LLVM brings a lot to the table. Too much, really, is the problem.
Title: Re: We can't decide
Post by: Fede-lasse on November 03, 2011, 05:16:44 pm
I don't mind 50 MB, as long as you have some fast servers so the download won't take forever.

I do mind 1 GB, though.
Title: Re: We can't decide
Post by: Josh @ Dreamland on November 03, 2011, 05:21:03 pm
We could do without nearly the entirety of LLVM. It'd be an option for users who want neat stuff.
Clang, plus my estimate of its LLVM dependencies, is only about 50, I believe.
Title: Re: We can't decide
Post by: Rusky on November 03, 2011, 08:46:10 pm
luiscubal is exaggerating the performance impact on execute_string- all the header files, etc. could be parsed once on startup (which is pretty much unnoticeable, especially with clang), and the project Josh was referring to is Cling (http://root.cern.ch/drupal/content/cling), which is designed to be a sort of REPL for C++ so it would be well-suited to execute_string. It probably wouldn't be any faster than GM's, but most uses of execute_string can be eliminated with new functions and will be in a standard (heh) way in GM9.

---

The biggest problem here is due to the fact that ENIGMA is distributed primarily through Subversion in source form. This does generally tend to keep it compatible with all the systems it gets used on, but it also requires that everyone download the libraries and even some toolchain stuff for platforms they don't necessarily use. Isolating all the platform-specific stuff to an installer/package outside of the repository would solve the Subversion problem as 1) various distros' package managers can handle dependencies and 2) developers can be expected to download things outside of the repository.

I see two options beyond that:

Continue with a C++ Compiler

The size requirement on Windows is large no matter what process is used to compile C++- MinGW packages binutils and libstdc++ (which is part of the g++ package <_<) are required pretty much no matter what. Supporting C++ directly, as with Definitions, requires a large compiler- either GCC or Clang. Clang's appeal is due to the fact that it provides a reusable C++ parser to replace ENIGMA's.

A Windows ENIGMA installer would include the 77Mb combination of mingw-get and Clang, then "mingw-get install g++ msys-bash msys-make" leading to a 26Mb download and an installed size of 229Mb. Without Clang, that would drop 76Mb from the installer (to 153Mb installed size) but add a considerable maintenance component to ENIGMA- a complete C++ parser separate from, and thus probably subtly incompatible with, the one used to compile the games.

This system would require minimal change to ENIGMA's current infrastructure, with at worst a new C++ parser to go with GCC and at best just some calls to Clang's libraries. It would be able to call all the existing C++ standard library and other libraries with minimal work from end users. It would be relatively simple to debug games with existing debuggers.

Write Your Own LLVM Frontend

Running with a custom recursive descent parser, ENIGMA would have enough information to generate code for EDL directly using LLVM. LLVM is fully capable of generating object files, but requires a system-specific linker to turn them into an executable. This would be essentially replacing both GCC and Clang with an EDL compiler, and libc/libstdc++ with and ENIGMA library with the appropriate libraries statically linked.

An EDL compiler with the required LLVM libraries linked would pull in about 19Mb (about 2Mb of that is x86-specific; ARM is not included but is also about 2Mb), and mingw-get is only 453Kb. The Windows installer in this case would download 1.79Mb for binutils, and unpack it into 22Mb. In total, the installed size would probably be 50-55Mb, compared to the full-blown compiler installation which is anywhere from 150-250Mb.

This smaller package would use only a single parser and code generator, but would still require some kind of intermediary between the EDL and the ENIGMA library, either through a combination of extern "C"/name mangling and better language features in EDL, or something like what luiscubal described, which may or may not be generated automatically at ENIGMA-build-time using Clang. EDL would probably get its own standard library that would be portable across all output languages. There would be more flexibility with the implementation of things like var, with, switch, etc. and they would be less likely to break.

---

Considering the goal of generating both native executables and JavaScript, it may be a good idea to isolate EDL, Definitions, and extensions from each other and the underlying platform. Instead of Definitions just being project-specific code written in whatever language the project targets, Definitions would be a part of GM-style extensions and include implementations of their API in EDL, C++ and/or JavaScript.

This would require some extension developers to have their own C++ compilers (i.e. not distributed with ENIGMA), but if they're writing C++ directly already that shouldn't be a problem. It would require multiple binaries of the platform-specific parts, but then again that's probably necessary in source form already, and the cross-platform bits could be written in EDL. This kind of a change would also make ENIGMA more compatible with actual GM extensions.

---

Cross-platform building is still a sticky issue in all of these situations. Using GCC, cross-compilation would require rebuilding the compiler for every target- infinitely larger than anything previously discussed. Clang and thus LLVM can cross-compile objects files, but not link them, and linkers are always highly-platform-specific. Cross-linking, the real problem here, can only really be eliminated by a custom linker (bad idea) or a runner of some kind (pointless and 20Mb games).
Title: Re: We can't decide
Post by: The 11th plague of Egypt on November 04, 2011, 04:49:43 pm
I do not mind 50mbs, nor would I mind 500mbs.

SDKs are huge these days. Enigma is going to be an SDK, so I guess 1gb would be acceptable as well.

I download games from Steam, big games that are well over 1gb. Lots of people do.

That said, I don't think supporting all the flaws of GML is the way to go.

Most GM games are developed half way through completion and then dropped, while the completed
ones receive little to no maintenance. They are essentially dead projects. No need for conversion.

If you have such troubles with GML, why not write your own language, it may be even better, hardly worse.

BTW ternary expressions often make code more difficult to read, even for experts
Title: Re: We can't decide
Post by: TheExDeus on November 04, 2011, 05:34:02 pm
Quote
why not write your own language
ENIGMA already has one. Its called EDL.

Quote
BTW ternary expressions often make code more difficult to read, even for experts
They also make the code smaller where it needs to be. There is no need to write some almost redundant if statement when you can just write a ternary expression. Also, I just recently learned that I can do this:
Code: [Select]
(a?max:min)(50,25)so I want this to be part of EDL support as well. :)

And I too don't care about the size. 1gb of course is a little much, but 250-300mb seems ok. 50mb is better of course, but so is 5mb or 500kb.
Title: Re: We can't decide
Post by: Rusky on November 04, 2011, 06:37:35 pm
Guess what you can't do in C++, but could do in EDL if Josh wanted to write a better code generator?
Code: [Select]
(a ? foo : bar) = 3
Title: Re: We can't decide
Post by: luiscubal on November 04, 2011, 06:44:16 pm
Quote
BTW ternary expressions often make code more difficult to read, even for experts
So can ifs, whiles, etc. Pretty much any language feature can be used for obfuscation when used wrong.
That doesn't mean they don't have their uses. IMHO, ternary conditionals are awesome. My only complaint about them is their precedence.

As for size, I'm surprised LLVM is actually that big. I wonder what they include to fill up all that space.

Regarding language design, I still think merging C++ with GML and keeping all features of both is a massive task, both with and without Clang's help.
Title: Re: We can't decide
Post by: Rusky on November 04, 2011, 08:36:22 pm
LLVM is probably composed 30% of C++ virtual function tables, 30% of indirection, and 20% of processor-specific data tables. The last 10% is the actual source code.
Title: Re: We can't decide
Post by: Josh @ Dreamland on November 05, 2011, 12:19:32 am
Guess what you can't do in C++, but could do in EDL if Josh wanted to write a better code generator?
Code: [Select]
(a ? foo : bar) = 3

Since when can you not do this in C++? This is part of the "complicated ternary expressions" I was talking about.
Title: Re: We can't decide
Post by: Rusky on November 05, 2011, 11:04:08 am
Code: [Select]
int main() {
int foo, bar;
(0 ? foo : bar) = 3;
}
Code: [Select]
<stdin>: In function ‘main’:
<stdin>:3:18: error: lvalue required as left operand of assignment

edit: evidently it is possible in C++, just not in C. :P
Title: Re: We can't decide
Post by: Fede-lasse on November 06, 2011, 07:53:47 am
I do not mind 50mbs, nor would I mind 500mbs.

SDKs are huge these days. Enigma is going to be an SDK, so I guess 1gb would be acceptable as well.

I download games from Steam, big games that are well over 1gb. Lots of people do.
Insert Disc 2 to continue ENIGMA installation.

That said, I don't think supporting all the flaws of GML is the way to go.
Agreed, though the main structure of GML should undeniably still remain intact so ENIGMA doesn't lose its GM'ness. GML has a lot of errors that I can live without.
Title: Re: We can't decide
Post by: ugriffin on November 06, 2011, 07:29:18 pm
YoYo Games is using LLVM. Russell Kay tweeted that some time ago.

Just a heads up, really.
Title: Re: We can't decide
Post by: Rusky on November 07, 2011, 12:05:46 pm
http://enigma-dev.org/forums/index.php?topic=898.0
Title: Re: We can't decide
Post by: IsmAvatar on November 07, 2011, 05:07:30 pm
Quote
our collective ideologies have become tangled
Kind of an overstatement really. Generally when a project's collective ideologies become tangled, it spells doom for the project. In this case, we're just trying to compensate for our code's lacking, and hitting some roadblocks that are taking us couple weeks to get past.

The irony to all the shit I do is that, if I do it right, you won't notice any difference. That is, until we start adding more resources, like Definitions, in which case, everyone will probably just be like "what took you so long to add a simple little instantiable code editor?"
Title: Re: We can't decide
Post by: Josh @ Dreamland on November 07, 2011, 05:09:36 pm
I suppose. I'd still say we're in a pretty gunky predicament, especially with the need to support old GM formats and new, never-load formats.
Title: Re: We can't decide
Post by: IsmAvatar on November 12, 2011, 03:49:04 am
In other news, I'm pretty much done with my hacked-together solution to dynamic resources, and it has been tested satisfactorily to demonstrate that it works. Current issue is that I can't get Subclipse working properly, and I think it may be because of a JavaHL issue (fresh install of Eclipse and Subclipse on Ubuntu 11.10).
Also, I don't remember why the fuck I did this hack-together in the first place... v_v I think (read: fear) it may have just been to fix a single suppressed warning that was bugging me (actually, more of a generics error that was suppressed by using reflection, thus being able to depend on erasure - a hack of a hack that's been replaced by a few more suppressed warnings that may or may not have displaced, reduced, or increased the prior number of suppressed warnings).

Once I figure out how to get Subclipse working, I'll review my code and then commit it so that you can all start bitching about how I broke everything.