TGMG's interested in maintaining the EGMJS port I started a while back, and doing that under the new parser will be pretty easy, imo.
This topic is to try to make it even easier for him.
I will record general notes to all implementers
on the Wiki. Notes specifically concerning my thoughts on the implementation of EGMJS will go in this topic.
My first concern is on how TGMG will load definitions. Presently, ENIGMA uses a central
JDI context to store its definitions. Since JDI is inherently a C++ parser, this is done by invoking it on the engine file directly. JDI is not a JavaScript parser. However, JDI's structure is easy to figure out, and JavaScript is capable of reflection. The way I see it, there are three ways you can go about this:
1) Choose the language that is going to host the crawler.
This can be Java using javax.script.ScriptEngine (javax.script.ScriptEngineManager.getEngineByName("JavaScript")), or in C++ using Google V8. Both methods have their advantages:
- If you use Java's ScriptEngine class, no additional libraries need included or set up. Java's also pretty good about doing the integration for you, and building V8 for Windows is an impossible task (it requires MSVC++). The difficulty is that you have to get this information back to ENIGMA, and it adds ENIGMA.jar as a dependency to the process (meaning a CLI build without Java will be completely impossible).
- If you use Google V8, everything can be done from within C++; you can use JavaScript reflection to call native methods directly. The C++ methods can populate JDI structures in memory while the JavaScript engine is doing the iteration. This is bound to be more efficient, as Java does not guarantee its scripting engines are even compiled, to my knowledge.
The bottom line is, by this method, you need to use JavaScript reflection to communicate a list of available functions to ENIGMA so the parser can do syntax checking.
The other method that I can see you using is having emscripten parse the JavaScript engine, and then polling it for definition names to pack into JDI classes. This method has similar advantages. On the downside, it means that EGMJS is dependent on LLVM—that's a heavy dependency that I'm in general not fond of. On the other hand, it means that you'll be asking LLVM for the definitions and (probably) using LLVM to store the code so emscripten can compile the code, which would open doors for ENIGMA to compile to other languages for which LLVM has pretty-printers. It might also introduce some issues in the translation, but from what I can tell, as long as you keep within a relatively decent-sized subset of LLVM instructions, you should avoid such issues.
I see a great amount of merit in each option, so I do not care which method you choose. If you go with the V8/ScriptEngine method, I will be happy to have a two-megabyte JavaScript export extension. If you go with emscripten, I will be happy to have LLVM as an abstraction layer. Let me know what you're thinking, though.