Prince of all Goldfish
Location: Pittsburgh, PA, USA
Joined: Feb 2008
There is a lot of room in a typical piece of GML for optimizations; the kind GCC is not capable of making in arbitrary code. I want ENIGMA's optimizer to be extensible so that we aren't hard-coding a lot of passes into the mechanism. That said, some passes will need hard-coded, and do not seem like a hack to hard-code.
To best avoid hacks, we need to lay out how we want the optimizer to promote extensible lists of things to optimize. To best lay out that framework, we need to know the kinds of optimizations that need made. I will name as many optimizations and classes of optimizations as I can. I've added emphasis to phrases I'm trying to pay attention to, such as what needs to have hard code, what optimizations we can automatically enumerate, and what optimizations we can kickstart through pattern matching.
ENIGMA, as you all know, is typed; I wouldn't call it strongly typed, due to var, but it you can specify strong types explicitly. When you do not specify a type, presently, var is used as the default. This is *terrible*, as var is like half the speed of a regular primitive in terms of raw calculations per second. Even when optimized. Variant, on the other hand, is equally fast as a double.
ENIGMA should, at very least, determine whether the variable is used as an array, and if it is not, use variant. Then GCC will do some inlining magic and, in general, no speed will be lost.
ENIGMA should, preferably, go one step further than that, and determine whether strings are ever assigned to it, and if not, use a double instead. Or, if no reals are assigned, only strings, it should use a string.
ENIGMA should, ideally, then go one step further and determine the slowest data type assigned (in terms of double, fastint_t, and long) and narrow the variable type down to it.
This process will simply be hard coded, but will possibly employ other aspects.
Let's go one step further. The above seem easy to implement, yes? But consider this case:
a = choose(1, 2, 3, 4, 5);
We're going to look at two optimizations that can be made to that snippet. An intelligent human would reduce that code to the single line int a = random_integer(1, 5);, but an EDL optimizer could only reasonably be expected to produce one of these outputs:
a = choose(1, 2, 3, 4, 5);
a = random_integer(1, 5);
In the first sample output, a is reduced to int, even though choose returns variant. I believe this can be done in an enumerable fashion by having either a macro or an entry in an optimization file to denote which parameter(s) share a potential return type. For choose(), any parameter can define the return type. I can't think of a function that has a specific parameter which can be either real or string, and defines a return type, but I'm sure there is one.
In the second sample output, choose(sequence) is replaced with random_integer(min(sequence), max(sequence)). This is a non-trivial replacement which would need hard coded, though it is possible we could enumerate functions which might need replaced with other functions based on their parameters. This way, implementers would need only supply the name of the funtion and a method/virtual class which does the checking and replacement.
Moving the assignment into the initializer is a potential optimization, though in most languages, it would not make a difference, and the attempt could therefore only serve to cause harm if initialization was not valid for whatever reason.
More simple optimizations exist in other codes. Consider this code:
b = 10;
It's clear to a human that the first call to draw_set_color does nothing. However, if the assignment b = 10 were replaced with draw_circle(mouse_x,mouse_y,10,0), removing line 1 would cause misbehavior. It would therefore be necessary to have the implementer either specify a list of functions to reference in determining whether two successive function calls of the same type undo each other, or else specify a class/method for examining an AST between two given nodes to make that call for the optimizer.
We then run into a separate, but related, case:
draw_set_color_rgba(0, 0, 0, 0.5);
The most efficient code output for that is as follows:
draw_set_color_rgba(0, 0, 255, 0.5);
But how does the optimizer know to do that? My only thought is that a good pattern to look for would be consecutive calls to related functions in a given set. A class would be provided to give that set of functions, along with hard code to do the merging—how else do we get 0,0,255 out of c_blue? The best we seem to be able to automate here is auto-matching consecutive calls to functions in set A which are not separated by calls to any functions in set B, and then invoking the merge method on the functions from the first set.
I will post more optimizations when I have some more time and am not so tired. I'll also do some proofreading, because I'm sure this reads like ass.
From what I have written here, it looks like the best approach is to have a base class defining a kind of optimization to perform, and then have child classes to carry out a specific operation, which can then have child classes for very similar optimizations. So call-consolidating (as in draw_set_color optimizations above) would be one child of the optimization class, which would employ its own virtual class for specifying sets of functions to consolidate, as described above.
Please do submit feedback; this process is going to need a lot more thought.