Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - Josh @ Dreamland

2551
Announcements / Re: Parsers -- A novel by Josh @ Dreamland
« on: August 29, 2009, 10:20:26 AM »
They're basically telling me that it'd be easier for me to use someone else's method, because mine's too hard for me to debug and understand.

Plus, if I use Yacc, other projects can make off with even more easily modifiable pieces of ENIGMA.

2552
Issues Help Desk / Re: *solved* execute_string
« on: August 26, 2009, 05:14:22 PM »
A C++ interpreter is a HUGE amount of work. I don't really even think I'm cut out for it, but my intention is to start the interpreter small and then see where I can take it from there. That is a ways away, though.

2553
Issues Help Desk / Re: Compiling Enigma
« on: August 26, 2009, 05:12:53 PM »
I haven't seen that error since I stopped using Dev. I think I recall it being a linker error to do with zlib. Are you linking to libzlib.a?

2554
Issues Help Desk / Re: Suggest image editor?
« on: August 26, 2009, 05:08:30 PM »
GIMP is a lot to take in. But once you realize that there IS a one-pixel pencil tool, it starts to work its way into your heart. <3

2555
General ENIGMA / Re: Expanding Enigma
« on: August 26, 2009, 05:03:19 PM »
Yoyo would have to be pretty desperate to break down and buy ENIGMA. No matter how well it worked.
'Sides, they'd ruin it.

Linux version *should* work on Mac, with a few hours of tinkering.
Most of ENIGMA is from scratch. Two exceptions are ZLib and LibFFI.

3D functions will be what I'm working on when everyone else is doing the easier functions. (Not long after the release of R4).

2556
General ENIGMA / Re: Enigma IDE (written in C++ using GTK+)
« on: August 24, 2009, 10:34:31 PM »
You're a Linux user. The Developers care what their stuff looks like on Linux, and Linux endorses GTK, so your pidgin download isn't 20MB.

2557
General ENIGMA / Re: Enigma IDE (written in C++ using GTK+)
« on: August 24, 2009, 09:23:41 PM »
If I had to choose one, it'd be wxWidgets, I imagine. I've used wxWidgets applications without ever knowing they were cross platform. That's more than I can say for so much as one GTK application. I mean, aside from the obvious GIMP, whose main toolbar... WINDOW... is so inconvenient and tedious to find that it isn't worth using it at all (until recently, as it now floats on top, but still kinda wreaks), who everyone knows is GTK, even the ones who don't know what GTK is. Let's look at Pidgin.

One fine day, I'm on Pidgin, as I need to speak to Ism on LGM matters. My contact list isn't enormous, but it's convenient to just type the first few letters of the name you're looking for and hit enter. I do that for all my Windows applications.

So I get as far as the first letter, when the box that pops up freezes. Won't accept input. But it had selected her name.

So I press Enter. Nothing happens.
So I press Escape. Nothing happens.

Each time I forgot in any GTK app, I'd be stuck waiting for five seconds for the damn box to go away. That includes GIMP's file selector.


Oh, and in my getting mad about the little things, I didn't even mention that GTK has a 17 or so MB runtime, that basically every program that uses it comes with an updated version of. The compiler's big enough; MinGW is roughly 10MB zipped. ENIGMA adds another couple to that. Can't really say I want a 19 MB runner if it can be avoided.


However. Despite it all, GTK is relatively clean, and ENIGMA could use an interface that can be natively compiled without incident, or breaking GPL rules. So no matter what runtime/API anyone uses, I'd be happy to see an interface.

Though as the above suggests, I'm partial to wx. I started making a UI one day before I realized what a task it'd be. Did I mention Code::Blocks has a built-in wxWidgets WYSI(basically)WYG editor?

2558
Announcements / Re: Parsers -- A novel by Josh @ Dreamland
« on: August 24, 2009, 06:40:11 PM »
Doesn't matter if the spec is human-authored or not. (And for all you know, it isn't. [That was a joke; you don't have to go calling people morons over it])
ENIGMA's API is human-written. GCC's optimizer is human-written. But there are some things that only a human can optimize.

Maybe a parser's not one of them; maybe it is. It doesn't really matter to me; my way will at very, very least keep up. And considering its comparative performance to GM, Which I'm not going to bother quantifying again, I'd say it'll do much more than that anyway.

Either way, if both methods had identical output with identical speed, I would still stick to this one simply because I'm already writing it, and I'm the only one that will ever be reading it. Anyone that thinks they have the incentive can fork the project and write their own parser that does what mine can.

Until then, we're sticking to what I'm writing.

2559
General ENIGMA / Re: Expanding Enigma
« on: August 23, 2009, 08:06:38 PM »
Yeah. My intention is to assemble a small group of coders ranging anywhere from novice to expert, as long as they have what it takes to rake together some GM functions.

Getting the behind-the-scenes things to work is the hard part; from there, it's all going to start flowing. (I hope.)

2560
General ENIGMA / Re: Expanding Enigma
« on: August 23, 2009, 07:39:01 PM »
When R4 Comes out, you'll notice a new script-like resource that I codenamed whitespace. (The name's cute, but it'd confuse people)

Basically, this is your very own piece of C++ that ENIGMA will parse to gain additional function names. If you recreate an unimplemented GM function (or set of functions where possible) or a potentially useful library of functions, then I'd be happy to add them officially.

If for some reason I don't add them officially, you can still distribute them and others can easily add them in.

2561
Announcements / Re: Parsers -- A novel by Josh @ Dreamland
« on: August 23, 2009, 07:36:21 PM »
The preprocessor is what handles #preprocessor directives.

2562
Announcements / Re: Parsers -- A novel by Josh @ Dreamland
« on: August 22, 2009, 08:40:33 PM »
I believe this is the reason compilation was just "too difficult" for the Game Maker team. They were all thinking in a box; this is the way it must be done. We must use a tree; we can't leave it to a far more experienced, far longer lived team such as the G++ one.

I'll say it again; it takes some nerve to even think we can compete with that compiler. It's totally out of the question to even try. Drop your fantasies, -- or at least, scenarios for usefulness of a token tree -- and realize that there's no need for it. Retro's right: the point is to make it so GCC, a far more capable compiler than either of us could hope to write, can read it. Only this, and nothing more.


You just told me you no longer want to use a complete tree, just a ...not tree... generated parser. So. why don't we let this war die until then.

In fact, until anyone has something to showcase, I don't really want to hear any more about it. The flaming is starting to get on my nerves. So show your knowledge and exhibit your idea by posting progress, not by flaming the hell out of anyone that disagrees with you.


As for me...
Progress: CFile parser is working quite nicely. There's a scope problem I'm just about to fix: checking if a variable already exists to report redeclaration needs to be sensitive to what scope the originally declared thing is in.

For example.

Code: [Select]
struct a;
namespace b;
{
  int a;
}

Currently bitches that I'm naming two types in the declaration (int and ::a). It shouldn't do that.

All right, solved, 12 minutes later.

2563
Announcements / Re: Parsers -- A novel by Josh @ Dreamland
« on: August 22, 2009, 12:32:29 PM »
Optimizing compiler should take care of all that, anyway.

2564
Announcements / Re: Parsers -- A novel by Josh @ Dreamland
« on: August 22, 2009, 10:09:02 AM »
That's why I've redone var at least 100 times.
I'll be modifying std::string next; have to make sure it can interface with var flawlessly. And boy do I have a plan for that.

2565
Announcements / Parsers -- A novel by Josh @ Dreamland
« on: August 21, 2009, 10:49:19 PM »
In fear of some readers being more pious than others, I'd like to clear some things up.
In other words, stop reading at the first big caption.


In summary, Rusky's trying to get me to use a token tree parser by starting work on one as a proof of concept. I assume he will continue to develop it from time to time, until it actually competes with the one I wrote as a competent proof of concept or until I'm all done. Whichever comes first.

I met it with some pretty harsh skepticism. But it's not because I feel token trees are bad.

A tree is a very useful tool. It will allow you to easily skip over large chunks of data to weed out only what you need. They are used in probably more than 90% of modern parsers. I'm not looking up an actual statistic, sorry math fans.

However, I don't feel they're appropriate for the formatter-type parser.

The entire point of the formatter parser is to add semicolons and weed out variable names by declaration. It doesn't really call for a tree, in my opinion.

Here's a sample code:
Code: [Select]
a=0b=1
That code is valid GML. So now, how do we deal with it?

Those who are weak of heart should stop reading now.

Token tree method:
The code is read into memory in a structure called a tree. A representation is depicted below:
Code: [Select]
=
├─a
└─0
=
├─b
└─1
To save size, a symbol table is used. It's basically a list of variable names/other things that appear in the code. The table makes comparison of strings from the time of its completion onward operate in constant time by comparing them as an index, not byte for byte. (Means it's efficient)
The code is then rebuilt with all proper punctuation.

My method:
The code undergoes a (somewhat) organized series of stream-like modification.

First, the code is reread into its current buffer free of comments and excess whitespace. This won't affect the code above.
Next, a second block of memory is allocated the size of the first. The first is named code, the second synt.

They look like this:
Code: [Select]
a=0b=1
Code: [Select]
n=0n=0
Next, the blocks are reallocated at twice the size for all upcoming changes. A pass is made that keeps a look out for patterns of tokens. You'll notice above in the syntax string that variable names have been replaced with n, while numbers have been replaced with 0. This is so only one check has to be made to tell what any given character in the code is a part of.

Notice the "0n". A pass is made to look out for things like that.


More complicated code
Trees are usually, or at least very often, done by using a parser generator of sort, which takes a grammar file that can be easily edited. It organizes the process by giving what I've seen to be a regex-like interface for manipulating what patterns of tokens the parser looks for.

My method's basically a hard-coded version of that. I specify what tokens require a semicolon between them in a separate function, instead of using a generator that reads that info from a text file.

In the code
Code: [Select]
if 0a=10else b=10 the token tree would do just what it did above, using if as a root of another sublevel, then putting the next = token inside it. It would then look up the rules for what's inside if to see what it puts next.

Mine does this, instead:   semis->push(')','s'); write_symbol('(');
Which does nothing but insert a left parenthesis and tell it, next time it needs to insert a semicolon, use ')' instead. This is still done in that pass.
(If I were using only find and replace like I said I could, I'd replace "if(.*);" with "if\($0\)"  )

Rusky said he didn't add for loops to his proof of concept. Wouldn't make much sense for me to do so, because all I had to do to implement for() was push two semicolons and an end parenthesis onto the stack, like I did above.


Partitioning my parser
Rusky's other complaint is that my parser is divided into sections. One's responsible for syntax check, one's responsible for C definitions, one's responsible for adding semicolons/other GM-C replacements. I don't really see what there's to criticize about that, but here's a nice reason to do so...

The formatter parser has more passes and gets more attention than both the syntax check and C parsers combined. However... They're what Rusky will call a disaster, and what I'll tell you are faster than greased lightning.

Instead of lexing, they keep a couple variables handy (integers, actually) that I like to refer to as registers. (I may in fact devote a register to each of them, as they are so few.) They let the CFile parser identify what it's going through now. Each C++ key word has an ID. (not dissimilar to the symbol table of the token tree).
In addition to this ID, a second variable is kept indicating mode. I keep track of these mentally, because that's how I goddamn roll. (Ask score_under). That's the part that really makes Rusky cringe; I rarely comment, and I don't document, because I just memorize all the values, or review my thinking to get back to them.

This ID, however, is the difference between "using" and "using namespace", just for example.

( Typedef is handled differently. Typedef's value is 128 or some crap, so I can just OR whatever I have by it to indicate that I'm typedef'ing it. (C and C++ will let you totally whore typedef) )

The most complicated thing to deal with is what may require a stack, and that's pointers to arrays to pointers....of functions.


Now, here's the kicker: I use a tree to store scope data. Le gasp, right? Each variable is stored the same way. An integer can have members if I really want to. And I can tell you just from using them there how beautiful a system they can be. But knowing that, I still say a pair of aligned strings is the way to go with a formatter.


Benefits/Downfalls
  • Tree offers O(1) string comparison. But the parser doesn't need that.
  • Tree does all punctuation in one pass. But it's liable to leave out, say, entire else statements. And not understand unary unless begged.
  • String requires multiple passes, but offers some find-replace methodology. To find a declaration, for example, I just search "d".
  • Tree is more comprehensive. I'm not sure if that's good or bad. I bring up unary again... Luis's parser had a fit on it. And Rusky's won't treat ++ as adding a unary-positive real value, or as incrementing the previous. (But he'll tell you how easy it is to add if you ask him)
  • Being component based means that my parser's syntax check only needs to do what's necessary for syntax checking... It never touches the C parser. The formatter needs to know the members of declared classes, so it calls the C parser to get that. Syntax check can just add the name as a type.

Okay, this is the single longest thing I've ever typed up. If you did any more than just skim it, I feel sorry for you.