Rusky
|
|
Posted on: August 17, 2009, 04:42:22 pm |
|
|
Joined: Feb 2008
Posts: 954
|
Josh likes his little string parser thingy, but IMO it would be much easier and more extensible to use a more conventional parser. I pulled together a small EDL->C++ translator using Lex and Yacc to show how much simpler it can be to write and maintain. This was written in about 7 programming days. DownloadYou can build it yourself with make if you're on Linux; if you're on Windows you can just use gml.exe in the zip (or build it yourself with make if you have msys or cygwin). It parses its standard input until it hits an EOF and then prints the C++ code it's come up with on standard output. I've provided a sample file test.txt that you can translate. Have fun. The first bunch of replies (through #14) are for the first version which didn't work as well. Pretty much everything's been fixed, although it still doesn't parse full EDL, just a subset of it. It does: - declarations (var only, but you can do defaults, like var a = 3)
- assignments (including +=, etc.)
- if (including else, did it right this time Josh ), while and for
- expressions using +, -, *, /, >, <, >=, <=, !=, == (= works like == in ifs) and parentheses
Another method that still takes advantage of using a parser generator but doesn't actually build the tree (because Josh thinks that's overkill) would be to print things as soon as you know they're correct (i.e. in the semantic action blocks for the productions in parser.ypp). This wouldn't use as much memory and it would probably be faster, but it would combine the printing and parsing phases and maybe require a separate syntax checker like Josh has. I haven't thought much about this one.
|
|
« Last Edit: August 24, 2009, 11:58:35 am by Rusky »
|
Logged
|
|
|
|
Josh @ Dreamland
|
|
Reply #1 Posted on: August 20, 2009, 08:57:31 pm |
|
|
Prince of all Goldfish
Location: Pittsburgh, PA, USA Joined: Feb 2008
Posts: 2950
|
I stopped when a=0b=1c=2 came out as a=0;b=1c=2.
With my parser, if one works, they all work.
And on for() not being implemented in your parser... When I was done with if(), for() was added just by saying this:
sy_semi=sy_semi->push(')','s'); sy_semi=sy_semi->push(';','s'); sy_semi=sy_semi->push(';','s'); Where if() was done with only the first line of that code.
The only thing you've demonstrated is how something can work once, and then just stop. The only way mine would ever stop working is if I managed to dis-align the code string and token string. But only a real twat would do that.
I'm writing the parser to do exactly what I want done. It's more focused than your all-purpose tree, and works independent of context. The context parsers are the syntax checker and the C Parser. Neither of those need to use a lexer; they are reading the code just as you and I do and writing things down as they go.
It would be aggravating, but doable, to write this GML-C++ formatter parser without doing any sort of lexing. But doing it this way is so much more general...
The only thing I'm not supporting from the entire C++ language with each of these parsers is literal suffixes, as in 100000000L or 0.5f. I find they're unnecessary anyway.
Even things like struct a {}; struct a b; rather than just a b; can all be so greatly generalized this way.
A tree just sounds like a great way to over-complicate things.
Edit: I just tried a simple if statement to see if that'd work, and I noticed that it replaced = with ==. Which was nice, until I noticed it also deletes else{} statements. I'll drop dead before something that stupid happens in my parser. And I don't care how easy it is to fix the goddamn thing. Point is, you somehow managed to delete else(), and that's not the first or only thing you can really fuck up with a token tree.
|
|
« Last Edit: August 20, 2009, 09:13:11 pm by Josh @ Dreamland »
|
Logged
|
"That is the single most cryptic piece of code I have ever seen." -Master PobbleWobble "I disapprove of what you say, but I will defend to the death your right to say it." -Evelyn Beatrice Hall, Friends of Voltaire
|
|
|
|
|
Josh @ Dreamland
|
|
Reply #4 Posted on: August 21, 2009, 08:06:51 pm |
|
|
Prince of all Goldfish
Location: Pittsburgh, PA, USA Joined: Feb 2008
Posts: 2950
|
I'll do all those wonderful things when I, or someone on my team, actually intends to program support for any of them.
And Flash is the only one of those that will require total recoding of the parser; the rest is trivial modification. The syntax check will still be valid in any of those cases, and looky here, it ruins the entire rest of the project anyway because the only thing I know about Flash is that it's fuck ugly.
If it wasn't for integrating the entirety of C++, this parser would have been done within three days of my starting it. I can't think of a good reason to entirely support Flash even if I got hit in the head hard enough to want to support it.
So basically, Flash and Obj-C are off topic and out of the question.
And, as I said in advance -- as in, before you could even bring it up again -- I don't care how trivial the mistake is. Point is, it was made. You'd have to put out some really, really nasty code to ever make my parser put a semicolon where it doesn't belong, and that's the only mistake it's capable of making.
Where can I see my parser's outcome not being entirely predictable for all audiences?
a = b++ c
That's it. That's the only real ambiguity.
And idiotic mistakes in mine... heh. If it works once, it won't just stop working. Due to everything being done procedurally; each step makes assumptions about how the code is formatted based on the imminent success of the previous.
On the subject of which, I think you missed something from the first thing I said.
a=0b=1c=2 became
a=0;b=1c=2 Note that it only added the semicolon once. The second time, nope. Just keeps ignoring further semicolons. Another simple fix, I say! Look how simple this is to get working if you stare at it with a magnifying glass for hours on end, testing everything over again each time to make sure two definitions in your grammar file aren't conflicting; making sure that no order of statements is going to cause total collapse.
I'll give you this, though: I did forget one thing last time. I totally forgot about allowing .7 as well as 0.7, which led to a semicolon being added. In retrospect, I should have been replacing .0 with 00 anyway, because 0.n is valid. Either way, boy did I feel stupid. That said, never did I manage to leave out half the goddamn code just because of "a single if statement that I left out".
Anyway, feel free to continue working on this. If it beats mine in a benchmark, -- that other silly little factor that just doesn't apply to something so great as a token tree -- I'll be happy to use it. And I know it doesn't matter how fast to you, but let me tell you, it certainly matters to me. Why?
I wish you luck with the C++ part of that. You'll need it. I've heard horror stories from "professionals" who were about as happy to hear of my parser's methodology as you, who tried it with the almighty token tree parser and had a record breaking parser time of ten seconds for an STL header. It'll be pretty hard for me to manage a time like that in one pass.
|
|
|
Logged
|
"That is the single most cryptic piece of code I have ever seen." -Master PobbleWobble "I disapprove of what you say, but I will defend to the death your right to say it." -Evelyn Beatrice Hall, Friends of Voltaire
|
|
|
RetroX
|
|
Reply #5 Posted on: August 21, 2009, 08:20:34 pm |
|
|
Master of all things Linux
Location: US Joined: Apr 2008
Posts: 1055
|
To be honest, after looking at this topic, the code, and the EXE, I had a good laugh.
The EXE didn't work, and all it said was "unknown character" every time I typed anything. And, main() wasn't in any of the source files.
This has to be some kind of joke, right?
|
|
« Last Edit: August 21, 2009, 08:25:14 pm by RetroX »
|
Logged
|
My Box: Phenom II 3.4GHz X4 | ASUS ATI RadeonHD 5770, 1GB GDDR5 RAM | 1x4GB DDR3 SRAM | Arch Linux, x86_64 (Cube) / Windows 7 x64 (Blob)Why do all the pro-Microsoft people have troll avatars?
|
|
|
Rusky
|
|
Reply #6 Posted on: August 22, 2009, 01:26:55 pm |
|
|
Joined: Feb 2008
Posts: 954
|
Your "trivial modification" would involve what, rewriting random bits of the program scattered all around? You've completely mixed the output and input stuff. Redoing a parser written with a generator just requires a different way to print out the tree, which is all in one place.
The only place ordering matters in mine is the lexer and the operator precedence section of the parser. Once the lexer works I can ignore it, same way you can ignore previous passes. So no, it's not really that hard. Besides, the parser generator program warns you of conflicting definitions, unlike the c++ compiler. So it's really not that hard, I say again. I already have it working, actually.
See, leaving out else... you're still ignoring the fact that the two programs work completely differently. Mistakes in yours lead to weird syntax errors, mistakes in mine lead to leaving stuff out. Which do you think is easier to fix, by the way? Figuring out where the syntax needs to be fixed, or figuring out where to print something out? The printing in mine is a completely separate phase, it's extremely easy to modify.
Finally, speed matters to you because you're looking at it too closely. All it needs to do is be fast enough that the user won't care. You also need to care about usability. You do want others to be able to work with your parser at some time in the future, don't you? Using a parser generator leaves the source code as a standard way of describing the structure of a program. Yours leaves it as... something that would take quite a while to understand, let alone modify.
RetroX, I'm going to ignore you until you figure out where main() is, and until you use a normal character set.
|
|
|
Logged
|
|
|
|
|
|
RetroX
|
|
Reply #9 Posted on: August 22, 2009, 08:11:36 pm |
|
|
Master of all things Linux
Location: US Joined: Apr 2008
Posts: 1055
|
Yes, main() is somewhere. Keep looking. You do realize this wasn't written entirely in C++, right?
Unless gml.cpp was merely an include for another script in another language, there must be main() in that file if it is a console script, and it wasn't there. You left code out that wasn't in parser.exe. Also, if your tree does not work with a UNIX character set, then that's just sad. Even Windows 7 said anything with semicolons was bad syntax, and it's better syntax to have them.
|
|
|
Logged
|
My Box: Phenom II 3.4GHz X4 | ASUS ATI RadeonHD 5770, 1GB GDDR5 RAM | 1x4GB DDR3 SRAM | Arch Linux, x86_64 (Cube) / Windows 7 x64 (Blob)Why do all the pro-Microsoft people have troll avatars?
|
|
|
|
|
|
|
|
|