Pages: 1
  Print  
Author Topic: GM8.1 format changes  (Read 25086 times)
Offline (Female) IsmAvatar
Posted on: April 23, 2011, 11:49:21 pm

LateralGM Developer
LGM Developer
Location: Pennsylvania/USA
Joined: Apr 2008
Posts: 877

View Profile Email
Herein I'll try to maintain a full listing of changes between GM8.0 (800) and GM8.1 (810).

This information can kind of be gleaned by carefully reading the GM Release notes, and dumping the irrelevant junk.
http://store.yoyogames.com/downloads/gm4win/release-notes.html

Note that, at this time, none of the changes in 810 affect the filesize. They have finally learned to recycle bytes. Also note that almost all of the changes are completely backwards compatible, meaning that if you change the version identifier back to 800, it will load in GM8.0, although some minor fields may have incorrect values (e.g. Font Range Min usually changed to 255).

8.1.X (oldest)
Version identifier appearing after magic number changed from 800 to 810.
Font "Range Min" bytes now shared with Charset (default 0). First 2 bytes (little endian) still indicate Range Min. Next byte indicates charset identifier. Final byte reserved (0).

8.1.69
Font "Range Min" and Charset bytes now merged with Anti-Aliasing (AA) selection (default 3). Final byte after the charset identifier now indicates the AA. 0 for Off, other values are 1, 2, and 3. Still uncertain what the different values of anti-aliasing mean.
Game Settings "Uninitialized variables" bytes now shared with "Throw an error when arguments aren't initialised correctly." (I abbreviate it as "Mismatched Arguments" or "Parameter Checking". Default On or 1). First byte is a bit-share, where Uninitialized Variables occupies &0x01, and Mismatched Arguments occupies &0x02.

8.1.71-91
No known changes.

LGM is at this point.

8.1.106
May have changed all strings, especially script code, to unicode support. Unconfirmed.

8.1.107-108
No known format changes. Introduces new functions ansi_char(?), get_function_address(?), string_byte_length(?), string_byte_at(?).

8.1.109-123
No known changes.

8.1.125
May have changed the saved room settings format. Unconfirmed.

8.1.126-135
No known changes.
« Last Edit: August 28, 2011, 10:53:56 pm by IsmAvatar » Logged
Offline (Unknown gender) Medo42
Reply #1 Posted on: August 13, 2011, 10:57:06 am
Member
Joined: Aug 2011
Posts: 4

View Profile
I tried to confirm the character encoding changes by creating test files with GM 8.1.132 using unicode characters in every string field I found, and verifying that they were actually stored as UTF-8. As far as I can tell all string fields are assumed to be UTF-8 now, even things like filenames and extension names (and even when the extension name is read form the .gex, not the .gm81). There seems to be no marker to indicate whether UTF-8 or the old encoding strategy is used, which simply used the system default encoding.

I suggest always attempting to read source files with version 810 or above as if they contain UTF-8 strings, by turning the CHARSET constant of GmStreamDecoder into an actual instance field and setting it appropriately depending on file version. Writing these versions should be modified the same way, to mirror GMs own behaviour. Autodetecting the charset seems to be too much effort, considering that the result would likely be unreliable anyway.
Logged
Offline (Male) Rusky
Reply #2 Posted on: August 13, 2011, 11:13:44 am

Resident Troll
Joined: Feb 2008
Posts: 954
MSN Messenger - rpjohnst@gmail.com
View Profile WWW Email
Auto-detecting the charset would actually not be a problem with UTF-8- that's the beauty of that encoding. ANSI chars are already UTF-8 and anything else has some bits set to indicate that it's a longer code point.

The problem would come if someone used weird characters from their system encoding, which would have been unreliable in the first place.
Logged
Offline (Unknown gender) Medo42
Reply #3 Posted on: August 13, 2011, 11:30:37 am
Member
Joined: Aug 2011
Posts: 4

View Profile
Auto-detecting the charset would actually not be a problem with UTF-8- that's the beauty of that encoding. ANSI chars are already UTF-8 and anything else has some bits set to indicate that it's a longer code point.

The problem would come if someone used weird characters from their system encoding, which would have been unreliable in the first place.
If you assume that nobody uses non-ascii characters then it's not a problem to simply read as UTF-8 either, so you are basically saying that autodetection is easy as long as it's not needed anyway.
Logged
Offline (Unknown gender) luiscubal
Reply #4 Posted on: August 13, 2011, 01:54:10 pm
Member
Joined: Jun 2009
Posts: 452

View Profile Email
@Medo42 - Does the UTF-8 include the BOM(starting every UTF-8 string with 0xEF 0xBB 0xBF)? Probably not but just to be sure.

If not, should a BOM be added, does that break GM 8.1 compatibility?
Logged
Offline (Unknown gender) Medo42
Reply #5 Posted on: August 13, 2011, 03:22:09 pm
Member
Joined: Aug 2011
Posts: 4

View Profile
Sadly, there is no BOM at the start of strings, and I do believe that adding one could lead to problems - I didn't test it though. Even if it doesn't, it still won't help when opening files created by GM.
Logged
Offline (Male) Rusky
Reply #6 Posted on: August 13, 2011, 07:22:37 pm

Resident Troll
Joined: Feb 2008
Posts: 954
MSN Messenger - rpjohnst@gmail.com
View Profile WWW Email
Heh, technically that is what I said. I guess I was more trying to point out that before UTF-8, it was already unreliable to use non-ASCII characters in GM because they could change between systems. This means that autodetection is probably unnecessary with real-world GM files since people wouldn't have been able to use non-ASCII characters easily in the first place.

Also luiscubal- UTF-8 is byte-order independent. A BOM would be stupid.
Logged
Offline (Unknown gender) luiscubal
Reply #7 Posted on: August 14, 2011, 09:42:39 am
Member
Joined: Jun 2009
Posts: 452

View Profile Email
Stupid or not, there is an official BOM for UTF-8(I'm painfully aware of this since it epically screwed my PHP scripts and I spent a LONG time trying to figure out what was wrong). What I was wondering was if GM used because, in the very unlikely event that it did, it would simplify some things.

I think Medo42 is right on this one - mimic the behavior of the newer GM versions. It's not like LGM users would have any problem GM users would not.
Also, remember, this is something the new GM *fixed*. The old behavior was incredibly outdated (Unicode is not a feature: *not* having Unicode is a bug)
Logged
Offline (Male) Josh @ Dreamland
Reply #8 Posted on: August 14, 2011, 09:58:58 am

Prince of all Goldfish
Developer
Location: Pittsburgh, PA, USA
Joined: Feb 2008
Posts: 2950

View Profile Email
We have no say in the contents of the GM format. Those encodings will always be hard-coded until Yoyo collectively develops a cerebrum.

In the meantime, the EGM format can easily describe the encoding of the text it stores, whether needed on a per-game, per-object/script, or per-code-snippet basis. If you feel that is necessary, now is the time to tell us so we can include it in the first EGM spec. In that case, for scripts, we would add it to the YAML file that denotes the script file, while for events, we would include it in the object's EEF Descriptor.

Don't worry, though; I made the spec very extensible (it's largely zipped YAML), so we *can* add it in later versions anyway. I just think that'd be something to throw in from square one if we're going to run with it.
« Last Edit: August 14, 2011, 10:13:13 am by Josh @ Dreamland » Logged
"That is the single most cryptic piece of code I have ever seen." -Master PobbleWobble
"I disapprove of what you say, but I will defend to the death your right to say it." -Evelyn Beatrice Hall, Friends of Voltaire
Offline (Unknown gender) Medo42
Reply #9 Posted on: August 14, 2011, 10:22:32 am
Member
Joined: Aug 2011
Posts: 4

View Profile
I don't know anything about EGM, but I feel qualified to comment anyway: IMO, best stick with one fixed unicode encoding (probably UTF-8). I do not see any big advantage in allowing alternatives, and it would complicate working with the format.
Logged
Offline (Unknown gender) luiscubal
Reply #10 Posted on: August 14, 2011, 10:37:39 am
Member
Joined: Jun 2009
Posts: 452

View Profile Email
Agreed. UTF-8 is the way to go. The only time a different encoding should be used would be when importing or exporting Game Maker files from/to GM<8.1 (even GM 8.1 should be interpreted as UTF-8).
For 8.1 GM and EGM, UTF-8 should be used.
« Last Edit: August 14, 2011, 10:40:38 am by luiscubal » Logged
Offline (Male) Josh @ Dreamland
Reply #11 Posted on: August 14, 2011, 10:39:06 am

Prince of all Goldfish
Developer
Location: Pittsburgh, PA, USA
Joined: Feb 2008
Posts: 2950

View Profile Email
I've considered support for extended character sets by replacing the symbol with the concatenation of its unicode number and two dollar signs. For instance, if we opted to go all out with UTF-16, I'd replace `int piñas;' with `int pi$00F1$as;'. This would rely on a largely unsupported compiler feature, but it'd work for our purposes since all of our presently planned platforms are reached through some version of the GCC.

UTF-16 is, of course, serious overkill for the wants of 99% of our user base, so by default we'd certainly go with UTF-8. I was just wondering if it would be frugal to insert a delimiter up front. Like I said, though, we could always add support for other encodings in later.
Logged
"That is the single most cryptic piece of code I have ever seen." -Master PobbleWobble
"I disapprove of what you say, but I will defend to the death your right to say it." -Evelyn Beatrice Hall, Friends of Voltaire
Offline (Unknown gender) TheExDeus
Reply #12 Posted on: August 14, 2011, 11:06:29 am

Developer
Joined: Apr 2008
Posts: 1860

View Profile
How draw_text will support UTF-8? Do you plan to render a whole lot of chars to texture?
Logged
Offline (Male) Josh @ Dreamland
Reply #13 Posted on: August 14, 2011, 11:17:45 am

Prince of all Goldfish
Developer
Location: Pittsburgh, PA, USA
Joined: Feb 2008
Posts: 2950

View Profile Email
I'd rather have a draw_text_unicode. Anyway, we have a number of options.

1) Like Yoyo, we can allow selecting ranges of unicode chars to include in the font texture.
2) We can use TTF to load the glyphs into GL lists, and draw any glyphs not encompassed by those list ranges manually.
3) We can get creative, as many unicode glyphs are simply a letter with a symbol drawn over them.

While it would require flipping on antialiasing to look decent, (2) seems to be the most practical option. Perhaps we could merge (1) and (2), drawing any symbol we have in our texture as a sprite, and rendering the rest as primitives.
Logged
"That is the single most cryptic piece of code I have ever seen." -Master PobbleWobble
"I disapprove of what you say, but I will defend to the death your right to say it." -Evelyn Beatrice Hall, Friends of Voltaire
Offline (Unknown gender) luiscubal
Reply #14 Posted on: August 14, 2011, 01:06:39 pm
Member
Joined: Jun 2009
Posts: 452

View Profile Email
Quote
so by default we'd certainly go with UTF-8
If UTF-8 covers 100% of your use-cases, and is the best option for 99% of your audience, why even bother to have the *option* to go UTF-16?

Quote
I'd replace `int piñas;' with `int pi$00F1$as;'
Are you talking about file formats or compilation result? (this topic seems to have been mostly about file formats, but your comment about GCC seems more relevant in the context of compilation)
If you're talking about file formats, I say ignore all that junk and go all-out on UTF-8, which in worst case will result in some weird characters on obsolete versions of GM(so, yes, I'm suggesting that going UTF-8 might be acceptable even for gm6 and gmk).
If you're talking about compilation result, then feel free to do whatever you want. Your $$ suggestion seems a bit wasteful in terms of size, but whatever works for you without introducing weird bugs in the compiler is perfectly fine, since the compiled result is not something most people would see anyway.
« Last Edit: August 14, 2011, 01:08:22 pm by luiscubal » Logged
Pages: 1
  Print