ENIGMA Forums

General fluff => General ENIGMA => Topic started by: Goombert on April 09, 2014, 08:31:44 pm

Title: Unicode Fonts
Post by: Goombert on April 09, 2014, 08:31:44 pm
Well, as Harri pointed out to me, GM's Unicode font support is limited. How you ask? They use Windows only fonts such Arial Baltic because their IDE is written in Delphi and only for Windows, this is true even for Studio.
http://en.wikipedia.org/wiki/Arial#Code_page_variants
(http://i.imgur.com/9vq6ofL.png)

As many of you may have already noticed, LGM does not have these fonts available. Why? Well that's simple, again, GM's IDE relies on native Windows shit for fonts. Java applications and all non-Windows software, provide full Unicode support.

(http://i.imgur.com/Lt37mQS.png)

LGM and ENIGMA are capable of getting around this limitation imposed by GM and allowing you full access to all 2,147,483,647 (arbitrary limit imposed by Java int max value) characters supported by the Java/Unicode standard as depicted in the above screenshot. You will however be limited to no more than 1000 characters per sequence, meaning if you want 10,000 sequential unicode characters you will need at minimum 10 character ranges to achieve it with LateralGM, this is to stop the font editor from lagging with excessive ranges. This will require that you set the character ranges yourself, you can do this simply by entering the translated text and hitting preview which will auotmagically calculate the character ranges you have used.

The reason I am posting this topic is both to be informative and to ask everyones opinion. First of all, what do you think of this solution? It is more powerful than GM that limits you to 255 characters. They will also most likely remove the 255 character limit and go this route as well in the future when they do their new IDE. Thoughts, feedback, criticism is again always welcome I would appreciate feedback.

Edit: Studio v1.3 does in fact make this change, but does not yet remove Arial Baltic and such fonts!
Title: Re: Unicode Fonts
Post by: TheExDeus on April 10, 2014, 10:20:37 am
I personally think that this is basically as good as it can be. These ranges are necessary because we still render everything using textures and so it isn't practical to render all Unicode characters on a texture.

Maybe you can give download link for LGM so I can play with the font dialog? I know that it won't work right now with ENIGMA, but I don't need ENIGMA to run LGM.

We still need to implement this in ENIGMA, but as I mentioned to Robert, I am not entirely sure how. Maybe Josh will have a more solid idea.
Title: Re: Unicode Fonts
Post by: Goombert on April 10, 2014, 11:46:38 am
Here is the download for the soon to be released new plugin and LateralGM.

https://www.dropbox.com/s/wcr3mt8uf19i7ho/enigma.jar
https://www.dropbox.com/s/dnb7caeroqgcael/lateralgm.jar

NOTE: There was a bug with the recent files menu and spaces, basically LGM was delimiting the recent file paths with spaces which would obviously make file paths with spaces incompatible and I switched it to tab delimited, you may need to clear that menu and then close and reload LGM as this version addresses that issue.
Title: Re: Unicode Fonts
Post by: time-killer-games on April 10, 2014, 07:00:08 pm
If the infinity symbol is still out of range I'm going to be pissed.
Title: Re: Unicode Fonts
Post by: TheExDeus on April 11, 2014, 10:31:24 am
Quote
Here is the download for the soon to be released new plugin and LateralGM.
I see you have made a lot of other changes as well.

I also remembered a bug I keep forgetting. It should be relatively trivial to fix. When you load a sprite and want to set the offset with the mouse, the offset is set relative to top-left of the window, not the sprite. So when you press on the sprite you always get bottom-right (because that is what it is limited to). Basically the sprite preview window doesn't take the centered drawing into account and so you cannot effectively use the mouse to set the offset.

And for a suggestion: Can you make so that when you type in the font dialog box it filters out the fonts? I am sure the Java's widget already has an option like that, but it must somehow be enabled. Right now when you type in the font box (like "Times") it won't actually move the selection to "Times New Roman", so if you want to type in a font name, you must type the whole name from memory. Or scroll trough the fonts, which is hard as it has hundreds of fonts for me while it lists only 8 at a time.

Quote
If the infinity symbol is still out of range I'm going to be pissed.
It's there. :) As Robert mentioned Java supports 16bit Unicode and so it supports a lot of characters, but not the whole set. But it does depend on the font as well. For example, I cannot draw Japanese or Chinese characters with Arial font, while I can with some others.
Title: Re: Unicode Fonts
Post by: Goombert on April 11, 2014, 11:50:40 am
Quote from: TheExDeus
I also remembered a bug I keep forgetting. It should be relatively trivial to fix. When you load a sprite and want to set the offset with the mouse, the offset is set relative to top-left of the window, not the sprite. So when you press on the sprite you always get bottom-right (because that is what it is limited to). Basically the sprite preview window doesn't take the centered drawing into account and so you cannot effectively use the mouse to set the offset.

I actually just fixed that one. The problem was I switched the image previews for Background and Sprite to center aligned because I thought that looked nicer, but didn't account for it in the mouse coordinates.
https://github.com/IsmAvatar/LateralGM/commit/0f430a9c7a1e4f9dd03eb3b96204c27348d0e56e
This commit followed the former.
https://github.com/IsmAvatar/LateralGM/commit/1a45337891b32a4856781c25f0755d74d68d8921

Quote from: TheExDeus
And for a suggestion: Can you make so that when you type in the font dialog box it filters out the fonts? I am sure the Java's widget already has an option like that, but it must somehow be enabled.
Yes I wanted that too, I'll try to get it added.

Quote from: TheExDeus
It's there. :) As Robert mentioned Java supports 16bit Unicode and so it supports a lot of characters, but not the whole set. But it does depend on the font as well. For example, I cannot draw Japanese or Chinese characters with Arial font, while I can with some others.
Now as for the fonts, I need to recant. Our code editors are UTF-8 encoded, IsmAvatar did this a long time ago Josh said. I put the label there saying UTF-8 a little bit ago and I actually had no idea, but anyway, most code editors are UTF-8 encoded, I assume GM's to be as well because their file functions support no higher than UTF-8 as well.

So anyway, all of our strings including literals are already UTF-8 encoded, we just need to fix draw_text with the new LGM, and then we add multiple character ranges in a map to replace glyphstart glyphcount.
Title: Re: Unicode Fonts
Post by: TheExDeus on April 11, 2014, 06:10:02 pm
Quote
So anyway, all of our strings including literals are already UTF-8 encoded, we just need to fix draw_text with the new LGM, and then we add multiple character ranges in a map to replace glyphstart glyphcount.
But LGM's code editor's encoding has nothing to do with C++ std::string or it's capabilities. I don't see how std::string suddenly supports UTF-8.
Title: Re: Unicode Fonts
Post by: Goombert on April 11, 2014, 07:21:42 pm
(http://i.imgur.com/z3PF1JY.png)

As you can see in the above screenshot I have now got the UTF-8 text rendering working perfectly.

The commit link is below.
https://github.com/RobertBColton/enigma-dev/commit/41fc6e7fc9dabd450f27bc8a9fccf200d8ce6458
The change was committed to pull request #623
https://github.com/enigma-dev/enigma-dev/pull/683

As the commit message also states, the only thing left is to add the multiple character ranges to the engine, compiler, and plugin.

Quote from: TheExDeus
But LGM's code editor's encoding has nothing to do with C++ std::string or it's capabilities. I don't see how std::string suddenly supports UTF-8.
I don't know, Josh didn't explain it thoroughly to me, ask him, all I know is that it does.

Quote from: TKG
If the infinity symbol is still out of range I'm going to be pissed.
Yes of course you will be able to use that symbol.
Title: Re: Unicode Fonts
Post by: Josh @ Dreamland on April 11, 2014, 09:18:54 pm
Once again, let me forward the method I would use. Assuming there is an enigma::glyph class representing a font glyph:

Code: (cpp) [Select]
/** Copyright 2014 Josh Ventura
***
*** This file is a part of the ENIGMA Development Environment.
***
*** ENIGMA is free software: you can redistribute it and/or modify it under the
*** terms of the GNU General Public License as published by the Free Software
*** Foundation, version 3 of the license or any later version.
***
*** This application and its source code is distributed AS-IS, WITHOUT ANY
*** WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
*** FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
*** details.
***
*** You should have received a copy of the GNU General Public License along
*** with this code. If not, see <http://www.gnu.org/licenses/>
**/

class glyph_set {
  struct glyph_range: private vector<glyph*> {
    size_t firstindex;
    using vector<glyph*>::operator[];
    void swap(vector<glyph*>& x, size_t first) {
      firstindex = first;
      vector<glyph*>::swap(x);
    }
    glyph_range(size_t first, size_t count): vector<glyph*>(count), firstindex(first) {}
    glyph_range(): firstindex(0) {}
  };
 
  typedef map<size_t, glyph_range> gmap;
  gmap glyphs;
 
  public:
  glyph *operator[](size_t x) {
    gmap::iterator it = glyphs.lower_bound(x);
    if (it == glyphs.end() or x < it->second.firstindex)
      return NULL;
    return it->second[x - it->second.firstindex];
  }

  void store_range(size_t first, vector<glyph*> &inglyphs) {
    size_t last = first + inglyphs.size() - 1;
   
    #ifdef DEBUG_MODE
    bool overlap = (*this)[last];
    if (overlap)
      show_error("Glyph ranges overlap", 0);
    #endif
   
    glyphs[last].swap(inglyphs, first);
   
    #ifdef DEBUG_MODE
    if (!overlap && inglyphs.size())
      show_error("Something wicked has happened to the glyph map", 0);
    #endif
  }
};

Make sure that compiles in debug mode, too. It should compile in release mode. I haven't tested it in ENIGMA, obviously, but I see no reason it wouldn't work. You populate a vector of glyphs, then use store_range to add that vector from a given glyph index. You can then look up the glyph in the map as you would in a normal map.

The bonus is that it allows you to have dense ranges of glyphs which are themselves distributed sparsely. So you can have, eg, ASCII, Cyrillic, and cat smilies.
Title: Re: Unicode Fonts
Post by: time-killer-games on April 12, 2014, 12:13:33 pm
Quote from: TheExDeus
It's there. :) As Robert mentioned Java supports 16bit Unicode and so it supports a lot of characters, but not the whole set. But it does depend on the font as well. For example, I cannot draw Japanese or Chinese characters with Arial font, while I can with some others.
Yay! Now I'm anti-pissed! :D
Title: Re: Unicode Fonts
Post by: TheExDeus on April 12, 2014, 03:02:59 pm
Quote
I don't know, Josh didn't explain it thoroughly to me, ask him, all I know is that it does.
Well, I now see how it "supports UTF8":
Code: [Select]
static uint32_t getUnicodeCharacter(const string str, size_t& pos) {
uint32_t character = 0;
if (str[pos] & 0xC0) {
character = (((uint32_t)str[pos] & 0x1F) << 6);
for (size_t ii = 1; ii <= 6; ii++) {
if ((str[pos + ii] & 0xC0) != 0x80) { pos += ii - 1; break; }
character |= (str[pos + ii] & 0x3F);
}
} else {
character = (uint32_t)str[pos];
}
return character;
}
If you do it like this, then you might even just use a vector<unsigned char> or char array[] as well. Or anything really. You just take all the bytes used for the char and add them together. I do have slight doubts about the efficiency though as a for loop per char isn't the best way to do things. But I guess the first IF short circuits the regular ASCII, so if you use the regular ASCII it will work basically just as fast as before.
Title: Re: Unicode Fonts
Post by: Josh @ Dreamland on April 12, 2014, 05:08:12 pm
"Supports" is a strong word. UTF-8 is eight bits, so an std::string is all we need to store it. You still need special methods to get a character from a position.

I would offer to write a utf8 string implementation, but I'm afraid you two would start using it exclusively for everything.
Title: Re: Unicode Fonts
Post by: TheExDeus on April 12, 2014, 05:58:06 pm
Quote
I would offer to write a utf8 string implementation, but I'm afraid you two would start using it exclusively for everything.
But why wouldn't we? I don't think using several encodings are a good idea. If we use UTF-8, then we should use it everywhere. Besides, all the normal std::string member stuff will not work either way. Things like string_length() probably already broke with that change. So we might as well have a custom implementation for it.
Title: Re: Unicode Fonts
Post by: Goombert on April 12, 2014, 06:53:13 pm
Quote from: TheExDeus
So we might as well have a custom implementation for it.
No, they all work fine including the new string_length_utf8, GM Studio's on the other hand is still not working as of v1.3
Title: Re: Unicode Fonts
Post by: Josh @ Dreamland on April 12, 2014, 07:01:41 pm
For the same reason we currently offer two overloads of string_length: one accepts const char*, the other accepts std::string. If all you have is const char*, then length is O(N), but does not necessarily entail a copy. If we only accept std::string, a copy becomes necessary, so now we're N in complexity and memory.

In order to have a utf8_string whose complexity is the same as std::string (which is completely possible), I must keep TWO strings. The first is a string of at most 4N characters, where N is the length in characters of the string; this translates to N bytes. The second string is of size_ts and denotes the byte of each character. So it'll usually look like ⟨0, 1, 2, 3, ...⟩ or ⟨0, 2, 4, 6, ...⟩ but will often be much uglier. Primarily where other languages use English (ASCII) punctuation.

For your interest, I'll write the class. But I am disclaiming liability for slowdown from you two constructing one or more strings in addition to the simple ASCII strings you are usually asked to operate on.
Title: Re: Unicode Fonts
Post by: TheExDeus on April 13, 2014, 07:04:52 am
Quote
No, they all work fine including the new string_length_utf8, GM Studio's on the other hand is still not working as of v1.3
If by that you are implying to make a different function just for UTF-8, then that is exactly what I am afraid of. We DON'T need several functions for that. We should use UTF-8 everywhere and use one.

Quote
For the same reason we currently offer two overloads of string_length: one accepts const char*, the other accepts std::string. If all you have is const char*, then length is O(N), but does not necessarily entail a copy. If we only accept std::string, a copy becomes necessary, so now we're N in complexity and memory.
And const char* is only possible in ENIGMA if explicitly used right? Like "char array[20]; string_length(array);"? Even though the code for "string_length(const char* str)" seems a little fishy. It doesn't seem to check end of line characters or anything like that. It just checks if the value is not null.

Quote
In order to have a utf8_string whose complexity is the same as std::string (which is completely possible), I must keep TWO strings. The first is a string of at most 4N characters, where N is the length in characters of the string; this translates to N bytes. The second string is of size_ts and denotes the byte of each character. So it'll usually look like ⟨0, 1, 2, 3, ...⟩ or ⟨0, 2, 4, 6, ...⟩ but will often be much uglier. Primarily where other languages use English (ASCII) punctuation.
My idea was just to use UTF-16 or something which apparently has the best memory/speed tradeoff for most of earth's languages. But I guess it doesn't really matter. Most of the time (i.e. English) the complexity won't be large enough to cause real slowdowns. Maybe we could figure out a way to optimize that getUnicodeCharacter() though.

Quote
For your interest, I'll write the class. But I am disclaiming liability for slowdown from you two constructing one or more strings in addition to the simple ASCII strings you are usually asked to operate on.
I guess there is no need for that. I just wanted a way to differentiate that all std::string's in our code is actually UTF-8 encoded. Because by default str::string isn't meant to be. Just the same way I don't want two versions of all string manipulation functions.
Title: Re: Unicode Fonts
Post by: Rusky on April 13, 2014, 09:49:43 am
UTF-16 is a terrible idea. string_length on UTF-8 is slow and extremely complicated to write correctly. The majority of GM and ENIGMA games don't need Unicode for anything but random symbols. If you do want to use Unicode for other languages, GM's string interface is not the way to do it. An i18n interface would be better.
Title: Re: Unicode Fonts
Post by: TheExDeus on April 13, 2014, 10:54:30 am
Quote
string_length on UTF-8 is slow and extremely complicated to write correctly.
Not any slower than actually drawing the text. Another reason I thought we should use classes is because then we could just add a .length member to the string and calculate it once when creating/assigning the string. It wouldn't make the "string_length("This is a string!");" any faster, but it would make faster cases when using variables.

Quote
The majority of GM and ENIGMA games don't need Unicode for anything but random symbols.
And localizations.. you know, the main reason Unicode exists.

Quote
If you do want to use Unicode for other languages, GM's string interface is not the way to do it. An i18n interface would be better.
How would i18n be of any benefit here? We need a way not only to store the strings, but to also draw them. Either way we need some kind of UTF encoding. We could include additional functions for easier localization of course. But that is besides the topic here.
Title: Re: Unicode Fonts
Post by: Goombert on April 13, 2014, 04:47:35 pm
After my last commit it is now working perfect, TKG's infinity symbol is also working now. After a few more improvements to LGM and stuff we can look to getting this release rolled out to everyone.
(http://i.imgur.com/sw7ZuKN.png)