Pages: 1 2 »
  Print  
Author Topic: Unicode Fonts  (Read 3961 times)
Offline (Male) Goombert
Posted on: April 09, 2014, 08:31:44 PM

Developer
Location: Cappuccino, CA
Joined: Jan 2013
Posts: 3110

View Profile
Well, as Harri pointed out to me, GM's Unicode font support is limited. How you ask? They use Windows only fonts such Arial Baltic because their IDE is written in Delphi and only for Windows, this is true even for Studio.
http://en.wikipedia.org/wiki/Arial#Code_page_variants


As many of you may have already noticed, LGM does not have these fonts available. Why? Well that's simple, again, GM's IDE relies on native Windows shit for fonts. Java applications and all non-Windows software, provide full Unicode support.



LGM and ENIGMA are capable of getting around this limitation imposed by GM and allowing you full access to all 2,147,483,647 (arbitrary limit imposed by Java int max value) characters supported by the Java/Unicode standard as depicted in the above screenshot. You will however be limited to no more than 1000 characters per sequence, meaning if you want 10,000 sequential unicode characters you will need at minimum 10 character ranges to achieve it with LateralGM, this is to stop the font editor from lagging with excessive ranges. This will require that you set the character ranges yourself, you can do this simply by entering the translated text and hitting preview which will auotmagically calculate the character ranges you have used.

The reason I am posting this topic is both to be informative and to ask everyones opinion. First of all, what do you think of this solution? It is more powerful than GM that limits you to 255 characters. They will also most likely remove the 255 character limit and go this route as well in the future when they do their new IDE. Thoughts, feedback, criticism is again always welcome I would appreciate feedback.

Edit: Studio v1.3 does in fact make this change, but does not yet remove Arial Baltic and such fonts!
« Last Edit: April 09, 2014, 09:16:11 PM by Robert B Colton » Logged
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.

Offline (Unknown gender) TheExDeus
Reply #1 Posted on: April 10, 2014, 10:20:37 AM

Developer
Joined: Apr 2008
Posts: 1872

View Profile
I personally think that this is basically as good as it can be. These ranges are necessary because we still render everything using textures and so it isn't practical to render all Unicode characters on a texture.

Maybe you can give download link for LGM so I can play with the font dialog? I know that it won't work right now with ENIGMA, but I don't need ENIGMA to run LGM.

We still need to implement this in ENIGMA, but as I mentioned to Robert, I am not entirely sure how. Maybe Josh will have a more solid idea.
Logged
Offline (Male) Goombert
Reply #2 Posted on: April 10, 2014, 11:46:38 AM

Developer
Location: Cappuccino, CA
Joined: Jan 2013
Posts: 3110

View Profile
Here is the download for the soon to be released new plugin and LateralGM.

https://www.dropbox.com/s/wcr3mt8uf19i7ho/enigma.jar
https://www.dropbox.com/s/dnb7caeroqgcael/lateralgm.jar

NOTE: There was a bug with the recent files menu and spaces, basically LGM was delimiting the recent file paths with spaces which would obviously make file paths with spaces incompatible and I switched it to tab delimited, you may need to clear that menu and then close and reload LGM as this version addresses that issue.
Logged
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.

Offline (Male) time-killer-games
Reply #3 Posted on: April 10, 2014, 07:00:08 PM

Contributor
Location: Virginia Beach
Joined: Jan 2013
Posts: 1166

View Profile Email
If the infinity symbol is still out of range I'm going to be pissed.
Logged
Offline (Unknown gender) TheExDeus
Reply #4 Posted on: April 11, 2014, 10:31:24 AM

Developer
Joined: Apr 2008
Posts: 1872

View Profile
Quote
Here is the download for the soon to be released new plugin and LateralGM.
I see you have made a lot of other changes as well.

I also remembered a bug I keep forgetting. It should be relatively trivial to fix. When you load a sprite and want to set the offset with the mouse, the offset is set relative to top-left of the window, not the sprite. So when you press on the sprite you always get bottom-right (because that is what it is limited to). Basically the sprite preview window doesn't take the centered drawing into account and so you cannot effectively use the mouse to set the offset.

And for a suggestion: Can you make so that when you type in the font dialog box it filters out the fonts? I am sure the Java's widget already has an option like that, but it must somehow be enabled. Right now when you type in the font box (like "Times") it won't actually move the selection to "Times New Roman", so if you want to type in a font name, you must type the whole name from memory. Or scroll trough the fonts, which is hard as it has hundreds of fonts for me while it lists only 8 at a time.

Quote
If the infinity symbol is still out of range I'm going to be pissed.
It's there. :) As Robert mentioned Java supports 16bit Unicode and so it supports a lot of characters, but not the whole set. But it does depend on the font as well. For example, I cannot draw Japanese or Chinese characters with Arial font, while I can with some others.
Logged
Offline (Male) Goombert
Reply #5 Posted on: April 11, 2014, 11:50:40 AM

Developer
Location: Cappuccino, CA
Joined: Jan 2013
Posts: 3110

View Profile
Quote from: TheExDeus
I also remembered a bug I keep forgetting. It should be relatively trivial to fix. When you load a sprite and want to set the offset with the mouse, the offset is set relative to top-left of the window, not the sprite. So when you press on the sprite you always get bottom-right (because that is what it is limited to). Basically the sprite preview window doesn't take the centered drawing into account and so you cannot effectively use the mouse to set the offset.

I actually just fixed that one. The problem was I switched the image previews for Background and Sprite to center aligned because I thought that looked nicer, but didn't account for it in the mouse coordinates.
https://github.com/IsmAvatar/LateralGM/commit/0f430a9c7a1e4f9dd03eb3b96204c27348d0e56e
This commit followed the former.
https://github.com/IsmAvatar/LateralGM/commit/1a45337891b32a4856781c25f0755d74d68d8921

Quote from: TheExDeus
And for a suggestion: Can you make so that when you type in the font dialog box it filters out the fonts? I am sure the Java's widget already has an option like that, but it must somehow be enabled.
Yes I wanted that too, I'll try to get it added.

Quote from: TheExDeus
It's there. :) As Robert mentioned Java supports 16bit Unicode and so it supports a lot of characters, but not the whole set. But it does depend on the font as well. For example, I cannot draw Japanese or Chinese characters with Arial font, while I can with some others.
Now as for the fonts, I need to recant. Our code editors are UTF-8 encoded, IsmAvatar did this a long time ago Josh said. I put the label there saying UTF-8 a little bit ago and I actually had no idea, but anyway, most code editors are UTF-8 encoded, I assume GM's to be as well because their file functions support no higher than UTF-8 as well.

So anyway, all of our strings including literals are already UTF-8 encoded, we just need to fix draw_text with the new LGM, and then we add multiple character ranges in a map to replace glyphstart glyphcount.
Logged
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.

Offline (Unknown gender) TheExDeus
Reply #6 Posted on: April 11, 2014, 06:10:02 PM

Developer
Joined: Apr 2008
Posts: 1872

View Profile
Quote
So anyway, all of our strings including literals are already UTF-8 encoded, we just need to fix draw_text with the new LGM, and then we add multiple character ranges in a map to replace glyphstart glyphcount.
But LGM's code editor's encoding has nothing to do with C++ std::string or it's capabilities. I don't see how std::string suddenly supports UTF-8.
Logged
Offline (Male) Goombert
Reply #7 Posted on: April 11, 2014, 07:21:42 PM

Developer
Location: Cappuccino, CA
Joined: Jan 2013
Posts: 3110

View Profile


As you can see in the above screenshot I have now got the UTF-8 text rendering working perfectly.

The commit link is below.
https://github.com/RobertBColton/enigma-dev/commit/41fc6e7fc9dabd450f27bc8a9fccf200d8ce6458
The change was committed to pull request #623
https://github.com/enigma-dev/enigma-dev/pull/683

As the commit message also states, the only thing left is to add the multiple character ranges to the engine, compiler, and plugin.

Quote from: TheExDeus
But LGM's code editor's encoding has nothing to do with C++ std::string or it's capabilities. I don't see how std::string suddenly supports UTF-8.
I don't know, Josh didn't explain it thoroughly to me, ask him, all I know is that it does.

Quote from: TKG
If the infinity symbol is still out of range I'm going to be pissed.
Yes of course you will be able to use that symbol.
Logged
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.

Offline (Male) Josh @ Dreamland
Reply #8 Posted on: April 11, 2014, 09:18:54 PM

Prince of all Goldfish
Developer
Location: Pittsburgh, PA, USA
Joined: Feb 2008
Posts: 2958

View Profile Email
Once again, let me forward the method I would use. Assuming there is an enigma::glyph class representing a font glyph:

Code: (C++) [Select]
/** Copyright 2014 Josh Ventura
***
*** This file is a part of the ENIGMA Development Environment.
***
*** ENIGMA is free software: you can redistribute it and/or modify it under the
*** terms of the GNU General Public License as published by the Free Software
*** Foundation, version 3 of the license or any later version.
***
*** This application and its source code is distributed AS-IS, WITHOUT ANY
*** WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
*** FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
*** details.
***
*** You should have received a copy of the GNU General Public License along
*** with this code. If not, see <http://www.gnu.org/licenses/>
**/


class glyph_set {
  struct glyph_range: private vector<glyph*> {
    size_t firstindex;
    using vector<glyph*>::operator[];
    void swap(vector<glyph*>& x, size_t first) {
      firstindex = first;
      vector<glyph*>::swap(x);
    }
    glyph_range(size_t first, size_t count): vector<glyph*>(count), firstindex(first) {}
    glyph_range(): firstindex(0) {}
  };
 
  typedef map<size_t, glyph_range> gmap;
  gmap glyphs;
 
  public:
  glyph *operator[](size_t x) {
    gmap::iterator it = glyphs.lower_bound(x);
    if (it == glyphs.end() or x < it->second.firstindex)
      return NULL;
    return it->second[x - it->second.firstindex];
  }

  void store_range(size_t first, vector<glyph*> &inglyphs) {
    size_t last = first + inglyphs.size() - 1;
   
    #ifdef DEBUG_MODE
    bool overlap = (*this)[last];
    if (overlap)
      show_error("Glyph ranges overlap", 0);
    #endif
   
    glyphs[last].swap(inglyphs, first);
   
    #ifdef DEBUG_MODE
    if (!overlap && inglyphs.size())
      show_error("Something wicked has happened to the glyph map", 0);
    #endif
  }
};

Make sure that compiles in debug mode, too. It should compile in release mode. I haven't tested it in ENIGMA, obviously, but I see no reason it wouldn't work. You populate a vector of glyphs, then use store_range to add that vector from a given glyph index. You can then look up the glyph in the map as you would in a normal map.

The bonus is that it allows you to have dense ranges of glyphs which are themselves distributed sparsely. So you can have, eg, ASCII, Cyrillic, and cat smilies.
« Last Edit: April 11, 2014, 09:20:31 PM by Josh @ Dreamland » Logged
"That is the single most cryptic piece of code I have ever seen." -Master PobbleWobble
"I disapprove of what you say, but I will defend to the death your right to say it." -Evelyn Beatrice Hall, Friends of Voltaire
Offline (Male) time-killer-games
Reply #9 Posted on: April 12, 2014, 12:13:33 PM

Contributor
Location: Virginia Beach
Joined: Jan 2013
Posts: 1166

View Profile Email
Quote from: TheExDeus
It's there. :) As Robert mentioned Java supports 16bit Unicode and so it supports a lot of characters, but not the whole set. But it does depend on the font as well. For example, I cannot draw Japanese or Chinese characters with Arial font, while I can with some others.
Yay! Now I'm anti-pissed! :D
Logged
Offline (Unknown gender) TheExDeus
Reply #10 Posted on: April 12, 2014, 03:02:59 PM

Developer
Joined: Apr 2008
Posts: 1872

View Profile
Quote
I don't know, Josh didn't explain it thoroughly to me, ask him, all I know is that it does.
Well, I now see how it "supports UTF8":
Code: [Select]
static uint32_t getUnicodeCharacter(const string str, size_t& pos) {
uint32_t character = 0;
if (str[pos] & 0xC0) {
character = (((uint32_t)str[pos] & 0x1F) << 6);
for (size_t ii = 1; ii <= 6; ii++) {
if ((str[pos + ii] & 0xC0) != 0x80) { pos += ii - 1; break; }
character |= (str[pos + ii] & 0x3F);
}
} else {
character = (uint32_t)str[pos];
}
return character;
}
If you do it like this, then you might even just use a vector<unsigned char> or char array[] as well. Or anything really. You just take all the bytes used for the char and add them together. I do have slight doubts about the efficiency though as a for loop per char isn't the best way to do things. But I guess the first IF short circuits the regular ASCII, so if you use the regular ASCII it will work basically just as fast as before.
Logged
Offline (Male) Josh @ Dreamland
Reply #11 Posted on: April 12, 2014, 05:08:12 PM

Prince of all Goldfish
Developer
Location: Pittsburgh, PA, USA
Joined: Feb 2008
Posts: 2958

View Profile Email
"Supports" is a strong word. UTF-8 is eight bits, so an std::string is all we need to store it. You still need special methods to get a character from a position.

I would offer to write a utf8 string implementation, but I'm afraid you two would start using it exclusively for everything.
Logged
"That is the single most cryptic piece of code I have ever seen." -Master PobbleWobble
"I disapprove of what you say, but I will defend to the death your right to say it." -Evelyn Beatrice Hall, Friends of Voltaire
Offline (Unknown gender) TheExDeus
Reply #12 Posted on: April 12, 2014, 05:58:06 PM

Developer
Joined: Apr 2008
Posts: 1872

View Profile
Quote
I would offer to write a utf8 string implementation, but I'm afraid you two would start using it exclusively for everything.
But why wouldn't we? I don't think using several encodings are a good idea. If we use UTF-8, then we should use it everywhere. Besides, all the normal std::string member stuff will not work either way. Things like string_length() probably already broke with that change. So we might as well have a custom implementation for it.
« Last Edit: April 12, 2014, 06:02:02 PM by TheExDeus » Logged
Offline (Male) Goombert
Reply #13 Posted on: April 12, 2014, 06:53:13 PM

Developer
Location: Cappuccino, CA
Joined: Jan 2013
Posts: 3110

View Profile
Quote from: TheExDeus
So we might as well have a custom implementation for it.
No, they all work fine including the new string_length_utf8, GM Studio's on the other hand is still not working as of v1.3
Logged
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.

Offline (Male) Josh @ Dreamland
Reply #14 Posted on: April 12, 2014, 07:01:41 PM

Prince of all Goldfish
Developer
Location: Pittsburgh, PA, USA
Joined: Feb 2008
Posts: 2958

View Profile Email
For the same reason we currently offer two overloads of string_length: one accepts const char*, the other accepts std::string. If all you have is const char*, then length is O(N), but does not necessarily entail a copy. If we only accept std::string, a copy becomes necessary, so now we're N in complexity and memory.

In order to have a utf8_string whose complexity is the same as std::string (which is completely possible), I must keep TWO strings. The first is a string of at most 4N characters, where N is the length in characters of the string; this translates to N bytes. The second string is of size_ts and denotes the byte of each character. So it'll usually look like ⟨0, 1, 2, 3, ...⟩ or ⟨0, 2, 4, 6, ...⟩ but will often be much uglier. Primarily where other languages use English (ASCII) punctuation.

For your interest, I'll write the class. But I am disclaiming liability for slowdown from you two constructing one or more strings in addition to the simple ASCII strings you are usually asked to operate on.
Logged
"That is the single most cryptic piece of code I have ever seen." -Master PobbleWobble
"I disapprove of what you say, but I will defend to the death your right to say it." -Evelyn Beatrice Hall, Friends of Voltaire
Pages: 1 2 »
  Print