Pages: 1 2 3 »
  Print  
Author Topic: GL3 changes from immediate to retained mode  (Read 8061 times)
Offline (Unknown gender) TheExDeus
Posted on: July 30, 2013, 11:44:34 AM

Developer
Joined: Apr 2008
Posts: 1872

View Profile
Hi! I think I made a topic about this a long time ago, but lets do it again. We have GL3 for some time now, but most of the things are still rendered in immediate mode. So I propose changing all the drawing functions not to use it. The problem is with caching, if we even decide use it. Right now we render a sprite like so:
Code: [Select]
void draw_sprite(int spr, int subimg, gs_scalar x, gs_scalar y)
{
    get_spritev(spr2d,spr);
    const int usi = subimg >= 0 ? (subimg % spr2d->subcount) : int(((enigma::object_graphics*)enigma::instance_event_iterator->inst)->image_index) % spr2d->subcount;
    texture_use(GmTextures[spr2d->texturearray[usi]]->gltex);

    glPushAttrib(GL_CURRENT_BIT);
    glColor4f(1,1,1,1);

    const float tbx = spr2d->texbordxarray[usi], tby = spr2d->texbordyarray[usi],
                xvert1 = x-spr2d->xoffset, xvert2 = xvert1 + spr2d->width,
                yvert1 = y-spr2d->yoffset, yvert2 = yvert1 + spr2d->height;

    glBegin(GL_QUADS);
    glTexCoord2f(0,0);
    glVertex2f(xvert1,yvert1);
    glTexCoord2f(tbx,0);
    glVertex2f(xvert2,yvert1);
    glTexCoord2f(tbx,tby);
    glVertex2f(xvert2,yvert2);
    glTexCoord2f(0,tby);
    glVertex2f(xvert1,yvert2);
    glEnd();

glPopAttrib();
}
This means in immediate mode it sends vertices one by one and is bad and slow and deprecated. The change would be using VAO's or VBO's which are sadly for more static geometry. It requires rebuilding the buffer all the time before drawing. So if just do this:
Code: [Select]
void draw_sprite(int spr, int subimg, gs_scalar x, gs_scalar y)
{
    get_spritev(spr2d,spr);
    const int usi = subimg >= 0 ? (subimg % spr2d->subcount) : int(((enigma::object_graphics*)enigma::instance_event_iterator->inst)->image_index) % spr2d->subcount;
    texture_use(GmTextures[spr2d->texturearray[usi]]->gltex);

    const float tbx = spr2d->texbordxarray[usi], tby = spr2d->texbordyarray[usi],
        xvert1 = x-spr2d->xoffset, xvert2 = xvert1 + spr2d->width,
        yvert1 = y-spr2d->yoffset, yvert2 = yvert1 + spr2d->height;

    float data[][7] = {
       {  xvert1, yvert1, 0.0, 0.0, 1.0, 1.0, 1.0  },
       {  xvert2, yvert1, tbx, 0.0, 1.0, 1.0, 1.0  },
       {  xvert2, yvert2, tbx, tby, 1.0, 1.0, 1.0  },

       {  xvert2, yvert2, tbx, tby, 1.0, 1.0, 1.0  },
       {  xvert1, yvert2, 0.0, tby, 1.0, 1.0, 1.0  },
       {  xvert1, yvert1, 0.0, 0.0, 1.0, 1.0, 1.0  }
    };

    GLuint spriteVBO;
    glGenBuffers(1, &spriteVBO);
    glEnableClientState(GL_VERTEX_ARRAY);
    glEnableClientState(GL_TEXTURE_COORD_ARRAY);
    glEnableClientState(GL_COLOR_ARRAY);

    glBindBuffer(GL_ARRAY_BUFFER, spriteVBO);
    glBufferData(GL_ARRAY_BUFFER, sizeof(data), data, GL_DYNAMIC_DRAW);
    glVertexPointer( 2, GL_FLOAT, sizeof(float) * 7, NULL );
    glTexCoordPointer( 2, GL_FLOAT, sizeof(float) * 7, (void*)(sizeof(float) * 2) );
    glColorPointer( 3, GL_FLOAT, sizeof(float) * 7, (void*)(sizeof(float) * 4) );

    glDrawArrays( GL_TRIANGLES, 0, 6);

    glDisableClientState( GL_COLOR_ARRAY );
    glDisableClientState( GL_TEXTURE_COORD_ARRAY );
    glDisableClientState( GL_VERTEX_ARRAY );

    glDeleteBuffers(1, &spriteVBO);
}
Then we will be using VBO's, but because we rebuild both the VBO and the data, then it ends up A LOT slower. So does anyone have any ideas on how we could buffer this? Originally I thought if it could be possible to assign some ID for each draw call which could be used throughout frames? So it could be possible to draw the same sprite if no such things like sprite index or position has changed? Or that even if we could create 1 VBO per sprite and then just rebuild the data per render, then it would be a lot faster? But I just did some tests and even if I only call:
Code: [Select]
glBufferData(GL_ARRAY_BUFFER, sizeof(data), data, GL_DYNAMIC_DRAW);
glDrawArrays( GL_TRIANGLES, 0, 6);
Then it is still a lot slower then immediate mode. The problem I guess is that we need to batch sprites into one VBO. But it cannot be painlessly done with all the dynamic things we have. I tried a quick hack by using a global vbo and then populating it in draw_sprite() just before drawing and I got a 3 times boost over immediate mode (though I seemed to get capped at exactly 100-101FPS, which mean it maybe was more, but for some reason I got vsyn'ced). That means I drawn 25k objects together with their logic (simple "bounce against walls" logic) and I got 30FPS with immediate mode and 100FPS with global VBO. 100k objects were 9fps for immediate and stable 25FPS with VBO. But I tested without drawing and found that I was actually capped at 33FPS by my use of a vector (I pushed 6*7 values for each sprite and there were 100k of them). Dunno how to improve that much though. After reserving and using manual counter (so no need for clear, just overwrite) I got to 44FPS (which was 30FPS with drawing).

So basically what I propose is this:
1) Have 1 global VBO.
2) In all drawing functions we populate this FBO with x,y,tx,ty,r,g,b,a and do that until texture_use() fails (eg, when the currently used texture is not the same as the requested one) at which point we draw the VBO and clear it.
3) Bind the new texture and repeat.

Advantages:
1) This way we will batch as much as we can before drawing and yet have the possibility to use different drawing functions (even sprite and background) interchangeably.
2) When we add sprite packing (or more precisely texture packing), then we will have a massive speed boost without changing any drawing functions. This is because we push the texture coordinates and render only when texture changes. So less texture changes means more batching.
3) Tiles would automatically be batched (usually), because calling draw_background_ext_transformed like previously would automatically make them be added to the same VBO (if the same tilestrip is used which often is). Right now it seems some GLLists are made and populated, by I think that is slower (especially when many glBegin and glEnd functions are used per tile). Of course remaking the tile system for 1 VBO per layer could maybe be better and speed the whole thing up (but will take more memory).
4) Port to GLES (Android and such) would be a lot simpler, as it doesn't support immediate mode and requires the app to basically be GL3 (so no gl transformation functions either). So we must push towards that for easier maintenance and compatibility.

Disadvantages:
1) If a lot of texture switching happens (like having two objects with the same depth and be created interleaved with one another, so the draw event is called interleaved as well)) then there will be a performance impact. On a game with few hundred sprites it will probably not be seen, but with thousands of sprites the impact could be noticeable. The good thing is that things like depth changes would reduce the impact. As well as texture packing.

note: Functions like glEnableClientState and such are actually also deprecated. Now all of that has to happen on a vertex shader. I plan to test that too and maybe implement it that way. But this global VBO thing is a lot simpler and could potentially give a lot of speed.

So, any ideas?

edit: By replacing glBufferData with glBufferSubData I got to 36FPS with 100k objects, but this won't be possible in the implementation mentioned here (as the size will change all the time depending on how many sprites are drawn and how many texture swaps happen). But with a much smaller VBO the impact of that function will not be so great. It is even recommended to use several smaller VBO's than one big one anyway.
« Last Edit: July 31, 2013, 04:53:23 AM by TheExDeus » Logged
Offline (Male) Josh @ Dreamland
Reply #1 Posted on: July 31, 2013, 07:25:33 AM

Prince of all Goldfish
Developer
Location: Pittsburgh, PA, USA
Joined: Feb 2008
Posts: 2950

View Profile Email
The point of the GL3 graphics system is to assume that the hardware supports VBOs and shaders, as opposed to call lists and matrices.

This has been planned for a loooong time, but has only recently begun being implemented.

Purging immediate mode from GL3 is certainly a goal. Texture batching is also an interest, if possible.
Logged
"That is the single most cryptic piece of code I have ever seen." -Master PobbleWobble
"I disapprove of what you say, but I will defend to the death your right to say it." -Evelyn Beatrice Hall, Friends of Voltaire
Offline (Male) Goombert
Reply #2 Posted on: July 31, 2013, 07:34:13 AM

Developer
Location: Cappuccino, CA
Joined: Jan 2013
Posts: 3107

View Profile
Yes, these were all the things I was originally planning to do with OpenGL 3. But Harri, for that I was thinking of a common interface for vertex formats of all the basic shapes like a plane and what not, and include them from a common header, thats what those GLshapes.cpp and GL3shapes.cpp files are about. I just didn't have the energy to do it. Also, that is why there's a shaders folder, we need to rewrote all the behavior expected into GPU programs as well. Especially for particle effects.
Logged
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.

Offline (Unknown gender) TheExDeus
Reply #3 Posted on: July 31, 2013, 08:46:52 AM

Developer
Joined: Apr 2008
Posts: 1872

View Profile
Quote
This has been planned for a loooong time, but has only recently begun being implemented.
Well I have been thinking about this for a long time as well. The problem is that it is not that straight forward. If you create your own GL project then it is easy to batch things as you do all the logic quite differently and you can decide what is going to be static and what dynamic. With GM way of doing things (and now ENIGMA of course) it is a lot harder, because you can do this:
Code: [Select]
draw_sprite(spr_0,0,x,y);
draw_background(back_0,10,10);
draw_line(10,10,250,300);
draw_sprite(spr_0,0,60,10);
Which cannot be straightforwardly batched. Even if background and sprite functions draw a static image (so the x and y are static), they are still considered dynamic and so must be either redrawn every time via immediate mode (like now) or batched to dynamic VBO (like it is proposed here). If we had sprite packing then at least this simple code wouldn't call a texture rebind (and in turn VBO regen), but it still wouldn't be as good as it could be. Like if we could have a way to figure out if the drawn images are static or not (like if none of the arguments are variables), then it could be possible to batch them together in a different VBO which would be reused and never regenerated. The problem though is that the static image can be inside an if(){} or that draw ordering would break. Some of the things could be fixed by some extreme analysis of the code at compile time, but I can't even imagine what the thinking that would require. I think JDI already returns many things about the functions, so it could be possible. And the draw_line() breaks the whole thing even further. I guess we can have a separate vector which whold drawing mode or even textures. Then just push everything into a single VBO and then bind different textures and draw only part of the VBO via the glDrawElements().

Quote
But Harri, for that I was thinking of a common interface for vertex formats of all the basic shapes like a plane and what not, and include them from a common header, thats what those GLshapes.cpp and GL3shapes.cpp files are about.
Can you explain in more detail? Did you mean that you though a common shape functions that return vertices or something? Like vert_plane(x,y,w,h,r) which would push to a common vertex array a rotated plane?
« Last Edit: July 31, 2013, 08:53:39 AM by TheExDeus » Logged
Offline (Male) Goombert
Reply #4 Posted on: July 31, 2013, 09:21:42 AM

Developer
Location: Cappuccino, CA
Joined: Jan 2013
Posts: 3107

View Profile
Harri, yup, exactly what I was thinking.
Logged
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.

Offline (Unknown gender) TheExDeus
Reply #5 Posted on: July 31, 2013, 10:03:16 AM

Developer
Joined: Apr 2008
Posts: 1872

View Profile
Then I don't get that at all. Taking into a account that some things needs to be rotated and some not, then I think it will be better if I just populate the thing inside drawing functions. Though I guess if it is done this way, then it will be easier to change formats later on. I will investigate.
Logged
Offline (Male) Josh @ Dreamland
Reply #6 Posted on: July 31, 2013, 10:08:22 AM

Prince of all Goldfish
Developer
Location: Pittsburgh, PA, USA
Joined: Feb 2008
Posts: 2950

View Profile Email
That's why I added that "if possible" clause to batching. It's hard to do batching when people layer sprites at the current depth, and intermix texture calls with untextured calls.

Fortunately, we can do some hackery in the compiler to help with that. We have a few options, which I propose we support as options in full:
  • Undefine the behavior of intermixed calls. We can add an option to treat depths as the only logical barrier for batching. This means that if you draw, as I mentioned in Robert's announcement earlier, an arm, a torso, then another arm, you might end up with both arms being in front of or behind the torso instead of sandwiched. For some games, this behavior is unacceptable—for others, this isn't an issue at all. Giving the option of drawing sprites immediately or batching them is a relatively simple operation which will have little impact on code size.
  • Place batching barriers in parallel. The compiler can register with the built-in sprite batching class that a draw event has started, and that a draw event has completed. This information can be used by the batch as a heuristic for what order to bind textures in. Barriers will be created not for each object, but instead for each time the user implicitly switches modes or textures. Examples below.
  • Do code profiling in a special mode. If ENIGMA has a Profile mode down the road, the sprite batch unit can denote, as a map of pairs, the texture IDs most commonly switched between. These pair counts can be used in generating texture atlases to avoid rebinding altogether, even when drawing sprites intermittently.

None of these options, alone, will solve the problem, but you can imagine that together these are extremely powerful options. Let me elaborate on points (2) and (3).

2. Parallel batch barriers
Say we have three events which are run during the game.
Code: (EDL) [Select]
draw_sprite(spr_wing_bottom, 0, x, y);
draw_sprite(spr_bird, 0, x, y);
draw_sprite(spr_wing_top, 0, x, y);
Code: (EDL) [Select]
draw_sprite(spr_fire, -1, x, y);
draw_sprite(spr_wing_bottom, 0, x, y);
draw_sprite(spr_firebird, 0, x, y);
draw_sprite(spr_wing_top, 0, x, y);
Code: (EDL) [Select]
draw_circle_color(x, y, 64, c_white, c_red, false);
draw_sprite(spr_wing_bottom, 0, x, y);
draw_rectangle_color(x-16, y-16, x+16, y+16, c1, c2, c3, c4);
draw_sprite(spr_wing_top, 0, x, y);

Our batch mechanism would work by keeping a list, in order, of each type of sprite, line, ...whatever needs drawn. At the beginning of each draw event would be batch_chunk_start(), at the end would be batch_chunk_end().

0. A list of batch jobs is created, and is initially empty.
1. The batch_chunk_start() method moves the head position to the beginning of the list.
2. Each time the user tries to draw something, the head advances until a batch job of that type is encountered.
3. If no batch job of that type is encountered, the head is moved back where it was, and a new batch job is inserted there.
4. The batch_chunk_end() method doesn't do anything except maybe a check in debug mode.

By the above process, the batch jobs generated for the above codes, in order, will be as follows (assuming the codes are first encountered in the order given above and then in any sequence for repetition):
  • Draw all circle_color.
  • Draw all spr_fire.
  • Draw all spr_wing_bottom.
  • Draw all rectangle_color.
  • Draw all spr_firebird.
  • Draw all spr_bird.
  • Draw all spr_wing_top

The worst case for this batch algorithm is when every object draws everything uniquely or in reverse order of another object which already drew it. The issue is that in this system, everything must be batchable, or have a batch node. Every. Single. Draw. Function.

3. Profiling
To improve further on the above, code profiling can be done by creating texture pairs as described. With our batch class in place, the pairs generated will be (spr_fire, spr_wing_bottom), (spr_wing_bottom, spr_firebird), (spr_firebird, spr_bird), (spr_bird, spr_wing_top). A very complicated (relatively speaking—I mean in terms of runtime complexity rather than in difficulty) algorithm would then decide the best arrangement for these sprites. An obvious answer (aside from put them all on the same sheet) is to arrange them so that spr_fire and spr_wing_bottom are on one atlas, and spr_firebird, spr_bird, and spr_wing_top are on another. The point is to minimize the number of texture switches in a batched or unbatched environment; for more complicated games, where these transitions will not be made 1:1 by the batch tool, the profiling will come in handy to a much higher degree.
Logged
"That is the single most cryptic piece of code I have ever seen." -Master PobbleWobble
"I disapprove of what you say, but I will defend to the death your right to say it." -Evelyn Beatrice Hall, Friends of Voltaire
Offline (Unknown gender) TheExDeus
Reply #7 Posted on: July 31, 2013, 12:35:01 PM

Developer
Joined: Apr 2008
Posts: 1872

View Profile
I don't think taking away drawing order management other than depth is such a good thing. I always rely on the drawing order for hud drawing and now it would either require shit ton of objects or changing depth mid draw, which I don't think would work in your case (or it would just call the buffer to be drawn and reset?), like so:
Code: [Select]
depth = 0;
draw_sprite(spr_wing_bottom, 0, x, y);
draw_sprite(spr_bird, 0, x, y);
depth = 1;
draw_sprite(spr_wing_top, 0, x, y);
Also note how I changed the depth in reverse order.

I now implemented a system which could be something like the system in the final version. These are the results. In the best case (0 texture switching) and drawing 50k sprites I get this:
Code: [Select]
int i = 0;
var inst;
repeat (50000){
inst = instance_create(random(room_width),50+random(room_height-50),obj_0);
inst.spr = (i%1==0?spr_0:spr_1);
++i;
}

Without batching (like GL1) gives 18FPS, so it's a speed increase of 400%. Now the worst case:
Code: [Select]
int i = 0;
var inst;
repeat (50000){
inst = instance_create(random(room_width),50+random(room_height-50),obj_0);
inst.spr = (i%2==0?spr_0:spr_1);
++i;
}

Without batching gives 14FPS (so a slight decrease because of texture switching), but the VBO is 230% slower here. In this case I have 50k sprites, but two different are draw (25k each) and as they are created intermittent, then they are rendered as such as well. This means texture switch happens for every draw_sprite and thus VBO flushing as well.

So some thoughts and questions:
1) Thus this seem acceptable to be committed? So for the worst case this could be a step back performance wise, but some points to consider:
    * Normally you don't render this many sprites like this. If you render thousands of sprites then they are for things like particles, which in this case would batch perfectly.
    * This runs fast enough for 500 and even 5000 (60 fps) worst case sprites, so it shouldn't impact any current game.
    * Worst case almost never happens (tm).
2) All sprite functions (like draw_sprite, draw_sprite_ext, _transformed etc.) are rendered together.
3) This clearly shows we need to use a texture atlas. Some thoughts:
    * Do we make it runtime or compile time? At runtime it would be better because we could pack also when using sprite_add() functions, but that would require a loading screen (as startup will get slower). We could also have a middle ground where all the compile time resources are packed at compile time (like fonts are now) and runtime sprite_add() packs at runtime. I would love some help implementing this.
    * Do we pack sprite and background resources together? As texture wise there is no difference, then I suggest we do.
    * How to select packing size? At runtime we could provide a function which allows the user to choose size (like 1024x1024, 4096x4096 etc.), but at compile time it might require either a macro or an option in ENIGMA settings.
    * If we do it compile time, then do with make it universal or tied to a graphics system? I think it should work if it is universal.
4) When this is drawn using shaders instead of glDrawElements(), then it could be faster.
« Last Edit: July 31, 2013, 12:36:49 PM by TheExDeus » Logged
Offline (Male) Josh @ Dreamland
Reply #8 Posted on: July 31, 2013, 12:39:54 PM

Prince of all Goldfish
Developer
Location: Pittsburgh, PA, USA
Joined: Feb 2008
Posts: 2950

View Profile Email
Quote
I don't think taking away drawing order management other than depth is such a good thing.
The proposal I gave above in (2) doesn't do that. It emulates it perfectly, with speedup in your "worst case" comparable to speedup in the best case. The only thing it takes away is instance ID behaving as a secondary depth. The order of those drawings is deterministic, but different, and should not differ in any meaningful way.

In my method, an object that draws spr1, then spr2, then spr3 will behave like this:

Code: (EDL) [Select]
with (obj_0)
  draw_sprite(spr_1);
with (obj_0)
  draw_sprite(spr_2);
with (obj_0)
  draw_sprite(spr_3);

Instead of like this:
Code: (EDL) [Select]
with (obj_0)
  draw_sprite(spr_1),
  draw_sprite(spr_2),
  draw_sprite(spr_3);
And dynamically changing to that behavior is basically trivial.


That said, go ahead and commit what you have for now, as improvement is improvement, and your solution is much less involved than mine.
« Last Edit: July 31, 2013, 12:43:45 PM by Josh @ Dreamland » Logged
"That is the single most cryptic piece of code I have ever seen." -Master PobbleWobble
"I disapprove of what you say, but I will defend to the death your right to say it." -Evelyn Beatrice Hall, Friends of Voltaire
Offline (Unknown gender) TheExDeus
Reply #9 Posted on: July 31, 2013, 02:43:11 PM

Developer
Joined: Apr 2008
Posts: 1872

View Profile
Quote
The proposal I gave above in (2) doesn't do that.
Well your example ordered 7 batches and if they are rendered in that order, then the output will differ. And if you draw a hud then that can be a difference between having text on the background or the background on the text (or even a player on the text).

Quote
In my method, an object that draws spr1, then spr2, then spr3 will behave like this:
I don't see how that would change much. The slowdown now happens only when switching textures. That means I can draw 20 objects with different depth and ID order and still get only 1 VBO if they all draw the same thing. If the thing differs, then you must switch texture and do the same thing again. So in that spr_1, spr_2, spr_3 example it would take the same amount of time whatever you do. Even if you use 1 VBO for each or 1 global one (of course it will work faster with 1 global one). By my testing it seems that you must render about 10 things in batch to have any speed gain over immediate mode. So what we really need is texture atlas. Any comments on that (the compile time vs runtime)?
Logged
Offline (Male) Goombert
Reply #10 Posted on: July 31, 2013, 05:41:18 PM

Developer
Location: Cappuccino, CA
Joined: Jan 2013
Posts: 3107

View Profile
Code: [Select]
d3d_transform_set_identity();
d3d_transform_add_rotation_x(90);
draw_sprite(spr_wall, 0, 0);

:P

That is possible in Game Maker and using the DX batching class. DX can also outperform this with different textured sprites, I presume because of batching, and it must be mixing it with a shader. If we add all that to the OpenGL one Harri committed, we could make it a LOT faster than what he has right now.
« Last Edit: July 31, 2013, 05:58:03 PM by Robert B Colton » Logged
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.

Offline (Male) Josh @ Dreamland
Reply #11 Posted on: July 31, 2013, 06:15:19 PM

Prince of all Goldfish
Developer
Location: Pittsburgh, PA, USA
Joined: Feb 2008
Posts: 2950

View Profile Email
Quote
Well your example ordered 7 batches and if they are rendered in that order, then the output will differ. And if you draw a hud then that can be a difference between having text on the background or the background on the text (or even a player on the text).
The output will only differ if individual objects overlap. The correct draw order for everything drawn inside each object's draw event is preserved. So if you draw a square sprite, then a circle sprite over it, the corner of the square sprite for one object will not be able to overlap the circle sprite of another object, because all square sprites are drawn at the same time, and all circle sprites are drawn after.

This problem in fact manifests anywhere multiple sprite draws are used to create one sprite, such as for attaching a hat, or armor, or other equipables to a character sprite. In this case, other objects on that depth which are drawn in the same way would not overlap correctly. Consider two characters with these equipables standing such that they overlap each other. This would cause one character to appear to have all equipables drawn on him, while the body of the other character remains behind him. This is an unfortunate,  but rare, side effect of the conversion. Proper texture atlasing would fix that.

In your example above, Harri, the bombs would all be drawn under the nuclear signs. Assuming spr_0 is the bomb. The disaster case for my algorithm is doing all that drawing in a single loop instead of at one depth.

Quote
I don't see how that would change much. The slowdown now happens only when switching textures.
This is exactly what my method avoids, Harri. It does this by batching sprites of the same texture together under strict conditions. How are you not noticing a difference between those two codes? I think you missed the point of what I was saying. The bottom code demonstrates the original, un-batched behavior, when ENIGMA asks each object to perform its draw event. It requires 3n texture binds, where n is the number of instances. The top code shows how the batching algorithm refactors the code to look; it requires only three texture binds, regardless of how many instances there are. All sprites with spr_1's texture are drawn in batch. Then all sprites for spr_2, then all sprites for spr_3. The order is determined from the order they are drawn in the code, so it will look identical to the original except in cases of one-sprite overlap (described above).

As for texture aliasing, I cannot think of an efficient way to do this at run time. I think our best option is to allow the user to specify groups of sprites for texture atlases, then atlas those atlases together according to profiler data from a special compilation mode.

Again in your example, method (3) from my post would return the tuples (spr_0, spr_1) with 25,000 hits, and (spr_1, spr_0) with 24,999 hits. The result would be that the profiler would strongly recommend (to the IDE/Compiler) placing spr_0 and spr_1 on the same atlas. The user could also manually fix the glitch in appearance from my method (2) by atlasing them together manually in the interface.


@Robert
That's a legitimate concern. Unfortunately, the option is to either stash matrix data in the bash operation, or treat transform calls as another barrier, which can be devastating for the performance of that batch algorithm.


One extra consideration:
Perhaps it would be a good idea to allow placing sprites in multiple texture atlases, and making it simple to check if the current atlas contains a sprite. This would further improve batching.
« Last Edit: July 31, 2013, 06:29:02 PM by Josh @ Dreamland » Logged
"That is the single most cryptic piece of code I have ever seen." -Master PobbleWobble
"I disapprove of what you say, but I will defend to the death your right to say it." -Evelyn Beatrice Hall, Friends of Voltaire
Offline (Male) Goombert
Reply #12 Posted on: July 31, 2013, 08:31:27 PM

Developer
Location: Cappuccino, CA
Joined: Jan 2013
Posts: 3107

View Profile
Well I would also like to mention that contrary to Josh's belief the D3D9 sprite batcher can also render tiled sprites by simply setting the source rectangle larger than the bounds and enabling texture repetition render state...

The only thing I don't know about is, whether I should force texture repetition on and leave it on in the sampler when these functions get called or implement a perplexed system for checking whether its enabled and then disable it again.
« Last Edit: July 31, 2013, 11:07:10 PM by Robert B Colton » Logged
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.

Offline (Unknown gender) TheExDeus
Reply #13 Posted on: August 01, 2013, 02:14:07 AM

Developer
Joined: Apr 2008
Posts: 1872

View Profile
Quote
So if you draw a square sprite, then a circle sprite over it, the corner of the square sprite for one object will not be able to overlap the circle sprite of another object, because all square sprites are drawn at the same time, and all circle sprites are drawn after.
Ok, I get what you meant. I think this is a compile thing instead of runtime thing no? So if there is not much to change in the drawing code then I guess it could be provided as an option. But I really do want it as an option, because I like how very deterministic is the drawing now. Because if we do this change, then at one point some users will come asking about this overlap problem and the only thing we could offer them would be changing depth (which is ofter hard, as people don't do 0 for player, 10000 for background, -10000 for foreground etc., it is usually 0, 1, -1).

Quote
This is exactly what my method avoids, Harri. It does this by batching sprites of the same texture together under strict conditions. How are you not noticing a difference between those two codes? I think you missed the point of what I was saying. The bottom code demonstrates the original, un-batched behavior, when ENIGMA asks each object to perform its draw event. It requires 3n texture binds, where n is the number of instances. The top code shows how the batching algorithm refactors the code to look; it requires only three texture binds, regardless of how many instances there are. All sprites with spr_1's texture are drawn in batch. Then all sprites for spr_2, then all sprites for spr_3. The order is determined from the order they are drawn in the code, so it will look identical to the original except in cases of one-sprite overlap (described above).
Yeah, sorry, it was late and I understood that code only when I was lying in bed.

Quote
As for texture aliasing, I cannot think of an efficient way to do this at run time. I think our best option is to allow the user to specify groups of sprites for texture atlases, then atlas those atlases together according to profiler data from a special compilation mode.
I didn't mean on using some magical heuristic in real-time though. I just though that we pack all sprites (as much as possible) in an nxn texture at runtime (or when sprite_add() is called) without taking into account usage. Usually the texture size can be quite massive, some even suggest 16kx16k for a modern PC (which GL3 is meant for). And in that size we could pack sprites for most 2d games (that texture can fit 65k 64x64 sprites, or 16k 128x128 sprites.. I think you get the point). At run-time it would also be possible to pack into GL_MAX_TEXTURE_SIZE and so work no matter what. The larger the maximum texture size the better it would go. Giving users the ability to set this would also be good of course.

Quote
Perhaps it would be a good idea to allow placing sprites in multiple texture atlases, and making it simple to check if the current atlas contains a sprite. This would further improve batching.
Well we will have to do this anyway. If the person doesn't have enough VRAM (or we just choose a conservative size when packing at compile time), then we must use multiple atlases. And I was thinking not about a way to check if a sprite is in an atlas, but that sprite returns in which atlas it is in. So basically nothing in the drawing functions would really have to change (only a little bit of texture coords). The texture_use() would automatically work.

Quote
That is possible in Game Maker and using the DX batching class. DX can also outperform this with different textured sprites, I presume because of batching, and it must be mixing it with a shader. If we add all that to the OpenGL one Harri committed, we could make it a LOT faster than what he has right now.
And it also worked in immediate mode. In GL3 transformations themselves are a massive beast, as we must rewrite all those functions to use our own matrix math. The problem is that it would probably break batching, as I would need to call glDrawElements as many times as there are transformations. Only vertex shaders could help there.
Logged
Offline (Male) Goombert
Reply #14 Posted on: August 01, 2013, 02:21:56 AM

Developer
Location: Cappuccino, CA
Joined: Jan 2013
Posts: 3107

View Profile
Quote
Only vertex shaders could help there.
Exactly, it is OpenGL 3, the goal was to rewrite all of it to use shaders. In fact, just Google, I found all the basic immediate mode functions recreated into shaders yesterday somewhere, lost the link :/, but it was open source.
Logged
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.

Pages: 1 2 3 »
  Print