Goombert
|
|
Posted on: November 28, 2013, 09:57:27 am |
|
|
Location: Cappuccino, CA Joined: Jan 2013
Posts: 2993
|
So while working on the global batchers to get the context managers finished for OGL3 and D3D so I can add the new matrix functions for Harri, I decided to do a little test. I ran into a particular issue with an auto-magic global batching system, and that is when you are constantly changing the stride or primitive type of a global batcher. I was disappointed in how much slower it caused my Box2D physics example to run as it draws a lot circles and rectangles with outlines, so I decided to test if Studio also has this issue. Sure enough it does, you can run the two examples below by simply making an empty project with 1 object whose draw event is the code below, sticking that object in a room and hitting run in either ENIGMA or Studio. The first example demonstrates the batching problem that no batching system can really resolve, this also ran at 183fps in Game Maker 8.1 because it just drew the primitives instantly without any overhead. minFPS, maxFPS, avgFPS 41, 52, 48 room_speed = 1000; draw_text(0, 0, "FPS: " + string(fps)); repeat (500) { var xx, yy; xx = random(room_width); yy = 50 + random(room_height); draw_set_color(c_blue); draw_circle(xx, yy, 10, false); draw_set_color(c_red); draw_circle(xx, yy, 10, true); }
This demonstration shows that global batching does help when not constantly switching stride and primitive type as this example ran 3 times slower 8.1 minFPS, maxFPS, avgFPS 60, 583, 524 room_speed = 1000; draw_text(0, 0, "FPS: " + string(fps)); repeat (500) { var xx, yy; xx = random(room_width); yy = 50 + random(room_height); draw_set_color(c_blue); draw_text(xx, yy, "wtf"); }
I am really at a loss for what to do here as I am historically in favor of users learning to do the batching themselves so they can fine tune it to perfection, I do not like trying to auto-magically make games faster. But then again I suppose it is not that big of an issue since I can always add the ability to disable global batching later on anyway, and well OpenGL 1.1 we had no plans of ever adding global batching to, so it's perfect. But anyway, moving forward, ENIGMA will have this same issue with drawing shape outlines in Direct3D and OpenGL 3 and I will later add an option if there is a lot of need for it, and OpenGL 1.1 will never have this issue, since it expects you to do shit yourself.
|
|
« Last Edit: November 28, 2013, 11:07:57 am by Robert B Colton »
|
Logged
|
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.
|
|
|
|
Goombert
|
|
Reply #2 Posted on: November 28, 2013, 10:10:40 am |
|
|
Location: Cappuccino, CA Joined: Jan 2013
Posts: 2993
|
Yes, but you are missing the point entirely, I explained why ENIGMA has this same issue as well. This is actually why I've been stuck the past couple days unable to finish my improvements to D3D and OGL3, as I am unsure exactly what to do. I am trying to think if I can come up with a solution that makes ENIGMA faster in all scenario's.
|
|
« Last Edit: November 28, 2013, 10:19:55 am by Robert B Colton »
|
Logged
|
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.
|
|
|
AsuMagic
|
|
Reply #3 Posted on: November 28, 2013, 10:29:47 am |
|
|
Joined: Nov 2013
Posts: 23
|
Oh sorry didn't read last sentence. It's pretty strange. So if I understood correctly, OpenGL 1.0 is running smoothly but OGL 3 & DX 9 lags? ( Sorry for my english, I'm french )
Edit : Well about the thing showing 'wtf' everywhere, it's because GM:S is using TrueType fonts which are slowing down a lot game. Actually I have exactly same FPS with or without 'draw_set_color(c_blue);' . For some reason, even without 'draw_set_color()' on 1st script, it's doing the same. With a empty room, I have 27 fps, 24 at 'wtf' thing, 25 at circle thing.
|
|
« Last Edit: November 28, 2013, 10:35:36 am by AsuMagic »
|
Logged
|
|
|
|
Goombert
|
|
Reply #4 Posted on: November 28, 2013, 10:37:01 am |
|
|
Location: Cappuccino, CA Joined: Jan 2013
Posts: 2993
|
AsuMagic, it is only a particular case. You see the way graphics API's work, you want to try to batch EVERYTHING into a single vertex buffer object. For instance in ENIGMA, each d3d model is a single VBO and sometimes it has an IBO or index buffer object attached. Now, these VBO's can contain 6 different primitive types, in the following order INDEXED TRIANGLES | INDEXED POINTS | INDEXED LINES | TRIANGLES | POINTS | LINES due to me being the one who designed it. Now the issue is, if you batch a bunch of triangle primitives together, which is the filled circle, since every filled circle is a triangle fan, they keep their depth and are drawn first when the VBO actually renders to the screen. The global VBO decides to draw its contents when stride changes or other render states such as color or text changes. Anyway, it batches the circle outlines as line lists, and the circle outlines will maintain their depth with each other. But the issue occurs when doing both filled and outlined circles, since if you draw the indexed triangles first, they keep their depth, and so does the indexed lines, but they don't keep their depths relative to each other. So in other words, all the line primitives render on top of the filled circle/triangle primitives causing the depth to be screwed. So that means the global VBO now has to render each time its primitive type changes, which drastically slows it down because it has to get run through all this overhead instead of just being sent directly to the GPU and rendered. But anyway, this is why 8.1 is faster with this test case because it didn't do any batching with any overhead and sent everything directly to the GPU after interpretation. Sorry if it doesn't make much sense but it is a rather complex graphics issue
|
|
« Last Edit: November 28, 2013, 10:39:20 am by Robert B Colton »
|
Logged
|
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.
|
|
|
TheExDeus
|
|
Reply #5 Posted on: November 28, 2013, 01:59:14 pm |
|
|
Joined: Apr 2008
Posts: 1860
|
I can't test with GM, but I get 1000FPS on the second test (caps out) and 480FPS on the first one (with GL3, with DX9 I get 333FPS). GL3 actually is faster in all these tests than DX9 (which if I remember correctly was already batching everything via high-level DX magic), so that's funny. Anyway, you select drawing mode when calling glDrawElements, this means you should be able to push all the circles (the triangles and the lines) on a single VBO and then use glDrawElements to draw them in succession. That should be faster as all the information is already on GPU and you only say what to draw. I don't know if it's even possible to draw it any faster without doing the drawing in shaders. users learning to do the batching themselves so they can fine tune it to perfection And that is what they can do. The batching we have now (the sprites one for example) is as about as good as it can be and I doubt a user could make their own better (as there is a limited ways to batch something after all). But the user can take into account this batching (if we document it on the wiki or something) so a user knows that drawing the same sprite several times in a row will be a lot faster than alternating between several. If the user knows how batching works, then he can fine tune the code for massive speed boost.
|
|
|
Logged
|
|
|
|
|
Goombert
|
|
Reply #7 Posted on: November 28, 2013, 04:57:11 pm |
|
|
Location: Cappuccino, CA Joined: Jan 2013
Posts: 2993
|
TheExDeus, (which if I remember correctly was already batching everything via high-level DX magic), so that's funny. No, this was actually my fault with not setting the index buffers to write only and not setting the new global batchers to dynamic memory usage and I had the models in the wrong pools. Don't blame Direct3D for my lack of inexperience with the API, it has enough reasons on its own of why it sucks. And it is still slower than OpenGL as well, from actual test cases I have been running with the proper memory pool set up and everything. And that is what they can do. A simple draw_batch_begin()/end() would suffice like every other game engine. But because of it having to be like this I am going to add draw_set_batching_enabled() If the user knows how batching works, then he can fine tune the code for massive speed boost. You are still completely missing the problem here, having such a system as this does not make every game faster, my Box2D example with the new system on my end as optimal as it can be now runs 1/3rd of the speed it did run at. There really is no solution to this problem. Sslaxx, not in this particular case no, as we already have texture binding memorization in my context managers, texture atlasing can improve game performance down the road for certain games. But please don't be mislead by YoYoGames, texture atlasing is not the only thing you need to worry about, you need to worry about ALL render state changes, that includes enabling/disabling lighting, texture repetition, and any thing else that is either FFP or shader based. Texture atlasing won't become necessary until we start on mobile ports. But also, my context managers do memorize all render states and everything, this is currently what im partially working on for OpenGL 3 as it needs perfected for me to add these new matrix functions just released.
|
|
|
Logged
|
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.
|
|
|
TheExDeus
|
|
Reply #8 Posted on: November 29, 2013, 02:22:48 am |
|
|
Joined: Apr 2008
Posts: 1860
|
You are still completely missing the problem here, having such a system as this does not make every game faster, my Box2D example with the new system on my end as optimal as it can be now runs 1/3rd of the speed it did run at. There really is no solution to this problem. But you just said the problem was rendering triangle lists and line lists interspersed causing an VBO flush. I said that you could render the whole thing with one VBO and as only one glBufferDataARB/glBufferSubDataARB needs to be called, then it should be faster. Taking into account that very few points are used per circle (default is 24) then it makes sense that flushing after every 24 vertices is going to be bad. But only using glDrawElements() after every 24 vertices should be a lot faster. A simple draw_batch_begin()/end() would suffice like every other game engine. I think this is a worse solution. They couldn't batch everything together even if they tried. Batching is dependent on the resources the drawing uses. If you have to draw from two different textures, then batching would fail anyway. If you have to draw from the same resources, then batching can be automatic (like now). So I don't see how this would change much. Unless you mean it creating a batch like a gl display list. That could potentially be useful. But still, it should do this automatically as well. But because of it having to be like this I am going to add draw_set_batching_enabled() Doesn't this mean you also have to have equivalent fixed pipeline functions as well for every drawing function? This potentially doubles the code base and I don't think that is smart. Also, when I launch GL debugger I wouldn't want to see any deprecated functions in GL3. for certain games Actually it is quite important and will improve performance for ALL games. Rendering state changes doesn't mean much when VBO is flushed every time a different sprite is drawn. Taking into account people usually draw sprites at the same depth, then this ends up slowing the whole thing down a lot. Even in your own example, create sprites spr_circle and spr_circle_outline, one of course is a circle and the other an outline. Without texture pages this will be a lot slower: room_speed = 1000; draw_text(0, 0, "FPS: " + string(fps)); repeat (500) { var xx, yy; xx = random(room_width); yy = 50 + random(room_height); draw_sprite(spr_circle,0,xx, yy); draw_sprite(spr_circle_outline,0,xx, yy); } It's basically as your draw_circle problem. This will be 1000 draw calls (with all the other functions associated with VBO flush also called). Now if you had a texture atlas then this would be 1 draw call. Now imagine a game with hundreds of sprites on the screen at one time, texture paging would change hundreds of draw calls, to just 1. This would potentially double the performance in many cases. Also, I know "VBO flush" is not a correct term, but I don't know how to call it in our case. It basically batches until it can and then "flushes" or renders the whole thing.
|
|
« Last Edit: November 29, 2013, 02:24:29 am by TheExDeus »
|
Logged
|
|
|
|
|