Pages: 1
  Print  
Author Topic: GameMaker: Studio is EXTREMELY slow...  (Read 15636 times)
Offline (Male) Goombert
Posted on: November 30, 2013, 10:34:50 am

Developer
Location: Cappuccino, CA
Joined: Jan 2013
Posts: 2993

View Profile


Well, I figured out how they are handling draw_set_color()

Harri, you were wrong, they are not adding it to each vertex if one is not passed. They are halting the ENTIRE graphics pipeline, and adding the currently set drawing color. I am amazed at their stupidity, the only thing I can think of is that Micro$hit gave them this idea, see the following article.
http://msdn.microsoft.com/en-us/library/windows/desktop/bb206331%28v=vs.85%29.aspx

I tested this by simply doing the following before that model is drawn in the above screenshot.
Code: (edl) [Select]
draw_set_color(choose(c_green, c_red, c_blue));
draw_set_alpha(random(100)/100);

Which sure enough proved that the entire graphics pipeline was haulted in order to update the color vertices, this is only possible on a model without colors added to each vertex already, which is why when I originally tested this I couldn't get any results, I had to go and remove the color and alpha from their AddCube() script. I also tested with the built in block and other shapes to get these same results.

So anyway, I am going to scrap what were doing by adding it per-vertex, and sure as HELL not doing what they are doing. I am going to change the default drawing color to white and alpha to 1.0, then handle it all in the shader, it's the most optimal solution.
« Last Edit: November 30, 2013, 10:37:21 am by Robert B Colton » Logged
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.

Offline (Unknown gender) TheExDeus
Reply #1 Posted on: November 30, 2013, 11:21:45 am

Developer
Joined: Apr 2008
Posts: 1860

View Profile
Quote
they are not adding it to each vertex if one is not passed.
And I never said they did. I actually said the opposite (read this: http://enigma-dev.org/forums/index.php?topic=1375). I said that we probably will have to do that (and we did). I said that they used the bound color when drawing, not when adding the vertex (which is what you are saying here).

Quote
this is only possible on a model without colors added to each vertex already
This of course was also discussed in that topic. I noted that vertex_color functions override the bound color. And that is what we did as well. If you used only vertex functions then it would use bound color, but when using vertex_color, then the color provided in the function.

Quote
then handle it all in the shader, it's the most optimal solution.
You still need to pass per-vertex color to the shader, so you still need to add it to some kind of array and send via glVertexAttribPointer. So nothing really changed.
Logged
Offline (Male) Goombert
Reply #2 Posted on: November 30, 2013, 12:08:08 pm

Developer
Location: Cappuccino, CA
Joined: Jan 2013
Posts: 2993

View Profile
Oh I see. Anyway, no, you can't just do that Harri, once its buffered its buffered, you can't modify the contents, I am talking about checking the shader if a vertex doesn't have a color then give it the set drawing color. The would solve everything, just have the default drawing color white, and everybody already knows enough to reset the color to white after they change it to draw something, so it would be super fast. I don't see any problem with that solution? But uhm, passing the set drawing color makes the vertex buffer substantially larger, when they all get the same color anyway, you could squeeze a lot more FPS out by not even passing it, which is why I am dropping our current solution too.

At any rate Harri, I now have Direct3D drawing everything just about except curves and stuff. I am starting to delete about 2/3rds of the graphics systems since all draw_sprite/background/standard draw now use draw_vertex_* functions which are batched automatically. The onlything that will be different between the systems is the model struct, the general folder now has source files as well as headers, like the following...
https://github.com/RobertBColton/enigma-dev/blob/master/ENIGMAsystem/SHELL/Graphics_Systems/General/GSsprite.cpp

So now we can get GLES working again, and it will be extremely easy to maintain graphics functions. And if in the case I posted yesterday where you want a lot of outlined shapes, we simply add draw_set_batching_enabled(false); and then the draw_primitive_* functions will simply switch over to using software vertex processing or immediate mode for instance.

Anyway, when I fix OpenGL3 and 1 now we can replace the entire FFP with shaders and then get GLES working, and then cheeseboy is going to try to get Android working again.
« Last Edit: November 30, 2013, 12:12:23 pm by Robert B Colton » Logged
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.

Offline (Unknown gender) TheExDeus
Reply #3 Posted on: November 30, 2013, 02:48:41 pm

Developer
Joined: Apr 2008
Posts: 1860

View Profile
Quote
you can't modify the contents
Well, technically you can, but that's slow (subdata). So I agree that it shouldn't be done.

Quote
I am talking about checking the shader if a vertex doesn't have a color then give it the set drawing color
Yes, but how will you know if vertex has color or not?

Quote
But uhm, passing the set drawing color makes the vertex buffer substantially larger, when they all get the same color anyway, you could squeeze a lot more FPS out by not even passing it, which is why I am dropping our current solution too.
True, but now vertex color is sent to GPU only when _color function is used. So when using the default color the vertex buffer itself won't grow, although the vector in ram will.

So basically what I am saying is that I agree with you and I know why you are trying to do it this way, but right now I just don't understand how will you pull this off. Especially these parts:
1) How will you allow using vertex() and vertex_color() together?
2) You will have to use attributes to send color data to FS. To do so you must use glVertexAttribPointer and that means you will have to put color information inside the buffer object anyway. If you know for a fact that you will exclusively draw with default color, then you can disable that attribute and not populate the buffer object with color. But if you draw regular vertex with colored vertex, then how will you combine? Like having draw_sprite() and then draw_sprite_ext(). I doubt causing the VBO to flush would be the best idea here. It's then better to send the color data as well. It's only 16bytes (RGBA*sizeof(float)) anyway (that's 16mb per 1mil. vertices).

I was thinking on the lines of a custom attribute with the smallest type size that would basically work as a boolean. It would allow you to get whether the vertex has a color or not, but the problem is that you cannot send only part of a buffer. Like it's not possible to have a buffer with 100 vertices and then 20 colors at the end. Number of colors will have to always be the same as vertices.

Quote
we simply add draw_set_batching_enabled(false); and then the draw_primitive_* functions will simply switch over to using software vertex processing or immediate mode for instance.
This is a bad idea. In GL3 there should be no immediate mode functions. Did you read the last post I posted in that topic previous? I said how you could draw all those circles with 1 VBO glBufferData call. It could be a lot faster.
Logged
Offline (Male) Goombert
Reply #4 Posted on: November 30, 2013, 11:24:09 pm

Developer
Location: Cappuccino, CA
Joined: Jan 2013
Posts: 2993

View Profile
Quote
So I agree that it shouldn't be done.
Right, but you can set flags for dynamic buffers and stuff, which does give serious performance boosts. And for OpenGL you simply want to BufferSubData if the formerly allocated BufferData data store region is still big enough to fit the new vertex data, otherwise you have to call BufferData so a new data store is allocated.

Quote
Yes, but how will you know if vertex has color or not?
I don't know that is what I was wondering, how the hell does the FFP know? GL_COLOR or whatever was a shader constant back in the old immediate mode shit.

Quote
So when using the default color the vertex buffer itself won't grow, although the vector in ram will.
Yes I changed my mind already Harri, why? Because then each time you change the drawing color you would have to flush the global batcher, so I guess doing it per vertex is the best solution. But then, the only problem is, what about models that didn't have color values added? They would already be buffered with the set drawing color from when they were created.

Quote
I was thinking on the lines of a custom attribute with the smallest type size that would basically work as a boolean.
This is another issue I/we have yet to resolve, nobody has come up with a fix to me storing color as 4 separate floats, I also love the fact that despite that we kick Studio's ass on that cubes demonstration. But anyway, somebody told me it don't matter anyway as the uhm color data would end up as 4 floats down the line anyway, but I don't believe that. I did attempt to merge the RGBA ints together into a single float, but I remember Direct3D being a piece of shit with anything past Red, probably because of its FFP shaders, which I had to use a damn custom vertex declaration which took me forever to figure out in order to avoid the FVF shit.

Quote
This is a bad idea.
No, not at all. When you disable batching OpenGL 1 would use immediate mode calls, and OpenGL3 and Direct3D would use vertex arrays. This would allow the data to be sent directly to the GPU without any overhead, which would speed up games such as my Box2D example and the draw circles test that was faster in 8.1 than Studio over in the other topic. You forget my model classes have to do rather intensive batching in order to get everything in the same primitive type, which is fine for models, but not for deferred rendering.

At any rate, I decided to let polygonz merge what I currently have.
https://github.com/enigma-dev/enigma-dev/pull/528

You read correctly, 2/3rds of graphics systems are gone and all standard drawing and sprites/backgrounds are located in Graphics_System/General and all utilize draw_primtive_* functions which are themselves batched. So now Direct3D is pretty much as capable as the other graphics systems, and since I have been working on it with batching for some time is currently the most stable. So there are a few glitches now but the kinks just need worked out, OpenGL3 is pretty much a copy and paste effort over into GLES now and cheeseboy is going to attempt fixing Android, I'll probably help.

Also, now that you have these context managers in place Harri, you are free to replace the FFP with shaders as well as add the matrix functions you wanted. Studio also added matrix functions now as well and we will need to mimick their version. They are added at the link below.
http://enigma-dev.org/docs/Wiki/Unimplemented#Other

I would link you the damn interim documentation they had since these functions were just added, but the fuckers have their website down.
Logged
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.

Offline (Unknown gender) TheExDeus
Reply #5 Posted on: December 01, 2013, 08:25:03 am

Developer
Joined: Apr 2008
Posts: 1860

View Profile
Quote
how the hell does the FFP know
It doesn't. It does the same thing - adds default color. When you draw with VBO and not specify color, then it uses the default one. If you draw with VBO and specify color, then it draws with that color. You cannot draw half of vertices with color and the other half without.

Quote
But then, the only problem is, what about models that didn't have color values added?
Then it would work like the previous system. If you created the model without using any vertex_color functions, then some hasColor=false and you just don't specify that attribute. Then it will use default. Of course the last thing we can do is pass uniform variables color and alpha to shaders and just blend with that. In that case it will be able to blend even vertices with color. Like a triangle with 2 default vertices (white) and 1 with c_red. Then if you blend with c_blue, you will have 2 blue vertices and one 50% c_red and 50% c_blue. This is not GM:S behavior though, and this does require to create all blend modes in shaders, but I guess this is something will have to do anyway.

Quote
fix to me storing color as 4 separate floats
Store were exactly? You can store them in float arrays just fine (this was done previously in my batching) and you can use them in VBO (and by extension in shaders) as well. You can even pass bound color as vec4() to shaders.

Quote
OpenGL3 and Direct3D would use vertex arrays.
Well I was referring to GL3 in particular. But how will vertex_arrays be faster than VBO?

Quote
You forget my model classes have to do rather intensive batching
So the batching was the slow thing in that example? I though the rendering.

Quote
you are free to replace the FFP with shaders as well as add the matrix functions you wanted
I guess I'll try. But only for GL3 of course. GL1 still will use FFP. And as we didn't figure out was matrix lib to use, I will just make a simple one my own. It's not that hard, especially when we look exclusively at 4x4 matrices.
Logged
Offline (Male) Goombert
Reply #6 Posted on: December 01, 2013, 03:28:59 pm

Developer
Location: Cappuccino, CA
Joined: Jan 2013
Posts: 2993

View Profile
Quote
It doesn't. It does the same thing - adds default color.
Right, so it adds default color, if the VBO does not specify color, without knowing if the VBO specifies color? Wtf harri?

Quote
Store were exactly? You can store them in float arrays just fine (this was done previously in my batching)
Yes, but you have no idea how much slower that is, which is why I find it funny we still kick Game Maker's ass at speed. But, you can simply merge those four bytes into a single float which would be MUCH faster, but I tried and tried and I could not get the alpha to work :\ which is why I temporarily just went with 4 floats. But for Direct3D this has to be done for the vertex functions they just added.

http://enigma-dev.org/docs/Wiki/Vertex_Functions

Quote
I guess I'll try. But only for GL3 of course.
Right, again, Direct3D is more stable with the batching because I have thoroughly tested it, and it was designed around the fact that Direct3D does not memorize render states and texture bindings AT ALL. Meaning you have to cache them and restore them on BeginScene(), which is actually kind of nice because this gives all the other possibilities I was raving about. OpenGL is going to be kind of wierd with this concept though because, OpenGL is not very OOP, just look at the difference between the two files so far.

https://github.com/enigma-dev/enigma-dev/blob/master/ENIGMAsystem/SHELL/Bridges/General/DX9Context.h
https://github.com/enigma-dev/enigma-dev/blob/master/ENIGMAsystem/SHELL/Bridges/General/GL3Context.h

Quote
But how will vertex_arrays be faster than VBO?
Heh, that depends on the situation, as far as batching we already know buffering is faster, eg. 575fps in Direct3D with 500 draw_text calls. But a vertex array is faster if you do a bunch of circles then circle outlines repetitively in the case I mentioned in the other topic. So when you want to draw a bunch of outlined stuff like that example or in my Box2D example, you would want to disable the batching so that it goes directly into a Vertex Array and then straight to the GPU without ANY of the overhead from the mesh classes. Because the mesh classes as optimal as they are, are REALLY slow for something very very small, which happens in the case of the circle example where batch flushes are frequent. The mesh classes are slow because they have to manipulate the contents and automatically generate the index buffer as well as perform a lot of memory copying around to interleave the vertex data.

At any rate, in all 3 graphics systems when batching is disabled we can simply use vertex arrays for all of the systems Direct3D, OpenGL3 and 1. But I also have another idea where we could automatically detect if the user is making frequent batch flushes and disable the batching for them. For instance, on begin scene we increment flushes by 1 and then in end scene decrement it by 1, if flushes ever becomes larger than say 5 consecutive flushes, we disable global batching, and if it becomes less than 5 consecutive flushes we re-enable batching.
« Last Edit: December 01, 2013, 03:36:13 pm by Robert B Colton » Logged
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.

Offline (Unknown gender) TheExDeus
Reply #7 Posted on: December 01, 2013, 05:19:50 pm

Developer
Joined: Apr 2008
Posts: 1860

View Profile
Quote
Right, so it adds default color, if the VBO does not specify color, without knowing if the VBO specifies color? Wtf harri?
It knows that you specify color. That what the attributes (or previously glColorPointer (which you still use)) are for. You specify what and in what order you have in vertex buffer. If you don't specify color in that buffer (don't enable GL_COLOR_ARRAY), then it knows that it must use the bound color. You cannot enable/disable it between vertices. So it's all or nothing.

Quote
Yes, but you have no idea how much slower that is
I doubt that is much slower, as that is the standard way to do this (as you cannot specify a format like that using glColorPointer). You may also sacrifices precision, but if we have 255 values for each, then I guess 4bytes should pack it fine. This packing would be possible if you used attributes and shaders. Then you could send a custom attribute (packed float) to fragment shader and draw with specified color. The problem though, would be that you would have to unpack them anyway (as you cannot specify 1 float as color in fragment shader). So you sacrifice a little bit of shader speed for a smaller data bandwidth. Benchmarks would have to be conducted to see which is in fact faster.

Quote
Because the mesh classes as optimal as they are, are REALLY slow for something very very small, which happens in the case of the circle example where batch flushes are frequent.
I don't get why that "optimization" has to be done outside model and primitive functions? Like doing some optimization for draw_sprite which is just a quad is of course slow. In this case the previous method of just pushing an array into a vector works a lot better. I have looked at the code a little and I am a little confused about your current way works. I will look further and run with a debugger, because now it seems it would just draw stuff 1 by 1.
Logged
Offline (Male) Goombert
Reply #8 Posted on: December 01, 2013, 11:34:57 pm

Developer
Location: Cappuccino, CA
Joined: Jan 2013
Posts: 2993

View Profile
Quote
That what the attributes (or previously glColorPointer (which you still use)) are for.
Oh I see, but yah people say glColorPointer is for vertex arrays but I use it anyway because it is easier than using the damn attribute functions. All it does is tell OpenGL the offset and stride, Direct3D is a lot nice with this.

Quote
I doubt that is much slower
As I said, it is much slower, I went and decided to finally figure it out, and Josh complained to me about trying to do it as well saying to just pass as four floats. But what neither of you guessed is that OpenGL and Direct3D's FFP both expect only a 4 byte color (see D3DDECLUSAGE_COLOR). The only reason I was using floats before was because I could not figure out how to store all the vertex data into single vectors, and with a little help after nagging Josh to listen to me, he helped me get it working with unions.

It gave a 30fps speed boost in the text batching call from yesterday, and a 30fps boost in the Studio cubes demo now as well, with OpenGL 3 being fastest and peeking at 300fps.


I have added it to the following pull request where you can view the changes made.
https://github.com/RobertBColton/enigma-dev/commit/5e71f562812588712aae58d20ed1278d4646222e

Quote
a vector works a lot better.
No it doesn't when you are drawing lots and lots of sprites, this allows it to get even faster than the 5000 sprites calls at 30fps we had before for Direct3D and OpenGL3

In fact the following code, with the added overhead of it being a text drawing function, renders at 100fps for me, so its clearly necessary for very large batches to be run through my mesh class and be converted to a single draw call, just not for small/frequent batch flushes.
Code: (edl) [Select]
    room_speed = 1000;
    draw_text(0, 0, "FPS: " + string(fps));
     
    repeat (5000) {
        var xx, yy;
        xx = random(room_width);
        yy = 50 + random(room_height);
        draw_set_color(c_blue);
        draw_text(xx, yy, "A");
    }



And if I add the following code it still stays in the range of 60fps.
Code: (edl) [Select]
draw_set_color(choose(c_red,c_blue,c_green));
draw_set_alpha(random(100)/100);
So if we continue to properly think out the solutions to these things, we will continue to have them beat in every single scenario.
« Last Edit: December 02, 2013, 12:14:11 am by Robert B Colton » Logged
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.

Offline (Unknown gender) TheExDeus
Reply #9 Posted on: December 02, 2013, 01:33:03 pm

Developer
Joined: Apr 2008
Posts: 1860

View Profile
Quote
Oh I see, but yah people say glColorPointer is for vertex arrays but I use it anyway because it is easier than using the damn attribute functions. All it does is tell OpenGL the offset and stride, Direct3D is a lot nice with this.
Attributes are not that hard to use as they are barely different from these pointers, but it's just that attributes require shaders to be used and pointer functions like glColorPointer are deprecated in GL3. So again, if we want to write it as best as possible, we will have to ditch them.

Quote
But what neither of you guessed is that OpenGL and Direct3D's FFP both expect only a 4 byte color
I know that it "can" expect that. That's what glColorPointer type argument is all about. But the fact that "The initial value is GL_FLOAT." (see http://www.opengl.org/sdk/docs/man2/xhtml/glColorPointer.xml) should give the idea that float is the most common way. Either way, as you just set "GL_UNSIGNED_BYTE" in glColorPointer type, then it will work fine. But I just realized that if attributes are used then you can still pack them just fine inside a float and also show type in glVertexAttribPointer. So basically nothing changes.

Quote
very large batches to be run through my mesh class and be converted to a single draw call
But previously it was also converted into 1 draw call. What I am saying is that you try to apply "optimizations", but those are not necessary when we already write optimized functions for sprites/backgrounds/surfaces etc. They can be applied when a user uses vertex functions, but I doubt they are necessary for when you are just drawing 1 sprite. Like this is the current draw_sprite:
Code: [Select]
void draw_sprite(int spr,int subimg, gs_scalar x, gs_scalar y)
{
    get_spritev(spr2d,spr);
    const int usi = subimg >= 0 ? (subimg % spr2d->subcount) : int(((enigma::object_graphics*)enigma::instance_event_iterator->inst)->image_index) % spr2d->subcount;
       
        const float tbx = spr2d->texbordxarray[usi], tby = spr2d->texbordyarray[usi],
                        xvert1 = x-spr2d->xoffset, xvert2 = xvert1 + spr2d->width,
                        yvert1 = y-spr2d->yoffset, yvert2 = yvert1 + spr2d->height;

        draw_primitive_begin_texture(pr_trianglestrip, spr2d->texturearray[usi]);
        draw_vertex_texture(xvert1,yvert1,0,0);
        draw_vertex_texture(xvert2,yvert1,tbx,0);
        draw_vertex_texture(xvert1,yvert2, 0,tby);
        draw_vertex_texture(xvert2,yvert2, tbx,tby);
        draw_primitive_end();
}
draw_primitive_end(); will call d3d_model_primitive_end(); and that will call mesh->end(). Mesh->end() has this for trianglestrips (which the sprite is):
Code: [Select]
case enigma_user::pr_trianglestrip:
                        triangleIndexedVertices.insert(triangleIndexedVertices.end(), vertices.begin(), vertices.end());
                        if (indices.size() > 0) {
                                for (std::vector<GLuint>::iterator it = indices.begin(); it != indices.end(); ++it) { *it += triangleIndexedCount; }
                                for (unsigned i = 0; i < indices.size() - 2; i++) {
                                        // check for and continue if indexed triangle is degenerate, because the GPU won't render it anyway
                                        if (indices[i] == indices[i + 1] || indices[i] == indices[i + 2] || indices[i + 1] == indices[i + 2] ) { continue; }
                                        triangleIndices.push_back(indices[i]);
                                        triangleIndices.push_back(indices[i+1]);
                                        triangleIndices.push_back(indices[i+2]);
                                }
                        } else {
                                unsigned offset = (triangleIndexedVertices.size() - vertices.size()) / stride;
                                for (unsigned i = 0; i < vertices.size() / stride - 2; i++) {
                                        if (i % 2) {
                                                triangleIndices.push_back(offset + i + 2);
                                                triangleIndices.push_back(offset + i + 1);
                                                triangleIndices.push_back(offset + i);
                                        } else {
                                                triangleIndices.push_back(offset + i);
                                                triangleIndices.push_back(offset + i + 1);
                                                triangleIndices.push_back(offset + i + 2);
                                        }
                                }
                        }
                        break;
               
As you are using draw_vertex_texture() which doesn't add indices manually, and this means that you are doing that second loop to push back indices. If you used a function which added indices as well, then it will do the first double loop to check for degenerated vertexes (which of course there are none, as sprite function is very specific that it draws a quad). And so in both cases there is useless if checks and useless loops. You should be able to populate the index buffer from draw_sprite() and reduce some of the operations used. Of course in quad case the loop will run twice, and the compiler will probably unroll it, but still. I think it would be a lot better if the operations wasn't done for stuff we can do for manually.
« Last Edit: December 02, 2013, 01:35:10 pm by TheExDeus » Logged
Offline (Male) Goombert
Reply #10 Posted on: December 02, 2013, 02:16:55 pm

Developer
Location: Cappuccino, CA
Joined: Jan 2013
Posts: 2993

View Profile
Quote
pack them just fine inside a float
No that was in fact why it wasn't working for me, I already knew you couldn't pack inside a float from when working on binary buffers, Josh clarified for me, IEEE does not guarantee any byte paste the first byte for floats. Which is where the union comes into play and I was finally able to resolve it now. But as for Direct3D, Direct3D not only expects it, but you have to use this for D3DCOLOR, as it is essentially a DWORD or long unsigned int. FVF requires D3DCOLOR.

Quote
But previously it was also converted into 1 draw call.
But without automatic indexing, unless we want to duplicate the code to do the same thing, which would be rather verbose as the rest of mesh class overhead is not even a pinch on the arm.

Quote
As you are using draw_vertex_texture() which doesn't add indices manually, and this means that you are doing that second loop to push back indices.
Right, that is faster, what you think manually adding them and increasing the memory copy's 10 fold is faster? Not only do you have the added memory copy's but you also have the compiler going through more function symbols, only to have the same end result. Where as the mesh class, simply reserves the memory, and generates those indices automatically, all in one go.

At any rate, this is all theory, unless something is presented that actually gives better performance all around for games, I am generally uninterested. The graphics code is much faster than all Game Maker versions, very easy to add new drawing functions, and faster than what we had before, and of course, no more immediate mode. So we got everything perfect, this system is optimal enough, let's just move on to fixing bugs with OpenGL3 batching and get the FFP into shaders and stuff now so we can get GLES working again, alrighty?
Logged
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.

Offline (Unknown gender) TheExDeus
Reply #11 Posted on: December 02, 2013, 02:33:28 pm

Developer
Joined: Apr 2008
Posts: 1860

View Profile
Quote
No that was in fact why it wasn't working for me, I already knew you couldn't pack inside a float from when working on binary buffers, Josh clarified for me, IEEE does not guarantee any byte paste the first byte for floats. Which is where the union comes into play and I was finally able to resolve it now. But as for Direct3D, Direct3D not only expects it, but you have to use this for D3DCOLOR, as it is essentially a DWORD or long unsigned int. FVF requires D3DCOLOR.
Correct, I was thinking float as 4 bytes, not as a type. But yes, anyway, you can still pack that in a way you could still put in vector<float>.

Quote
But without automatic indexing
I was talking about draw_sprite and it didn't have automatic indexing but a manual one. I think for functions like that manual one should be faster, ergo, my example given previously.

Quote
Right, that is faster, what you think manually adding them and increasing the memory copy's 10 fold is faster?
It would also work with reservations like your current code. Just without one if statement (i%2), a loop (I thought it could be unrolled, but actually I no longer think that is possible as the vector size can change, so it possibility doesn't unroll a loop even if 4 vertices are used and so requires a jmp) and possibility a simpler offset calculation.

Quote
At any rate, this is all theory, unless something is presented that actually gives better performance all around for games, I am generally uninterested.
Agreed, I will try those optimizations later myself and see if it makes a difference. You shouldn't waste much time on it as it's already working quite fast.

Quote
So we got everything perfect, this system is optimal enough, let's just move on to fixing bugs with OpenGL3 batching and get the FFP into shaders and stuff now so we can get GLES working again, alrighty?
Agreed. I see that you already try to use our own arrays for transformations matrices. I guess I will write a simple class for that later if you don't plan to and then we will finally be able to use shaders with the rendering.
Logged
Offline (Male) Goombert
Reply #12 Posted on: December 02, 2013, 03:37:32 pm

Developer
Location: Cappuccino, CA
Joined: Jan 2013
Posts: 2993

View Profile
Quote
But yes, anyway, you can still pack that in a way you could still put in vector<float>.
Yes but only using unions, as I said, you can't pack into a float like you can an integer for an RGB/RGBA color value because of IEEE, if you try it may work but it might not on other hardware because IEEE is not always followed and I guess ISO don't have a say?

Quote
Just without one if statement (i%2),
Oh yes, definitely, knock yourself out if you can make it faster :P

Quote
You shouldn't waste much time on it as it's already working quite fast.
Right yes, I am going to stop now, I just wanted it all perfected and so we can throw out the old GLES code and get it all working properly again.

Quote
I guess I will write a simple class for that later if you don't plan to and then we will finally be able to use shaders with the rendering.
Yes, just make sure you do the gm_Matrices or whatever stupid constant they use, check their manual for the matrix shits to make sure it matches up.
http://docs.yoyogames.com

Also, we have some side effects now, such as OpenGL3 not being quite as prepared for the batching, and I've been working with Direct3D longer which has caused it to be more stable. Nevertheless, much better performance and lower RAM usage. Part of the issues are just the OpenGL 3 context manager not flushing on state changes, didn't add them all to it.
Logged
I think it was Leonardo da Vinci who once said something along the lines of "If you build the robots, they will make games." or something to that effect.

Pages: 1
  Print