I think this is a compile thing instead of runtime thing no?
No, this is exclusively a run-time thing. Instead of doing anything in the drawing functions themselves, the draw functions just look for an appropriate batch job to include their operations in. Sprite functions look for a triangle batch job with the same texture, then for a sub-job with color information if needed. Filled shape operations look for a batch job of the same color or of arbitrary color. Curve, line, and outline operations look for a line batch job of the same color/arbitrary color.
I didn't mean on using some magical heuristic in real-time though. I just though that we pack all sprites (as much as possible) in an nxn texture at runtime (or when sprite_add() is called) without taking into account usage. Usually the texture size can be quite massive, some even suggest 16kx16k for a modern PC (which GL3 is meant for). And in that size we could pack sprites for most 2d games (that texture can fit 65k 64x64 sprites, or 16k 128x128 sprites.. I think you get the point). At run-time it would also be possible to pack into GL_MAX_TEXTURE_SIZE and so work no matter what. The larger the maximum texture size the better it would go. Giving users the ability to set this would also be good of course.
Yes, this would be a great thing, except we need to keep input from the user in mind at all times. For larger games with shitloads of sprites, the user might want to swap them in and out of memory frequently enough to where reconstructing these huge atlases could be a problem. We need some useful heuristic; not a magical one, but one based on the user's intended use. I suggested earlier using the resource tree as a method to group sprites for mass load/unload. We might instead want to just consider a new resource type for general logical grouping. One of these grouping types could be for use in generating atlases. Another could be for use in mass load/unload. A third could just be for generic categorization, so it's easy to check if a resource is in a certain category. Other heuristic data comes from profile mode,which converts the ((spr_0, spr_1), 25000) and ((spr_1, spr_0), 24999) tuples into a more useful {50,000 => (spr_0, spr_1)} in a map or priority queue. So when generating these atlases on, say, embedded systems, where MAX_TEXTURE_SIZE is to the order of 32x32 (or, more practically, 512x512), precedence will be given to grouping spr_0 with spr_1, as otherwise, we will be saddled with 50,000 texture misses. If other pairs have higher weights, they will, of course, be given precedence; I use that 50,000 as a good example of an extreme case. I think typical values will be closer to the 5s and 10s range to 50s-100s, in a typical game.
Well we will have to do this anyway. If the person doesn't have enough VRAM (or we just choose a conservative size when packing at compile time), then we must use multiple atlases. And I was thinking not about a way to check if a sprite is in an atlas, but that sprite returns in which atlas it is in. So basically nothing in the drawing functions would really have to change (only a little bit of texture coords). The texture_use() would automatically work.
Not what I'm saying. I mean (spr_0, subimage 0) isn't in one atlas. It's in two or three, because otherwise, we'll have misses out the wazoo for this one sprite, because we don't have room for all the sprites in our profiler tuples.
Robert's code will not break batching, especially if we applied the matrices software-side for sprites (which is what we do now). This isn't necessary, though. The only way those matrix operations can hurt us is if they're being applied randomly in every single draw event. Otherwise, the batching mechanism can treat them as a separate barrier to move between. When a matrix has the same result as a previous matrix, it becomes the new head position moved to by batch_chunk_start(). That is, after a call to a matrix operation, the batch_chunk_start() method can only move the head back as far as the first matrix node matching the current matrix configuration.
Since it's a run-time mechanism, there is no guess and check here. It will simply work.
That still means that there would be a lot of batch creation/destruction going on. ... I am still not convinced it would give a massive speed boost and it would also foobar the rendering order
Think of it this way. Batch or no batch, this information needs to make it to the card. Moving vertices to the card is expensive, too; a batch operation allows us to amortize this cost over a much larger fraction of the drawing work. So as long as the savings from that is greater than the cost of maintaining the batch, we're fine.
Still, we should be sure it's relatively easy to enable and disable the batch algorithm. A simple way to do that while minimizing the number of data transfers is to just nullify the behavior of batch_chunk_start(). If it no longer moves the head backward, batching will only be done where possible without re-ordering anything. A new batch job will be created for every draw call, and the behavior will be identical to what we have now.
As for implementing texture atlases... You can do that at load time relatively easily. Use the existing rectangle packer used for font glyphs. But you might want to figure out what's causing the font off-by-one error in the engine, first, as it will probably affect you. You'll have to change the load API a bit. Right now, each sprite load is done through graphics_create_texture. You'll need to replace it with some kind of graphics_place_texture() function which returns not just the texture ID, but the top-left and bottom-right float coordinates, as well. You'll also need some kind of finalize method so you can convert from your rect packer + empty texture + pxdata state to just a texture in memory. Best I can say is, good luck.