This a proposed research topic for anybody interested in the graphics. Even though #1648 was less code, faster, and made things work better in Direct3D, I believe we can still do better. It may be possible to replace the low-level texture transfer interface I designed with one oriented for DMA texture transfer.
It may be that only Direct3D would benefit from this or possibly all backends. Regardless, a DMA interface would allow us to write directly to one texture the contents of what we are reading from another texture. That would allow us to eliminate the overhead of an additional copy of the pixel data to system memory.
This topic actually includes more than just textures. All of the generalized screen/surface/pixel interfaces that I am proposing might be able to be replaced by a DMA one. Also, for backends that do not truly support DMA, it can always be emulated, which should be no slower than if the interface were not designed for DMA transfer.
Such a proposed DMA interface may look something like the following.
unsigned char* graphics_texture_lock(int texture, int x, int y, int width, int height, bool read, bool write); void graphics_texture_unlock(int texture);
It would also be possible to use RAII like Direct3D to design the interface.
void graphics_texture_lock(int texture, enigma::DMA_LOCK& dmaLock, bool read, bool write); void graphics_texture_unlock(int texture);
Despite all the research I did and optimizations I found in #1725, I still could not get it to beat OpenGL. This suggests that OpenGL does some redundant caching of the texture in system memory. I actually experimented with forcing the D3D9 backend into managed mode, which is slower rendering, and it does drop the copy speeds down to about half of OpenGL. Because the managed memory pool is deprecated in D3D9Ex, we could just always keep texture pixels in RAM or keep the system memory copy we use to upload the texture at all times if copy speeds were really a concern. I am not convinced they are since I got D3D fast enough that nobody will probably notice.
Considering that alone, I don't think the additional abstraction and copying over DMA pixel transferring we have now is actually our biggest concern. It really doesn't seem to impact performance much. No matter what you do, the CPU reading to and from the GPU is slow.