Empty Game/Room Caption/Var Construction Recursion

Reporter: RobertBColton  |  Status: open  |  Last Modified: June 23, 2019, 09:23:50 PM

This is happening for me on Windows in compile mode. Steps to reproduce:

  • Create a new game with one object (no room!)
  • Compile an exe of it (opening the exe will crash)
  • Add to the create event a call to string_height("");
  • Compile an exe again, opening it this time will not crash
  • <
RobertBColton  
So, I finally managed to get to the bottom of it. It's room_caption and current_caption global initialization. I figured it out by removing -s from the SHELL Makefile for compile mode, then adding -g3 -ggdb to get the symbols. I did a full clean rebuild and got the below stack trace. Note the length 12 is because I changed the two vars to initialize to "hello rusky".

#18569 0x0000000000535593 in var::var<char [12]>(char const (&) [12]) [clone .constprop.0] ()
#18570 0x0000000000535593 in var::var<char [12]>(char const (&) [12]) [clone .constprop.0] ()
#18571 0x0000000000535593 in var::var<char [12]>(char const (&) [12]) [clone .constprop.0] ()
#18572 0x000000000053603d in global constructors keyed to 65535_0_SHELLmain.o.70040 ()
#18573 0x0000000000485b32 in __do_global_ctors ()
    at E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gccmain.c:67
#18574 0x0000000000485b8f in __main ()
    at E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gccmain.c:83
#18575 0x000000000040137f in __tmainCRTStartup ()
    at E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:329
#18576 0x00000000004014db in WinMainCRTStartup ()
    at E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:195

JoshDreamland  

It's not clear to me why calling string_height fixes anything. It's also not clear to me that we're seeing recursion. Sounds like an undefined access that adding another method call is masking. I suggest room_caption is being initialized to room[0].caption, which is OOB. Maybe declaring that create event pushes another string in the way. Either way, the problem is likely with how room_caption is initialized. Not with recursion between two constructors.
RobertBColton  

Here's the thing, I deleted my earlier comment. If you change string_height to draw_text that also works, as well as just about any other function in GSfont.cpp except for draw_set_font or draw_get_font which do not fix it. Rusky suggested it might just be functions that take var, but I tried show_message, array_length_1d, and string which do not fix it. Pretty weird, but I think it's just related to the fact that this only occurs with -flto which must be separating the code segments into Enclaves.

Now let me explain why I think it's recursion other than the stack being 20,000 frames of var construction. If I add an overload to variant_string_wrapper that takes const char* then the recursion goes away but I end up with a memmove segfault, which I think is caused by the release just below that.

struct variant_string_wrapper : std::string {

JoshDreamland  

That's a weird problem to encounter. And that solution shouldn't work. It seems GCC is up to its old antics of trying to hop from type A to type B via var. That isn't legal and I don't understand how it's doing that. Can you get line numbers?
RobertBColton  

Right, no I don't think I can get line numbers because it only occurs with -flto which destroys that. I mean, it's giving line numbers, they just happen to be in linker-separated modules instead of the original line numbers. If you have an idea of what to add to the -g3 -ggdb I used to get the original line numbers, please share.

@JoshDreamland here's something to add, std::string has no virtual destructor, and thus deriving from it is not recommended.
https://stackoverflow.com/q/6006860

RobertBColton  

Ok, so I decided to do some printing from the constructors to get us some more information and see all the variables involved here as well as exactly which constructors are running. I discovered that as far as strings go, it is just the caption basically. However, I also realized that we are casting the room caption every step to std::string and checking if it's empty to set the window caption.
if (!((std::string)enigma_user::room_caption).empty())

I had to comment that out for a second in order to see first the constructors that are being run.

********* EXECUTE:
C:/msys64/tmp/egm13900845851131715855.exe

rvalue_ref string variant_string_wrapper() this be the room caption
variant(const char *str) this be the room caption
rvalue_ref string variant_string_wrapper() this be the current caption
variant(const char *str) this be the current caption
Initializing audio system...

I then gave this a go in compile mode and the stack trace was totally different. I even uncommented the setting of the room caption again and got the same.

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00000000005339d0 in enigma::propagate_locals(enigma::object_planar*) [clone .cold] ()
(gdb) bt
#0  0x00000000005339d0 in enigma::propagate_locals(enigma::object_planar*) [clone .cold] ()
#1  0x0000000000992850 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)

Once I set the two caption vars back to empty string literals, I was able to get back to the original issue.

#0  0x0000000000532c62 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) [clone .isra.0] [clone .lto_priv.6] ()
#1  0x0000000000532cc4 in var::var<char [1]>(char const (&) [1]) [clone .constprop.0] ()

So it just seems it's something to do with the linker optimization not liking vars constructed from empty string literals.

RobertBColton  

So, with #1754 merged, I'm back digging for this one again. I discovered that .constprop means constant propagation: https://stackoverflow.com/a/14796736

I see now we have ONLY the memmove segfault. So, I think what was going on with current_caption then is that the optimizer was inlining it? And I don't know where to look for this memmove.

#0  0x00007ffb95c74400 in msvcrt!memmove ()
   from C:\WINDOWS\System32\msvcrt.dll
#1  0x0000000000536616 in ?? ()
#2  0x0000000000482d62 in ?? ()
#3  0x000000000040137f in ?? ()
#4  0x00000000004014db in ?? ()
#5  0x00007ffb95d44034 in KERNEL32!BaseThreadInitThunk ()
   from C:\WINDOWS\System32\kernel32.dll
#6  0x00007ffb97c53691 in ntdll!RtlUserThreadStart ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#7  0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

RobertBColton  

Ok, I attempted to reproduce it outside of ENIGMA. It didn't work out ultimately. I tried recreating the same source structure, static linking the runtime, etc.

Download Attempted Reproducer: var_crash.zip

Now going to look at asan for leaks in #1756 to see if I can find it that way.

RobertBColton  

Ok, so now, after looking at the disassembly and learning some new tricks, I am actually getting close to solving this one. The first thing I want to point out is that in b7c9dd5 I originally removed the default initialization of the room background and view arrays, because I felt it was redundant since they are loaded from the executable anyway.

Here is the complete decompiled code from an empty x64 bit game executable: Decompiled Empty Game.zip

The crash occurs after a jump from line 321,853 goto addr_533636_16;

If we go to the label addr_533636_16 we see it falls smack dab between setting of room_caption to null and the initialization of the background var arrays.

_ZN11enigma_user12room_captionE = 0; // line 321,769
...
addr_533636_16: // line 321,791
...
_ZN11enigma_user18background_visibleE = 0; // line 321,872

If we look back in the room source code, we can see there is no other vars between the caption and background visible, only two ints. If we look for these ints in the decompiled code, we can see they were grouped with other ints such as the room width and height.

RobertBColton  

So, here's the constructor order for room_caption in run mode.

#  define rvalue_ref &&

std::string(x)
variant_string_wrapper(std::string rvalue_ref  x)
variant(const char *str)
var(const T &v)

It's matching with the templated var constructor which calls variant's C-string constructor, which calls its parent rvalue_ref string wrapper constructor. In compile mode, based on my analysis of the decompiled game, it seems the problem is that the linker is inlining all of these constructors for the global initialization of the caption.

RobertBColton  

@JoshDreamland alright, I got a lot more info by decompiling. Where should we go from here?
RobertBColton  

Disabling all extensions except Paths extension and setting Widgets/Audio to None has changed the stack.

Download Minimum SDL Executable: minimum-sdl-exe.zip

(gdb) bt
#0  0x00007ffb97c3c283 in ntdll!RtlCreateHeap ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#1  0x00007ffb97bfd26c in ntdll!RtlSizeHeap ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#2  0x00007ffb97bf9725 in ntdll!RtlAllocateHeap ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#3  0x00007ffb97cdf55e in ntdll!RtlpNtSetValueKey ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#4  0x00007ffb97c8b530 in ntdll!memset () from C:\WINDOWS\SYSTEM32\ntdll.dll
#5  0x00007ffb97bf9725 in ntdll!RtlAllocateHeap ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#6  0x00007ffb95c19960 in msvcrt!malloc () from C:\WINDOWS\System32\msvcrt.dll
#7  0x000000000050aa69 in operator new(unsigned long long) ()
#8  0x000000000050b955 in global constructors keyed to 65535_0_SHELLmain.o.51871 ()
#9  0x000000000050b9e0 in global constructors keyed to 65535_0_SHELLmain.o.51871 ()
(gdb)

Please sign in to post comments, or you can view this issue on GitHub.