ENIGMA Forums

General fluff => Off-Topic => Topic started by: luiscubal on November 14, 2010, 06:06:41 pm

Title: C++0x and garbage collection
Post by: luiscubal on November 14, 2010, 06:06:41 pm
What happens to variables you forgot to free?
Previously, you'd expect to get a memory leak in C++, but in C++0x, according to wikipedia (http://en.wikipedia.org/wiki/C%2B%2B0x#Allow_garbage_collected_implementations):

Quote
It is implementation defined whether unreachable dynamically allocated objects are automatically reclaimed.

Of course, considering how permissive C++ is regarding memory, decent garbage collectors are nearly impossible, so we're most left with stuff like Boehm's conservative GC.

Note that implementation defined doesn't mean that all implementations GC. Only if the compiler makers decide so.
On GCC, I'd guess garbage collector would end up as some compiler flag.
Title: Re: C++0x and garbage collection
Post by: Rusky on November 15, 2010, 10:51:28 am
The idea was to allow implementation of real garbage collection from within the language (possibly through extensions to operator new, etc.), and that idea has been around for a decade at least. However, the committee just keeps putting it off so they decided to put that in rather than do nothing.

My guess is that it was intended to allow compiler writers to experiment so they can standardize on something later - maybe Clang will be able to come up with something using LLVM's GC support?
Title: Re: C++0x and garbage collection
Post by: RetroX on November 15, 2010, 05:51:12 pm
I think that GC is a waste of time.  If you leak memory, that's your own fault, and you can use valgrind and a debugger to find it.

The fact that languages like Java that heavily rely on GC are being used for teaching is terrible, and that's probably why my graphics drivers can leak 20 MB by just opening a window.
Title: Re: C++0x and garbage collection
Post by: Josh @ Dreamland on November 15, 2010, 09:56:42 pm
Don't give Valgrind too much credit.
Though the idea of a GC irks me, Valgrind is not perfect. It has been difficult to determine what ENIGMA leaks Valgrind indicates are legitimate, since it gives a definite loss record on all GL Mesa calls, and a possible loss record on all ENIGMA's pointer arithmetic (used for such sinister concoctions as lua_table<>).

So I can't run anything in ENIGMA without an 8MB loss record from one GLX call.

That said, the idea of relying on a GC to clean up after me is disgusting.
Title: Re: C++0x and garbage collection
Post by: luiscubal on November 16, 2010, 12:25:29 pm
Some people use GCs for memory leak detection.
Title: Re: C++0x and garbage collection
Post by: RetroX on November 16, 2010, 05:52:06 pm
Some people use GCs for memory leak detection.
I like this idea, however, I don't like it as a standard practice.
Title: Re: C++0x and garbage collection
Post by: Rusky on November 16, 2010, 09:01:28 pm
Unilaterally condemning garbage collection as "disgusting" is extremely short-sighted. GC has practical benefits beyond ease of use, and while they may not apply to your area of programming they definitely apply to others. GC greatly improves cache usage, improves allocation performance (which in some cases, no matter how hard it is for you to believe, can result in an overall performance increase) and can occasionally improve collection performance. Garbage collection is a useful trade-off, not a crutch for lazy people - it just shifts work around between malloc/free, really. However, beyond it not always being the best idea for games, I'll agree that there are better ways to manage memory in general, like automatic stack allocation.

I'm especially irritated by Retro's hypocritical attitude of "If you leak memory, that's your own fault." You have an incredibly powerful machine sitting in front of you - why not take advantage of it? The trade-off is often worth it, and you do it all the time anyway, with things like type systems (static or dynamic, they automatically choose the right opcodes for your data types), shell scripts (oh my gosh dynamic typing and garbage collection you horrible sloppy pig), your CPU's MMU (Singularity, an operating system written in C#, eliminates the need for this and is much faster despite using garbage collection and "over-use" of classes) and a secure operating system (if you run malicious programs or let stupid people use your computer it's your own fault).

Really, these technologies have uses. There's almost never such a thing as an unconditionally bad technique or tool.
Title: Re: C++0x and garbage collection
Post by: RetroX on November 16, 2010, 10:13:40 pm
Unilaterally condemning garbage collection as "disgusting" is extremely short-sighted. GC has practical benefits beyond ease of use, and while they may not apply to your area of programming they definitely apply to others. GC greatly improves cache usage, improves allocation performance (which in some cases, no matter how hard it is for you to believe, can result in an overall performance increase) and can occasionally improve collection performance.
Although, you fail to explain how.  If there actually is a good reason, I'd like to know what it is.

I'm especially irritated by Retro's hypocritical attitude of "If you leak memory, that's your own fault." You have an incredibly powerful machine sitting in front of you - why not take advantage of it? The trade-off is often worth it, and you do it all the time anyway, with things like type systems (static or dynamic, they automatically choose the right opcodes for your data types), shell scripts (oh my gosh dynamic typing and garbage collection you horrible sloppy pig), your CPU's MMU (Singularity, an operating system written in C#, eliminates the need for this and is much faster despite using garbage collection and "over-use" of classes) and a secure operating system (if you run malicious programs or let stupid people use your computer it's your own fault).

Really, these technologies have uses. There's almost never such a thing as an unconditionally bad technique or tool.
Yes, I have an incredibly powerful machine.  AMD's drivers leak 20 MB from opening a window.  That's a massive amount of memory, and if there was GC, it would all be solved.

How many calculations does it require to collect 20 MB of memory?  If we assume that all of those MB are in floats or integers, we can pretend that it's one calculation per 4 bytes.  That's 2 * 1024^2 "calculations."  Sure, four integers per frame in a game isn't bad for GC, but 20 MB?  There are some cases where GC might be a bit much, and these things add up.  If it's 20 MB per window, and you open multiple windows at once, that number doubles or triples or quadruples.  Now, it takes a few seconds for programs to start when they could start instantaneously.

Good programming is far better than having some garbage collector come and pick up the trash that you leave behind.  I agree that for finding these errors, GC is extremely useful, but not in every case should it be used as common practice.
Title: Re: C++0x and garbage collection
Post by: Rusky on November 16, 2010, 11:05:51 pm
Great job, you have shown a complete and utter lack of knowledge about both GC and non-GC systems.

In this post, I'm talking specifically about generational/compacting GC, which is what most major VMs and runtimes use, but other types often have their own benefits. GC improves cache usage by keeping objects that are used together close by each other in memory so they can be in the same cache lines. You do know what that means, right? GC improves allocation performance because allocation consists of a store and an increment rather than walking a list, removing a node and reinserting a node. GC can improve collection performance because rather than walking a list, removing, merging and inserting a node on every free you follow some pointers and then compact the objects you find in the heap, only when you need more memory - these approaches can both be advantageous depending on the situation.

Garbage collection doesn't scale with the amount of memory - that would be an insanely brain-dead way to do it. It scales with the number of references- what happens is that when you hit a certain threshold of allocated memory you follow the pointers in globals and on the stack through to pointers in objects and so on until you see everything that's reachable. Everything else on the heap is dead and can be overwritten. Can you see how this is just a redistribution of the work? Rather than leave everything to the programmer, some of the run-time work like managing sizes or reference counts goes into the compiler describing what variables are pointers, and some work moves from allocation to freeing, and all the freeing gets combined so costs can be amortized.

Good programming is understanding the tools you have available and choosing the ones that work the best for the current situation. Dynamically allocated memory - automatically or manually collected - is not always the best tool. When it is, sometimes automatic garbage collection is a better tool than manual memory management. There is virtually never a reason to ban something for every situation or even most situations. GC is useful for more than debugging, whether or not you care to understand how it works and whether or not you consider it lazy.
Title: Re: C++0x and garbage collection
Post by: RetroX on November 17, 2010, 04:26:38 pm
Great job, you have shown a complete and utter lack of knowledge about both GC and non-GC systems.
Gee, thanks for letting me know.  You might as well have said "You fucking moron, why the fuck do you not know what GC is, what the fuck, you're stupid," because it probably would have said the same thing.

You're terrible at explaining things, and you're a dick when you do it.


On the topic of what you actually said, I honestly don't know anything about how managing memory works, but as far as I know, it's removing unclaimable "garbage" objects.  In other words, if I store a new pointer to an object over an old, and that object isn't pointed to anywhere, it will be deleted.

That's probably wrong, but both you and wikipedia do an extremely terrible job at explaining it to people that haven't had five years of education on the subject.
Title: Re: C++0x and garbage collection
Post by: Fede-lasse on November 18, 2010, 08:38:53 am
Here we go...

Josh, you oughta get more server space for the forums.
Title: Re: C++0x and garbage collection
Post by: luiscubal on November 18, 2010, 02:26:11 pm
The argument for GC is the following:

1.
malloc and free are relatively expensive. By reusing the same memory location for multiple purposes, you can save some time.
Code: [Select]
int x = 0;
for (int i = 0; i < 1000; ++i) {
   char* array = (char*) malloc(sizeof(char) * 1000);
   memset(array, 0, sizeof(char) * 1000);
   //do something with array.
   x += array[0].
   free(array);
}
//vs
int x = 0;
char* array = (char*) malloc(sizeof(char) * 1000);
for (int i = 0; i < 1000; ++i) {
   memset(array, 0, sizeof(char) * 1000);
   //do something with array.
   x += array[0].
}
free(array);
The second alternative is more efficient.
Of course, the example above is a very simple case that can be solved with very minimalistic code redesign. However, for more complex situations, you'll find yourself needing to use Memory pools (http://en.wikipedia.org/wiki/Memory_pool) and custom allocators.
I'll come back to this topic later.

2. Code and data locality benefits cache (http://en.wikipedia.org/wiki/CPU_Cache).
Rusky is claiming that garbage collectors can reduce the frequency of cache misses.
The CPU cache makes RAM access substantially faster, and cache misses truly damage program speed.
Again, I'll come back to this topic later.

3. It is as retarded to say "X is universally bad" as it is to say "X is universally good". Not understanding X is not a good reason to claim X is bad.

So, I promised I'd come back to 1. and 2., so here we are.
Regarding memory pools and locality, be aware that GCs like Boehm can not possibly be expected to do the really awesome GC magic that Rusky mentioned. Perhaps only a very weak version of it.
GCs like Boehm too can suffer from fragmentation:
(images from Mono's Compacting GC page (http://www.mono-project.com/Compacting_GC):
(http://www.mono-project.com/files/a/aa/Fragmentation.png)
GCs like Mono's SGen can perform tricks C++ can not.
(http://www.mono-project.com/files/b/bc/Compacting.png)

Now THAT can improve locality.
Title: Re: C++0x and garbage collection
Post by: serprex on November 18, 2010, 04:53:20 pm
Retro,
Firefox has areas where it isn't a memory leak that causes high memory usage, but a large amount of fragementation. If you can't allocate memory in evenly sized blocks without being overly pessimistic so that small objects bloat the memory usage though, then I guess that's your own fault
Really I think the issue is that when people say manual memory management, they think malloc/free. I've been leaning more towards mmap as of late, but then you're outside of standard C. I was recently toying with an explicit memory manager which also had compacting, though the implementation was rather brain dead due to the half ass proof of concept nature of the project. Garbage collection is a design pattern, and where it isn't supplied by the underlying framework, it'll end up being poorly implemented by the application. There are a number of refcounted APIs in C, but refcounting isn't necessarily the superior mechanism. It's only that refcounting is suitable for explicit use by the programmer, as it doesn't deviate much from malloc/free
In any case, you should learn to read Rusky's responses at times. He answered your questions, albeit with hostility. He over evangelizes, but that's what happens when one is forced to play devil's advocate in a crowd
Title: Re: C++0x and garbage collection
Post by: RetroX on November 18, 2010, 05:59:07 pm
Retro,
Firefox has areas where it isn't a memory leak that causes high memory usage, but a large amount of fragementation. If you can't allocate memory in evenly sized blocks without being overly pessimistic so that small objects bloat the memory usage though, then I guess that's your own fault
Really I think the issue is that when people say manual memory management, they think malloc/free. I've been leaning more towards mmap as of late, but then you're outside of standard C. I was recently toying with an explicit memory manager which also had compacting, though the implementation was rather brain dead due to the half ass proof of concept nature of the project. Garbage collection is a design pattern, and where it isn't supplied by the underlying framework, it'll end up being poorly implemented by the application. There are a number of refcounted APIs in C, but refcounting isn't necessarily the superior mechanism. It's only that refcounting is suitable for explicit use by the programmer, as it doesn't deviate much from malloc/free
In any case, you should learn to read Rusky's responses at times. He answered your questions, albeit with hostility. He over evangelizes, but that's what happens when one is forced to play devil's advocate in a crowd
I read his post.  It was calling me stupid because I didn't know what I was talking about (which I didn't), then proceeded to explain as if I knew anything.  I don't know how memory is allocated; based upon my extremely simplistic knowledge on the subject, it seemed like GC was a bad idea.  I kind of see why it's a better idea now.

I'll admit that I was being terribly ignorant on the subject.

After looking at luis's example, there's a reason that static variables exist.  Why recreate a variable for every call to a function when you can create it once and reuse it?  The first example is extremely inefficient for regular variables, and I always define variables outside of a block and try to use as few as possible.

I still think that while GC can help memory allocation be a lot faster, it's not always the best option.  Or maybe it is.  I still don't know enough about it.
Title: Re: C++0x and garbage collection
Post by: luiscubal on November 18, 2010, 06:32:23 pm
@RetroX Multi-threading pretty much screws up static variables way more than stack variables(although in this case, only the pointer is in the stack. The values pointed by "array" are in the "heap"). Also, recursive functions don't play well along static variables.

Also, remember it was an *example*. I was trying to use simple C to show you the *concept*.
The problem is not my example, because if that was the worst case then GC wouldn't help much.
The *real* problem comes when the "array" variables aren't together in a single function that is easily modified to be more efficient. The problem comes when the variables are spread across the application. In some cases, there is no obvious way to "reuse" array using plain old malloc/free.
The Mono GC image I quoted is a good example of what's wrong with manual allocations.

Note that not all GCs are created equal.
For instance, Boehm can't hope to perform the tricks Mono's GC does, simply because Boehm tries to play nice with standard C++, so pointers should always point to fixed locations. Boehm, therefore, can do very little to help with fragmentation. In addition, Boehm has no typing information, so it can't know the difference between integers and pointers. As a result, it can sometimes think that a value is a pointer to some memory location when it is just some random integer.
Reference counters that are simply a struct { int refs; void* ptr } also obviously don't help with the efficiency situation discussed above. Of course, they *are* simple and straightforward to implement, not to mention that with a few locks or atomic integers, it is easy to make it multi-threaded.
Some garbage collection algorithms are also based on the concept of "stopping the world", which can really help with implementation in multi-threaded environments, but may unpredictably slow down the application.
Other garbage collection algorithms will intentionally "leak" memory, simply because free is expensive, and there's lots of memory. And then only when memory runs low, they perform a few short major free()s, performing better in some cases but showing high memory consumption in profiling tools.
Also, some garbage collections will initiate at apparently random patterns, which causes unpredictable performance. In some cases, predictably slow is better than sometimes-fast-other-times-slow, so those GCs tend to be bad in those cases. Some reference counters can perform well if this is a problem.

Finally, I can't think how scripting languages would be if there was no garbage collection.
Similarly, I can't see how "undefined behavior" garbage collection fits C++.

You *can* implement good GCs for C++, but it will almost always be a better job for the application itself to provide the garbage collector(so that it can opt-out of C++ features in exchange for GC efficiency).
GCs as "do it if you feel like it, but, like, dude, do as you will, really" is not a good idea. People who like GCs will be disgusted that GC is not always enabled, and how primitive and badly performing it will be. People who don't like GCs will be disgusted by the fact that a GC *might* be attached to their programs. Noobs won't realize their application leaks tons of memory because they heard "someone" say they didn't have to worry about it(conveniently ignoring the part that it isn't always enabled).

In the end, these are the main points:
1. GCs impact in performance is not clear unless you specify what type of GC you are discussing;
2. There is more beyond malloc/free. Even manual allocation goes way beyond that;
3. GCs have their use cases;
4. Not all GCs are equal. Specify what types of GCs you mean when you criticize them;
5. GCs greatly benefit from having typing information and the ability to modify pointers at any time;
6. Good GCs can be implemented in C++ by well disciplined programmers;
7. A GC that respects all limitations and features of C++ will likely miss a lot of the GC world;
8. When answering "is the language garbage collected?", both "Yes" and "No" are good questions. "Undefined behavior" is not.
Title: Re: C++0x and garbage collection
Post by: serprex on November 18, 2010, 07:06:36 pm
Stack variables are allocated all together. Static variables are not faster than the stack. The stack let's different code reuse the same memory. Keeping variables declared at the inner most scope is best, as it gets reused rather than redefined (Register allocation.) Those two kinds of memory don't suffer fragmentation, and don't have to deal with runtime allocation. This limits pretty close to malloc/free, which becomes close to mmap if you use large blocks which are then partitioned by one's own means. See things like arenas and pools. To understand more about malloc/free, read about free lists

Rusky contends that GC helps cache locality, but preferably one designs their objects to maintain locality by explicitly putting them together in memory, which resolves the issue also. That's the issue I have with people saying that JITs are going to bypass C: In Java, the JIT proves that array bounds are safe, and stops checking. In C, the compiler assumes the array bounds are safe, and doesn't check. One has a JIT to deal with, the other doesn't

Also, Rusky, LLVM doesn't have very good mechanisms for GC. They have some hooks to facillitate a shadow stack, but that says nothing of how much C++'s semantics fail to facillitate garbage collection. Hence Boehm admits that it cannot be standard compliant, only implementation friendly

And at that last part of I still don't know enough about it: Neither is ever always the right. Hence D allows the programmer to disable the GC for some things
Title: Re: C++0x and garbage collection
Post by: Rusky on November 19, 2010, 12:34:44 am
The stack is indeed a much better way to allocate memory in terms of cache usage and runtime overhead - memory access is minimized, fragmentation is impossible and there's no runtime mechanisms to keep track of what's allocated beyond a couple of registers that are heavily optimized within the processor. However, when you can't use the stack for whatever reason (dynamic size, data's scope doesn't follow the stack's behavior nicely, small stack, etc.) you need some kind of dynamic allocation.

Using malloc/free for each object is the least efficient, because each object can be a different size, each uses a few bytes to keep track of size/usage and eventually you end up with a fragmented heap without a lot of contiguous free memory. Allocating larger chunks for memory pools and free lists can help alleviate this problem and is often a good idea. However, you can still get bad cache usage with objects from different pools or far apart in their pools.

You can manually put objects together in memory by carefully tuning allocation, but (compacting/generational) garbage collection is sometimes a better way to do this because it automatically puts things allocated close together in the same chunk of memory. It also has the ability to rearrange the heap, so that even when the original assumption that objects allocated together are used together is wrong, objects that stay in use and do get accessed together get placed close together after a collection or two.

The tradeoffs between GC and manual memory management should always be taken into consideration - besides the end result of performance, which is complicated enough, there's also the cost in programmer time. In the real world it's often worth it to have somewhat sub-optimal performance rather than pay programmers more than that extra performance would earn you with extra sales. The same applies to scripting languages - people are willing to trade runtime performance for ease of use because overall the task gets done much more quickly.

---

LLVM supports more than a shadow stack - that's just the simplest way to do it because it's already 90% set up. Their garbage collector documentation (http://llvm.org/docs/GarbageCollection.html) explains what's possible beyond that. The Boehm collector has to work around a lot of limitations in C++'s semantics, but some of these could be removed by compiler support. While it's impossible to find all the pointers in a C++ program because of its weak typing, it is possible to be completely sure about things that stay within the type system.

Finally, JIT has nothing to do with bounds checking. The issue here is that people confuse JIT with all the type checking that most JIT languages include - you could easily write a JIT for C that could improve performance over statically compiled programs. Also, startup time is only a problem for some types of applications. Servers that are designed to stay running forever may start out slower but will just keep getting better and better as they run. Better system designs can help avoid repeating startup delays for normal desktop applications as well. By itself, JIT really can improve performance, not just "keep up."
Title: Re: C++0x and garbage collection
Post by: Josh @ Dreamland on November 19, 2010, 10:48:26 am
Quote
when you can't use the stack for whatever reason (dynamic size
alloca() :troll:
Title: Re: C++0x and garbage collection
Post by: Rusky on November 19, 2010, 11:48:10 am
Which is non-standard, doesn't always work the way you want or the same across platforms and is only one reason. :)
Title: Re: C++0x and garbage collection
Post by: serprex on November 19, 2010, 12:57:23 pm
It's standard in C99
Title: Re: C++0x and garbage collection
Post by: luiscubal on November 19, 2010, 02:11:54 pm
If you need to create a pointer and have it outlive the function, alloca won't work.
That said, it's still a good function to know about.