Pages: [1] 2
  Print  
Author Topic: C++0x and garbage collection  (Read 2376 times)
Offline (Unknown gender) luiscubal
Posted on: November 14, 2010, 06:06:41 PM
Member
Joined: Jun 2009
Posts: 452

View Profile Email
What happens to variables you forgot to free?
Previously, you'd expect to get a memory leak in C++, but in C++0x, according to wikipedia:

Quote
It is implementation defined whether unreachable dynamically allocated objects are automatically reclaimed.

Of course, considering how permissive C++ is regarding memory, decent garbage collectors are nearly impossible, so we're most left with stuff like Boehm's conservative GC.

Note that implementation defined doesn't mean that all implementations GC. Only if the compiler makers decide so.
On GCC, I'd guess garbage collector would end up as some compiler flag.
Logged
Offline (Male) Rusky
Reply #1 Posted on: November 15, 2010, 10:51:28 AM

Resident Troll
Joined: Feb 2008
Posts: 961
MSN Messenger - rpjohnst@gmail.com
View Profile WWW Email
The idea was to allow implementation of real garbage collection from within the language (possibly through extensions to operator new, etc.), and that idea has been around for a decade at least. However, the committee just keeps putting it off so they decided to put that in rather than do nothing.

My guess is that it was intended to allow compiler writers to experiment so they can standardize on something later - maybe Clang will be able to come up with something using LLVM's GC support?
Logged
Offline (Male) RetroX
Reply #2 Posted on: November 15, 2010, 05:51:12 PM

Master of all things Linux
Contributor
Location: US
Joined: Apr 2008
Posts: 1055
MSN Messenger - classixretrox@gmail.com
View Profile Email
I think that GC is a waste of time.  If you leak memory, that's your own fault, and you can use valgrind and a debugger to find it.

The fact that languages like Java that heavily rely on GC are being used for teaching is terrible, and that's probably why my graphics drivers can leak 20 MB by just opening a window.
Logged
My Box: Phenom II 3.4GHz X4 | ASUS ATI RadeonHD 5770, 1GB GDDR5 RAM | 1x4GB DDR3 SRAM | Arch Linux, x86_64 (Cube) / Windows 7 x64 (Blob)
Quote from: Fede-lasse
Why do all the pro-Microsoft people have troll avatars? :(
Offline (Male) Josh @ Dreamland
Reply #3 Posted on: November 15, 2010, 09:56:42 PM

Prince of all Goldfish
Developer
Location: Ohio, United States
Joined: Feb 2008
Posts: 2953

View Profile Email
Don't give Valgrind too much credit.
Though the idea of a GC irks me, Valgrind is not perfect. It has been difficult to determine what ENIGMA leaks Valgrind indicates are legitimate, since it gives a definite loss record on all GL Mesa calls, and a possible loss record on all ENIGMA's pointer arithmetic (used for such sinister concoctions as lua_table<>).

So I can't run anything in ENIGMA without an 8MB loss record from one GLX call.

That said, the idea of relying on a GC to clean up after me is disgusting.
Logged
"That is the single most cryptic piece of code I have ever seen." -Master PobbleWobble
"I disapprove of what you say, but I will defend to the death your right to say it." -Evelyn Beatrice Hall, Friends of Voltaire
Offline (Unknown gender) luiscubal
Reply #4 Posted on: November 16, 2010, 12:25:29 PM
Member
Joined: Jun 2009
Posts: 452

View Profile Email
Some people use GCs for memory leak detection.
Logged
Offline (Male) RetroX
Reply #5 Posted on: November 16, 2010, 05:52:06 PM

Master of all things Linux
Contributor
Location: US
Joined: Apr 2008
Posts: 1055
MSN Messenger - classixretrox@gmail.com
View Profile Email
Some people use GCs for memory leak detection.
I like this idea, however, I don't like it as a standard practice.
Logged
My Box: Phenom II 3.4GHz X4 | ASUS ATI RadeonHD 5770, 1GB GDDR5 RAM | 1x4GB DDR3 SRAM | Arch Linux, x86_64 (Cube) / Windows 7 x64 (Blob)
Quote from: Fede-lasse
Why do all the pro-Microsoft people have troll avatars? :(
Offline (Male) Rusky
Reply #6 Posted on: November 16, 2010, 09:01:28 PM

Resident Troll
Joined: Feb 2008
Posts: 961
MSN Messenger - rpjohnst@gmail.com
View Profile WWW Email
Unilaterally condemning garbage collection as "disgusting" is extremely short-sighted. GC has practical benefits beyond ease of use, and while they may not apply to your area of programming they definitely apply to others. GC greatly improves cache usage, improves allocation performance (which in some cases, no matter how hard it is for you to believe, can result in an overall performance increase) and can occasionally improve collection performance. Garbage collection is a useful trade-off, not a crutch for lazy people - it just shifts work around between malloc/free, really. However, beyond it not always being the best idea for games, I'll agree that there are better ways to manage memory in general, like automatic stack allocation.

I'm especially irritated by Retro's hypocritical attitude of "If you leak memory, that's your own fault." You have an incredibly powerful machine sitting in front of you - why not take advantage of it? The trade-off is often worth it, and you do it all the time anyway, with things like type systems (static or dynamic, they automatically choose the right opcodes for your data types), shell scripts (oh my gosh dynamic typing and garbage collection you horrible sloppy pig), your CPU's MMU (Singularity, an operating system written in C#, eliminates the need for this and is much faster despite using garbage collection and "over-use" of classes) and a secure operating system (if you run malicious programs or let stupid people use your computer it's your own fault).

Really, these technologies have uses. There's almost never such a thing as an unconditionally bad technique or tool.
Logged
Offline (Male) RetroX
Reply #7 Posted on: November 16, 2010, 10:13:40 PM

Master of all things Linux
Contributor
Location: US
Joined: Apr 2008
Posts: 1055
MSN Messenger - classixretrox@gmail.com
View Profile Email
Unilaterally condemning garbage collection as "disgusting" is extremely short-sighted. GC has practical benefits beyond ease of use, and while they may not apply to your area of programming they definitely apply to others. GC greatly improves cache usage, improves allocation performance (which in some cases, no matter how hard it is for you to believe, can result in an overall performance increase) and can occasionally improve collection performance.
Although, you fail to explain how.  If there actually is a good reason, I'd like to know what it is.

I'm especially irritated by Retro's hypocritical attitude of "If you leak memory, that's your own fault." You have an incredibly powerful machine sitting in front of you - why not take advantage of it? The trade-off is often worth it, and you do it all the time anyway, with things like type systems (static or dynamic, they automatically choose the right opcodes for your data types), shell scripts (oh my gosh dynamic typing and garbage collection you horrible sloppy pig), your CPU's MMU (Singularity, an operating system written in C#, eliminates the need for this and is much faster despite using garbage collection and "over-use" of classes) and a secure operating system (if you run malicious programs or let stupid people use your computer it's your own fault).

Really, these technologies have uses. There's almost never such a thing as an unconditionally bad technique or tool.
Yes, I have an incredibly powerful machine.  AMD's drivers leak 20 MB from opening a window.  That's a massive amount of memory, and if there was GC, it would all be solved.

How many calculations does it require to collect 20 MB of memory?  If we assume that all of those MB are in floats or integers, we can pretend that it's one calculation per 4 bytes.  That's 2 * 1024^2 "calculations."  Sure, four integers per frame in a game isn't bad for GC, but 20 MB?  There are some cases where GC might be a bit much, and these things add up.  If it's 20 MB per window, and you open multiple windows at once, that number doubles or triples or quadruples.  Now, it takes a few seconds for programs to start when they could start instantaneously.

Good programming is far better than having some garbage collector come and pick up the trash that you leave behind.  I agree that for finding these errors, GC is extremely useful, but not in every case should it be used as common practice.
Logged
My Box: Phenom II 3.4GHz X4 | ASUS ATI RadeonHD 5770, 1GB GDDR5 RAM | 1x4GB DDR3 SRAM | Arch Linux, x86_64 (Cube) / Windows 7 x64 (Blob)
Quote from: Fede-lasse
Why do all the pro-Microsoft people have troll avatars? :(
Offline (Male) Rusky
Reply #8 Posted on: November 16, 2010, 11:05:51 PM

Resident Troll
Joined: Feb 2008
Posts: 961
MSN Messenger - rpjohnst@gmail.com
View Profile WWW Email
Great job, you have shown a complete and utter lack of knowledge about both GC and non-GC systems.

In this post, I'm talking specifically about generational/compacting GC, which is what most major VMs and runtimes use, but other types often have their own benefits. GC improves cache usage by keeping objects that are used together close by each other in memory so they can be in the same cache lines. You do know what that means, right? GC improves allocation performance because allocation consists of a store and an increment rather than walking a list, removing a node and reinserting a node. GC can improve collection performance because rather than walking a list, removing, merging and inserting a node on every free you follow some pointers and then compact the objects you find in the heap, only when you need more memory - these approaches can both be advantageous depending on the situation.

Garbage collection doesn't scale with the amount of memory - that would be an insanely brain-dead way to do it. It scales with the number of references- what happens is that when you hit a certain threshold of allocated memory you follow the pointers in globals and on the stack through to pointers in objects and so on until you see everything that's reachable. Everything else on the heap is dead and can be overwritten. Can you see how this is just a redistribution of the work? Rather than leave everything to the programmer, some of the run-time work like managing sizes or reference counts goes into the compiler describing what variables are pointers, and some work moves from allocation to freeing, and all the freeing gets combined so costs can be amortized.

Good programming is understanding the tools you have available and choosing the ones that work the best for the current situation. Dynamically allocated memory - automatically or manually collected - is not always the best tool. When it is, sometimes automatic garbage collection is a better tool than manual memory management. There is virtually never a reason to ban something for every situation or even most situations. GC is useful for more than debugging, whether or not you care to understand how it works and whether or not you consider it lazy.
Logged
Offline (Male) RetroX
Reply #9 Posted on: November 17, 2010, 04:26:38 PM

Master of all things Linux
Contributor
Location: US
Joined: Apr 2008
Posts: 1055
MSN Messenger - classixretrox@gmail.com
View Profile Email
Great job, you have shown a complete and utter lack of knowledge about both GC and non-GC systems.
Gee, thanks for letting me know.  You might as well have said "You fucking moron, why the fuck do you not know what GC is, what the fuck, you're stupid," because it probably would have said the same thing.

You're terrible at explaining things, and you're a dick when you do it.


On the topic of what you actually said, I honestly don't know anything about how managing memory works, but as far as I know, it's removing unclaimable "garbage" objects.  In other words, if I store a new pointer to an object over an old, and that object isn't pointed to anywhere, it will be deleted.

That's probably wrong, but both you and wikipedia do an extremely terrible job at explaining it to people that haven't had five years of education on the subject.
Logged
My Box: Phenom II 3.4GHz X4 | ASUS ATI RadeonHD 5770, 1GB GDDR5 RAM | 1x4GB DDR3 SRAM | Arch Linux, x86_64 (Cube) / Windows 7 x64 (Blob)
Quote from: Fede-lasse
Why do all the pro-Microsoft people have troll avatars? :(
Post made November 18, 2010, 08:38:53 AM was deleted at the author's request.
Offline (Unknown gender) luiscubal
Reply #11 Posted on: November 18, 2010, 02:26:11 PM
Member
Joined: Jun 2009
Posts: 452

View Profile Email
The argument for GC is the following:

1.
malloc and free are relatively expensive. By reusing the same memory location for multiple purposes, you can save some time.
Code: [Select]
int x = 0;
for (int i = 0; i < 1000; ++i) {
   char* array = (char*) malloc(sizeof(char) * 1000);
   memset(array, 0, sizeof(char) * 1000);
   //do something with array.
   x += array[0].
   free(array);
}
//vs
int x = 0;
char* array = (char*) malloc(sizeof(char) * 1000);
for (int i = 0; i < 1000; ++i) {
   memset(array, 0, sizeof(char) * 1000);
   //do something with array.
   x += array[0].
}
free(array);
The second alternative is more efficient.
Of course, the example above is a very simple case that can be solved with very minimalistic code redesign. However, for more complex situations, you'll find yourself needing to use Memory pools and custom allocators.
I'll come back to this topic later.

2. Code and data locality benefits cache.
Rusky is claiming that garbage collectors can reduce the frequency of cache misses.
The CPU cache makes RAM access substantially faster, and cache misses truly damage program speed.
Again, I'll come back to this topic later.

3. It is as retarded to say "X is universally bad" as it is to say "X is universally good". Not understanding X is not a good reason to claim X is bad.

So, I promised I'd come back to 1. and 2., so here we are.
Regarding memory pools and locality, be aware that GCs like Boehm can not possibly be expected to do the really awesome GC magic that Rusky mentioned. Perhaps only a very weak version of it.
GCs like Boehm too can suffer from fragmentation:
(images from Mono's Compacting GC page:

GCs like Mono's SGen can perform tricks C++ can not.


Now THAT can improve locality.
Logged
Offline (Female) serprex
Reply #12 Posted on: November 18, 2010, 04:53:20 PM
Smooth ER
Developer
Joined: Apr 2008
Posts: 106

View Profile WWW
Retro,
Firefox has areas where it isn't a memory leak that causes high memory usage, but a large amount of fragementation. If you can't allocate memory in evenly sized blocks without being overly pessimistic so that small objects bloat the memory usage though, then I guess that's your own fault
Really I think the issue is that when people say manual memory management, they think malloc/free. I've been leaning more towards mmap as of late, but then you're outside of standard C. I was recently toying with an explicit memory manager which also had compacting, though the implementation was rather brain dead due to the half ass proof of concept nature of the project. Garbage collection is a design pattern, and where it isn't supplied by the underlying framework, it'll end up being poorly implemented by the application. There are a number of refcounted APIs in C, but refcounting isn't necessarily the superior mechanism. It's only that refcounting is suitable for explicit use by the programmer, as it doesn't deviate much from malloc/free
In any case, you should learn to read Rusky's responses at times. He answered your questions, albeit with hostility. He over evangelizes, but that's what happens when one is forced to play devil's advocate in a crowd
Logged
Offline (Male) RetroX
Reply #13 Posted on: November 18, 2010, 05:59:07 PM

Master of all things Linux
Contributor
Location: US
Joined: Apr 2008
Posts: 1055
MSN Messenger - classixretrox@gmail.com
View Profile Email
Retro,
Firefox has areas where it isn't a memory leak that causes high memory usage, but a large amount of fragementation. If you can't allocate memory in evenly sized blocks without being overly pessimistic so that small objects bloat the memory usage though, then I guess that's your own fault
Really I think the issue is that when people say manual memory management, they think malloc/free. I've been leaning more towards mmap as of late, but then you're outside of standard C. I was recently toying with an explicit memory manager which also had compacting, though the implementation was rather brain dead due to the half ass proof of concept nature of the project. Garbage collection is a design pattern, and where it isn't supplied by the underlying framework, it'll end up being poorly implemented by the application. There are a number of refcounted APIs in C, but refcounting isn't necessarily the superior mechanism. It's only that refcounting is suitable for explicit use by the programmer, as it doesn't deviate much from malloc/free
In any case, you should learn to read Rusky's responses at times. He answered your questions, albeit with hostility. He over evangelizes, but that's what happens when one is forced to play devil's advocate in a crowd
I read his post.  It was calling me stupid because I didn't know what I was talking about (which I didn't), then proceeded to explain as if I knew anything.  I don't know how memory is allocated; based upon my extremely simplistic knowledge on the subject, it seemed like GC was a bad idea.  I kind of see why it's a better idea now.

I'll admit that I was being terribly ignorant on the subject.

After looking at luis's example, there's a reason that static variables exist.  Why recreate a variable for every call to a function when you can create it once and reuse it?  The first example is extremely inefficient for regular variables, and I always define variables outside of a block and try to use as few as possible.

I still think that while GC can help memory allocation be a lot faster, it's not always the best option.  Or maybe it is.  I still don't know enough about it.
Logged
My Box: Phenom II 3.4GHz X4 | ASUS ATI RadeonHD 5770, 1GB GDDR5 RAM | 1x4GB DDR3 SRAM | Arch Linux, x86_64 (Cube) / Windows 7 x64 (Blob)
Quote from: Fede-lasse
Why do all the pro-Microsoft people have troll avatars? :(
Offline (Unknown gender) luiscubal
Reply #14 Posted on: November 18, 2010, 06:32:23 PM
Member
Joined: Jun 2009
Posts: 452

View Profile Email
@RetroX Multi-threading pretty much screws up static variables way more than stack variables(although in this case, only the pointer is in the stack. The values pointed by "array" are in the "heap"). Also, recursive functions don't play well along static variables.

Also, remember it was an *example*. I was trying to use simple C to show you the *concept*.
The problem is not my example, because if that was the worst case then GC wouldn't help much.
The *real* problem comes when the "array" variables aren't together in a single function that is easily modified to be more efficient. The problem comes when the variables are spread across the application. In some cases, there is no obvious way to "reuse" array using plain old malloc/free.
The Mono GC image I quoted is a good example of what's wrong with manual allocations.

Note that not all GCs are created equal.
For instance, Boehm can't hope to perform the tricks Mono's GC does, simply because Boehm tries to play nice with standard C++, so pointers should always point to fixed locations. Boehm, therefore, can do very little to help with fragmentation. In addition, Boehm has no typing information, so it can't know the difference between integers and pointers. As a result, it can sometimes think that a value is a pointer to some memory location when it is just some random integer.
Reference counters that are simply a struct { int refs; void* ptr } also obviously don't help with the efficiency situation discussed above. Of course, they *are* simple and straightforward to implement, not to mention that with a few locks or atomic integers, it is easy to make it multi-threaded.
Some garbage collection algorithms are also based on the concept of "stopping the world", which can really help with implementation in multi-threaded environments, but may unpredictably slow down the application.
Other garbage collection algorithms will intentionally "leak" memory, simply because free is expensive, and there's lots of memory. And then only when memory runs low, they perform a few short major free()s, performing better in some cases but showing high memory consumption in profiling tools.
Also, some garbage collections will initiate at apparently random patterns, which causes unpredictable performance. In some cases, predictably slow is better than sometimes-fast-other-times-slow, so those GCs tend to be bad in those cases. Some reference counters can perform well if this is a problem.

Finally, I can't think how scripting languages would be if there was no garbage collection.
Similarly, I can't see how "undefined behavior" garbage collection fits C++.

You *can* implement good GCs for C++, but it will almost always be a better job for the application itself to provide the garbage collector(so that it can opt-out of C++ features in exchange for GC efficiency).
GCs as "do it if you feel like it, but, like, dude, do as you will, really" is not a good idea. People who like GCs will be disgusted that GC is not always enabled, and how primitive and badly performing it will be. People who don't like GCs will be disgusted by the fact that a GC *might* be attached to their programs. Noobs won't realize their application leaks tons of memory because they heard "someone" say they didn't have to worry about it(conveniently ignoring the part that it isn't always enabled).

In the end, these are the main points:
1. GCs impact in performance is not clear unless you specify what type of GC you are discussing;
2. There is more beyond malloc/free. Even manual allocation goes way beyond that;
3. GCs have their use cases;
4. Not all GCs are equal. Specify what types of GCs you mean when you criticize them;
5. GCs greatly benefit from having typing information and the ability to modify pointers at any time;
6. Good GCs can be implemented in C++ by well disciplined programmers;
7. A GC that respects all limitations and features of C++ will likely miss a lot of the GC world;
8. When answering "is the language garbage collected?", both "Yes" and "No" are good questions. "Undefined behavior" is not.
Logged
Pages: [1] 2
  Print