The Mystery of Concurrent GC



There’s been a discussion going on within DevelopMentor for a couple weeks regarding concurrent GC and when it really applies.


The idea behind the concurrent collector is to do as much of the GC while the UI thread continues to process UI stuff and then only interrupt the application threads when memory is being shuffled around and fixups are occurring. This provides for more responsive UI applications at the expense of slower collections and a higher memory utilization.


The instructors who are involved with .NET were discussing this because of a discrepancy in various reliable sources – First, Maoni Stephens, the ultimate authority on all things .NET GC, stated in her blog entries that concurrent GC was the default behavior for the workstation version of GC (see http://blogs.msdn.com/maoni for more information on this).


This is well known, and now well documented in various places. However, Jeff Richter seems to say in his book “CLR via C#” that concurrent collections occur only on multiprocessor machines. Several other authors back this up (notably Stephen Pratschner in “Customizing the .NET Framework Common Language Runtime” and Joe Duffy in his “Professional .NET Framework 2.0” book; both are excellent btw).


So, we had some differing opinions from people in the “know”. Our Effective .NET and Essential .NET courses were written around Maoni’s blogs and so the diagram we presented showed concurrent collections even on single-processor machines since she didn’t explicitly say it required multiple processors to turn it on. One of the guys noticed this and said “Wait! That’s wrong!”


We tried to actually see the concurrent collection in action but it turns out that it’s actually quite difficult to get this to happen because concurrent collections only occur on Gen2, which for a well-written application shouldn’t occur that often. To add to the complexity, the CLR attempts to optimize this behavior – just because it can do the GC concurrently, it might not.  Finally, the thread which is used for this collection is actually created and destroy as necessary so it doesn’t exist most of the time.


So, to settle the argument, Jason Whittington and I spent some time spelunking into the CLR this past week to see if we could spot the concurrent collection in action. With a little WinDBG and some symbols, I think we’ve put the question to rest once and for all (at least for us) :-) .  If you scan the GC symbols in mscorwks.dll, you’ll find a treasure trove of information; one of the things that caught my eye was this:

0:000> x mscorwks!WKS::gcheap::*
79f8c5dd mscorwks!WKS::GCHeap::Relocate = <no type information>
7a0d09d9 mscorwks!WKS::GCHeap::ValidateObjectMember = <no type information>
7a088e2a mscorwks!WKS::GCHeap::IsConcurrentGCInProgress = <no type information>

Disassembling that method showed me that it checks a flag in memory —
0:000> u mscorwks!WKS::GCHeap::IsConcurrentGCInProgress
mscorwks!WKS::GCHeap::IsConcurrentGCInProgress:
7a088e2a a1701b387a mov eax,dword ptr [mscorwks!WKS::gc_heap::settings+0x10 (7a381b70)]
7a088e2f c3 ret
I tried putting a breakpoint at the end of the GC process and dumping that flag out when it was a Gen2 collection (the breakpoint itself stolen shamelessly from Maoni’s MSDN article in November 2006):
0:000> bp mscorwks!WKS::GCHeap::RestartEE “j 
(dwo(mscorwks!WKS::GCHeap::GcCondemnedGeneration)==2)
‘dd mscorwks!WKS::gc_heap::settings+0×10 L1′;’g'”
But I didn’t get much success seeing it set to “1″ – I suspect that it requires a more complex application than I was using to cause the CLR to decide a concurrent collection is worth the effort. So I decided this was a dead end and started looking to see where the GC process determines that concurrent collections are allowed at all. I figured GCHeap::Initialize sounded like a good spot —
0:000> u mscorwks!WKS::GCHeap::Initialize
mscorwks!WKS::GCHeap::Initialize:
79ed6924 a14412387a mov eax,dword ptr [mscorwks!g_pConfig (7a381244)]
79ed6929 8b4058 mov eax,dword ptr [eax+58h]
…..
79ed697b 51 push ecx
79ed697c e823040000 call mscorwks!WKS::gc_heap::initialize_gc (79ed6da4)
79ed6981 85c0 test eax,eax
Going into that function yielded what I was looking for —
0:000> u mscorwks!WKS::gc_heap::initialize_gc
mscorwks!WKS::gc_heap::initialize_gc:
79ed6da4 56 push esi
79ed6da5 e80cf1ffff call mscorwks!WKS::write_watch_api_supported (79ed5eb6)
79ed6daa 33f6 xor esi,esi

79ed6dc6 c70564ce387a01000000 mov dword ptr [mscorwks!WKS::gc_heap::gc_can_use_concurrent (7a38ce64)],1
Looking at that particular memory location, I tried various configurations to make sure I had the right spot. First, I did a simple console app with no config file and dumped out the location on a multi-processor box:
0:000> dd mscorwks!WKS::gc_heap::gc_can_use_concurrent L1
7a38ce64 <b>00000001
b>
Next, I set the application up in server mode (also on a multi-processor box):
<xml version=”1.0″ encoding=”utf-8″ ?>
<configuration>
<runtime>
<gcServer enabled=”true” />
</
runtime>
</configuration>
0:000> dd mscorwks!WKS::gc_heap::gc_can_use_concurrent L1
7a38ce64 <b>00000000
b>
Showing that it was now off.. next I tried a VM session which emulates a single processor:
0:000> dd mscorwks!WKS::gc_heap::gc_can_use_concurrent L1
7a38ce64 <b>00000001
b>

It appears that the CLR is capable of doing concurrent collections on a single processor! My final test was to try turning it off altogether via a configuration switch:

<xml version=”1.0″ encoding=”utf-8″ ?>
<configuration>
<runtime>
<gcConcurrent enabled=”false” />
</
runtime>
</configuration>
0:000> dd mscorwks!WKS::gc_heap::gc_can_use_concurrent L1
7a38ce64 <b>00000000
b>

As a final note on this, Jason asked Maoni directly and her response was that concurrent GC does not depend on the number of processors – just as we are seeing above. So, apparently several primary sources are actually incorrect on this behavior. It just goes to show how complex the overall GC process really is — simple in concept, but boy there are a lot of moving parts!

SOS: finding the method bound to an EventHandler with WinDbg.

I was preparing a sample memory leak application for an Advanced C# class at Microsoft this past week and debugging through it with SOS.DLL (“Son of Strike”). My prepared application was an ASP.NET application that would leak memory by holding references to the page objects after they had completed their work. I did this by having the page hook up an event handler to a global event and then never remove the handler.




This, of course, is bad form because the System.Web.UI.Page object is intended to be a transient object – it goes away at the end of the request – in production code, I would really bind the event to a handler in global.asax instead. But as I said, this was a sample.




So, I was debugging through it prior to the class just to make sure it exhibited the behavior I was looking for. I run the web page a few times, noted my working set going up and never coming back down. Next, I used ADPLUS.VBS to take a hang dump of the process. I then loaded this dump up into WinDBG and started poking around. First, I looked at the heap and sure enough I saw a bunch of page objects:


0:000> .load sos
0:000> !DumpHeap -stat
total 36955 objects
Statistics:
      MT    Count    TotalSize Class Name
7b4ecd7c        1           12 System.Windows.Forms.ButtonInternal.ButtonPopupAdapter
7b481f00        1           12 System.Windows.Forms.LinkLabel+LinkComparer
7b475ca8        1           12 System.Windows.Forms.FormCollection
7b474f8c        1           12 System.Windows.Forms.Layout.DefaultLayout
7b4749e0        1           12 System.Windows.Forms.ClientUtils+WeakRefCollection
7b473ca8        1           12 System.Windows.Forms.Layout.ArrangedElementCollection
7a755834        1           12 System.Diagnostics.PerformanceCounterCategoryType
7a753394        1           12 System.Diagnostics.TraceOptions
7a71a710        1           12 System.Net.TimeoutValidator
.......
00166030      891       169744      Free
054d24d4     3128       187680 System.Web.UI.LiteralControl
0548cbd4      519       197220 ASP.default_aspx
791242ec     1545       297960 System.Collections.Hashtable+bucket[]
79124670     1185      1090500 System.Char[]
79124228    11961      1279380 System.Object[]
790fa3e0    19149      1561392 System.String
Total 110069 objects

So, next I used DumpHeap to just look at this specific type by giving it a metadata token:

0:000> !DumpHeap -mt 0548cbd4        
 Address       MT     Size
.....     
01854ff0 0548cbd4      380     
01860130 0548cbd4      380     
0186b2b4 0548cbd4      380     
018773f8 0548cbd4      380     
01882538 0548cbd4      380     
0188d6bc 0548cbd4      380     
01898840 0548cbd4      380     
018a39c4 0548cbd4      380     
018aeb48 0548cbd4      380     
total 519 objects
Statistics:
      MT    Count    TotalSize Class Name
0548cbd4      519       197220 ASP.default_aspx
Total 519 objects

I then used the very cool GCRoot command to determine why a page instance was still rooted and therefore not collectable.


Note: it appears that GCRoot doesn’t work well inside VS.NET 2005 – apparently the SOS debugging extension is using some debugger API which isn’t fully supported in VS.NET, so you need to familiarize yourself with WinDBG to do this.



0:000> !gcroot 018aeb48 
Note: Roots found on stacks may be false positives. Run "!help gcroot" for
more info.
Scan Thread 0 OSTHread 3a8
Scan Thread 2 OSTHread e8
Scan Thread 3 OSTHread 1a8
Scan Thread 6 OSTHread 7d4
Scan Thread 7 OSTHread 2b4
Scan Thread 8 OSTHread fdc
Scan Thread 9 OSTHread eac
DOMAIN(001E5E08):HANDLE(Pinned):12312f0:Root:0226c498(System.Object[])->
018af940(System.EventHandler)->
0186c0ac(System.Object[])->
018af920(System.EventHandler)->
018aeb48(ASP.default_aspx)

With this output, I can tell that my default_aspx object is being kept alive through an EventHandler as I expected. The interesting thing about this output is I cannot tell which event handler is keeping it alive – i.e. there is nothing in this root list that points to a specific object holding it other than a object[]. That essentially means this is a static event and there isn’t an actual object around on the heap for it. This just makes the debugging exercise more interesting – after all if it were an instance event then I would see the class name at the top of the root list and we could stop right here.




So, my next step is to dump the event handler to try to identify what method it is wrapping – I could then search the code for this method and find out where it is being bound. I dump out the EventHandler object:

0:000> !do 018af920
Name: System.EventHandler
MethodTable: 7910d61c
EEClass: 790c3a7c
Size: 32(0x20) bytes
 (C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790f9c18  40000f9        4        System.Object  0 instance 018aeb48 _target
79109208  40000fa        8 ...ection.MethodBase  0 instance 00000000 _methodBase
790fe160  40000fb        c        System.IntPtr  0 instance 88962888 _methodPtr
790fe160  40000fc       10        System.IntPtr  0 instance        0 _methodPtrAux
790f9c18  4000106       14        System.Object  0 instance 00000000 _invocationList
790fe160  4000107       18        System.IntPtr  0 instance        0 _invocationCount

From this I get the internals of the EventHandler object. It gives me some specific information:

  • _target is the object the delegate is holding onto – my default_aspx in this case. The value better match up with my original object!
  • _methodBase is used for dynamic code and is null here.
  • _methodPtr is the method associated with the instance, or possibly a dynamically generated thunk for static methods. This is what I’m interested in.
  • _methodPtrAux is used for static methods; this holds the method descriptor and the _methodPtr is a dynamically generated block of code to remove the this reference. Noting that this is null, I can infer that this is an instance method bound to the delegate.
  • _invocationList and _invocationCount are used for Multicast Delegates — these are both zero indicating that this is a single delegate and there is no chain to follow.

My next step is to try to get a valid method name from the _methodPtr so I convert that to hex and pass it into IP2MD.

0:000> !ip2md 0n88962888 
Failed to request MethodData, not in JIT code range

This is a pretty common error and simply means that the method may not have been JITted yet, or may be a dynamic block of code which never went through the JIT compiler. In .NET 1.1 we could start dumping method descriptors and trying to match up the address (DumpClass -md) for this class and it’s base class.




However, under .NET 2.0 this rarely works anymore – the address doesn’t appear to ever match up to a valid descriptor. However, it clearly is part of the managed heap due to it’s address. So, failing to locate this address, I tried disassembling the code:

0:000> !u 0n88962888 
Unmanaged code
054d7748 e862289b74      call    mscorwks!LogHelp_TerminateOnAssert+0x3f5f (79e89faf)
054d774d 5e              pop     esi
054d774e cc              int     3
054d774f cc              int     3
054d7750 38c8            cmp     al,cl
054d7752 48              dec     eax
054d7753 05a0774d05      add     eax,54D77A0h
054d7758 0100            add     dword ptr [eax],eax
054d775a 0011            add     byte ptr [ecx],dl
054d775c 0000            add     byte ptr [eax],al

Humph. This doesn’t even look like valid code to me.. this looks like random data, so I dumped it out:

0:000> dd 0n88962888 
054d7748  9b2862e8 cccc5e74 0548c838 054d77a0
054d7758  11000001 90000000 054d77a0 11000002
054d7768  90000004 00000000 054d77a0 00000000

The third and fourth DWORD look interesting because they appear to fall in the managed heap as well — so I began to dump them out trying to figure out what they were. I found that the first value is a method descriptor:

0:000> !dumpmd 0548c838 
Method Name: _Default.OnDatabaseHasChanged(System.Object, System.EventArgs)
Class: 054ab574
MethodTable: 0548c86c
mdToken: 06000004
Module: 0548c35c
IsJitted: no
m_CodeOrIL: ffffffff

It is the real method bound to the delegate instance. The other DWORD appears to be an metadata reference to the event owner itself:

0:000> !dumpmt 054d77a0
EEClass: 0551940c
Module: 048ac9ec
Name: DatabaseMonitor
mdToken: 02000002  (C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\Temporary ASP.NET Files\leakypage\2a399ab5\b1e04c63\App_Code.onwg1zqj.dll)
BaseSize: 0xc
ComponentSize: 0x0
Number of IFaces in IFaceMap: 0
Slots in VTable: 8

From here I can see the module that this code is defined in (the dynamically generated App_Code directory) and the name (DatabaseMonitor). This gives me enough information to stop here and begin looking at the code itself – specifically where default.aspx binds it’s OnDatabaseHasChanged method to the DatabaseMonitor static class defined somewhere in the App_Code directory. If I didn’t have the source code available, I could locate the module and then save it out to a file through the savemodule command:

0:000> lm m App_Code_onwg1zqj
start    end        module name
04de0000 04de8000   App_Code_onwg1zqj C (no symbols)           
0:000> !savemodule 04de0000 c:\appcode.dll
3 sections in file
section 0 - VA=2000, VASize=504, FileAddr=200, FileSize=600
section 1 - VA=4000, VASize=2c8, FileAddr=800, FileSize=400
section 2 - VA=6000, VASize=c, FileAddr=c00, FileSize=200

I could then ILDasm or Reflector the generated assembly and look for my bug from that.



I love SOS and WinDBG.. if only they could match up managed source code to the image I’d abandon VS.NET as a debugger altogether..

Debugging the underlying Telephony calls

So, a question was asked “How do I determine what’s happening in the TAPI3 wrapper”?  The answer is you turn on the internal trace source — ITapi3 was built with a build in tracing facility to tell you when it had any underlying interface or COM failures and it’s easy to activate.  First, add an Application Configuration File to your project.  Open that file and add the following lines:


<?xml version=1.0 encoding=utf-8 ?>
<
configuration>
   <
system.diagnostics>
      <
sources>
         <
source name=ITapiTrace switchName=tapiSwitch switchType=System.Diagnostics.SourceSwitch>
            <
listeners>
               
<add name=MyTraceLog type=System.Diagnostics.TextWriterTraceListener initializeData=MyTrace.txt />
            </
listeners>
         </
source>
      </
sources>
      <
switches>
         <
add name=tapiSwitch value=All />
      </
switches>
   </
system.diagnostics>
</
configuration>


This will create a file called “MyTrace.txt” in your working directory.  The important line is the source tag which identifies the internal TraceSource object used by the ITapi3 library.  Inside this file will be the internal TAPI3 calls being made for your application.  As an example, the following trace shows me that several underlying COM errors occurred in the running of a simple TAPI3 application — it was unable to retrieve the ITTerminal interface from the ITAddressEvent interface (which actually isn’t really an error), failed to open the line (because Unimodem won’t allow the media type VOICE to be passed for my modem), and failed to set the play list for this MSP – [0x80040216] is actually a DirectShow error [VFW_E_NOT_FOUND].


ITapiTrace Verbose: 0 : Creating ITTAPI instance
ITapiTrace Verbose: 0 : Hooking up connection sink to ITTAPI interface
ITapiTrace Information: 0 : ITTapi::put_EventFilter(0x8001F) hr=0×0
ITapiTrace Error: 0 : COM Hresult 0×80040004 “The MEDIATYPE passed in to this method was invalid.” generated 
   at JulMar.Tapi3.TapiException.ThrowExceptionForHR(Int32 hr)
   at JulMar.Tapi3.TTapi.RegisterCallNotifications(ITAddress* pitf, Int16 vbMonitor, Int16 vbOwner, Int32 supportedMediaTypes)
   at JulMar.Tapi3.TAddress.Open(TAPIMEDIATYPES supportedMediaTypes)
ITapiTrace Error: 0 : ITAddressEvent::get_Terminal failed hr=0×80040055
ITapiTrace Verbose: 0 : Processing TapiAddressChangedEventArgs: Evt=AE_RINGING, Address=DSSP Line #1 – Address 5, Terminal=
ITapiTrace Error: 0 : ITAddressEvent::get_Terminal failed hr=0×80040055
ITapiTrace Verbose: 0 : Processing TapiAddressChangedEventArgs: Evt=AE_RINGING, Address=DSSP Line #1 – Address 4, Terminal=
ITapiTrace Error: 0 : ITAddressEvent::get_Terminal failed hr=0×80040055
ITapiTrace Verbose: 0 : Processing TapiAddressChangedEventArgs: Evt=AE_RINGING, Address=DSSP Line #1 – Address 3, Terminal=
ITapiTrace Error: 0 : ITAddressEvent::get_Terminal failed hr=0×80040055
ITapiTrace Verbose: 0 : Processing TapiAddressChangedEventArgs: Evt=AE_RINGING, Address=DSSP Line #1 – Address 2, Terminal=
ITapiTrace Error: 0 : ITAddressEvent::get_Terminal failed hr=0×80040055
ITapiTrace Verbose: 0 : Processing TapiAddressChangedEventArgs: Evt=AE_RINGING, Address=DSSP Line #1 – Address 1, Terminal=
ITapiTrace Error: 0 : ITAddressEvent::get_Terminal failed hr=0×80040055
ITapiTrace Verbose: 0 : Processing TapiAddressChangedEventArgs: Evt=AE_RINGING, Address=DSSP Line #1 – Address 0, Terminal=
ITapiTrace Verbose: 0 : Processing TapiCallNotificationEventArgs: Event=CNE_OWNER, Call=TCall: 171360625 CS_OFFERING
ITapiTrace Verbose: 0 : Processing TapiCallStateEventArgs: Call=TCall: 171360625 CS_OFFERING, State=CS_OFFERING, Cause=CEC_NONE
ITapiTrace Error: 0 : COM Hresult 0×80040216 “” generated 
   at JulMar.Tapi3.TapiException.ThrowExceptionForHR(Int32 hr)
   at JulMar.Tapi3.TTerminal.set_MediaPlayList(String[] fileList)
   at AnsMachine.AutoAttendantForm.AnswerCall()
   at AnsMachine.AutoAttendantForm.OnCallState(Object sender, TapiCallStateEventArgs e)
ITapiTrace Verbose: 0 : Processing TapiCallStateEventArgs: Call=TCall: 171360625 CS_DISCONNECTED, State=CS_DISCONNECTED, Cause=CEC_DISCONNECT_NORMAL
ITapiTrace Error: 0 : ITAddressEvent::get_Terminal failed hr=0×80040055
ITapiTrace Verbose: 0 : Processing TapiAddressChangedEventArgs: Evt=AE_RINGING, Address=DSSP Line #1 – Address 5, Terminal=
ITapiTrace Error: 0 : ITAddressEvent::get_Terminal failed hr=0×80040055
ITapiTrace Verbose: 0 : Processing TapiAddressChangedEventArgs: Evt=AE_RINGING, Address=DSSP Line #1 – Address 4, Terminal=
ITapiTrace Error: 0 : ITAddressEvent::get_Terminal failed hr=0×80040055
ITapiTrace Verbose: 0 : Processing TapiAddressChangedEventArgs: Evt=AE_RINGING, Address=DSSP Line #1 – Address 3, Terminal=
ITapiTrace Error: 0 : ITAddressEvent::get_Terminal failed hr=0×80040055
ITapiTrace Verbose: 0 : Processing TapiAddressChangedEventArgs: Evt=AE_RINGING, Address=DSSP Line #1 – Address 2, Terminal=
ITapiTrace Error: 0 : ITAddressEvent::get_Terminal failed hr=0×80040055
ITapiTrace Verbose: 0 : Processing TapiAddressChangedEventArgs: Evt=AE_RINGING, Address=DSSP Line #1 – Address 1, Terminal=
ITapiTrace Error: 0 : ITAddressEvent::get_Terminal failed hr=0×80040055
ITapiTrace Verbose: 0 : Processing TapiAddressChangedEventArgs: Evt=AE_RINGING, Address=DSSP Line #1 – Address 0, Terminal=
ITapiTrace Error: 0 : ITTapi::Shutdown hr=0×0