Reviewing merge request #878: Debugged a deadlock that was happening shortly after the startup of a comple...
Debugged a deadlock that was happening shortly after the startup of a complex application.
Summary: cf commit log for 91613c1.
More details:
The lockup happened on a laptop with a release build, so the stack traces below were generated with "Debug Diagnostic Tool" and are not complete.
GUI thread:
ntdll!KiFastSystemCallRet
ntdll!ZwWaitForSingleObject+c
ntdll!RtlpWaitOnCriticalSection+155
ntdll!RtlEnterCriticalSection+152
quartz!CAutoLock::CAutoLock+14
quartz!CMapperCache::BreakCacheIfNotBuildingCache+12
quartz!CFilterMapper2::RegisterFilter+7b
devenum!RegisterClassManagerFilter+12b
devenum!CWaveOutClassManager::CreateRegKeys+82
devenum!CClassManagerBase::VerifyRegistryInSync+b7
devenum!CClassManagerBase::CreateClassEnumerator+98
devenum!CCreateSwEnum::CreateClassEnumerator+3f
devenum!CCreateSwEnum::CreateClassEnumerator+19
phonon_ds94!qt_plugin_instance+5ae (should be "Backend::objectDescriptionIndexes")
msvcr80_67360000!free+cd
phonon4!Phonon::GlobalConfig::audioOutputDeviceListFor+32e
QtCore4!QSettings::QSettings+26 (looks bogus)
phonon4!Phonon::GlobalConfig::audioOutputDeviceFor+35
phonon4!Phonon::AudioOutput::setMuted+4c9
phonon4!Phonon::AudioOutput::AudioOutput+88
MediaObject WorkerThread
ntdll!KiFastSystemCallRet
ntdll!ZwWaitForSingleObject+c
kernel32!WaitForSingleObjectEx+be
kernel32!WaitForSingleObject+12
devenum!CClassManagerBase::CreateClassEnumerator+5b
devenum!CCreateSwEnum::CreateClassEnumerator+3f
devenum!CCreateSwEnum::CreateClassEnumerator+19
quartz!CMapperCache::ProcessOneCategory+2a
quartz!CMapperCache::Cache+1ea
quartz!CMapperCache::Refresh+38
quartz!CMapperCache::RegEnumFilterInfo+31
quartz!CEnumRegMonikers::Next+117
quartz!CFilterGraph::NextFilter+146
quartz!CFilterGraph::RenderViaIntermediate+256
quartz!CFilterGraph::RenderRecursively+37
quartz!CFilterGraph::RenderFile+172
phonon_ds94!qt_plugin_instance+e7fa (should be "WorkerThread::handleTask")
Hence the analysis:
* GUI thread is in Backend::objectDescriptionIndexes which calls CreateClassEnumerator (lock B) which calls CMapperCache (lock A)
* MediaObject WorkerThread is doing RenderFile which calls CMapperCache (lock A) which calls CreateClassEnumerator (lock B) => deadlock
And the solution: a lock in phonon over the DirectShow API, given that I don't feel like trying to convince Microsoft that they should fix DirectShow for this use case :-) . Actually the DirectShow documentation doesn't really talk about this API being threadsafe (they only talk about threads when talking about writing custom filters).
About the fix: I hesitated between a static mutex (using "extern" to use it from the other file) or a cleaner C++ solution (class members). I went for the latter, although it makes the patch slightly bigger; hope that's ok.
Also, I'm not 100% sure I caught all instances of calls to the DirectShow API that might end up in CreateClassEnumerator. Maybe a few more lockers are needed, but at least this fixes the deadlock that I was consistently seeing.
Commits that would be merged:
- f23fa54
- 91613c1
Fix deadlock inside DirectShow which apparently doesn't expect two threads calling into it at the same time.
f23fa54-91613c1Comments
We found a way to trigger this more reproduceably than when I was initially debugging it: simply installing ffdshow-tryout from sourceforge.net. That codec pops up a dialog on initialization, so it makes the race happen every single time; the worker thread plays RenderFile while the GUI thread is in the CreateClassEnumerator code, and the deadlock happens.
I admit that I didn’t try a standalone phonon testcase though, but I recommend installing ffdshow-tryout if you are trying to reproduce the deadlock.
Hello David,
I've reworked the locking in 4.6 (mainline). It solved a race condition at the same position. Could you try it please and tell me if the crash still occurs?
My fix is for a complete deadlock, not for a race.
You are referring to commit 507c0ff17beeddfd281a4e978c2612031ce49154 I suppose?
I cannot test the change because it was happening at the customer’s site in a specific environment. If I go back there I’ll definitely test it (meanwhile you could download ffdshow-tryout to give it a try, btw).
But does your fix really prevent audioOutputDeviceListFor from calling DirectShow’s CreateClassEnumerator if RenderFile is currently being used? I don’t see that in the commit, so I don’t think it fixes the problem I was seeing. Your mutexes only lock up the “sound playing” code, not the “enumerate devices” kind of code.
My guess was that maybe you stack trace was broken. I had lots of strange results with the current 4.5 code wrt to calling RenderFile. I’ll try with ffdshow-tryout and tell you if I can reproduce the problem.
hmm, I installed ffdshow-tryout but it doesn’t show the dialog you’re mentioning… Do I need to setup some special option for that to happen?
Broken? The stack trace looks quite valid to me. Backend::objectDescriptionIndexes calls CreateClassEnumerator, doesn’t it? and that’s from a different thread as the MediaObject worker thread.
And the cause for the deadlock is quite visible in the stacktraces: the main thread calls CreateClassEnumerator which ends up in CMapperCache, while the worker thread calls (indirectly) CMapperCache which ends up calling CreateClassEnumerator. Obviously doing this at the same time leads to the typical “reversed mutex order” deadlock, and this is why I added a mutex in phonon, at a higher level so that these calls do not happen at the same time (and this fixed the issue indeed).
The dialog should show whenever playing a sound; the initialization of the ffdshow-tryout dll (when phonon initializes DirectShow) shows the dialog. I'm a bit surprised that you’re not seeing it. I can’t make a testcase though (I don’t have the customer code at hand anymore, this was an on-site consulting trip).
I have submitted a patch slightly different from yours. I hope it will make the problem disappear as I really wasn’t able to reproduce it.


Add a new comment:
Login or create an account to post a comment