Friday, March 28, 2014

Allocating heap memory from within an asynchronous signal handler? Not a good idea on Mac OS.

Just this Wednesday noticed something strange in Chrome. When starting with bunch of tabs refreshing their content, Peerbelt's menu refused to appear. Recycling Chrome did not help much…

A quick glance at the Activity Monitor confirmed the Chrome process hosting Peerbelt's NPAPI was irresponsive. The sample taken revealed the following:

    +      2764 free  (in libsystem_malloc.dylib) + 277  [0x94504d1d]

    +        2764 szone_free_definite_size  (in libsystem_malloc.dylib) + 673  [0x944fada9]

    +          2764 get_tiny_previous_free_msize  (in libsystem_malloc.dylib) + 47  [0x944fef33]

    +            2764 _sigtramp  (in libsystem_platform.dylib) + 43  [0x92e81deb]

    +              2764 signalHandler(int, __siginfo*, void*)  (in PeerBelt) + 182  [0x7abbd06]  ZombieReaper.cpp:57

    +                2764 _beginthreadex(void*, unsigned int, unsigned int (*)(void*), void*, unsigned int, unsigned long*)  (in PeerBelt) + 113  [0x7abf7a1]  winapi.cpp:2033

    +                  2764 _beginthreadInternal(void*, unsigned int, void (*)(void*), unsigned int (*)(void*), void*, unsigned int, unsigned long*)  (in PeerBelt) + 688  [0x7ac1520]  winapi.cpp:1959

    +                    2764 operator new(unsigned long)  (in libc++abi.dylib) + 39  [0x9b1abf77]

    +                      2764 malloc  (in libsystem_malloc.dylib) + 52  [0x94504f44]

    +                        2764 malloc_zone_malloc  (in libsystem_malloc.dylib) + 75  [0x9450455b]

    +                          2764 ???  (in Google Chrome Framework)  load address 0x84000 + 0x818fb7  [0x89cfb7]

    +                            2764 szone_malloc  (in libsystem_malloc.dylib) + 24  [0x944f7b6a]

    +                              2764 szone_malloc_should_clear  (in libsystem_malloc.dylib) + 102  [0x94502013]

    +                                2764 _OSSpinLockLockSlow  (in libsystem_platform.dylib) + 58  [0x92e816b0]

    +                                  2764 syscall_thread_switch  (in libsystem_kernel.dylib) + 10  [0x9692c082]

Seems the OS needed a frequently called function to hook the asynchronous signal delivery for near realtime signals. But the choice of free, might not had been the best one. At very least, there may be a need to invoke _sigtramp at a different stage.

The signal handler, still within the context of mem free, tries to allocate some more memory with C++ new. Without inspecting any malloc/free code, it appears the attempt to allocate memory reverts back to the incomplete free, letting it finish the started. But the free will never complete since it is earlier in the call stack on the same thread. It all ends deadlocked.

Now, the interesting question is why it does work sometimes. My speculation is, other threads allocating/deallocating memory on the heap may be completing what the blocked thread started doing, effectively unblocking it. The deadlock manifests itself when all the other threads wait on the already locked thread. Something the call stack of the other threads does in fact suggest.

The funny thing was, I knew starting a thread just to reap a zombie process with the only benefit of a clean Activity Monitor was not such a good idea. But the implementation was so easy and in such an uninteresting from functionality perspective location, that I gave in. Mark Pincus was talking about the “Death by thousand cuts” some time back. Leaving the grand moral of the story aside, do not allocate memory on the heap from your async signal handlers! Otherwise, once in awhile there will be consequences.

A fix is now available and will be released soon, together with lookup performance improvement, adjusted Bing support on Mac and other minor UI changes. The auto updater will pick it up.