Have you ever wondered what to do when the very tool you use to predict and handle crashes, Crashlytics, encounters a problem itself? You might think it's an impasse, but don't worry – we will do some detective work in this post. I have come across a unique deadlock within Firebase Crashlytics' urgent mode. After some deep digging, I've found an unexpected yet efficient solution, drawing inspiration from an unlikely place – XCTets' "expectation" implementation. Firebase The Final Frontier Let's begin with revealing what the "urgent" mode is. We are spending countless hours testing and fixing bugs before deployment. But then, happens, and your app crashes upon launch! Not a single request will not be able to send and inform you about that incident. But how do you know the reason for that crash if it was not reproducible? something unexpected Firebase Crashlytics comes to the rescue! It has a feature that detects a crash during app startup. If that happens, Crashlytics will pause Main Thread pause initialization to prevent it from crashing; The crash info will hopefully be sent to the server before the crash happens again. The name of that feature is "urgent mode." Discovering The Culprit Let's jump back to the issue at hand. I observed my app was taking an unusually long time to launch. To dig into this, I used to pause my app and examined the issue in detail. As I went through the stack, it didn't take long to spot the culprit: Firebase Crashlytics was interrupting the launch process. lldb The function had appeared on the Main thread. This was odd because if you use a symbolic breakpoint, you'll notice that is normally invoked from a background thread, not the Main thread. This unusual shift was a clear red flag that the expected process flow was off. regenerateInstallIDIfNeededWithBlock regenerateInstallIDIfNeededWithBlock Use the source, Luke Now, let's unravel this deadlock situation. A close examination reveals that is preceded by , which is itself preceded by . regenerateInstallID prepareAndSubmitReport processExistingActiveReportPath Let's dive into to understand it better. the code - (void)processExistingActiveReportPath:(NSString *)path
                    dataCollectionToken:(FIRCLSDataCollectionToken *)dataCollectionToken
                               asUrgent:(BOOL)urgent {
  FIRCLSInternalReport *report = [FIRCLSInternalReport reportWithPath:path];

  if (![report hasAnyEvents]) {
    // call is scheduled to the background queue
    [self.operationQueue addOperationWithBlock:^{
      [self.fileManager removeItemAtPath:path];
    }];

    return;
  }

  if (urgent && [dataCollectionToken isValid]) {
    // called from the Main thread
    [self.reportUploader prepareAndSubmitReport:report
                            dataCollectionToken:dataCollectionToken
                                       asUrgent:urgent
                                 withProcessing:YES];
    return;
  } The "urgent" parameter determines whether the code will run in the background or on the Main thread. Submitting a report from the Main thread seems like expected behavior. But why does it halt? The waiting for the semaphore to signal, which should occur when is completed. of looks like this (for the sake of brevity, the code is simplified): regenerateInstallID [self.installations installationIDWithCompletion] The code regenerateInstallID - (void)regenerateInstallID {
  dispatch_semaphore_t semaphore = dispatch_semaphore_create(0);

  // This runs Completion async, so wait a reasonable amount of time for it to finish.
  [self.installations
      installationIDWithCompletion:^(void) {
        dispatch_semaphore_signal(semaphore);
      }];

  intptr_t result = dispatch_semaphore_wait(
      semaphore, dispatch_time(DISPATCH_TIME_NOW, FIRCLSInstallationsWaitTime));
} To figure out why the completion does not fire, I've dug down in the chain of calls to the and did not notice any path that could ignore the completion. installationIDWithCompletion The real issue revealed itself when I noticed the completion wrapped in a block. This block is dispatched asynchronously on , as shown here: FBLPromise.then {} the Main thread @implementation FBLPromise (ThenAdditions)

- (FBLPromise *)then:(FBLPromiseThenWorkBlock)work {
  // Where defaultDispatchQueue is gFBLPromiseDefaultDispatchQueue by default
  return [self onQueue:FBLPromise.defaultDispatchQueue then:work];
}

@end

static dispatch_queue_t gFBLPromiseDefaultDispatchQueue;

+ (void)initialize {
  if (self == [FBLPromise class]) {
    gFBLPromiseDefaultDispatchQueue = dispatch_get_main_queue();
  }
} : A semaphore is waiting on the Main thread for a signal from the completion handler to release it, but the completion handler itself is stuck, waiting for the main thread to execute . This circular dependency was causing our app launch to stall. So, the deadlock essentially boils down to this dispatch_async Searching for the optimal solution So, what options are we left with? We could pass a queue to the promise if we wait for completion on the Main thread. However, this approach would require proposing a new interface to FBLPromise. We could alter the default queue for all promises. This, however, is a risky move that would affect every call in the SDK. With my preference for containing bug fixes in their local context to avoid introducing new bugs, I chose not to tweak FBLPromise. Instead, I looked for a solution that would be minimal and confined to this particular case. If only we could execute an async callback on the Main thread while simultaneously waiting on it... Sounds familiar? Well, it should! We do have this capability in XCTest via . waitForExpectations Here's an example: // This test will pass
func testExample() throws {
	let testExpectation = expectation(description: "")
	DispatchQueue.main.asyncAfter(deadline: .now() + 0.5) {
		testExpectation.fulfill()
	}
	assert(Thread.isMainThread == true)
	waitForExpectations(timeout: .infinity)
} Intrigued, I delved deeper into the XCTest framework's source code to understand how it does that trick. Here's the related piece of code: func primitiveWait(using runLoop: RunLoop, duration timeout: TimeInterval) {
	let timeIntervalToRun = min(0.1, timeout)

	runLoop.run(mode: .default, before: Date(timeIntervalSinceNow: timeIntervalToRun))
} Surprisingly, I discovered we could handle dispatched callbacks on the current thread using a nested RunLoop spinner. This seemed like a promising way out of our deadlock. The Fix To address this deadlock, the code was adjusted to implement a run loop spinning mechanism instead of the semaphore while running on the main thread. This tweak allows dispatch_async to signal the main thread to continue execution, preventing it from blocking. - (void)regenerateInstallID {
	dispatch_semaphore_t semaphore = nil;

	bool isMainThread = NSThread.isMainThread;
	if (!isMainThread) {
	  semaphore = dispatch_semaphore_create(0);
	}

	[self.installations
		installationIDWithCompletion:^(void) {
		NSAssert(NSThread.isMainThread, @"We expect to get a completion on the main thread");
		completed = true;
		if (!isMainThread) {
		  dispatch_semaphore_signal(semaphore);
		}
	}];

	intptr_t result = 0;
	if (isMainThread) {
	  NSDate *deadline =
		  [NSDate dateWithTimeIntervalSinceNow:FIRCLSInstallationsWaitTime / NSEC_PER_SEC];
	  while (!completed) {
		NSDate *now = [[NSDate alloc] init];
		if ([now timeIntervalSinceDate:deadline] > 0) {
		  break;
		}
		[[NSRunLoop mainRunLoop] runMode:NSDefaultRunLoopMode beforeDate:deadline];
	  }
	  if (!completed) {
		result = -1;
	  }
	} else {  // isMainThread
	  result = dispatch_semaphore_wait(semaphore,
									   dispatch_time(DISPATCH_TIME_NOW, FIRCLSInstallationsWaitTime));
	}
} Although the proposed solution worked, the maintainers of the Firebase SDK discovered an even more elegant and streamlined solution. They found that calling was not required. The most straightforward fix is the most effective, sidestepping the need for complex or solutions. And I want to highlight the importance of constantly refining and enhancing our solutions to focus on simplicity and efficiency in our code. regenerateInstallID "big-brained" Final Thoughts Understanding and preventing deadlocks is key to keeping your app responsive. Tools like run loops, locks, and semaphores can help manage tasks across multiple threads, but they can also make things complex and cause deadlocks if not used correctly. When using these tools, it's important to avoid potential issues like race conditions and deadlocks. Keep your code simple, make sure to always balance semaphore waits with signals, and try not to hold locks during lengthy tasks. Applying these concepts correctly can help your app stay responsive and provide a smooth user experience.

Walkthroughs, tutorials, guides, and tips. This story will teach you how to do something new or how to do something better.

The is an opinion piece based on the author’s POV and does not necessarily reflect the views of HackerNoon.

Unconventional Deadlock Fix Inspired by XCTest's "Expectation"

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

10 Reasons Why Less Is More in Your init/deinit Methods

A Little Dropbox, Bugsnag, and a Lot of Visibility During Error Investigation

Square Case Study: Bugsnag Struck Gold When Building the Timeline View

Building a Crash Report Automation for iOS and Android

Solve Database Concurrency Issues with TypeOrm

The Night of the Living Deadlocks: A Spooky Tale of Multithreading Mistakes

10 Reasons Why Less Is More in Your init/deinit Methods

A Little Dropbox, Bugsnag, and a Lot of Visibility During Error Investigation

Square Case Study: Bugsnag Struck Gold When Building the Timeline View

Building a Crash Report Automation for iOS and Android

Solve Database Concurrency Issues with TypeOrm

The Night of the Living Deadlocks: A Spooky Tale of Multithreading Mistakes

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps