This is going to be a low-level article, but I guess you already knew that since you landed here, right? I wanted to talk about this mysterious term Mach-O… What is it? How does it work? Really, what is it!? To answer all of that, we’ll have to dig deep and get our hands dirty… When we build an application in Xcode, a lot of things happen at the same time. One of them is converting all the source code into an executable. This executable contains the byte code that will run on the CPU, the ARM processor on an iOS device, or the intel processor on Mac. This executable is called Mach-O. Well, this was easy and fun, Goodbye! Unless you want to stick around to know about the internals 😈 Mach-O format Mach-O is a binary stream of bytes grouped into meaningful chunks of data. These chunks contain information about the meta like byte order, CPU type, size of the chunk and so on. There are various Mach-O types, most typically you’d have seen these Main app binary, like Executable — Example.app/Example Dynamic library, like Dylib — libSwiftCore.dylib Yeah, I know right, lurking right under our nose! So, Mach-O files are divided into several segments and it looks something like this but before diving into the segments let’s look into something else Mach-O Header Every Mach-O file begins with a header structure that defines the structure of the file. It also contains information about file type and target architecture ( , , , etc). armv7 armv7s i386 Just below the header structure are a bunch of load commands which help in the layout and linking of the file. Also, load commands can specify The initial layout of the file in the virtual memory (we’ll come back to this) Section names and addresses dylibs to be loaded “main” function address Code Signature And, this is how the complete header looks like! As you can see, Mach-O header consists of a bunch of load commands which are defining the addresses of the sections, main function, and the dependent binaries to be loaded. The addresses mentioned above are actually offset from the memory address where your Mach-O is loaded. This is done because the starting memory address is randomised every time your app launches using a nifty technique called or as we lovingly call it Automatic Space Layout Randomisation ASLR. What this means is when your app process starts, you do not know from which address will it start beforehand. Let’s imagine its implications, assume you have a global variable which occupies some memory address in your RAM, but since you don’t know where your process started from, you cannot possibly determine the memory address of this global variable! As you might have guessed, this is done for security purposes, otherwise, it would become very easy to hack the binary if everything has the same address on every launch! Segments Let’s look at the individual segments of the Mach-O file __PAGEZERO This is the first segment of an executable file and it has no data inside so it takes up no space in the file. This segment is full of zeroes to catch NULL pointer dereferences. You might have faced a crash, that is precisely because something in your code tried to access data from here, which is not allowed. EXC_BAD_ACCESS As an aside, this segment can be a good place to hide malicious code 😉 __TEXT This segment contains executable code and read-only data. It is made read-only to allow the sharing of the segment when it is mapped into the memory. This is primarily used with frameworks, bundles and shared libraries. And, since the segment is read-only, there are no changes that need to be saved back to the disk. If the kernel needs to free up memory, it will simply remove the page and re-read them when needed. __TEXT __TEXT This is the reason how iOS and OSX cache their dynamic libraries so aggressively. __DATA This segment contains writable data (e.g. globals, static variables, etc), and because it is writable, the segment of a framework or other shared library is logically copied for each process linking with the library. __DATA If you have any experience with Swift, you must be familiar with , this essentially means do not create a copy until the thing being referenced is edited. Similarly, when the segment is copied, it isn’t really until some process modifies it, that process then receives its own private copy of the page. copy-on-write __DATA __OBJC This is an optional segment and contains data used by the Objective-C language runtime support library. __IMPORT This is also an optional segment and contains symbol stubs and non-lazy pointers to symbols not defined in the executable. This segment is generated only for executables targeted for the IA-32 architecture. __LINKEDIT This segment contains raw data for the linker ( dito ) like symbols and string tables, compressed dynamic linking info, code signing info, and the indirect symbol table — all of which occupy regions as specified by the load commands. link e r So, now that we have an understanding of the individual segments, let’s try to look at the bigger picture and see how it all fits together. The Big Picture — DYLD Till now we know How a Mach-O file is generated and its load commands used to link dependencies in various ways. Load Commands are used to map the segments in the memory commands. Execution of the file begins from LC_MAIN Well, this is only information and this information requires a brain to process it. And this brain is called Dyld! 🤫 Let me tell you a secret! Well, it’s not really a secret When you launch your app by tapping the app icon, instead of launching your app the kernel launches ! dyld I know right! This guy is a big deal around here. The kernel will actually load at some random address space and it will itself has its own segment, segment… well, you get the idea. dyld __TEXT __DATA It's the job of dyld to basically load and setup all the dependent dylibs for us. Load dylibs This is where the dyld reads Mach-O header to find out about the dependent . It then finds that library file on the file system and parses them. dylibs This process is done recursively because a A can be dependent on B which can be dependent on C, so it has to resolve this whole graph of dependencies and finally memory map all of these segments to the original Mach-O header. dylib dylib dylib dylib’s And, this whole transaction might look something like this. Now, remember we talked about and how you cannot know which address will be assigned to all the variables in your app. This is something that has to fix using the below techniques. ASLR dyld Rebasing section contains locations of all the pointers that need to be shifted. Dyld will go through all these pointers and shift them based on your application’s start address. __LINKEDIT Notice that to do this, we have to read and write to the data pages, causing those pages to become dirty, and would need a copy on write. This is why Rebasing is expensive in IO. Binding References to other dylib functions are fixed using binding, like NSLog, malloc, etc. Once dyld loads the dependent libraries, it needs to search the symbol tables and find the implementation of these symbols. So, there’s actually a string named inside your binary that is unresolved and what dyld will do is look up the symbol table and fill it up with the addresses of these functions from the dependent libraries. _NSLog This is computationally complex and is expensive Objc Runtime All Objc class definitions need to be registered, why? because you can construct an Objc class from a string calling method. NSClassFromString(_:) So, dyld has to build this table before the app can launch. Adding categories to method lists — what this means is, if you have created a category over UIView and added a bunch of new functions, those new functions will be added to the method list of UIView. It also ensures selectors are unique. Run Initialisers Objc methods are called at this pointC++ static initialisers +load This happens in a bottom-up fashion so basically the dependent libraries will be initialised first. Whew!! After all of this is done, finally, your will execute. main() And, this is the story behind the elusive Mach-O file. Summary Binaries use Mach-O format with , and segments. __TEXT __DATA __LINKEDIT Dyld needs to parse and load all dynamic library dependencies. Dyld needs to fix all pointers both internal and external (rebase, bind, setup runtime). Run static initialisers and methods. +load AND THEN main() Where to go from here? To the dark side — https://lowlevelbits.org/parsing-mach-o-files/ If you want to dive even deeper and extract some code from dylibs, check out this from objc.io article Even Moar documentation!! Previously published at https://medium.com/tokopedia-engineering/a-curious-case-of-mach-o-executable-26d5ecadd995

Apple

A Curious Case of Mach-O Executable

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Codable in Swift4

Understanding Concurrency and Multithreading in iOS Development

Swift Optionals

10 Top Industries To Be Transformed By Artificial Intelligence, Soon

10 Essential Mobile App UI Design Principles for Building Outstanding Apps

11 AI & ML App Ideas For Startups and SME’s In 2019

Codable in Swift4

Understanding Concurrency and Multithreading in iOS Development

Swift Optionals

10 Top Industries To Be Transformed By Artificial Intelligence, Soon

10 Essential Mobile App UI Design Principles for Building Outstanding Apps

11 AI & ML App Ideas For Startups and SME’s In 2019

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps