Instruments is the best debugging tool you're not using

I’ll admit something embarrassing for a mobile engineer who cares a lot about performance: I spent too long avoiding Instruments. The interface is dense, the terminology overlaps in confusing ways (what is a “thread” vs. a “lane” vs. a “track”?), and every tutorial I found either stopped at “here’s how to take a Time Profiler trace” or assumed I already knew what a virtual memory fault was.

Learning Instruments properly often happens during a performance crisis — like a scroll list running at 20fps on an older device when the product team asks questions that are hard to answer. However, an afternoon of real Instruments work can uncover root causes and bring performance back to 60fps. Here are the most valuable lessons.

Three Tools, in Priority Order

Instruments ships with over thirty instruments. Most of them are specialized. Three do the heavy lifting for the vast majority of iOS performance issues:

Time Profiler — CPU time attribution. What code is consuming CPU cycles, and when?
Allocations — Memory lifecycle. What’s being allocated, how much, and is anything leaking?
Core Animation — Render performance. Why are frames dropping?

Start with Time Profiler. If frames are dropping or the UI is janky, the Time Profiler shows you what the main thread was busy doing during those drops. If you see spikes during scrolling, open the call tree and find the hot path.

The Time Profiler Workflow

Run your app on a real device, not the simulator. Simulator performance isn’t representative because it runs on your Mac’s CPU and doesn’t have the same GPU and memory characteristics. Use simulator traces for quick directional checks, but don’t trust them for final performance conclusions.

In Instruments, record for 20-30 seconds covering the problematic interaction. Then:

Select the time range covering a problematic moment using click-drag on the timeline
In the bottom panel, select Call Tree
Enable Hide System Libraries in the bottom-left checkboxes — this filters out Apple frameworks and focuses on your code
Enable Invert Call Tree — this shows the heaviest functions at the top instead of the entry points

The inverted+filtered call tree is the insight. It directly answers: which function in my code took the most CPU time?

Here’s an example from that 20fps list problem:

4.2s  SomeFeatureViewModel.computeDisplayData()
  3.8s  Array.sorted(by:)
    3.8s  closure #1 in computeDisplayData()

The sort was running inside body evaluation because computeDisplayData() was being called during view re-renders. Moving the sort to run once when the data changed (and caching the result) brought the spike to sub-millisecond.

The Call Tree Options That Actually Matter

Separate by Thread: Shows CPU time per thread. If your main thread is doing work that should be on a background thread, this makes it obvious. I use this when I suspect UI slowness is caused by blocking main thread work.

Separate by State: Shows time while the app is in foreground vs. background. Useful for diagnosing battery drain (your app shouldn’t be doing much when backgrounded).

Top Functions: Change from the default hierarchical view to a flat view of the most expensive functions across all calls. Useful when a function is called from many call sites and you want to see aggregate time without navigating every call chain.

Allocations: Finding What You’re Not Freeing

The Allocations instrument reveals two different problems:

Memory growth over time — The “Persistent Bytes” graph slopes upward whenever you navigate away from a screen and back. The objects allocated for that screen should mostly be deallocated.
Allocation spikes — A sudden large allocation during a specific interaction. This might be fine (rendering a large image), or it might indicate inefficiency (allocating thousands of small objects in a rendering loop).

The most useful Allocations feature is Generation Analysis:

Navigate to a screen
Click “Mark Generation” in Instruments
Navigate away from the screen (back to a list, or switch tabs)
Click “Mark Generation” again

If there are persistent allocations between the two generations — objects that should have been freed when you left the screen — they appear highlighted. Click on any persistent allocation to see the call stack where it was created.

This is one of the best ways to surface potential retain cycles. If you navigate away from ArticleDetailViewController and the Allocations instrument still shows a live ArticleDetailViewController instance in the next generation, something is still holding a reference to it (sometimes intentionally, often not). The allocation call stack tells you which code created it; confirming whether it’s a retain cycle is then standard weak/unowned detective work.

A practical tip: the “Zombie Objects” instrument catches use-after-free bugs by turning deallocated Objective-C objects into “zombies” that log when they’re messaged. This catches rare but time-consuming crashes, especially around UIKit/AppKit and other Objective-C runtime boundaries.

Animation Hitches (Formerly Core Animation): Frame Drop Diagnosis

When frames are dropping and Time Profiler shows clean CPU usage, the bottleneck is often on the rendering side. The Animation Hitches instrument (called “Core Animation” in Xcode 13 and earlier) helps you pinpoint where:

Commit Transaction peaks correspond to the main thread preparing and submitting rendering work. If these peaks are consistently high, frame production is under stress and you should investigate what is being committed each frame.

As a rule of thumb, if total frame work exceeds the frame budget, you’ll see hitches/dropped frames (16.67ms at 60Hz, 8.33ms at 120Hz). A tall CA::Transaction::commit spike is a strong signal, not a standalone proof by itself. Use the associated call tree and timeline context to confirm.

Common culprits I diagnose with the Animation Hitches instrument:

Offscreen rendering: Rounded corners, shadows, and some mask operations can force the GPU to render to an offscreen buffer before compositing. On older devices, this can be expensive enough to cause drops. view.layer.shouldRasterize = true (with matching rasterizationScale) can help for mostly static content, but it can also backfire if content changes frequently. Usually the best fix is restructuring the view to avoid expensive offscreen work entirely.

Overdraw on transparent layers: Multiple transparent layers stacked together require the GPU to blend each layer. Ten mostly-transparent layers are more expensive than one fully-opaque layer. The “Color Blended Layers” debug overlay in the Simulator (Edit > Debug > Color Blended Layers) shows this visually — green is single-pass, red is blended overdraw.

Frequent CALayer content changes: If layer contents (the rendered image) is changing every frame, you’re pushing large amounts of texture data to the GPU repeatedly. For custom drawing, CATiledLayer or pre-rendering to an UIImage and caching it is almost always faster than redrawing on every frame.

Building an Instruments Practice

The engineers I know who are best at performance work all share one habit: they run Instruments regularly, not just during crises. Every time they ship a new feature, they take one Instruments session before marking it done.

This is how you develop intuition. After you’ve seen what a retain cycle looks like in the Allocations instrument twenty times, you recognize the pattern immediately. After you’ve correlated Core Animation spikes with specific rendering operations enough times, you start predicting performance problems before they’re visible to users.

The other habit: profile on your oldest supported device, not your development machine. An iPhone 16 Pro makes everything look fast. An iPhone 12 — still a common device for many apps’ user bases — is an honest benchmark.

The best moment I’ve had with Instruments was finding a memory leak that had been in a codebase for two years. Previous debugging sessions had missed it because it was slow — the leaked object was just 128KB, so it took dozens of navigations to matter. Generation Analysis caught it on the first try: a singleton that held a reference to a closure that captured self strongly. Twenty minutes to find, a one-line capture-list fix ([weak self]). Two years of mystery fixed.

That’s why you learn Instruments.