I Used mitmproxy to Read My App's AI System Prompt. Then I Built the Defense.

I've been a little obsessed with AI security lately.

Not in the "responsible disclosure" corporate sense. More in the "what if I could get models to do things they're not supposed to do" sense. I've always thought hacking was cool, and hacking AI just sounds cooler.

It's also scratching an itch I didn't realize I had. Since I moved to 100% agentic coding, both at work and on side projects, a lot of the problem-solving that used to be satisfying... isn't anymore. The problems don't feel as hard. The wins don't feel as earned. AI security gives me that thrill again. Trying to break something that's designed not to break. Figuring out the angles. That's the stuff I missed.

That little journey led me to building KiloPwn, a macOS app that automates prompt injection attacks against mobile apps that use AI. The idea is to target an app, find out if I can break it, write up a report, and get into the security research space for real.

To test KiloPwn, I needed something to attack. So I built Sphinx.

Sphinx and the Missing Piece

Sphinx is a prompt injection puzzle game. You chat with an AI character called the Sphinx, and each level has a secret password hidden in the system prompt. Your job is to trick the Sphinx into revealing it. The levels get progressively harder; the Sphinx gets smarter about detecting your tricks.

KiloPwn connects to Sphinx, automates typing payloads into the chat, captures the responses, and logs everything. It worked. But there was a gap. I could see what Sphinx was showing on screen, but I had no idea what was actually going over the wire. What was the system prompt? What was the full API request? What was Claude actually sending back before the app processed it?

That's when I started looking into intercepting the network traffic.

Enter mitmproxy

mitmproxy is a man-in-the-middle proxy. It sits between your app and the server, and it reads everything.

graph LR A[iOS App] -->|HTTPS| B[mitmproxy] B -->|HTTPS| C[API Server] style B fill:#c9a84c,color:#000

The way it pulls this off is by generating fake TLS certificates on the fly. When Sphinx tries to connect to kiloloco.com, mitmproxy intercepts the connection and presents a fake certificate for kiloloco.com, signed by mitmproxy's own Certificate Authority. If the device trusts that CA, the connection goes through and mitmproxy can read everything in plaintext.

If the device doesn't trust the CA, the connection fails. That's literally what HTTPS is designed to prevent.

Setting It Up for the iOS Simulator

The iOS simulator shares the Mac's network stack. It doesn't have its own networking; it piggybacks on whatever your Mac is doing. So to route simulator traffic through mitmproxy, you set the proxy on the Mac itself:

brew install mitmproxy
mitmdump -p 8082

Run mitmdump once to generate the CA certificate at ~/.mitmproxy/mitmproxy-ca-cert.pem. Then install it in the simulator's trust store:

xcrun simctl keychain booted add-root-cert ~/.mitmproxy/mitmproxy-ca-cert.pem

And set the Mac's system proxy so traffic flows through it:

networksetup -setwebproxy Wi-Fi 127.0.0.1 8082
networksetup -setsecurewebproxy Wi-Fi 127.0.0.1 8082

For physical devices, it's a manual process. You configure the Wi-Fi proxy on the device to point at your Mac's IP, browse to mitm.it in Safari to download the certificate profile, install it in Settings > General > VPN & Device Management, then enable full trust in Settings > General > About > Certificate Trust Settings.

One thing I learned the hard way: turn off the Mac proxy when you're done. I stopped mitmdump, went to use my browser, and nothing loaded. Everything was still routing through port 8082, which was now a dead process. It took me longer than I'd like to admit to figure out why Brave was refusing to load anything.

What I Saw

I sent a message through Sphinx, and there it was. The full API request, in plaintext. The system prompt, the secret password, the entire conversation history, the model's response. All of it.

For a prompt injection game, that's game over. You don't need to trick the Sphinx into revealing the password. You just read it from the network traffic.

And here's the thing. I know this is how a lot of AI apps are designed. Especially vibe-coded apps. The system prompt is right there in the API request, unprotected. If someone sets up a proxy, they can read your system prompt, your model configuration, your entire prompt engineering strategy. Most apps aren't doing anything to prevent this.

Not Everything Cooperated

While the proxy was running, I noticed something interesting. Apple News on my device wouldn't load. The app just sat there. But Sphinx worked fine. The Wikipedia iOS app (which I'd cloned from GitHub and built from source) also worked fine.

The difference is certificate pinning.

Apple pins the certificates for their backend services. When Apple News connects to Apple's servers, it doesn't just check "is this cert from a trusted CA?" It checks "is this the specific certificate I expect for this domain?" mitmproxy's fake cert fails that check, even though I'd trusted the CA. So Apple News refuses to connect.

Sphinx and Wikipedia don't pin anything. They use standard URLSession with the default trust evaluation, which accepts any certificate signed by a trusted CA. mitmproxy's fake cert passes that check without issue.

This was the moment it clicked for me. Any app without certificate pinning is completely transparent to a proxy. And most apps don't have it.

Building the Defense

I decided to add certificate pinning to Sphinx. Not as a blanket thing; as part of the game's escalating difficulty. Chambers I through IV have no pinning. mitmproxy works, and reading the system prompt is a valid strategy. Chambers V through VII pin the certificate. The proxy gets blocked, and you have to find other ways in.

The implementation is a URLSessionDelegate that checks the server's public key against a known hash.

First, you need the hash. This openssl pipeline connects to the server, extracts the public key from its TLS certificate, converts it to raw binary, SHA-256 hashes it, and base64 encodes the result:

echo | openssl s_client -connect kiloloco.com:443 \
  -servername kiloloco.com 2>/dev/null \
  | openssl x509 -pubkey -noout \
  | openssl pkey -pubin -outform DER \
  | openssl dgst -sha256 -binary \
  | base64

That gives you a string like Rwn8EPxRyqC30SHgwjsvtCU2HPHE2IBceRHBhwUUkVA=. Hardcode it.

Then the delegate:

import CryptoKit

class CertificatePinningDelegate: NSObject, URLSessionDelegate {
    let pinnedKeyHash: String

    init(pinnedKeyHash: String) {
        self.pinnedKeyHash = pinnedKeyHash
    }

    func urlSession(
        _ session: URLSession,
        didReceive challenge: URLAuthenticationChallenge
    ) async -> (URLSession.AuthChallengeDisposition, URLCredential?) {
        guard challenge.protectionSpace.authenticationMethod
                == NSURLAuthenticationMethodServerTrust,
              let serverTrust = challenge.protectionSpace.serverTrust else {
            return (.cancelAuthenticationChallenge, nil)
        }

        let policy = SecPolicyCreateSSL(
            true,
            challenge.protectionSpace.host as CFString
        )
        SecTrustSetPolicies(serverTrust, policy)

        var error: CFError?
        guard SecTrustEvaluateWithError(serverTrust, &error) else {
            return (.cancelAuthenticationChallenge, nil)
        }

        guard let chain = SecTrustCopyCertificateChain(serverTrust)
                as? [SecCertificate],
              let cert = chain.first,
              let publicKey = SecCertificateCopyKey(cert),
              let keyData = SecKeyCopyExternalRepresentation(
                  publicKey, nil
              ) as? Data else {
            return (.cancelAuthenticationChallenge, nil)
        }

        let hash = SHA256.hash(data: keyData)
        let hashBase64 = Data(hash).base64EncodedString()

        if hashBase64 == pinnedKeyHash {
            return (.useCredential, URLCredential(trust: serverTrust))
        } else {
            return (.cancelAuthenticationChallenge, nil)
        }
    }
}

Using it is straightforward. When pinning is enabled, create a session with the delegate. When it's not, use URLSession.shared:

let session: URLSession
if certificatePinningEnabled {
    let delegate = CertificatePinningDelegate(
        pinnedKeyHash: "Rwn8EPxRyqC30SHgwjsvtCU2HPHE2IBceRHBhwUUkVA="
    )
    session = URLSession(
        configuration: .default,
        delegate: delegate,
        delegateQueue: nil
    )
} else {
    session = URLSession.shared
}

In Sphinx, each level has a certificatePinningEnabled flag. The first four levels leave it off. The last three turn it on. If someone's been cheating with mitmproxy, they'll know the moment they hit Chamber V.

The Tradeoffs

Certificate pinning isn't free. When the server rotates its TLS certificate, the public key hash changes. That means an app update. Some teams pin the CA's key instead of the leaf certificate, which survives rotation as long as the same CA issues the new cert. That's a more resilient approach.

On jailbroken devices, tools like SSL Kill Switch hook into the runtime and bypass pinning entirely. It's not a guarantee. But on stock devices, for the vast majority of users and attackers, pinning stops interception cold.

The Takeaway

If you're building an AI-powered app, your system prompts are one proxy away from being readable. That's not theoretical. I set it up in an afternoon and could see everything.

Certificate pinning is straightforward to add, and it meaningfully raises the bar. It won't stop a determined researcher on a jailbroken device, but it stops the casual stuff. And for most apps, that's the threat model that actually matters.

I put together a minimal sample project that shows both the pinned and unpinned approaches side by side. Three files, one Xcode project. Run it with mitmproxy active and you'll see exactly what happens.