Skip to content

Conversation

@beekhc
Copy link
Contributor

@beekhc beekhc commented Oct 29, 2025

This is proposed instrumentation for Apple's MetricKit. The technical details for the instrumentation are thoroughly documented in the README file.

I am sure some details of the proposal will be controversial. I'd encourage you to particularly pay attention to:

  • the shape of the telemetry:
    • a span for MXMetricPayload with attributes for all reported values
    • log events for each diagnostic in MXDiagnosticPayload
  • the timestamps:
    • the MXMetricPayload span is a 24-hour span over the time the data was aggregated
    • log events are recorded as they are received
  • the way MXHistograms are converted to approximate averages

Of course any other feedback would also be appreciated.

@beekhc
Copy link
Contributor Author

beekhc commented Oct 29, 2025

I don't know why the macOS tests are failing. MXMetricPayload should be available in macOS v12. And it works on my machine, albeit with a different SDK version.

@ArielDemarco
Copy link
Member

ArielDemarco commented Oct 30, 2025

I don't know why the macOS tests are failing. MXMetricPayload should be available in macOS v12. And it works on my machine, albeit with a different SDK version.

I downloaded all macOS SDKs from version 26.0 down to Monterey (12.0), and the only one where the API is defined like this:

API_AVAILABLE(ios(13.0), macos(10.15)) API_UNAVAILABLE(tvos, watchos)
@interface MXMetricPayload : NSObject <NSSecureCoding>

is in macOS 26.0.

After that, in all the others, it looks like this:

API_AVAILABLE(ios(13.0)) API_UNAVAILABLE(macos, tvos, watchos)
@interface MXMetricPayload : NSObject <NSSecureCoding>

Maybe it's related to this thing mentioned in the MXMetricManager documentation:
image

So, seems that unless you compile it using the latest versions of Xcode (26.0 and up), this problem will happen

Edit: I'd suggest removing macOS support until we fully migrate to Xcode 26.0 in CI/CD


let namespacedAttrs: [String: AttributeValueConvertable] = [
"hang_duration": $0.hangDuration,
"exception.stacktrace_json": stacktraceJson
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why do we have exception.stacktrace and exception.stacktrace_tree? From the semconv, seems that different formats are already implied, and we can just add the metric kit json format to the list https://opentelemetry.io/docs/specs/semconv/exceptions/exceptions-spans/#stacktrace-representation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I removed exception.stacktrace_json. It was redundant.
  2. I added some code to simplify the stacktrace, mostly flattening it, based on feedback from the SIG.
  3. I added a doc to define the new format and the differences from Apple's JSON format.

The use case is so specific, I'm not convinced we can reuse the format for other kinds of crashes, but I'm open to the discussion.


@available(iOS 13.0, macOS 12.0, macCatalyst 13.1, visionOS 1.0, *)
public func reportMetrics(payload: MXMetricPayload) {
let span = getMetricKitTracer().spanBuilder(spanName: "MXMetricPayload")
Copy link
Contributor

@williazz williazz Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was some debate about capturing MXMetric as a metric, span, or log. Could you document your decision to record this as a span?

With the way its setup now, seems that we can also extend this to support metrics in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this in the SIG, and I think we're on the same page. This is documented in the README.

decoding: callStackTree.jsonRepresentation(),
as: UTF8.self
)
namespacedAttrs["exception.stacktrace_json"] = stacktraceJson
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here about stacktrace_json

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addresses above.


// Standard OTel exception attributes - will be populated below
var otelType: String?
var otelMessage: String?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How good are signalDescription and machExceptionDescription at describing crashes, and allowing them to be grouped together by common root cause?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It varies, but probably not sufficient. It just maps the human-readable names for the signals from Apple's documentation. For example, SIGSEGV will have the description "segmentation violation".

  1. I looked into what KSCrash does with its "Crash Doctor" code to make better message names. I don't think it's feasible to adapt that code for this PR. There's a lot of complexity to it, and it depends heavily on the fields KSCrash populates. I think it would be a reasonable future change if we can get it working, though.
  2. Even with KSCrash's better descriptions, I doubt it would be sufficient for crash deduplication. I dunno.

My recommendation would be for us to go with this, and see if we can improve the descriptions in the future, based on usage.

The Crash Reporter instrumentation you're working on may make users less reliant on this particular diagnostic anyway.

@williazz
Copy link
Contributor

Left a bunch of questions. Thanks Bee!

@@ -0,0 +1,244 @@
# MetricKit Instrumentation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc is really great

@beekhc
Copy link
Contributor Author

beekhc commented Nov 5, 2025

I removed support for macOS, as recommended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants