-
Notifications
You must be signed in to change notification settings - Fork 1.6k
add blog post announcing proposed changes/oteps for stability work #8208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
682cc40
b47c251
c615866
540eafd
b4de4e5
16adeaf
91851fb
89a0b40
453f5d1
25e806a
10c039d
dee201d
61bb7fb
f2c8479
72d7a6a
0bc3c8d
8c0bead
c917aab
547446f
cd8148c
749b108
c96ad1b
a7c9828
8239264
cb7974f
52739fc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,269 @@ | ||||||
| --- | ||||||
| title: Evolving OpenTelemetry’s Stabilization and Release Practices | ||||||
| linkTitle: Stability Proposal Announcement # Mandatory, make sure that your short title. | ||||||
| date: 2025-10-23 # Put the current date, we will keep the date updated until your PR is merged | ||||||
| author: | ||||||
| >- # If you have only one author, then add the single name on this line in quotes. | ||||||
| "OpenTelemetry Governance Committee" | ||||||
| # canonical_url: http://somewhere.else/ # TODO: if this blog post has been posted somewhere else already, uncomment & provide the canonical URL here. | ||||||
| issue: the issue ID for this blog post # TODO: See https://opentelemetry.io/docs/contributing/blog/ for details | ||||||
| sig: Governance Committee # TODO: add the name of the SIG that sponsors this blog post | ||||||
| --- | ||||||
|
|
||||||
| ## Summary | ||||||
|
|
||||||
| OpenTelemetry is changing how the project approaches stability and releases, | ||||||
| across all sub-projects and repositories. We would like to make a clearer | ||||||
| distinction between internal and external stability levels; As a project, we | ||||||
| should be more concerned with the packaging and 'product' of OpenTelemetry | ||||||
| distributions. | ||||||
|
|
||||||
| **Why?** | ||||||
|
|
||||||
| - User interviews, surveys, and discussions have demonstrated that the | ||||||
| complexity of OpenTelemetry creates impediments to production deployments. | ||||||
|
|
||||||
| **What's Changing?** | ||||||
|
|
||||||
| - We'll be moving to a simpler, but stricter, definition of what components can | ||||||
| be present in a stable release and updating stability requirements to include | ||||||
| new guidance on documentation, performance testing, etc. | ||||||
| - Instrumentation stability guidelines will be updated to encourage faster iteration, | ||||||
| greater federation, and to unblock consumers to allow stable instrumentation | ||||||
| releases. | ||||||
| - To make these releases easier to consume, we'll be introducing 'epoch | ||||||
| releases' -- similar to the Kubernetes release cycle. | ||||||
|
|
||||||
| **What we need your input on:** | ||||||
|
|
||||||
| - Implementation timelines and stability requirements. | ||||||
| - Proposed changes to instrumentation stability. | ||||||
| - How we can make releases better. | ||||||
|
|
||||||
| [Join the discussion!](https://github.com/open-telemetry/community/discussions/3098). | ||||||
|
|
||||||
| ## Why are we doing this? | ||||||
|
|
||||||
| OpenTelemetry has grown into a massive, complex ecosystem. We support four | ||||||
| different telemetry signals (tracing, metrics, logs, and profiles) across more | ||||||
| than a dozen programming languages. Each language has its own runtime | ||||||
| requirements and execution environments. The | ||||||
| [specification compliance matrix](https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md) | ||||||
| shows just how much we're trying to accomplish – and it's overwhelming. | ||||||
|
|
||||||
| This complexity creates real barriers to adoption. Organizations ready to deploy | ||||||
| OpenTelemetry in production encounter unexpected roadblocks: configuration that | ||||||
| breaks between minor versions, performance regressions that only appear at | ||||||
| scale, and the challenge of coordinating rollouts across hundreds or thousands | ||||||
| of services. Many teams end up delaying or scaling back their OpenTelemetry | ||||||
| deployments as a result. | ||||||
|
|
||||||
| For maintainers, this complexity makes their job harder than it needs to be. | ||||||
| There's a lack of clear milestones and guidance about what's 'most important' at | ||||||
| any given time. Stability efforts involve a lot of churn and there's often | ||||||
| conflicting guidance about where you should focus your time. | ||||||
|
|
||||||
| With this in mind, the OpenTelemetry project would like to announce some | ||||||
| proposed changes that we’d like to institute, with the goal of normalizing our | ||||||
| language around component stability and component offerings. | ||||||
|
|
||||||
| The Governance Committee believes that these changes need community involvement | ||||||
| and discussion to be a success, so we’re taking this opportunity to announce our | ||||||
| intention and open an | ||||||
| [GitHub discussion](https://github.com/open-telemetry/community/discussions/3098) | ||||||
| in order to get feedback from users, maintainers, and contributors. We do not | ||||||
| anticipate that these changes will be completed overnight, and want to assure | ||||||
| everyone that we will continue to prioritize our existing commitments to users | ||||||
| and maintainers even as we consider necessary changes for the overall wellbeing | ||||||
| and maturity of the project. | ||||||
austinlparker marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
| ## 1. Stable By Default | ||||||
|
|
||||||
| Stability guarantees have been a long-held principle in OpenTelemetry, with | ||||||
| exceedingly high bars. There is a tension between this and user needs that we'd | ||||||
| like to discuss. | ||||||
|
|
||||||
| ### Background | ||||||
|
|
||||||
| OpenTelemetry is a specification for how cloud native software -- libraries, | ||||||
| frameworks, infrastructure abstractions, executable code, etc. -- produces and | ||||||
| communicates telemetry data about its operation. This specification is designed | ||||||
| to be exhaustive, comprehensive, and low-level. Many of the elements of the | ||||||
| specification are hard-won knowledge from the combined decades of experience its | ||||||
| authors have with building, operating, or designing telemetry systems at planet | ||||||
| scale. | ||||||
|
|
||||||
| A specification with no implementation is not a useful thing for end users, | ||||||
| though. Developers and operators approach telemetry through a variety of lenses; | ||||||
| Some organizations have high standards for observability, with entire teams | ||||||
| dedicated to building internal monitoring and instrumentation frameworks. Other | ||||||
| organizations view observability and monitoring as a second or third order | ||||||
| priority -- something that needs to happen, but not something that's | ||||||
| incentivized. OpenTelemetry, as a specification, needs to serve all of these | ||||||
| users and their use cases. | ||||||
|
|
||||||
| To make OpenTelemetry useful, we need to provide an 'on-ramp' from existing | ||||||
| methods and modes, existing tools and strategies, which means we need to provide | ||||||
| implementations of not just the specification, but _applications_ of it as well. | ||||||
| In practice, this means we need to distribute libraries to add OpenTelemetry | ||||||
| instrumentation to existing HTTP servers and clients, or Collector receivers to | ||||||
| scrape metrics from MySQL and translate them into OTLP. | ||||||
|
|
||||||
| Most of the value our community derives from OpenTelemetry comes directly from | ||||||
| instrumentation libraries and Collector components – not the core SDKs. While we | ||||||
| organize these as `contrib` repositories to distinguish them from core | ||||||
| components, end users don't see or care about this distinction. They just want | ||||||
| instrumentation that works. | ||||||
|
|
||||||
| For maintainers and project leadership, our stability goals and the nature of | ||||||
| `contrib` present a significant challenge. Users want stable, well-tested, and | ||||||
| performant releases -- that _also_ perform the same function as commercial | ||||||
| instrumentation agents. | ||||||
|
|
||||||
| ### What's Changing? | ||||||
|
|
||||||
| There's three major changes that the project will be rolling out. Keep in mind | ||||||
| the exact details of some of this may change, but these goals are more or less | ||||||
| fixed. | ||||||
|
|
||||||
| 1. All components across all repositories (including semantic conventions) | ||||||
| should adhere to a consistent way of communicating stability, through a | ||||||
| metadata file/information, that can be discovered and parsed in a | ||||||
| programmatic way. The exact format will be defined through an OTEP and | ||||||
| incorporated into the specification. | ||||||
| 2. Stability requirements will be expanded to include more requirements around | ||||||
| documentation and where it's hosted, example code, performance benchmarks | ||||||
| (where applicable), implementation cookbooks, and other artifacts as | ||||||
| necessary. | ||||||
| 3. Stable distributions of OpenTelemetry should only enable stable components by | ||||||
| default. Users should be able to select a desired minimum stability level | ||||||
| with a documented and consistent configuration option. | ||||||
|
|
||||||
| This is a big change for maintainers, especially those who have shipped v1+ of | ||||||
| their libraries. We would deeply appreciate your feedback on this proposal in | ||||||
| the [discussion](https://github.com/open-telemetry/community/discussions/3098). | ||||||
tedsuo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
| ## 2. Instrumentation Stability and Semantic Conventions | ||||||
|
|
||||||
| As mentioned above, we would like to adjust the instrumentation stability | ||||||
| process in order to provide greater consistency across projects, but also to | ||||||
| better communicate the interplay between telemetry stability and implementation | ||||||
| stability. | ||||||
|
|
||||||
| ### Background | ||||||
|
|
||||||
| Semantic conventions evolve slowly and deliberately because they must work | ||||||
| across diverse telemetry systems. While OpenTelemetry is designed for | ||||||
| interconnected signals flowing together, users deploy many different storage and | ||||||
| analysis engines to consume this data. Each backend has its own constraints and | ||||||
| capabilities. Maintainers must balance competing concerns – keeping cardinality | ||||||
| manageable, ensuring attributes are useful but not overly specific, and making | ||||||
| conventions that work well regardless of where the data ends up. | ||||||
|
|
||||||
| The downside of this is that progress on semantic conventions can be slow, and | ||||||
| this slowness impacts all consumers of the conventions. Many instrumentation | ||||||
| libraries are currently stuck on pre-release versions because they depend on | ||||||
| experimental semantic conventions. Outside contributors are stuck between | ||||||
| emitting unspecified telemetry or trying to engage in the process, which | ||||||
| requires a long commitment. Finally, we're internally inconsistent in | ||||||
| instrumentation across the project; Some libraries are mapped to conventions, | ||||||
| others exist independently of it. | ||||||
|
|
||||||
| ### What's Changing? | ||||||
|
|
||||||
| Our goals here are designed to achieve three outcomes. | ||||||
|
|
||||||
| 1. Instrumentation stability should be decoupled from semantic convention | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure if this is a good point for end users. The end users depend on semantic conventions in their visualization and alerting tools. As a user updating the instrumentation library, I am very interested in knowing if the instrumentation changed any attributes, because it can break my dashboarding and alerting. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that isn't going away; what's changing is that instrumentation stability shouldn't be perma-0.x just because semconv isn't 1.x. if an instrumentation library takes a breaking change in telemetry outputs, it will still need to version properly and treat it as a breaking change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shall it be mentioned in this document? |
||||||
| stability. | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can see how not having an explanation on this point could trigger some misinterpretation. |
||||||
| 2. Semantic conventions should be more federated; OpenTelemetry should not be | ||||||
| the final word on what conventions exist, and instead should focus on | ||||||
| creating core conventions that can be extended and built upon. | ||||||
| 3. Semantic convention development and iteration should not be a blocker on | ||||||
| distribution maintainers. | ||||||
|
|
||||||
| To this end, we have a few recommendations we'd like to codify into the | ||||||
| specification. First, our position around instrumentation libraries in | ||||||
| OpenTelemetry is that they exist as concrete implementations of the semantic | ||||||
| conventions. This gives us a concrete target for 'first party' instrumentation | ||||||
| libraries that we wish to support in distributions. In addition, maintainers | ||||||
| should prioritize instrumentations that align to existing conventions and | ||||||
| deprioritize others. | ||||||
|
|
||||||
| Second, we'd like to make it easier for maintainers to ship stable | ||||||
| instrumentations. If an instrumentation's API surface is stable, then we believe | ||||||
| that semantic convention stability should not block the stabilization of that | ||||||
| instrumentation library. This means that we'll need to be thoughtful in | ||||||
| providing migration pathways for telemetry as operators upgrade to new major versions | ||||||
| of instrumentation libraries. | ||||||
|
|
||||||
| Finally, we'd like to make it easier for third-parties to publish their own | ||||||
| semantic conventions by formalizing and stabilizing necessary parts of the | ||||||
| semantic conventions in order for other organizations to ship conventions for | ||||||
| their libraries, frameworks, tech stacks, etc. | ||||||
|
|
||||||
| In order to accomplish this, we're looking for feedback on several areas from | ||||||
| maintainers and end-users -- especially around the maturity/lifecycle of | ||||||
| semantic conventions, as well as what's missing in terms of federating semantic | ||||||
| conventions. We are more flexible on proposals here, but our outcomes aren't. | ||||||
| Remember, a core goal of the project is to encourage other libraries, tools, and | ||||||
| frameworks to | ||||||
| [natively adopt OpenTelemetry](https://www.youtube.com/watch?v=l8xiNOCIdLY) -- | ||||||
| semantic conventions are a big part of that. | ||||||
|
|
||||||
| ## 3. Confident, Stable, Releases. | ||||||
|
|
||||||
| ## Background | ||||||
|
|
||||||
| OpenTelemetry isn't just a single binary deployed into a Kubernetes cluster. | ||||||
| Subtle differences in everything from configuration to telemetry output between | ||||||
| different versions of instrumentation libraries, collector receivers, and SDKs | ||||||
| can cause a real headache for adopters. In addition, the rapid release cadence | ||||||
| of many components causes real difficulty for end users, especially around the | ||||||
| Collector. Enterprise deployments and upgrades are slow, deliberate things -- | ||||||
| teams simply do not have the bandwidth to validate and roll out changes at the | ||||||
| cadence we ship. | ||||||
|
|
||||||
| ## What's Changing? | ||||||
|
|
||||||
| We are proposing the creation of a Release SIG that will be responsible for | ||||||
austinlparker marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| creating a schedule of 'epoch' releases for OpenTelemetry. These epoch versions | ||||||
| would essentially be a manifest pointing to a tested, documented, and stable set | ||||||
| of components that meet project stability requirements. | ||||||
|
|
||||||
| This is not a trivial undertaking, to be clear. It is also not as well-defined | ||||||
| as the earlier proposals in this blog -- those efforts will communicate many of | ||||||
| the requirements these epoch releases must follow, after all. To our maintainers | ||||||
| and contributors, though, a _non-goal_ of this effort is to force you to change | ||||||
| your versioning or release cadence. | ||||||
|
|
||||||
| ## Looking Forward | ||||||
|
|
||||||
| These changes are a reflection of the impact and importance of OpenTelemetry to | ||||||
| the cloud native software community. | ||||||
| [OpenTelemetry has been the second highest velocity project in the CNCF over the past few years](https://www.cncf.io/wp-content/uploads/2025/04/CNCF-Annual-Report-2024_v2.pdf), | ||||||
| and | ||||||
| [nearly 50% of surveyed cloud native end user companies have adopted the project](https://www.cncf.io/wp-content/uploads/2025/04/cncf_annual_survey24_031225a.pdf). | ||||||
| These changes are setting up the next chapter of our success, and becoming truly | ||||||
| ubiquitous. | ||||||
|
|
||||||
| Our mission as a project is not changing, but our priorities are. | ||||||
|
|
||||||
| 1. Stability and usability for all developers and users. | ||||||
| 2. Clear packaging, installation, and usage paths. | ||||||
| 3. Predictability and consistency. | ||||||
|
|
||||||
| For contributors and maintainers, what does this mean? We'll fast-track | ||||||
| proposals that align with these priorities. If there's feature work or | ||||||
| instrumentation that doesn't align to this, that's fine -- we'd ask that you | ||||||
| work on it outside the project and discover where our existing integration | ||||||
| points and patterns don't work; That's good feedback, and will help us improve | ||||||
| the specification for everyone. | ||||||
|
|
||||||
| For maintainers, contributors, and integrators -- we would appreciate your | ||||||
| feedback in | ||||||
| [this GitHub Discussion](https://github.com/open-telemetry/community/discussions/3098) | ||||||
| on the topics and proposals raised here. You can also send feedback on this | ||||||
| proposal to [feedback@opentelemetry.io](mailto:feedback@opentelemetry.io) or on | ||||||
| the CNCF Slack in the #opentelemetry channel. We also look forward to meeting | ||||||
| the cloud native community in person at KubeCon this fall -- please join us | ||||||
| there with comments! | ||||||
Uh oh!
There was an error while loading. Please reload this page.