Skip to content

Commit 2f1ad9d

Browse files
Jaswanth51derdeljan-msftJonathanC-ARMapsonawanechwarr
authored
Sync with Microsoft ONNX Runtime - 01/09/2025 (#801)
* [CPU] Optimize GQA attention bias application for FP16 (microsoft#25871) ### Description When using attention bias input for GQA op with FP16, on the platforms that don't natively support FP16 math a cast to fp32 needs to be performed, and thus a temporary buffer needs to be created to store the fp32 values. The issue is that this temporary buffer was being allocated / deallocated inside of a loop for every token being processed. Refactored the implementation so that the allocation takes place only once. Phi model throughput increased by 15%. * Fixes for DynamicQuantizeMatMul and Attention3D tests (microsoft#25814) ### Description This change fixes correctness issues in two areas that were causing failures in onnxruntime_test_all: - DynamicQuantizeMatMul.WithConstantBInputs - AttentionTest.Attention3DDefault - AttentionTest.Attention3DWithPastAndPresentQkMatmul What was wrong and how it’s fixed 1) DynamicQuantizeMatMul.WithConstantBInputs - Root cause: The Kleidi dynamic quantization GEMM path could be selected even when the B scales contained values such as (zero, negative, or non-finite). That violates kernel assumptions and can lead to incorrect results. - Fix: In `onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.cc`, we now explicitly validate that all B scales are finite and strictly positive before enabling the Kleidi/MLAS dynamic path. If any scale is invalid, we disable that path. 2) Attention tests (Attention3DDefault, Attention3DWithPastAndPresentQkMatmul) - Root causes in `onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp`: - Incorrect handling of GEMM corner cases for alpha/beta and K==0 (e.g., not respecting C = beta*C when alpha==0 or K==0). - Unnecessary or premature fallbacks for small shapes. - Fixes: - Add early-outs for degenerate sizes: if M==0 or N==0, return handled. - Correctly implement alpha/beta semantics: --------- Signed-off-by: Jonathan Clohessy <jonathan.clohessy@arm.com> * Fix MoE CPP tests (microsoft#25877) This change adds skip test for QMoE CPU tests when running on TensorRT or CUDA EP. In the QMoE kernel there was a memory overwrite bug in the accumulate part, updated that and this fixed the python tests back * [c++] Eliminate dynamic initialization of static Ort::Global<void>::api_ (microsoft#25741) ### Description Delay the call to `OrtGetApiBase()` until the first call to `Ort::GetApi()` so that `OrtGetApiBase()` is typically called after dynamic library loading. ### Motivation and Context When ORT_API_MANUAL_INIT is not defined (which is the default), the static `Ort::Global<void>::api_` has a dynamic initializer that calls `OrtGetApiBase()->GetApi(ORT_API_VERSION)` This dynamic initialization can cause problems when it interacts with other global/static initialization. On Windows in particular, it can also cause deadlocks when used in a dynamic library if OrtGetApiBase()->GetApi() attempts to load any other libraries. * Replace the templated `Global<void>::api_` with an inline static initialized to nullptr. * `Ort::GetApi()` now calls `detail::Global::GetApi()` which calls `detail::Global::DefaultInit()` if initialization is needed. * When `ORT_API_MANUAL_INIT` is defined, `DefaultInit()` returns nullptr, which will eventually cause the program to crash. The callers have violated the initialization contract by not calling one of the `Ort::InitApi` overloads. * When `ORT_API_MANUAL_INIT` is not defined, `DefaultInit()` uses a function-level static to compute the result of `OrtGetApiBase()->GetApi(ORT_API_VERSION)` once and return it. * `Ort::Global<void>` has been replaced with a non-templated type and moved inside a `detail` namespace. Since the `Global<void>` object was documented as being used internally, it is believed that these changes here are non-breaking, as they do not impact a public API. The public APIs, `Ort::InitApi()` and `Ort::InitApi(const OrtApi*)` remain unchanged. * Add `#pragma detect_mismatch` to surface issues with compilation units that disagree on how ORT_API_MANUAL_INIT is defined. (MSVC only.) --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * python GPU IO Bindings for NVIDIA (microsoft#25776) ### Description <!-- Describe your changes. --> 1. A Small change to use the shared allocator in Python binding. 2. Remove the FP64 support from the EP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The Python GPU IO binding is necessary for performance. The change will enable the shared allocator for GPU allocation. The FP64 was using the FP32 inference—aligned WRT TRT RTX support. --------- Co-authored-by: Gaurav Garg <gaugarg@nvidia.com> * [CANN] Add a `enable_cann_subgraph` feature parameter (microsoft#25867) ### Description Add a `enable_cann_subgraph` feature parameter. this parameter controls whether graph splitting is performed and can help quickly identify issues in certain scenarios. * [EP ABI] Add OpAttr_GetTensorAttributeAsOrtValue and replace the existing Node_GetTensorAttributeAsOrtValue (microsoft#25886) ### Description Replace `Node_GetTensorAttributeAsOrtValue` with `OpAttr_GetTensorAttributeAsOrtValue`. Change the API signature to make it one of the `OpAttr` interfaces instead of the `OrtNode` interface. The original API was added [here](microsoft#25566). * Language bindings for model compatibility API (microsoft#25878) ### Description This change builds on top of microsoft#25841 , and adds the scaffolding necessary to call into this API from C++ / C# / Python. ### Motivation and Context microsoft#25454 talks more about the broader notion of precompiled model compatibility. This change is directed at app developers whose apps may want to determine if a particular precompiled model (e.g. on a server somewhere) is compatible with the device where the application is running. There is functionality in `OrtEpFactory` for making this determination, which was exposed as a C API in microsoft#25841, and this change makes the API more broadly available in other languages. ### Testing and Validation Introduced new unit test cases across each language, and verified that the API was being called and returned the correct result for the default CPU EP. --------- Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com> * [QNN-EP] Introduce Level1 Transformer into qnn.preprocess (microsoft#25883) ### Description - Introduce Level1 Transformer into qnn.preprocess to support various optimizations. ### Motivation and Context - This change brings in several useful optimizations such as `ConvBnFusion` and `ConstantFolding`, which are part of `TransformerLevel::Level1` and can benefit QNNEP. - The goal is to optimize the ONNX model before quantization by integrating these passes into the Python tooling workflow. * [QNN EP] Minor fix weight name missing when not valid QDQ node group (microsoft#25887) ### Description Minor fix weight name missing when not valid QDQ node group ### Motivation and Context Some quantized model failed QDQ node group validation, the weights then won't be folded as initializer. QNN EP failed to handle the dynamic weights here due to the transpose op input name look up. This change make sure we process the weights tensor before adding transposes. * Add custom ops library_path to EP metadata (microsoft#25830) ## Summary Adds EP metadata library path support to enable custom ops DLL registration with proper path resolution. ## Changes - Added `library_path` metadata key to EP metadata infrastructure - Pass resolved library path directly to `EpLibraryProviderBridge` constructor - Simplified implementation per reviewer feedback (removed virtual method complexity) - Added `#include <utility>` for std::move compliance ## Purpose Enables downstream applications (like onnxruntime-genai) to resolve relative custom ops library paths using EP metadata, improving DLL registration reliability. ## Files Modified - `plugin_ep/ep_factory_provider_bridge.h` - `plugin_ep/ep_library.h` - `plugin_ep/ep_library_plugin.h` - `plugin_ep/ep_library_provider_bridge.cc` - `plugin_ep/ep_library_provider_bridge.h` - `utils.cc` * [OVEP] OpenVINO EP Features and bug-fixes for ORT-1.23 (microsoft#25884) ### Description This update introduces multiple improvements, fixes, and feature enhancements to the OpenVINO Execution Provider (OVEP) and related components in ONNX Runtime: #### Configuration & Properties - Updated load_config mapping to act as a passthrough to OpenVINO properties. - Added support for providing layout information to inputs/outputs in OpenVINO. #### Inference & Tensor Handling - Improved OVInferRequest::SetTensor to correctly handle cached binding shape mismatches. - Added support for self-detecting on-the-fly bfloat16 → float16 conversion. - Fixed issues with input ONNX models when used with shared execution contexts. #### Model Handling & Operator Support - Fixed model copying behavior for QDQ stripping. - Updated operator support status for OpenVINO 2025.2. #### Platform & Integration Fixes - Applied multiple PSU Lora fixes and related updates. - Resolved filename confusion issues with wrapped OVIRs in EPCtx. - Enabled memory-mapped native binaries for OpenVINO 2025.3. #### Quality & Maintenance - Addressed linting issues. - Fixed coverage gaps in OVEP. - Added a new test script for OpenVINO with ORT ABI integration. --------- Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com> Co-authored-by: Ryan Metcalfe <107415876+RyanMetcalfeInt8@users.noreply.github.com> Co-authored-by: Klimenko, Mikhail <mikhail.klimenko@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: Garth Long <garth.long@intel.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com> Co-authored-by: Eric Crawford <eric.r.crawford@intel.com> Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com> Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com> Co-authored-by: Javier Martinez <javier.e.martinez@intel.com> * [java] Auto EP and compile model support (microsoft#25131) ### Description Java API for compile model and EP discovery APIs. Roughly equivalent to the C# version in microsoft#24604. cc: @skottmckay. I haven't quite got the CMake configured so the Java tests for the ep registration only run when the ONNX Runtime shared provider support is built, but everything else works. I expect that to be a quick fix, but I'm not sure in what conditions it should be built and how we should handle it so I don't know where/when to plumb it through. ### Motivation and Context API parity for Java. * Add error handling to extract_nuget_files.ps1 (microsoft#25866) ### Description 1. Check process exit code when running 7z.exe . Currently the errors were silently ignored. 2. Add snld20 flag to the 7z.exe commands, which is needed to be compatible with the latest 7z release. * [Fix] illegal memory access in GetInputIndices with optional inputs (microsoft#25881) ### Description Fix illegal memory access in GetInputIndices with optional inputs ### Motivation and Context When an input is optional, its ValueInfo may be nullptr. The current implementation directly calls InputValueInfo->GetName(), leading to illegal memory access. Update logic to skip optional inputs when valueInfo is nullptr . * Re-enable cpuinfo for ARM64EC (microsoft#25863) ### Description <!-- Describe your changes. --> Re-enable cpuinfo for ARM64EC build and fix `CPUIDINFO_ARCH_ARM` so it is actually used. Patch cpuinfo to support vcpkg ARM64EC build. See pytorch/cpuinfo#324. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix for workaround in microsoft#25831. --------- Signed-off-by: Jonathan Clohessy <jonathan.clohessy@arm.com> Co-authored-by: derdeljan-msft <derdeljan@microsoft.com> Co-authored-by: Jonathan Clohessy <jonathan.clohessy@arm.com> Co-authored-by: Akshay Sonawane <111780983+apsonawane@users.noreply.github.com> Co-authored-by: Christopher Warrington <chwarr@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Ishwar Raut <iraut@nvidia.com> Co-authored-by: Gaurav Garg <gaugarg@nvidia.com> Co-authored-by: Xinpeng Dou <15529241576@163.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: adrastogi <aditya.rastogi@microsoft.com> Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com> Co-authored-by: qti-hungjuiw <hungjuiw@qti.qualcomm.com> Co-authored-by: qti-yuduo <yuduow@qti.qualcomm.com> Co-authored-by: Pradeep Sakhamoori <psakhamoori@microsoft.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com> Co-authored-by: Ryan Metcalfe <107415876+RyanMetcalfeInt8@users.noreply.github.com> Co-authored-by: Klimenko, Mikhail <mikhail.klimenko@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: Garth Long <garth.long@intel.com> Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com> Co-authored-by: Eric Crawford <eric.r.crawford@intel.com> Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com> Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com> Co-authored-by: Javier Martinez <javier.e.martinez@intel.com> Co-authored-by: Adam Pocock <adam.pocock@oracle.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: mingyue <131847423+mingyueliuh@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
1 parent be346bb commit 2f1ad9d

File tree

78 files changed

+3007
-509
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

78 files changed

+3007
-509
lines changed

cmake/CMakeLists.txt

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1607,7 +1607,6 @@ if ("${CMAKE_SYSTEM_NAME}" STREQUAL "Linux")
16071607
endif()
16081608
endif()
16091609

1610-
16111610
#Now the 'onnxruntime_EXTERNAL_LIBRARIES' variable should be sealed. It will be used in onnxruntime.cmake which will be included in the next.
16121611
#The order of the following targets matters. Right depends on left. If target A appears before target B. Then A.cmake can not use variables defined in B.cmake.
16131612
set(ONNXRUNTIME_CMAKE_FILES onnxruntime_flatbuffers onnxruntime_common onnxruntime_mlas onnxruntime_graph onnxruntime_lora onnxruntime_framework onnxruntime_util onnxruntime_providers onnxruntime_optimizer onnxruntime_session ${ONNXRUNTIME_EAGER_CMAKE_FILE_NAME})
@@ -1623,9 +1622,6 @@ if (onnxruntime_USE_WINML)
16231622
list(APPEND ONNXRUNTIME_CMAKE_FILES winml)
16241623
endif() # if (onnxruntime_USE_WINML)
16251624

1626-
if (onnxruntime_BUILD_APPLE_FRAMEWORK AND NOT ${CMAKE_SYSTEM_NAME} MATCHES "Darwin|iOS|visionOS|tvOS")
1627-
message(FATAL_ERROR "onnxruntime_BUILD_APPLE_FRAMEWORK can only be enabled for macOS or iOS or visionOS or tvOS.")
1628-
endif()
16291625
list(APPEND ONNXRUNTIME_CMAKE_FILES onnxruntime)
16301626

16311627
if (onnxruntime_BUILD_JAVA)
@@ -1690,8 +1686,8 @@ if (WIN32 AND NOT GDK_PLATFORM AND NOT CMAKE_CROSSCOMPILING)
16901686
endif()
16911687
endif()
16921688

1693-
foreach(target_name ${ONNXRUNTIME_CMAKE_FILES})
1694-
include(${target_name}.cmake)
1689+
foreach(onnxruntime_cmake_file ${ONNXRUNTIME_CMAKE_FILES})
1690+
include(${onnxruntime_cmake_file}.cmake)
16951691
endforeach()
16961692
if (UNIX)
16971693
option(BUILD_PKGCONFIG_FILES "Build and install pkg-config files" ON)

cmake/external/onnxruntime_external_deps.cmake

Lines changed: 27 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -313,41 +313,32 @@ onnxruntime_fetchcontent_makeavailable(nlohmann_json)
313313
if (onnxruntime_ENABLE_CPUINFO)
314314
# Adding pytorch CPU info library
315315
# TODO!! need a better way to find out the supported architectures
316-
list(LENGTH CMAKE_OSX_ARCHITECTURES CMAKE_OSX_ARCHITECTURES_LEN)
316+
set(CPUINFO_SUPPORTED FALSE)
317317
if (APPLE)
318+
list(LENGTH CMAKE_OSX_ARCHITECTURES CMAKE_OSX_ARCHITECTURES_LEN)
318319
if (CMAKE_OSX_ARCHITECTURES_LEN LESS_EQUAL 1)
319320
set(CPUINFO_SUPPORTED TRUE)
320-
elseif (onnxruntime_BUILD_APPLE_FRAMEWORK)
321-
# We stitch multiple static libraries together when onnxruntime_BUILD_APPLE_FRAMEWORK is true,
322-
# but that would not work for universal static libraries
323-
message(FATAL_ERROR "universal binary is not supported for apple framework")
324-
endif()
325-
else()
326-
# if xnnpack is enabled in a wasm build it needs clog from cpuinfo, but we won't internally use cpuinfo
327-
# so we don't set CPUINFO_SUPPORTED in the CXX flags below.
328-
if (CMAKE_SYSTEM_NAME STREQUAL "Emscripten" AND NOT onnxruntime_USE_XNNPACK)
329-
set(CPUINFO_SUPPORTED FALSE)
330321
else()
322+
message(WARNING "cpuinfo is not supported when CMAKE_OSX_ARCHITECTURES has more than one value.")
323+
endif()
324+
elseif (CMAKE_SYSTEM_NAME STREQUAL "Emscripten")
325+
# if xnnpack is enabled in a wasm build it needs clog from cpuinfo, but we won't internally use cpuinfo.
326+
if (onnxruntime_USE_XNNPACK)
331327
set(CPUINFO_SUPPORTED TRUE)
332328
endif()
333-
if (WIN32)
334-
# There's an error when linking with cpuinfo on arm64ec with a vcpkg build (--use_vcpkg).
335-
# TODO Fix it and then re-enable cpuinfo on arm64ec.
336-
if (onnxruntime_target_platform STREQUAL "ARM64EC")
337-
set(CPUINFO_SUPPORTED FALSE)
338-
else()
339-
set(CPUINFO_SUPPORTED TRUE)
340-
endif()
341-
elseif (NOT ${onnxruntime_target_platform} MATCHES "^(i[3-6]86|AMD64|x86(_64)?|armv[5-8].*|aarch64|arm64)$")
342-
message(WARNING
343-
"Target processor architecture \"${onnxruntime_target_platform}\" is not supported in cpuinfo. "
344-
"cpuinfo not included."
345-
)
346-
set(CPUINFO_SUPPORTED FALSE)
329+
elseif (WIN32)
330+
set(CPUINFO_SUPPORTED TRUE)
331+
else()
332+
if (onnxruntime_target_platform MATCHES "^(i[3-6]86|AMD64|x86(_64)?|armv[5-8].*|aarch64|arm64)$")
333+
set(CPUINFO_SUPPORTED TRUE)
334+
else()
335+
message(WARNING "Target processor architecture \"${onnxruntime_target_platform}\" is not supported in cpuinfo.")
347336
endif()
348337
endif()
349-
else()
350-
set(CPUINFO_SUPPORTED FALSE)
338+
339+
if(NOT CPUINFO_SUPPORTED)
340+
message(WARNING "onnxruntime_ENABLE_CPUINFO was set but cpuinfo is not supported.")
341+
endif()
351342
endif()
352343

353344
if (CPUINFO_SUPPORTED)
@@ -358,23 +349,26 @@ if (CPUINFO_SUPPORTED)
358349

359350
# if this is a wasm build with xnnpack (only type of wasm build where cpuinfo is involved)
360351
# we do not use cpuinfo in ORT code, so don't define CPUINFO_SUPPORTED.
361-
if (NOT CMAKE_SYSTEM_NAME STREQUAL "Emscripten")
362-
string(APPEND CMAKE_CXX_FLAGS " -DCPUINFO_SUPPORTED")
352+
if (CMAKE_SYSTEM_NAME STREQUAL "Emscripten" AND onnxruntime_USE_XNNPACK)
353+
else()
354+
add_compile_definitions(CPUINFO_SUPPORTED)
363355
endif()
364356

365-
366357
set(CPUINFO_BUILD_TOOLS OFF CACHE INTERNAL "")
367358
set(CPUINFO_BUILD_UNIT_TESTS OFF CACHE INTERNAL "")
368359
set(CPUINFO_BUILD_MOCK_TESTS OFF CACHE INTERNAL "")
369360
set(CPUINFO_BUILD_BENCHMARKS OFF CACHE INTERNAL "")
370361
if (onnxruntime_target_platform STREQUAL "ARM64EC" OR onnxruntime_target_platform STREQUAL "ARM64")
371-
message(STATUS "Applying a patch for Windows ARM64/ARM64EC in cpuinfo")
362+
message(STATUS "Applying patches for Windows ARM64/ARM64EC in cpuinfo")
372363
onnxruntime_fetchcontent_declare(
373364
pytorch_cpuinfo
374365
URL ${DEP_URL_pytorch_cpuinfo}
375366
URL_HASH SHA1=${DEP_SHA1_pytorch_cpuinfo}
376367
EXCLUDE_FROM_ALL
377-
PATCH_COMMAND ${Patch_EXECUTABLE} -p1 < ${PROJECT_SOURCE_DIR}/patches/cpuinfo/patch_cpuinfo_h_for_arm64ec.patch
368+
PATCH_COMMAND
369+
${Patch_EXECUTABLE} -p1 < ${PROJECT_SOURCE_DIR}/patches/cpuinfo/patch_cpuinfo_h_for_arm64ec.patch &&
370+
# https://github.com/pytorch/cpuinfo/pull/324
371+
${Patch_EXECUTABLE} -p1 < ${PROJECT_SOURCE_DIR}/patches/cpuinfo/patch_vcpkg_arm64ec_support.patch
378372
FIND_PACKAGE_ARGS NAMES cpuinfo
379373
)
380374
else()
@@ -584,8 +578,7 @@ endif()
584578

585579
set(onnxruntime_EXTERNAL_LIBRARIES ${onnxruntime_EXTERNAL_LIBRARIES_XNNPACK} ${WIL_TARGET} nlohmann_json::nlohmann_json
586580
onnx onnx_proto ${PROTOBUF_LIB} re2::re2 Boost::mp11 safeint_interface
587-
flatbuffers::flatbuffers ${GSL_TARGET} ${ABSEIL_LIBS} date::date
588-
${ONNXRUNTIME_CLOG_TARGET_NAME} Eigen3::Eigen)
581+
flatbuffers::flatbuffers ${GSL_TARGET} ${ABSEIL_LIBS} date::date Eigen3::Eigen)
589582

590583
# The source code of onnx_proto is generated, we must build this lib first before starting to compile the other source code that uses ONNX protobuf types.
591584
# The other libs do not have the problem. All the sources are already there. We can compile them in any order.

cmake/onnxruntime.cmake

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -350,8 +350,19 @@ if (winml_is_inbox)
350350
endif()
351351
endif()
352352

353-
# Assemble the Apple static framework (iOS and macOS)
353+
# Assemble the Apple static framework
354354
if(onnxruntime_BUILD_APPLE_FRAMEWORK)
355+
if (NOT CMAKE_SYSTEM_NAME MATCHES "Darwin|iOS|visionOS|tvOS")
356+
message(FATAL_ERROR "onnxruntime_BUILD_APPLE_FRAMEWORK can only be enabled for macOS or iOS or visionOS or tvOS.")
357+
endif()
358+
359+
list(LENGTH CMAKE_OSX_ARCHITECTURES CMAKE_OSX_ARCHITECTURES_LEN)
360+
if (CMAKE_OSX_ARCHITECTURES_LEN GREATER 1)
361+
# We stitch multiple static libraries together when onnxruntime_BUILD_APPLE_FRAMEWORK is true,
362+
# but that would not work for universal static libraries
363+
message(FATAL_ERROR "universal binary is not supported for apple framework")
364+
endif()
365+
355366
# when building for mac catalyst, the CMAKE_OSX_SYSROOT is set to MacOSX as well, to avoid duplication,
356367
# we specify as `-macabi` in the name of the output static apple framework directory.
357368
if (PLATFORM_NAME STREQUAL "macabi")

cmake/onnxruntime_common.cmake

Lines changed: 4 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -194,59 +194,10 @@ if(APPLE)
194194
target_link_libraries(onnxruntime_common PRIVATE "-framework Foundation")
195195
endif()
196196

197-
if(MSVC)
198-
if(onnxruntime_target_platform STREQUAL "ARM64")
199-
set(ARM64 TRUE)
200-
elseif (onnxruntime_target_platform STREQUAL "ARM")
201-
set(ARM TRUE)
202-
elseif(onnxruntime_target_platform STREQUAL "x64")
203-
set(X64 TRUE)
204-
elseif(onnxruntime_target_platform STREQUAL "x86")
205-
set(X86 TRUE)
206-
endif()
207-
elseif(APPLE)
208-
if(CMAKE_OSX_ARCHITECTURES_LEN LESS_EQUAL 1)
209-
set(X64 TRUE)
210-
endif()
211-
elseif(NOT CMAKE_SYSTEM_NAME STREQUAL "Emscripten")
212-
if (CMAKE_SYSTEM_NAME STREQUAL "Android")
213-
if (CMAKE_ANDROID_ARCH_ABI STREQUAL "armeabi-v7a")
214-
set(ARM TRUE)
215-
elseif (CMAKE_ANDROID_ARCH_ABI STREQUAL "arm64-v8a")
216-
set(ARM64 TRUE)
217-
elseif (CMAKE_ANDROID_ARCH_ABI STREQUAL "x86_64")
218-
set(X86_64 TRUE)
219-
elseif (CMAKE_ANDROID_ARCH_ABI STREQUAL "x86")
220-
set(X86 TRUE)
221-
endif()
222-
else()
223-
execute_process(
224-
COMMAND ${CMAKE_C_COMPILER} -dumpmachine
225-
OUTPUT_VARIABLE dumpmachine_output
226-
ERROR_QUIET
227-
)
228-
if(dumpmachine_output MATCHES "^arm64.*")
229-
set(ARM64 TRUE)
230-
elseif(dumpmachine_output MATCHES "^arm.*")
231-
set(ARM TRUE)
232-
elseif(dumpmachine_output MATCHES "^aarch64.*")
233-
set(ARM64 TRUE)
234-
elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "^riscv64.*")
235-
set(RISCV64 TRUE)
236-
elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "^(i.86|x86?)$")
237-
set(X86 TRUE)
238-
elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "^(x86_64|amd64)$")
239-
set(X86_64 TRUE)
240-
endif()
241-
endif()
242-
endif()
243-
244-
if (RISCV64 OR ARM64 OR ARM OR X86 OR X64 OR X86_64)
245-
# Link cpuinfo if supported
246-
if (CPUINFO_SUPPORTED)
247-
onnxruntime_add_include_to_target(onnxruntime_common cpuinfo::cpuinfo)
248-
list(APPEND onnxruntime_EXTERNAL_LIBRARIES cpuinfo::cpuinfo ${ONNXRUNTIME_CLOG_TARGET_NAME})
249-
endif()
197+
if(CPUINFO_SUPPORTED)
198+
# Link cpuinfo if supported
199+
onnxruntime_add_include_to_target(onnxruntime_common cpuinfo::cpuinfo)
200+
list(APPEND onnxruntime_EXTERNAL_LIBRARIES cpuinfo::cpuinfo)
250201
endif()
251202

252203
if (NOT onnxruntime_BUILD_SHARED_LIB)

cmake/onnxruntime_java.cmake

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@ if (WIN32)
159159
if(NOT onnxruntime_ENABLE_STATIC_ANALYSIS)
160160
add_custom_command(TARGET onnxruntime4j_jni POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different $<TARGET_FILE:onnxruntime> ${JAVA_PACKAGE_LIB_DIR}/$<TARGET_FILE_NAME:onnxruntime>)
161161
add_custom_command(TARGET onnxruntime4j_jni POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different $<TARGET_FILE:onnxruntime4j_jni> ${JAVA_PACKAGE_JNI_DIR}/$<TARGET_FILE_NAME:onnxruntime4j_jni>)
162-
if (onnxruntime_USE_CUDA OR onnxruntime_USE_DNNL OR onnxruntime_USE_OPENVINO OR onnxruntime_USE_TENSORRT OR (onnxruntime_USE_QNN AND NOT onnxruntime_BUILD_QNN_EP_STATIC_LIB))
162+
if (TARGET onnxruntime_providers_shared)
163163
add_custom_command(TARGET onnxruntime4j_jni POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different $<TARGET_FILE:onnxruntime_providers_shared> ${JAVA_PACKAGE_LIB_DIR}/$<TARGET_FILE_NAME:onnxruntime_providers_shared>)
164164
endif()
165165
if (onnxruntime_USE_CUDA)
@@ -207,7 +207,7 @@ if (WIN32)
207207
else()
208208
add_custom_command(TARGET onnxruntime4j_jni POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different $<TARGET_FILE:onnxruntime> ${JAVA_PACKAGE_LIB_DIR}/$<TARGET_LINKER_FILE_NAME:onnxruntime>)
209209
add_custom_command(TARGET onnxruntime4j_jni POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different $<TARGET_FILE:onnxruntime4j_jni> ${JAVA_PACKAGE_JNI_DIR}/$<TARGET_LINKER_FILE_NAME:onnxruntime4j_jni>)
210-
if (onnxruntime_USE_CUDA OR onnxruntime_USE_DNNL OR onnxruntime_USE_OPENVINO OR onnxruntime_USE_TENSORRT OR (onnxruntime_USE_QNN AND NOT onnxruntime_BUILD_QNN_EP_STATIC_LIB))
210+
if (TARGET onnxruntime_providers_shared)
211211
add_custom_command(TARGET onnxruntime4j_jni POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different $<TARGET_FILE:onnxruntime_providers_shared> ${JAVA_PACKAGE_LIB_DIR}/$<TARGET_LINKER_FILE_NAME:onnxruntime_providers_shared>)
212212
endif()
213213
if (onnxruntime_USE_CUDA)

cmake/onnxruntime_nodejs.cmake

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ include(node_helper.cmake)
1010

1111
# setup ARCH
1212
if (APPLE)
13+
list(LENGTH CMAKE_OSX_ARCHITECTURES CMAKE_OSX_ARCHITECTURES_LEN)
1314
if (CMAKE_OSX_ARCHITECTURES_LEN GREATER 1)
1415
message(FATAL_ERROR "CMake.js does not support multi-architecture for macOS")
1516
endif()

cmake/onnxruntime_unittests.cmake

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1640,6 +1640,10 @@ if (NOT CMAKE_SYSTEM_NAME STREQUAL "Emscripten")
16401640
add_custom_command(TARGET onnxruntime_providers_qnn POST_BUILD
16411641
COMMAND ${CMAKE_COMMAND} -E copy ${QNN_LIB_FILES} ${JAVA_NATIVE_TEST_DIR})
16421642
endif()
1643+
if (WIN32)
1644+
set(EXAMPLE_PLUGIN_EP_DST_FILE_NAME $<IF:$<BOOL:${WIN32}>,$<TARGET_FILE_NAME:example_plugin_ep>,$<TARGET_LINKER_FILE_NAME:example_plugin_ep>>)
1645+
add_custom_command(TARGET custom_op_library POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different $<TARGET_FILE:example_plugin_ep> ${JAVA_NATIVE_TEST_DIR}/${EXAMPLE_PLUGIN_EP_DST_FILE_NAME})
1646+
endif()
16431647

16441648
# delegate to gradle's test runner
16451649

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
diff --git a/CMakeLists.txt b/CMakeLists.txt
2+
index aedc983..dab589e 100644
3+
--- a/CMakeLists.txt
4+
+++ b/CMakeLists.txt
5+
@@ -72,6 +72,17 @@ IF(CMAKE_SYSTEM_NAME MATCHES "FreeBSD" AND CPUINFO_TARGET_PROCESSOR STREQUAL "am
6+
ENDIF()
7+
IF(IS_APPLE_OS AND CMAKE_OSX_ARCHITECTURES MATCHES "^(x86_64|arm64.*)$")
8+
SET(CPUINFO_TARGET_PROCESSOR "${CMAKE_OSX_ARCHITECTURES}")
9+
+ELSEIF(MSVC AND CMAKE_VERSION VERSION_GREATER_EQUAL "3.10")
10+
+ # Use CMAKE_C_COMPILER_ARCHITECTURE_ID. MSVC values are documented as available since CMake 3.10.
11+
+ IF(CMAKE_C_COMPILER_ARCHITECTURE_ID STREQUAL "X86")
12+
+ SET(CPUINFO_TARGET_PROCESSOR "x86")
13+
+ ELSEIF(CMAKE_C_COMPILER_ARCHITECTURE_ID STREQUAL "x64")
14+
+ SET(CPUINFO_TARGET_PROCESSOR "x86_64")
15+
+ ELSEIF(CMAKE_C_COMPILER_ARCHITECTURE_ID MATCHES "^(ARM64|ARM64EC)$")
16+
+ SET(CPUINFO_TARGET_PROCESSOR "arm64")
17+
+ ELSE()
18+
+ MESSAGE(FATAL_ERROR "Unsupported MSVC compiler architecture ID \"${CMAKE_C_COMPILER_ARCHITECTURE_ID}\"")
19+
+ ENDIF()
20+
ELSEIF(CMAKE_GENERATOR MATCHES "^Visual Studio " AND CMAKE_VS_PLATFORM_NAME)
21+
IF(CMAKE_VS_PLATFORM_NAME STREQUAL "Win32")
22+
SET(CPUINFO_TARGET_PROCESSOR "x86")
23+
@@ -88,7 +99,7 @@ ENDIF()
24+
25+
# ---[ Build flags
26+
SET(CPUINFO_SUPPORTED_PLATFORM TRUE)
27+
-IF(NOT CMAKE_SYSTEM_PROCESSOR)
28+
+IF(NOT CPUINFO_TARGET_PROCESSOR)
29+
IF(NOT IOS)
30+
MESSAGE(WARNING
31+
"Target processor architecture is not specified. "
32+
@@ -201,12 +212,12 @@ IF(CPUINFO_SUPPORTED_PLATFORM)
33+
src/arm/linux/chipset.c
34+
src/arm/linux/midr.c
35+
src/arm/linux/hwcap.c)
36+
- IF(CMAKE_SYSTEM_PROCESSOR MATCHES "^armv[5-8]")
37+
+ IF(CPUINFO_TARGET_PROCESSOR MATCHES "^armv[5-8]")
38+
LIST(APPEND CPUINFO_SRCS src/arm/linux/aarch32-isa.c)
39+
IF(CMAKE_SYSTEM_NAME STREQUAL "Android" AND ANDROID_ABI STREQUAL "armeabi")
40+
SET_SOURCE_FILES_PROPERTIES(src/arm/linux/aarch32-isa.c PROPERTIES COMPILE_FLAGS -marm)
41+
ENDIF()
42+
- ELSEIF(CMAKE_SYSTEM_PROCESSOR MATCHES "^(aarch64|arm64)$")
43+
+ ELSEIF(CPUINFO_TARGET_PROCESSOR MATCHES "^(aarch64|arm64)$")
44+
LIST(APPEND CPUINFO_SRCS src/arm/linux/aarch64-isa.c)
45+
ENDIF()
46+
ELSEIF(IS_APPLE_OS AND CPUINFO_TARGET_PROCESSOR MATCHES "arm64.*")
47+
@@ -395,7 +406,7 @@ IF(CPUINFO_SUPPORTED_PLATFORM AND CPUINFO_BUILD_MOCK_TESTS)
48+
TARGET_COMPILE_DEFINITIONS(cpuinfo_mock PRIVATE _GNU_SOURCE=1)
49+
ENDIF()
50+
51+
- IF(CMAKE_SYSTEM_NAME STREQUAL "Android" AND CMAKE_SYSTEM_PROCESSOR MATCHES "^(armv5te|armv7-a)$")
52+
+ IF(CMAKE_SYSTEM_NAME STREQUAL "Android" AND CPUINFO_TARGET_PROCESSOR MATCHES "^(armv5te|armv7-a)$")
53+
ADD_EXECUTABLE(atm7029b-tablet-test test/mock/atm7029b-tablet.cc)
54+
TARGET_INCLUDE_DIRECTORIES(atm7029b-tablet-test BEFORE PRIVATE test/mock)
55+
TARGET_LINK_LIBRARIES(atm7029b-tablet-test PRIVATE cpuinfo_mock gtest)
56+
@@ -577,7 +588,7 @@ IF(CPUINFO_SUPPORTED_PLATFORM AND CPUINFO_BUILD_MOCK_TESTS)
57+
ADD_TEST(NAME xperia-sl-test COMMAND xperia-sl-test)
58+
ENDIF()
59+
60+
- IF(CMAKE_SYSTEM_NAME STREQUAL "Android" AND CMAKE_SYSTEM_PROCESSOR MATCHES "^(armv5te|armv7-a|aarch64)$")
61+
+ IF(CMAKE_SYSTEM_NAME STREQUAL "Android" AND CPUINFO_TARGET_PROCESSOR MATCHES "^(armv5te|armv7-a|aarch64)$")
62+
ADD_EXECUTABLE(alcatel-revvl-test test/mock/alcatel-revvl.cc)
63+
TARGET_INCLUDE_DIRECTORIES(alcatel-revvl-test BEFORE PRIVATE test/mock)
64+
TARGET_LINK_LIBRARIES(alcatel-revvl-test PRIVATE cpuinfo_mock gtest)
65+
@@ -774,7 +785,7 @@ IF(CPUINFO_SUPPORTED_PLATFORM AND CPUINFO_BUILD_MOCK_TESTS)
66+
ADD_TEST(NAME xperia-c4-dual-test COMMAND xperia-c4-dual-test)
67+
ENDIF()
68+
69+
- IF(CMAKE_SYSTEM_NAME STREQUAL "Android" AND CMAKE_SYSTEM_PROCESSOR MATCHES "^(i686|x86_64)$")
70+
+ IF(CMAKE_SYSTEM_NAME STREQUAL "Android" AND CPUINFO_TARGET_PROCESSOR MATCHES "^(i686|x86_64)$")
71+
ADD_EXECUTABLE(alldocube-iwork8-test test/mock/alldocube-iwork8.cc)
72+
TARGET_INCLUDE_DIRECTORIES(alldocube-iwork8-test BEFORE PRIVATE test/mock)
73+
TARGET_LINK_LIBRARIES(alldocube-iwork8-test PRIVATE cpuinfo_mock gtest)
74+
@@ -831,7 +842,7 @@ IF(CPUINFO_SUPPORTED_PLATFORM AND CPUINFO_BUILD_UNIT_TESTS)
75+
ADD_TEST(NAME brand-string-test COMMAND brand-string-test)
76+
ENDIF()
77+
78+
- IF(CMAKE_SYSTEM_NAME STREQUAL "Android" AND CMAKE_SYSTEM_PROCESSOR MATCHES "^(armv[5-8].*|aarch64)$")
79+
+ IF(CMAKE_SYSTEM_NAME STREQUAL "Android" AND CPUINFO_TARGET_PROCESSOR MATCHES "^(armv[5-8].*|aarch64)$")
80+
ADD_LIBRARY(android_properties_interface STATIC test/name/android-properties-interface.c)
81+
CPUINFO_TARGET_ENABLE_C99(android_properties_interface)
82+
CPUINFO_TARGET_RUNTIME_LIBRARY(android_properties_interface)
83+
@@ -879,7 +890,7 @@ IF(CPUINFO_SUPPORTED_PLATFORM AND CPUINFO_BUILD_TOOLS)
84+
TARGET_LINK_LIBRARIES(cache-info PRIVATE cpuinfo)
85+
INSTALL(TARGETS cache-info RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
86+
87+
- IF(CMAKE_SYSTEM_NAME MATCHES "^(Android|Linux)$" AND CMAKE_SYSTEM_PROCESSOR MATCHES "^(armv[5-8].*|aarch64)$")
88+
+ IF(CMAKE_SYSTEM_NAME MATCHES "^(Android|Linux)$" AND CPUINFO_TARGET_PROCESSOR MATCHES "^(armv[5-8].*|aarch64)$")
89+
ADD_EXECUTABLE(auxv-dump tools/auxv-dump.c)
90+
CPUINFO_TARGET_ENABLE_C99(auxv-dump)
91+
CPUINFO_TARGET_RUNTIME_LIBRARY(auxv-dump)

0 commit comments

Comments
 (0)