Skip to content

Conversation

@yashwantbezawada
Copy link

@yashwantbezawada yashwantbezawada commented Nov 6, 2025

Description

Fixes #62915

When concatenating DatetimeIndex objects across DST transitions, the frequency preservation logic in _concat_same_type was failing because it used naive addition that didn't account for timezone offset changes at DST boundaries.

Problem

At line 2386 in pandas/core/arrays/datetimelike.py, the code checked:

if all(pair[0][-1] + obj.freq == pair[1][0] for pair in pairs):

When crossing a DST boundary (e.g., Europe/Helsinki on 2025-10-27):

  • pair[0][-1] = 2025-10-26 00:00:00+03:00
  • pair[0][-1] + Day() = 2025-10-27 00:00:00+03:00 (naive add, offset unchanged)
  • pair[1][0] = 2025-10-27 00:00:00+02:00 (actual value after DST)

These timestamps are not equal due to different UTC offsets, causing the assertion to fail even though they represent consecutive days.

Solution

For fixed (Tick) frequencies like Day, Hour, etc., the fix compares the underlying int64 values (UTC nanoseconds since epoch) instead of relying on timezone-aware arithmetic:

freq_nanos = obj.freq.nanos
if all(pair[1][0]._value - pair[0][-1]._value == freq_nanos for pair in pairs):

This correctly identifies consecutive timestamps regardless of DST transitions.

For non-fixed frequencies like MonthEnd or BusinessDay where freq.nanos raises ValueError, the code falls back to the original comparison method:

except (ValueError, AttributeError):
    # Non-fixed frequency, fall back to original comparison
    pairs_match = all(pair[0][-1] + obj.freq == pair[1][0] for pair in pairs)

Testing

Added test_union_dst_boundary in test_setops.py that reproduces the exact scenario from the issue report with Europe/Helsinki DST transition.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New tests added to cover the fix

When concatenating DatetimeIndex objects across DST transitions, the
frequency preservation logic was using naive addition that didn't account
for timezone offsets changing at DST boundaries. This caused the assertion
`pair[0][-1] + obj.freq == pair[1][0]` to fail even when the indexes
were legitimately consecutive.

For fixed (Tick) frequencies like Day, Hour, etc., the fix compares the
underlying int64 values (UTC nanoseconds) instead of relying on timezone-aware
arithmetic. This correctly identifies consecutive timestamps regardless of
DST transitions.

For non-fixed frequencies like MonthEnd or BusinessDay, the code falls back
to the original comparison method since freq.nanos is not available for
these offset types.

Closes pandas-dev#62915
@yashwantbezawada
Copy link
Author

Closing this PR as the bug is already fixed on main by commit 1bd1830 (#61985). I was incorrectly testing with pandas 2.3.3 (released version) instead of the main development branch. The fix is already in place and just needs a regression test, which others have already claimed. Thank you @rhshadrach for clarifying!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: index.union fails at DST boundary

1 participant