CoW: add readonly flag to ExtensionArrays, return read-only EA/ndarray in .array/EA.to_numpy() #61925

jorisvandenbossche · 2025-07-22T22:51:04Z

Addresses one of the remaining TODO items from #48998

Similar as #51082 and some follow-up PRs, ensuring we also mark EAs as read-only like we do for numpy arrays, when the user gets the underlying EA from a pandas object.
For that purpose, added a _readonly attribute to the EA class that is False by default.

Still need to add more tests and fix a bunch of tests

closes #58007

…y in .array/EA.to_numpy()

simonjayhawkins · 2025-07-23T10:16:18Z

pandas/core/arrays/base.py

    #  strictly less than 2000 to be below Index.__pandas_priority__.
    __pandas_priority__ = 1000

+    _readonly = False


why not use arr.flags.writeable to be consistent with numpy?

Because this was easier for a quick POC ;)
It would indeed keep it more consistent in usage, so that might be a reason to add a flags attribute, so code that needs to work with both ndarray or EA can use one code path. But I don't think we would ever add any of the other flags that numpy has, so not sure it would then be worth to add a nested attribute for this.

pandas/_libs/ops.pyx

jbrockmendel · 2025-07-24T16:12:16Z

pandas/core/arrays/base.py

+        elif self._readonly and astype_is_view(self.dtype, result.dtype):
+            # If the ExtensionArray is readonly, make the numpy array readonly too
+            result = result.view()
+            result.flags.writeable = False


should this be done below the setting of na_value on L616?

I don't think so, because in that case the result array is already a copy, so no need to take a read-only view in that case

pandas/tests/arrays/test_datetimelike.py

pandas/core/arrays/sparse/array.py

jbrockmendel · 2025-07-24T18:25:44Z

i get why .values and .array are made read-only, but why are we bothering with to_numpy?

jorisvandenbossche · 2025-07-24T22:13:37Z

That's a good question, I didn't really think about it deeply .. But so for the non-extension dtypes, we also did it for .values / __array__ and to_numpy() (#51082), and so followed along here.

I do think there is value in being consistent in those different ways to get a numpy array from the pandas object. So could also ask, why not for to_numpy()? And then compared to .values, to_numpy() actually gives you more control with the ability to ask for a copy.
(in practice the implementation of __array__ and to_numpy() are also quite overlapping for the EAs.

jbrockmendel · 2025-07-25T14:35:14Z

So could also ask, why not for to_numpy()?

I don't feel strongly about this, but asked in the first place because it seems most of the code complexity in this PR is driven by to_numpy changes. Without that, most of this is just boilerplate edits to __getitem__ methods.

The main reason i can think of to treat to_numpy different from .array and .values is that it has an explicit copy keyword. With copy=False, the user ideally understands that they are getting a view on existing data.

jorisvandenbossche · 2025-08-03T09:27:04Z

asked in the first place because it seems most of the code complexity in this PR is driven by to_numpy changes.

Looking at the diff again, I think it is a bit 50/50 between to_numpy() and __array__. But to_numpy() also reuses the result from __array__ in some cases, so if we would then want to have to_numpy() consistently not return readonly data, that would also requires some changes in to_numpy(). So regarding the implementation, not entirely sure this would be a lot simpler (but didn't look in detail).

The main reason i can think of to treat to_numpy different from .array and .values is that it has an explicit copy keyword. With copy=False, the user ideally understands that they are getting a view on existing data.

Yeah, we could potentially also make the default of copy to be None instead of False, with the same meaning (i.e. avoid a copy if possible), and so then if someone explicitly passes copy=False, then we wouldn't set the readonly flag.

From previous discussions (maybe #52823), I seem to remember that we at some point did bring up whether it would be worth having a keyword to control this behaviour, i.e. so there would be a way that you could ask for a numpy array that was guaranteed to be mutable. Of course you could do to_numpy(copy=True) which also guarantees that, but that doesn't cover the case where you want to get the data zero-copy if possible, and you know that mutating it is fine (for example because the holding dataframe or series is dismissed after converting).
At the moment, the documentation (https://pandas.pydata.org/docs/dev/user_guide/copy_on_write.html#read-only-numpy-arrays) suggests to manually reset the readonly flag:

arr = ser.to_numpy()
arr.flags.writeable = True

instead of adding a keyword like arr = ser.to_numpy(ensure_writable=True). But so in theory copy=False could also cover that.

(but this is probably a discussion for #52823)

pandas/core/arrays/_mixins.py

jorisvandenbossche · 2025-09-21T15:07:03Z

Are there any other public APIs with inplace keywords (that actually modify inplace) we should test a read only EA with?

Good point to verify. For the EA class itself, it seems we have some methods that can act inplace:

EA.fillna() if copy=False. For the default base impl, this uses self[:] and then normal setitem, so depending on the point below about getitem, this will be covered. But I can add a specific test to ensure we do this consistently also for other EAs which might have a custom implementation of fillna
__getitem__ with a slice returns a view, and should thus keep the read-only flag?
(EA.insert() does not work inplace)
EA._putmask() -> goes through setitem

For DataFrame et al, there are indeed some methods that allow inplace modifications, and that gives a problem with underlying read-only arrays, but that is not specific for EAs. That is also an issue already on main for default numpy dtypes if then end up using read-only arrays. This is essentially #62144 but then not specifically for __setiten__ but also for other inplace methods (so let's discuss this there?).

mroeschke

cc @jbrockmendel if you want to have another look

jbrockmendel · 2025-10-21T17:35:06Z

pandas/core/arrays/base.py

        """
        raise AbstractMethodError(self)

+    def _getitem_returns_view(self, key) -> bool:


do we expect anyone to override this? i.e. does it need to be a method? or just convenient to put it here?

i guess this makes it easy for subclass authors to use

We currently don't override it anywhere, I just put it here in the base class because it gets used in the __getitem__ implementation of various subclasses. But so indeed does not need to be a method.

I don't think I expect EA authors to override the method (or if override it, they essentially just define it for their own, it's not that the overriding will influence something else on the base class).
They might want to use it for their own getitem implementation, and at that point it is a bit easier to have it as a method instead of a helper function somewhere. But let's start conservative and not expose it to EA authors (we can always add it later)

jbrockmendel · 2025-10-21T17:41:13Z

pandas/tests/extension/test_sparse.py

+        result = data.fillna(data_missing[1])
+        assert result[0] == data_missing[1]
+
+        # copy=False is ignored -> so same result as above


this comment (in a few places) confused me until (i think) i figured out it refers to the keyword in the fillna method. could clarify for future-me

Looking back at it, the first comment above "copy=False keyword is not ignored by SparseArray.fillna" also seems wrong, because SparseArray.fillna just completely ignores copy.

Attempted to clarify this.

jbrockmendel · 2025-10-21T17:41:44Z

small comments, no objections

jbrockmendel · 2025-10-22T18:09:30Z

pandas/tests/extension/test_sparse.py

        super().test_fillna_no_op_returns_copy(data)

+    def test_fillna_readonly(self, data_missing):
+        # copy=False keyword is not ignored by SparseArray.fillna


not necessarily for this PR, but should we add an EA attribute specifying whether copy=False is ignored? that way we could avoid overriding tests?

Or could also be an attribute on the Test class

yah either way it can be saved for a follow-up

jbrockmendel · 2025-11-07T18:18:31Z

thanks @jorisvandenbossche

CoW: add readonly flag to ExtensionArrays, return read-only EA/ndarra…

a9df51b

…y in .array/EA.to_numpy()

jorisvandenbossche added the Copy / view semantics label Jul 22, 2025

jorisvandenbossche mentioned this pull request Dec 11, 2023

Copy-on-Write (PDEP-7) follow-up overview issue #48998

Open

38 tasks

jorisvandenbossche added 5 commits July 23, 2025 01:16

cleanup

9cd6e4f

fixup attribute name in tests

c6f37d1

fix tests

8058d9a

more test fixes

91465ee

add tests for .array being readonly

856dc02

jorisvandenbossche mentioned this pull request Jul 23, 2025

TST[string]: update expecteds for using_string_dtype to fix xfails #61727

Merged

7 tasks

simonjayhawkins reviewed Jul 23, 2025

View reviewed changes

jorisvandenbossche requested a review from jbrockmendel July 23, 2025 22:04

jbrockmendel reviewed Jul 24, 2025

View reviewed changes

pandas/_libs/ops.pyx Show resolved Hide resolved

jbrockmendel reviewed Jul 24, 2025

View reviewed changes

pandas/tests/arrays/test_datetimelike.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Jul 24, 2025

View reviewed changes

pandas/core/arrays/sparse/array.py Outdated Show resolved Hide resolved

Merge remote-tracking branch 'upstream/main' into cow-ea-readonly

828fadc

typing

ee1ed6e

mroeschke reviewed Aug 13, 2025

View reviewed changes

pandas/core/arrays/_mixins.py Outdated Show resolved Hide resolved

jorisvandenbossche added 3 commits August 19, 2025 17:14

Merge remote-tracking branch 'upstream/main' into cow-ea-readonly

3f7bc3e

Merge remote-tracking branch 'upstream/main' into cow-ea-readonly

1765fe7

address feedback: use _values in tests + add comment

a7abee3

jorisvandenbossche mentioned this pull request Sep 7, 2025

TYP: update EA.view() type annotation to indicate it returns an EA without dtype parameter #62285

Merged

jorisvandenbossche added 4 commits September 8, 2025 21:28

Merge remote-tracking branch 'upstream/main' into cow-ea-readonly

4b6ced0

update typing

f235aa3

Merge remote-tracking branch 'upstream/main' into cow-ea-readonly

f76bbc8

fix numpy test setup

5cfb0f8

Merge remote-tracking branch 'upstream/main' into cow-ea-readonly

0af4b39

jorisvandenbossche modified the milestones: 3.0, 2.3.3 Sep 21, 2025

jorisvandenbossche added 2 commits September 21, 2025 12:11

Merge remote-tracking branch 'upstream/main' into cow-ea-readonly

959451e

add whatsnew note

a4accf8

jorisvandenbossche mentioned this pull request Sep 21, 2025

BUG / API: setitem on pandas object fails if underlying numpy array is read-only #62144

Open

jorisvandenbossche added 2 commits September 21, 2025 17:14

Merge remote-tracking branch 'upstream/main' into cow-ea-readonly

d5d4db4

let getitem propagate readonly property

84e83c7

jorisvandenbossche modified the milestones: 2.3.3, 3.0 Sep 29, 2025

jorisvandenbossche added 6 commits October 8, 2025 19:35

Merge remote-tracking branch 'upstream/main' into cow-ea-readonly

6ec9b06

fixup merge

9f78d76

Merge remote-tracking branch 'upstream/main' into cow-ea-readonly

f6f300e

Merge remote-tracking branch 'upstream/main' into cow-ea-readonly

c243f75

fix typing

fa063bf

add test for fillna

ef4a36f

mroeschke approved these changes Oct 21, 2025

View reviewed changes

jbrockmendel reviewed Oct 21, 2025

View reviewed changes

jbrockmendel reviewed Oct 22, 2025

View reviewed changes

jorisvandenbossche added 3 commits November 5, 2025 21:03

Merge remote-tracking branch 'upstream/main' into cow-ea-readonly

3fcc02f

move getitem_returns_view from method to helper function

649f052

clarify comments in test_fillna_readonly

605716f

jbrockmendel merged commit f1904ae into pandas-dev:main Nov 7, 2025
42 checks passed

jbrockmendel mentioned this pull request Nov 8, 2025

not necessarily for this PR, but should we add an EA attribute specifying whether copy=False is ignored? that way we could avoid overriding tests? #63040

Open

jorisvandenbossche deleted the cow-ea-readonly branch November 10, 2025 08:29

jorisvandenbossche mentioned this pull request Nov 11, 2025

COMPAT: setting crs inplace for pandas 3+ where .array is a read-only view geopandas/geopandas#3672

Open

Uh oh!

CoW: add readonly flag to ExtensionArrays, return read-only EA/ndarray in .array/EA.to_numpy() #61925

CoW: add readonly flag to ExtensionArrays, return read-only EA/ndarray in .array/EA.to_numpy() #61925

Uh oh!

Conversation

jorisvandenbossche commented Jul 22, 2025 • edited by mroeschke Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jbrockmendel commented Jul 24, 2025

Uh oh!

jorisvandenbossche commented Jul 24, 2025

Uh oh!

jbrockmendel commented Jul 25, 2025

Uh oh!

jorisvandenbossche commented Aug 3, 2025

Uh oh!

Uh oh!

jorisvandenbossche commented Sep 21, 2025

Uh oh!

mroeschke left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Oct 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jbrockmendel commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jorisvandenbossche commented Jul 22, 2025 •

edited by mroeschke

Loading