@@ -89,15 +89,6 @@ or convert from existing pandas data:
8989 However there are four distinct :class: `StringDtype ` variants that may be utilized.
9090See :ref: `text.four_string_variants ` section below for details.
9191
92- .. _text.differences :
93-
94- Behavior differences
95- ====================
96-
97- There are various behavior differences between using NumPy ``object `` dtype,
98- ``dtype="str" ``, and ``dtype="string" ``. See the
99- :ref: `String migration guide <string_migration_guide-differences >` section for further details.
100-
10192.. _text.string_methods :
10293
10394String methods
@@ -686,6 +677,77 @@ String ``Index`` also supports ``get_dummies`` which returns a ``MultiIndex``.
686677
687678 See also :func: `~pandas.get_dummies `.
688679
680+ .. _text.differences :
681+
682+ Behavior differences
683+ ====================
684+
685+ Differences in behavior will be primarily due to the kind of NA value.
686+
687+ ``StringDtype `` with ``np.nan `` NA values
688+ -----------------------------------------
689+
690+ 1. Like ``dtype="object" ``, :ref: `string accessor methods<api.series.str> `
691+ that return **integer ** output will return a NumPy array that is
692+ either dtype int or float depending on the presence of NA values.
693+ Methods returning **boolean ** output will return a NumPy array this is
694+ dtype bool, with the value ``False `` when an NA value is encountered.
695+
696+ .. ipython :: python
697+
698+ s = pd.Series([" a" , None , " b" ], dtype = " str" )
699+ s
700+ s.str.count(" a" )
701+ s.dropna().str.count(" a" )
702+
703+ When NA values are present, the output dtype is float64. However
704+ **boolean ** output results in ``False `` for the NA values.
705+
706+ .. ipython :: python
707+
708+ s.str.isdigit()
709+ s.str.match(" a" )
710+
711+ 2. Some string methods, like :meth: `Series.str.decode `, are not
712+ available because the underlying array can only contain
713+ strings, not bytes.
714+ 3. Comparison operations will return a NumPy array with dtype bool. Missing
715+ values will always compare as unequal just as :attr: `np.nan ` does.
716+
717+ ``StringDtype `` with ``pd.NA `` NA values
718+ ----------------------------------------
719+
720+ 1. :ref: `String accessor methods<api.series.str> `
721+ that return **integer ** output will always return a nullable integer dtype,
722+ rather than either int or float dtype (depending on the presence of NA values).
723+ Methods returning **boolean ** output will return a nullable boolean dtype.
724+
725+ .. ipython :: python
726+
727+ s = pd.Series([" a" , None , " b" ], dtype = " string" )
728+ s
729+ s.str.count(" a" )
730+ s.dropna().str.count(" a" )
731+
732+ Both outputs are ``Int64 `` dtype. Similarly for methods returning boolean values.
733+
734+ .. ipython :: python
735+
736+ s.str.isdigit()
737+ s.str.match(" a" )
738+
739+ 2. Some string methods, like :meth: `Series.str.decode ` because the underlying
740+ array can only contain strings, not bytes.
741+ 3. Comparison operations will return an object with :class: `BooleanDtype `,
742+ rather than a ``bool `` dtype object. Missing values will propagate
743+ in comparison operations, rather than always comparing
744+ unequal like :attr: `numpy.nan `.
745+
746+
747+ .. important ::
748+ Everything else that follows in the rest of this document applies equally to
749+ ``'str' ``, ``'string' ``, and ``object `` dtype.
750+
689751.. _text.four_string_variants :
690752
691753The four :class: `StringDtype ` variants
0 commit comments