-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
ENH: add dtype_from_format option to preserve Excel text formatting #63037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Overall, this seems far too complex an implementation for the feature being implemented. Why can we not just determine the format upon reading each cell and react appropriately?
In addition, it does not seem to me pandas should ever read a cell as numeric if that cell is text. I think we should not implement this as a flag.
| ordered_levels.append(level_idx) | ||
| return ordered_levels | ||
|
|
||
| def _convert_index_labels(self, index, levels_to_convert: list[int]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like code got duplicated?
| ordered_levels.append(level_idx) | ||
| return ordered_levels | ||
|
|
||
| def _convert_index_labels(self, index, levels_to_convert: list[int]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks the same?
| This behavior currently applies to the ``openpyxl`` and ``xlrd`` engines. Other | ||
| engines simply ignore the flag until text format detection is implemented for | ||
| them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm negative on diverging behavior between readers unless absolutely necessary.
| @staticmethod | ||
| def _parser_engine(parser): | ||
| return getattr(parser, "_engine", parser) | ||
|
|
||
| @classmethod | ||
| def _parser_attr(cls, parser, attribute: str): | ||
| if hasattr(parser, attribute): | ||
| return getattr(parser, attribute) | ||
| engine = cls._parser_engine(parser) | ||
| if engine is not parser and hasattr(engine, attribute): | ||
| return getattr(engine, attribute) | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these necessary?
|
|
||
| @staticmethod | ||
| def _cell_is_text_formatted(cell) -> bool: | ||
| number_format = getattr(cell, "number_format", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does cell have and not have the number_format attribute?
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.