-
Notifications
You must be signed in to change notification settings - Fork 122
3288 - CSV file empty headers fix #3473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3288 - CSV file empty headers fix #3473
Conversation
|
@igorbeslic could you review this PR? |
igorbeslic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to see you made it!! It looks good.
Go through my comments and make requested changes. We can sync in Monday 10th.
| BufferedReader bufferedReader, ReadConfiguration configuration) throws IOException { | ||
|
|
||
| MappingIterator<Object> iterator; | ||
| Iterator<CSVRecord> iterator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not declare iterator here at all. Instead, just return iterator from method immediately after it is ready. either
- csvParser.iterator(); if there is header line or
- return csvFormat.parse(bufferedReader).iterator(); if no header line
NOTE that method doesn't need else clause. Content can stays and code is more readable without else.
| .readerForListOf(String.class) | ||
| .with(CsvParser.Feature.WRAP_AS_ARRAY) | ||
| .readValues(bufferedReader); | ||
| CSVFormat csvFormat = CSVFormat.DEFAULT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karlocehulic19 please verify this. If DEFAULT format is hardcoded here it means that we can read CSV which is not in DEFAULT format only if CSV file has header.
We should be able to parse CSV with arbitrary delimiter no matter if header is present or not.
| .get(); | ||
|
|
||
| CSVParser csvParser = csvFormat.parse(bufferedReader); | ||
| iterator = csvParser.iterator(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
Make inner public class CSVHeaderBuilder with method String[] asArray(String headerLine) which can be called:
CSVHeaderBuilder.asArray(String headerLine)
Simplify code now:
declare headerRow=null;
if configuration.headerRow() {
headerRow = CSVHeaderBuilder.asArray(bufferedReader.readLine())
}
CSVFormat csvFormat = CSVFormat.Builder.create()
.setIgnoreEmptyLines(false)
.setDelimiter(delimiter)
.setHeader(usableHeaderRow)
.get();
Look, now you have covered both, if header line is present or not...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always commit TEST source files separately from regular java source files. Test java code relates to runtime java code but those two are different worlds and can live separately.
For us, it is easier to drop commit and merge other commits with test java sources are good (and vice-versa).
| BufferedReader bufferedReader, ReadConfiguration configuration) throws IOException { | ||
|
|
||
| Iterator<CSVRecord> iterator; | ||
| List<String> regexSpecials = Arrays.asList(".", "+", "*", "?", "^", "$", "(", ")", "[", "]", "{", "}", "|", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use different name for this variable. I perfectly understand the goal. I think better name for reserved characters would make code more readable.
4d91fba to
51b6e41
Compare
|
@igorbeslic, I believe I fixed all requested changes mentioned in commit b6fb396. I am not sure why CI failed, I did gradle checks, and they passed locally. I also added another test case for csv file without a header row, but with a special delimiter (This test prevents CsvFormat.DEFAULT implementation to pass), can easily be dropped if it's not needed. We can sync up if needed. |
…nted conversions and exceptions
… place, avoid duplication)
This creates an parser for header that explicitly wraps null Apaches CSV pareser is mandatory Jackson's parser doesn't allow for custom headers
51b6e41 to
fa4415a
Compare
|
@karlocehulic19 we merged... there are couple technical details that we have to go through. We disabled test because read logic is not good... This is HINT... |
Description
Enables empty and duplicate header names by creating a small header parser and then using the parsed header string. This is a draft PR to confirm that this implementation satisfies CSV component usage needs.
Fixes #3288
This successfully passes all integration and unit tests for the CSV component, but it fails in the Gradle check, and I have yet to figure out why, after I confirm this is a valid approach for the given issue.