Skip to content

Conversation

@karlocehulic19
Copy link
Contributor

@karlocehulic19 karlocehulic19 commented Nov 5, 2025

Description

Enables empty and duplicate header names by creating a small header parser and then using the parsed header string. This is a draft PR to confirm that this implementation satisfies CSV component usage needs.

Fixes #3288

This successfully passes all integration and unit tests for the CSV component, but it fails in the Gradle check, and I have yet to figure out why, after I confirm this is a valid approach for the given issue.

@karlocehulic19
Copy link
Contributor Author

@igorbeslic could you review this PR?

Copy link
Collaborator

@igorbeslic igorbeslic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to see you made it!! It looks good.

Go through my comments and make requested changes. We can sync in Monday 10th.

BufferedReader bufferedReader, ReadConfiguration configuration) throws IOException {

MappingIterator<Object> iterator;
Iterator<CSVRecord> iterator;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not declare iterator here at all. Instead, just return iterator from method immediately after it is ready. either

  • csvParser.iterator(); if there is header line or
  • return csvFormat.parse(bufferedReader).iterator(); if no header line
    NOTE that method doesn't need else clause. Content can stays and code is more readable without else.

.readerForListOf(String.class)
.with(CsvParser.Feature.WRAP_AS_ARRAY)
.readValues(bufferedReader);
CSVFormat csvFormat = CSVFormat.DEFAULT;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlocehulic19 please verify this. If DEFAULT format is hardcoded here it means that we can read CSV which is not in DEFAULT format only if CSV file has header.
We should be able to parse CSV with arbitrary delimiter no matter if header is present or not.

.get();

CSVParser csvParser = csvFormat.parse(bufferedReader);
iterator = csvParser.iterator();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:
Make inner public class CSVHeaderBuilder with method String[] asArray(String headerLine) which can be called:
CSVHeaderBuilder.asArray(String headerLine)

Simplify code now:
declare headerRow=null;
if configuration.headerRow() {
headerRow = CSVHeaderBuilder.asArray(bufferedReader.readLine())
}

CSVFormat csvFormat = CSVFormat.Builder.create()
.setIgnoreEmptyLines(false)
.setDelimiter(delimiter)
.setHeader(usableHeaderRow)
.get();

Look, now you have covered both, if header line is present or not...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always commit TEST source files separately from regular java source files. Test java code relates to runtime java code but those two are different worlds and can live separately.
For us, it is easier to drop commit and merge other commits with test java sources are good (and vice-versa).

BufferedReader bufferedReader, ReadConfiguration configuration) throws IOException {

Iterator<CSVRecord> iterator;
List<String> regexSpecials = Arrays.asList(".", "+", "*", "?", "^", "$", "(", ")", "[", "]", "{", "}", "|",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use different name for this variable. I perfectly understand the goal. I think better name for reserved characters would make code more readable.

@karlocehulic19 karlocehulic19 force-pushed the 3288-csv-file-empty-headers branch from 4d91fba to 51b6e41 Compare November 10, 2025 21:42
@karlocehulic19 karlocehulic19 marked this pull request as ready for review November 10, 2025 21:50
@karlocehulic19
Copy link
Contributor Author

karlocehulic19 commented Nov 11, 2025

@igorbeslic, I believe I fixed all requested changes mentioned in commit b6fb396. I am not sure why CI failed, I did gradle checks, and they passed locally.

I also added another test case for csv file without a header row, but with a special delimiter (This test prevents CsvFormat.DEFAULT implementation to pass), can easily be dropped if it's not needed.

We can sync up if needed.

@karlocehulic19 karlocehulic19 force-pushed the 3288-csv-file-empty-headers branch from 51b6e41 to fa4415a Compare November 12, 2025 17:58
@igorbeslic
Copy link
Collaborator

@karlocehulic19 we merged... there are couple technical details that we have to go through. We disabled test because read logic is not good... This is HINT...
If CSV file has header line, that line should be red only once. In current implementation, readHeaderValues is called every time.
Lets say this is not done yet. We will sync.

@igorbeslic igorbeslic closed this Nov 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CSV Read Action Must Process Files with delimiter other than coma

2 participants