Skip to content

Conversation

@linhnam-nguyen
Copy link

Issues addressed by this PR

Closes #206

Intended Functionality

  • Support both .csv and .txt file types.
  • Provide ReadFromCsvFile and WriteToCsvFile methods with configurable options (e.g., delimiter, include headers, encoding).
  • Use a CsvConfig object to control formatting and parsing.
  • Ensure compatibility with existing File_Engine patterns.

Expected Behaviour

  • Users can read a CSV/TXT file into a structured object collection.
  • Users can write collections of objects or strings to CSV/TXT files.

Desired Inputs/Outputs

  • Input: File path, configuration settings (delimiter, header inclusion, etc.).
  • Output: Parsed data structure (e.g., string[,], IEnumerable<IObject>).

CsvConfig (available options)

  • Delimiter: The delimiter used in the CSV file. Defaults to \t (tab). Common alternatives: , or ;.
  • IncludeObjects: If true, objects without a natural string representation are included using ToString() or a placeholder. If false, such objects are skipped. Default: false.
  • PropertyName: If set, the value of this property is serialized for objects. If null, the object type name is shown.
  • IncludeHeader: Whether to include a header row with column names. Default: true.
  • ColumnDataFormats: A list of per-column formatting instructions (StringType?). If null, default formatting is applied based on data type.
  • BooleanAsNumber: If true, booleans are written as 1 (true) or 0 (false). Default: false (booleans are text).
  • DecimalSeparator: The character used as the decimal separator in numeric values. Default: ".".
  • Digit: If set, numerical values are rounded to the given number of decimal places. Default: null (no rounding).
  • DateTimeFormat: Formatting style for dates. Options:
    • ISO8601: 2023-10-05T14:48:00Z
    • US: 10/05/2023
    • EU: 05/10/2023
      Default: EU.

Test files

Test.xlsx

Changelog

Additional comments

Nested enums cannot currently be parsed directly from Excel_UI strings; only numeric values are supported. Additional work in the relevant Repos would be required to enable robust enum (and nested enum) parsing from strings.

Copy link
Member

@pawelbaran pawelbaran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had an initial look at the proposed code and it looks good 👍 Of course need to test and dive into details to approve, but before I do that, wanted to raise a few conceptual questions to agree upon with @adecler / @alelom first:

  • should CSV be supported by Excel or File toolkit? IMO file is better, as CSV is a plain, platform agnostic format
  • shouldn't we split the config to CsvPushConfig and CsvPullConfig? Some properties seem to be irrelevant on pull
  • I would rather avoid enforcing number formatting on Push via CsvConfig.Digit - if one needs to do it, prepare your data accordingly
  • similar with PropertyName and other properties - actually I would make sure it is unified with Excel_Toolkit so that interop with both tabular formats behaves the same

Immediate thoughts @linhnam-nguyen @adecler @alelom?

@linhnam-nguyen
Copy link
Author

linhnam-nguyen commented Sep 22, 2025

@pawelbaran Thanks a lot for taking the time to review the code.

  • I agree that Digit isn’t essential and can be removed from CsvConfig.
  • Regarding PropertyName: my goal is to handle cases where IObject (or other complex objects) are exported to CSV. Without a way to specify which property to serialize, the output falls back to .ToString(), which can feel opaque to users. Exposing PropertyName lets them control the representation explicitly. I’d love your thoughts on the case.
  • For the other configs, these are mainly a framework to simplify data conversion and provide predefined formats across multiple languages. Especially in Excel, as I understand, numerical types are treated as double, while types like integer, decimal, or datetime are essentially just presentation forms of it. To ensure correct conversion of DateTime values to string, users need to predefine the desired output formats. This is not directly related to how the values are displayed on screen.
  • Regarding the subject of splitting CsvConfig or the overall structure of the PR, I’d really value your insights. Personally, it doesn’t feel like a good fit with the current layout, but I don’t yet have a better idea.

@pawelbaran
Copy link
Member

Apologies @linhnam-nguyen for the late response, it took me a bit of time to regroup. But I think I got it! 😉 What I was concerned about was the fact that CSV interop has zero chance to be bidirectional in case of objects with nested structure, i.e. we can push chosen properties to .csv, but will never reconstruct the object back from that table on Pull.

So what I think could work better in this case is supporting pushing or pulling only Table (or potentially TableRow), while all the property mapping would be contained in ToTable and FromTable methods:

public static Table ToTable(this Pipe pipe)
{
    // Convert the pipe to table here
}

public static Pipe FromTable(this Table table)
{
    // Convert the table back to pipe here
}

The above methods would not be a part of the File_Toolkit, but would sit in the project that actually leverages the interop. Naturally, ToTable and FromTable methods could take more inputs, like for example a dictionary of mappings or some sort of a mapping config that would contain the instructions of how to rebuild the objects back from the table.

Would that work for you @linhnam-nguyen, assuming you'd want to use this mechanism in Excel?

When it comes to .txt interop, I would keep it simple: enable writing/reading strings to and from files, if that works for you.

@linhnam-nguyen
Copy link
Author

Hi Pawel,

As you suspected, I experimented with implementing literal string read/write operations to and from a text file. However, there’s a caveat with this approach: since the table contains BHoM objects, Excel_UI recognises them as objects before any post-processing occurs, which means their string representations can’t be directly used for conversion. This, in turn, requires defining a specific output format for the BHoM objects.

To make this work more seamlessly, some adjustments would be needed on the Excel_UI side — mainly to retrieve the object’s ID and type string from Excel’s cache prior to synchronisation. Unfortunately, this behaviour isn’t currently achievable within the scope of this repository.

Regarding the ToTable and FromTable methods you mentioned, I believe these should be implemented directly within the UI layer, depending on how each client internalises the object structure.

What are your thoughts on this approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for CSV/TXT text-base data files

2 participants