Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Xls Reader #4118

Merged
merged 14 commits into from
Oct 13, 2024
Merged

Refactor Xls Reader #4118

merged 14 commits into from
Oct 13, 2024

Conversation

oleibman
Copy link
Collaborator

I have been having some time-out problems with php-cs-fixer in my environment (no problems yet in Github). Bringing that up with them, the first thing they suggested was that it might be due to very large modules. The largest module they use in testing is a bit over 1,000 lines, and we have about 16 that exceed that. The biggest of these is Xls Reader; at 7,647 lines, it is more than 2,000 lines longer than its nearest competitor (Calculation), and at least 5 times larger than fixer's max.

It's not clear to me that breaking it up will actually solve my problem. On the other hand, perhaps it is time to do some re-factoring anyhow; as an example, changing the parsing of tfunc/tfuncv values from enormous select statements to indexing constant arrays, possibly external, is something that ought to have happened long ago. This turned out to be easier than I had thought. Breaking it into sub-modules each of which can access each other's protected properties and methods was fairly straightforward. There's a bit of overhead in having to allocate new classes, but Xlsx Reader has been doing that all along (without the protected access part).

I've managed to remove about 2,900 lines from Reader/Xls, scattering those among 2 existing and 6 new source modules. I'm not sure I've chosen the best possible, or most maintainable, approach. I'm not sure that I'm done (there may be opportunities to move the parsing to its own module). But I do want something in place as a contingency. There's no need to rush it into production; I plan to leave this in draft status for a while, at least until after release 3.0.0.

This is:

  • a bugfix
  • a new feature
  • refactoring
  • additional unit tests

Checklist:

  • Changes are covered by unit tests
    • Changes are covered by existing unit tests
    • New unit tests have been added
  • Code style is respected
  • Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
  • Documentation is updated as necessary

Why this change is needed?

Provide an explanation of why this change is needed, with links to any Issues (if appropriate).
If this is a bugfix or a new feature, and there are no existing Issues, then please also create an issue that will make it easier to track progress with this PR.

I have been having some time-out problems with php-cs-fixer in my environment (no problems *yet* in Github). Bringing that up with them, the first thing they suggested was that it might be due to very large modules. The largest module they use in testing is a bit over 1,000 lines, and we have about 16 that exceed that. The biggest of these is Xls Reader; at 7,647 lines, it is more than 2,000 lines longer than its nearest competitor (Calculation), and at least 5 times larger than fixer's max.

It's not clear to me that breaking it up will actually solve my problem. On the other hand, perhaps it is time to do some re-factoring anyhow; as an example, changing the parsing of tfunc/tfuncv values from enormous select statements to indexing constant arrays, possibly external, is something that ought to have happened long ago. This turned out to be easier than I had thought. Breaking it into sub-modules each of which can access each other's protected properties and methods was fairly straightforward. There's a bit of overhead in having to allocate new classes, but Xlsx Reader has been doing that all along (without the protected access part).

I've managed to remove about 2,900 lines from Reader/Xls, scattering those among 2 existing and 6 new source modules. I'm not sure I've chosen the best possible, or most maintainable, approach. I'm not sure that I'm done (there may be opportunities to move the parsing to its own module). But I do want something in place as a contingency. There's no need to rush it into production; I plan to leave this in draft status for a while, at least until after release 3.0.0.
@oleibman oleibman marked this pull request as draft July 29, 2024 02:49
@oleibman
Copy link
Collaborator Author

Decent result with respect to fixer. Recent attempts have taken 1:18 to 1:27 to complete; this one completed in 0:50.

One semi-valid message among a slew of false positives.
@oleibman
Copy link
Collaborator Author

No concerns with Scrutinizer "complexity" warnings.

@oleibman
Copy link
Collaborator Author

Experiment with Calculation resulted in only 1 second of additional savings. Per plan, I will back it out of this ticket.

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this pull request Aug 12, 2024
I am becoming concerned with the increasing run-time of php-cs-fixer, especially since it can time out. Some relief may come from PR PHPOffice#4118, but that won't be merged for some time, if ever. So, bump up the timeout period now. Also replace properties which php-cs-fixer has deprecated with their non-deprecated equivalents. No change to any source code.
@oleibman oleibman marked this pull request as ready for review October 7, 2024 00:13
@oleibman oleibman changed the title WIP Refactor Xls Reader Refactor Xls Reader Oct 7, 2024
@oleibman oleibman added this pull request to the merge queue Oct 13, 2024
Merged via the queue into PHPOffice:master with commit f4919af Oct 13, 2024
12 of 13 checks passed
@oleibman oleibman deleted the brkupxlsr2 branch November 10, 2024 19:27
oleibman added a commit to oleibman/PhpSpreadsheet that referenced this pull request Dec 14, 2024
After breaking up Xls Reader (PR PHPOffice#4118), it is a little easier to identify uncovered code. BIFF8 had no tests involving constant arrays. This PR adds some. Most of the work is in the tests, but some source code is modernized to use things like null coercion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant