-
-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UTF-8 / multi-byte charset support to ADIF parser #830
Add UTF-8 / multi-byte charset support to ADIF parser #830
Conversation
Just tested ADIF import and export. It worked without any trouble. |
@mikaelnousiainen thanks for this, plan to merge tomorrow, suffering from poor health least week, just want to confirm everything is ok my end 👍🏻 |
@magicbug Thanks. The only thing I'm worried about is that the speed of imports might be slower now. Importing an ADIF with a bit over 1000 QSOs logged in WSJT-X took a significant time, maybe around 10-15 seconds on an otherwise idle, single-core cloud instance. Some of the parsing could be optimized in the future. Anyway, hope you'll recover soon! |
@mikaelnousiainen yeah, I did notice the slowdown here as well. Recently I made some modifications to the parser. On my local machine, it made a huge difference. It dropped the time from many hours to down to around 30 seconds on 26k QSOs. With this PR, it takes around 1 minute. I guess further optimizations are possible, but I guess Peter will decide how he proceeds. |
@AndreasK79 Would you like share those optimizations here? |
@mikaelnousiainen I have. They were merged into the parser a few weeks back. See here: #604 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not requesting you add the code but before merge I'd like to do the following to track speed changes in the adif code
- In development mode run the Codeigniter benchmarking tool to calculate processing time for a while which is then logged to file using Codeigniters logging features.
This is purely to make sure there's not a huge jump, it might also be useful to have a set few ADIF files to check against.
Sorry for the delays been quite unwell the last couple of months
I did a benchmark here by enabling $this->output->enable_profiler(TRUE); in the controller. 501 qsos: --without 7777 qsos: --without: There are other things that can be done to speedup things, as many calls are made to the database just for one QSO. |
Merged |
This PR demonstrates the changes required to add UTF-8 / multi-byte charset support to the ADIF parser. The ADIF text data can never be accessed using direct index references (e.g.
$data[$index]
) and all of these need to be replaced withmb_substr
. Additionally, other functions handling strings, likestrlen
,strlower
, etc, have been replaced with their multi-byte counterparts.The code here is certainly not very clean and could be improved further, but it works at least on my test ADIF logs with lots of non-ASCII chars and long comments.