Skip to content

Conversation

@php1ic
Copy link
Owner

@php1ic php1ic commented Oct 1, 2025

Continuing the work done in #1 we now make use of pandas inbuilt read_fwf() to read the data as it is in fixed width format.

Turns out the format is not necessarily consistent in the files so we can't use the widths parameter and have to stick with setting up our own start and end columns, but it does remove the need to use the original method of reading line by line, slicing then converting to the correct data type.

We still need to do some clean up after the initial read, but that's relatively simple so I think it's an improvement.

Data has been added from 1983, 1993 and 1995/1997 (see README updates).

Quality of life addition for the end user. There is already a dictionary
to map the Z value to symbol and the creation of the reverse is fairly
simple so might as well create it.
The format of the file is given in the header of the file. It's wrong,
but while finding that out I found that we can setup various lists and
dictionaries to then use read_fwf and parse the file in one go. We still
need to make a few adjustments once the parse is done, but they are
fairly minor.
I misread the condition so had the default and 2020 cases the wrong way
round.
Now we don't read the file line by line, the tests needed to be updated
to account for this. I also learnt about the pandas.testing module so
started to use that.

There was in a bug in the AME mass parsing. I assumed the atomic mass
always started with the same A as the isotope. This is not true so the
parsing was updated to read the value from the file. The atomic mass
error was also too large but is now scaled appropriately.
The format is different to the other years so needed a new set of START
and END markers as well as a few edge catch alterations to the dataframe
after the initial parse.
We can't parse all of these yet, but adding to the repo ready for when
we can.
Also needed to tweak the column start and end points for a few of the
columns in the mass table. The reaction files appear to be the same as
later years.
Two things in one here, the name of the test function was wrong as we no
longer do a per line read and for maintainability split into per year
functions.
If there is anything different about a years column positions we now
give it it's own case branch so no need for the nesting.
There is a lot going on with the line format and the format within a
column so for the moment, it's quite a rough parse and we drop a few
columns so there is scope for future improvements.
The class is now purely for storage so Parse is no longer a good name as
it is both overly generic and incorrect.
We index and on the year and use it during merging so definitely need it
at this stage.
Not sure how this passed the tests when we were focusing on the parsing
of this file. I must have broken it later on but fix was fairly simple
so I have not investigated further.
This is mainly adding in the new years and splitting the AME and NUBASE
parsing as they no longer have matching years. This also meant we have
to merge in a slightly different way and ensure we don't lose any data
unique to one set.

I have also removed the validation of the year. We currently have full
control over the years passed to the functions so should not get any
errors. The fact that AME and NUBASE now have different years means we
would have also had to add additional functionality so the decision was
made to delete. We can add back in if required.
php1ic added 3 commits October 1, 2025 21:04
Added references to the new years of data that we can now parse and
fleshed out a bit more to demonstrate some basic usage.
ubuntu-latest point to 24.04, will look into if a ppa make 3.14
available, but of the moment, happy to just remove testing against this
version of python.
@php1ic php1ic merged commit f3dea09 into main Oct 1, 2025
10 checks passed
@php1ic php1ic deleted the fwf_read branch October 1, 2025 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants