Utilise read_fwf() from pandas #2

php1ic · 2025-10-01T20:18:06Z

Continuing the work done in #1 we now make use of pandas inbuilt read_fwf() to read the data as it is in fixed width format.

Turns out the format is not necessarily consistent in the files so we can't use the widths parameter and have to stick with setting up our own start and end columns, but it does remove the need to use the original method of reading line by line, slicing then converting to the correct data type.

We still need to do some clean up after the initial read, but that's relatively simple so I think it's an improvement.

Data has been added from 1983, 1993 and 1995/1997 (see README updates).

Quality of life addition for the end user. There is already a dictionary to map the Z value to symbol and the creation of the reverse is fairly simple so might as well create it.

The format of the file is given in the header of the file. It's wrong, but while finding that out I found that we can setup various lists and dictionaries to then use read_fwf and parse the file in one go. We still need to make a few adjustments once the parse is done, but they are fairly minor.

I misread the condition so had the default and 2020 cases the wrong way round.

Now we don't read the file line by line, the tests needed to be updated to account for this. I also learnt about the pandas.testing module so started to use that. There was in a bug in the AME mass parsing. I assumed the atomic mass always started with the same A as the isotope. This is not true so the parsing was updated to read the value from the file. The atomic mass error was also too large but is now scaled appropriately.

The format is different to the other years so needed a new set of START and END markers as well as a few edge catch alterations to the dataframe after the initial parse.

We can't parse all of these yet, but adding to the repo ready for when we can.

Also needed to tweak the column start and end points for a few of the columns in the mass table. The reaction files appear to be the same as later years.

Two things in one here, the name of the test function was wrong as we no longer do a per line read and for maintainability split into per year functions.

If there is anything different about a years column positions we now give it it's own case branch so no need for the nesting.

There is a lot going on with the line format and the format within a column so for the moment, it's quite a rough parse and we drop a few columns so there is scope for future improvements.

The class is now purely for storage so Parse is no longer a good name as it is both overly generic and incorrect.

We index and on the year and use it during merging so definitely need it at this stage.

Not sure how this passed the tests when we were focusing on the parsing of this file. I must have broken it later on but fix was fairly simple so I have not investigated further.

This is mainly adding in the new years and splitting the AME and NUBASE parsing as they no longer have matching years. This also meant we have to merge in a slightly different way and ensure we don't lose any data unique to one set. I have also removed the validation of the year. We currently have full control over the years passed to the functions so should not get any errors. The fact that AME and NUBASE now have different years means we would have also had to add additional functionality so the decision was made to delete. We can add back in if required.

Added references to the new years of data that we can now parse and fleshed out a bit more to demonstrate some basic usage.

ubuntu-latest point to 24.04, will look into if a ppa make 3.14 available, but of the moment, happy to just remove testing against this version of python.

php1ic added 30 commits September 21, 2025 15:24

Create a symbol to Z dictionary

b8f501f

Quality of life addition for the end user. There is already a dictionary to map the Z value to symbol and the creation of the reverse is fairly simple so might as well create it.

Fix bug after moving to a match statement

c940a7d

I misread the condition so had the default and 2020 cases the wrong way round.

Convert AME reaction1 to read_fwf parsing

f85f34b

Remove new line from end of file

6b3af94

Convert AME reaction2 to read_fwf

72c1628

Update AME reaction 1 tests

e42bb4d

Update AME reaction 2 tests

54941dd

Add 1983 AME mass table parsing functionality

be16c1b

The format is different to the other years so needed a new set of START and END markers as well as a few edge catch alterations to the dataframe after the initial parse.

Add 1983 AME reaction 1 file parsing

62172ed

Add 1983 AME reaction 2 file parsing

01e940e

Add the files from 1983, 1993 and 1995

2a4a557

We can't parse all of these yet, but adding to the repo ready for when we can.

Add tests for AME 2016 data

5cecfc6

Add tests for AME 2012 data

0e254ed

Add tests for AME 2003 data

6100cce

Add tests for 1995 AME data

915e4f4

Also needed to tweak the column start and end points for a few of the columns in the mass table. The reaction files appear to be the same as later years.

Add tests for 1993 AME file parsing

6722fc4

Split AME tests by year

150ab2e

Two things in one here, the name of the test function was wrong as we no longer do a per line read and for maintainability split into per year functions.

Remove nested match statement

4447c41

If there is anything different about a years column positions we now give it it's own case branch so no need for the nesting.

Update NUBASE parsing and tests

2b3962c

There is a lot going on with the line format and the format within a column so for the moment, it's quite a rough parse and we drop a few columns so there is scope for future improvements.

Use a better class name after recent updates

6b455d3

The class is now purely for storage so Parse is no longer a good name as it is both overly generic and incorrect.

Rename the file to match the new class name

1171bbe

Update tests to match ElementConverter class

5a1647d

Add the year to the parsed dataframes

0701f5c

We index and on the year and use it during merging so definitely need it at this stage.

Add condition to remove repeated header in 2020 rct2 file

05f3e56

Not sure how this passed the tests when we were focusing on the parsing of this file. I must have broken it later on but fix was fairly simple so I have not investigated further.

Remove debug print message

9c8be5f

Add the table year to the nubase data and update tests

2fbdaa4

Delete old debug comment

3d46c42

php1ic added 3 commits October 1, 2025 21:04

Update README

b62ba3f

Added references to the new years of data that we can now parse and fleshed out a bit more to demonstrate some basic usage.

Update CI to include latest version of python

c343800

Latest ubuntu on github runner doesn't have 3.14

96fc638

ubuntu-latest point to 24.04, will look into if a ppa make 3.14 available, but of the moment, happy to just remove testing against this version of python.

php1ic mentioned this pull request Oct 1, 2025

read_fwf based implementation #1

Closed

php1ic merged commit f3dea09 into main Oct 1, 2025
10 checks passed

php1ic deleted the fwf_read branch October 1, 2025 20:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Utilise read_fwf() from pandas #2

Utilise read_fwf() from pandas #2

Uh oh!

php1ic commented Oct 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Utilise read_fwf() from pandas #2

Utilise read_fwf() from pandas #2

Uh oh!

Conversation

php1ic commented Oct 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants