Skip to content

create and initialize structuredf.py#1

Open
tinatn29 wants to merge 9 commits intodiffpy:mainfrom
tinatn29:tn-structuredf
Open

create and initialize structuredf.py#1
tinatn29 wants to merge 9 commits intodiffpy:mainfrom
tinatn29:tn-structuredf

Conversation

@tinatn29
Copy link
Collaborator

create a class StructureDF (inherited from pandas DataFrame) to represent a local structure or cluster of atoms. will later add _from_structure and _from_file methods

@codecov
Copy link

codecov bot commented Jan 23, 2026

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@sbillinge
Copy link
Contributor

@sbillinge

@tinatn29 tinatn29 requested a review from sbillinge January 26, 2026 21:40
@tinatn29
Copy link
Collaborator Author

@sbillinge I've edited the DocString (moved it to under init and each parameter starts with "the")

Copy link
Contributor

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. Please see suggestions

Parameters
Methods
-------
from_structure(structure, site_index=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cleaner API may be to have load_structure() and make it something that simply tests what the input is (pymatgen object or file-path/string) and behaves accordingly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good! Would you add load_structure() as a function in diffpy.clusterrender.io or as a method in this class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and in that case, should we make __init__() take something like structure_input as argument, and we specify in the DocString that structure_input can be a pymatgen Structure, pathlib.Path, or str?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right.

Regarding function or method, I think it is really a parser, so perhaps we should call it parse_data and have it as method in the class. If the only way to do it is on instantiation, then we would presumably make it a private method.

site_index : int, optional
The index of atom in the structure to be treated as the
central atom.
filename : str, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's accept a pathlib.Path or a string. That is our kind of new standard. Also, we only need to have structure entry if we resolve this in the method but allow it to be a file path or a pytmatgen structure object. In general, we may want to pass it a diffpy or objcryst structure object too, later, and this will be more easily extended if we do it this way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok we can have load_structure() take care of that

@tinatn29
Copy link
Collaborator Author

@sbillinge I addressed your comments in this commit (will make a_parse_data method next PR). Is this one ready to merge?

@@ -0,0 +1,23 @@
**Added:**

* create StructureDF class (DataFrame) to store a cluster of atoms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name change

Copy link
Contributor

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Nearly there. After reading what it does, I wonder whether w should call it "ClusterDataFrame" since it denotes center atom and references everything to that? It is like a container for a cluster around a center atom?

But otherwise it looks good

@tinatn29
Copy link
Collaborator Author

@sbillinge I changed the name to ClusterDataFrame and edit the news accordingly.

Copy link
Contributor

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. Are we missing tests?

We just have a constructor but this is still a function that could be tested. Since all it does is create a DataFrame, have it create one and test that this is equal to what you expect.

@tinatn29
Copy link
Collaborator Author

tinatn29 commented Feb 2, 2026

@sbillinge can we add the constructor test later? Currently, instantiating this class requires the method parse_structure that I haven't written yet.

Alternatively, we could change __init__ slightly such that if the input is already a DataFrame it will just create the class without calling parse_structure, then I can add the test?

@sbillinge
Copy link
Contributor

@sbillinge can we add the constructor test later? Currently, instantiating this class requires the method parse_structure that I haven't written yet.

Alternatively, we could change __init__ slightly such that if the input is already a DataFrame it will just create the class without calling parse_structure, then I can add the test?

I think this comment was before we talked, but basically, we want the test to fail until it passes. Failing tests are good.

@tinatn29
Copy link
Collaborator Author

tinatn29 commented Feb 9, 2026

@sbillinge I started setting up the test for the ClusterDataFrame constructor. I've only added a few test cases here (very basic -- input is either a dictionary or a pandas DataFrame), but wanted to make a PR to make sure the test format looks alright before I add more test cases.

Copy link
Contributor

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. Please see inline comments though.

input_structure, site_index, expected_output = input_test_data
cdf = ClusterDataFrame(input_structure, site_index=site_index)

# check if the output matches the expected DataFrame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need comments like this as they are a bit trivial.


from diffpy.clusterrender.clusterdataframe import ClusterDataFrame

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably is working but it is a slightly different structure than our most recent standard. I think that Yuchen is writing good standard tests, and so is Caden. The standards make the tests easier to read for the reviewers because the intent becomes very clear, and the test to test the intent is right below it and so things become easier to review. Could I maybe ask you to check with those guys, or check their code and try and use that style? Thanks so much.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caden sent me an example from cmi. Will try to get to it today!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants