Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging RSS code #69

Merged
merged 13 commits into from
Jul 29, 2024
Merged

Merging RSS code #69

merged 13 commits into from
Jul 29, 2024

Conversation

YuanbinLiu
Copy link
Collaborator

@YuanbinLiu YuanbinLiu commented Jun 12, 2024

This PR is mainly for merging RSS code.

Todo list

  • Refactor autoplex/data/common/utils.py into smaller files with informative names (e.g. selection.py)
  • Numpy docstrings for all functions
  • Write tests for RSS generation and selection
  • Full support for multicomponent generation and selection
  • Add buildcell installation similar to github workflow (we need to use the latest buildcell version)

@JaGeo
Copy link
Collaborator

JaGeo commented Jun 12, 2024

Great. We need to find a way to run the tests. I think all action time is used up here as well.

@naik-aakash
Copy link
Collaborator

naik-aakash commented Jun 13, 2024

It seems not the action time, but some refactoring is done, and imports fail, leading to failures in all tests and linting issues. It is probably not yet ready for review and is still a work in progress. I guess

@JaGeo
Copy link
Collaborator

JaGeo commented Jun 13, 2024

Ah, I thought i had used it all for the last pull request.

Before I start with the review all tests need to pass and we again need tests for the new modules.

@@ -783,3 +783,85 @@ def plot_energy_forces(
)
plt.savefig(train_name.replace("train", "energy_forces").replace(".extxyz", ".png"))
plt.show()


class Species:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A part of this should be covered by pymatgen and ase. Why is a new class needed here? @YuanbinLiu

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It provides some new functionalities. For example, return the unique atomic pairs in a dataset according to the specific format.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen these two parts but might be worth checking ase/pymatgen if some of the other functionality is already covered. I am hesistant to reinvent the wheel as it means more work in maintaining and testing code ;).

self._cell_seed(self.buildcell_options, self.tag)
bt_file = '{}.cell'.format(self.tag)

with Pool() as pool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to limit number of cpu used here, provide it as an arg ?

@JaGeo JaGeo changed the title Merging RSS code [WIP] Merging RSS code Jun 13, 2024

tmp_file_name = "tmp." + str(i) + '.' + tag + '.cell'

run("buildcell",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I found so far, this is a separate binary that needs to be compiled before one can use it. We also need this to be installed for tests to run successfully for this maker. We would need to add additional steps in the GitHub CI workflow to get this installed before tests, as we did for Julia.

@YuanbinLiu Is this how you installed it on your system?
https://airss-docs.github.io/getting-started/installation/

I did not find any pip installable package or precompiled binary available.

There is one package named airsspy, but it seems more like a wrapper and does not install buildcell.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check if buildcell is used in pymatgen? I remember something like rhis

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am wrong. ... We might need to compile it during the test setup... but we surely need to check that everything works. If we cannot get it to run here, other people will surely struggle as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildcell is very straightforward to compile, unlike many fortran things, and is central to RSS, so I think we ought to invest the effort in this

Copy link
Collaborator

@JaGeo JaGeo Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MorrowChem Does the conda version work? https://anaconda.org/conda-forge/airss-with-default-names

And, yes, we absolutely need to have this as a part of our test framework. This is clear.

Copy link
Collaborator Author

@YuanbinLiu YuanbinLiu Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the website instruction is what I am using for the installation of buildcell. I have noticed airsspy before. But I find there is still some difference between them. So agree with Joe's opinion here. @naik-aakash

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me (on linux, not mac), so agreed we should go with the conda version

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember where exactly in the tutorial it was but maybe we can incorporate some steps from @MorrowChem 's tutorial here https://github.com/MorrowChem/how-to-validate-potentials to set up buildcell.

@@ -1035,7 +1118,7 @@ def boltzhist_CUR(atoms,
else:
selected_atoms = selected_bolt_ats

ase.io.write('boltzhist_CUR.extxyz', selected_atoms, parallel=False)
ase.io.write('boltzhist_CUR.extxyz', selected_atoms, parallel=False) # TODO: should we be writing random files like this?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this info important. One can always access these files within joblow. If it is very important, we might want to have the info in the overall output

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But as we do the latter aleady... Do we need this for restarts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sometimes use it for the structural analysis (e.g., see if the generated structures make sense). But it indeed does not affect the whole workflow. Will take it out.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be useful to have this for debugging if it is easier like this. We can then leave it.

We also generate phonopy.yaml files as they might be easier to reuse than the database.

@JaGeo
Copy link
Collaborator

JaGeo commented Jun 13, 2024

@MorrowChem can you try installinf buildcell via the workflow to have everything ready for creating automatic tests?

@YuanbinLiu
Copy link
Collaborator Author

It seems not the action time, but some refactoring is done, and imports fail, leading to failures in all tests and linting issues. It is probably not yet ready for review and is still a work in progress. I guess

Ah, I thought i had used it all for the last pull request.

Before I start with the review all tests need to pass and we again need tests for the new modules.

For sure. It is still in progress. I estimate it will take at least one week to pass all tests on our (Joe's and my) side so that we could be able to do the final test on github.

@YuanbinLiu
Copy link
Collaborator Author

How about renaming the job "generate_randomized_structures" to "generate_distorted_structures"? I realized that it could be confused with the random structures generated by buildcell. The "generate_randomized_structures" job doesn't create completely random structures but rather makes some perturbations. Therefore, to differentiate between the two, it might be better to rename it to "generate_distorted_structures". Would this be feasible? @QuantumChemist @naik-aakash

@JaGeo
Copy link
Collaborator

JaGeo commented Jun 18, 2024

How about renaming the job "generate_randomized_structures" to "generate_distorted_structures"? I realized that it could be confused with the random structures generated by buildcell. The "generate_randomized_structures" job doesn't create completely random structures but rather makes some perturbations. Therefore, to differentiate between the two, it might be better to rename it to "generate_distorted_structures". Would this be feasible? @QuantumChemist @naik-aakash

"generate_rattled_structures" maybe? And the other ones, we call random structures? I don't think distorted captures what we are doing. (@YuanbinLiu )

@JaGeo
Copy link
Collaborator

JaGeo commented Jun 18, 2024

Or displaced structures?

@YuanbinLiu
Copy link
Collaborator Author

How about renaming the job "generate_randomized_structures" to "generate_distorted_structures"? I realized that it could be confused with the random structures generated by buildcell. The "generate_randomized_structures" job doesn't create completely random structures but rather makes some perturbations. Therefore, to differentiate between the two, it might be better to rename it to "generate_distorted_structures". Would this be feasible? @QuantumChemist @naik-aakash

"generate_rattled_structures" maybe? And the other ones, we call random structures? I don't think distorted captures what we are doing.

ah, "generate_rattled_structures" sounds great!

@naik-aakash
Copy link
Collaborator

Hi @YuanbinLiu, Yeah, renaming it to "generate_rattled_structures" sounds good to differentiate between the two methods better.

@QuantumChemist
Copy link
Collaborator

Hi @YuanbinLiu, Yeah, renaming it to "generate_rattled_structures" sounds good to differentiate between the two methods better.

I agree.

@QuantumChemist QuantumChemist mentioned this pull request Jun 20, 2024
9 tasks
@JaGeo
Copy link
Collaborator

JaGeo commented Jun 21, 2024

@QuantumChemist maybe we can both check the code for an intermediate review.

The unit tests don't seem to run at the moment. Not sure why. Likely action time.

@QuantumChemist
Copy link
Collaborator

@QuantumChemist maybe we can both check the code for an intermediate review.

The unit tests don't seem to run at the moment. Not sure why. Likely action time.

I could check it locally or make a new branch in Aakash's repo to check the unit tests.

@naik-aakash
Copy link
Collaborator

@QuantumChemist maybe we can both check the code for an intermediate review.

The unit tests don't seem to run at the moment. Not sure why. Likely action time.

Hi @JaGeo , I have disabled it on the repo. As I suspected, there will be multiple commits and action time might get wasted in the process.

@naik-aakash
Copy link
Collaborator

Also, from what I can see, buildcell is not added yet to github ci workflow. So, tests will fail anyway, which depends on that particular program

@naik-aakash
Copy link
Collaborator

@QuantumChemist maybe we can both check the code for an intermediate review.
The unit tests don't seem to run at the moment. Not sure why. Likely action time.

I could check it locally or make a new branch in Aakash's repo to check the unit tests.

Maybe testing locally is good idea.

@JaGeo
Copy link
Collaborator

JaGeo commented Jun 21, 2024

Also, from what I can see, buildcell is not added yet to github ci workflow. So, tests will fail anyway, which depends on that particular program

Ah. Okay. There is an Installation instruction in the new readme file.

@JaGeo
Copy link
Collaborator

JaGeo commented Jul 8, 2024

I will try to get the buildcell installed on the repo so that @YuanbinLiu can continue with this pull request more easily.

@JaGeo JaGeo mentioned this pull request Jul 8, 2024
@JaGeo JaGeo mentioned this pull request Jul 16, 2024
@JaGeo
Copy link
Collaborator

JaGeo commented Jul 16, 2024

Buildcell will now be automatically installed on the repo.

@YuanbinLiu YuanbinLiu merged commit 11a7305 into autoatml:main Jul 29, 2024
@JaGeo
Copy link
Collaborator

JaGeo commented Jul 29, 2024

Can you please revert this, @YuanbinLiu ?

@YuanbinLiu YuanbinLiu changed the title [WIP] Merging RSS code Merging RSS code Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants