Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evo-search Custom Fitness Field #715

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

SWeav02
Copy link
Contributor

@SWeav02 SWeav02 commented Jan 30, 2025

This PR aims to add a custom fitness function to the evolutionary search app. Currently, the app only allows for energy per atom as the target value. The goal is for the user to be able to use any value they would like and the algorithm will try and minimize it, maximize it, or get it close to a target value.

@SWeav02
Copy link
Contributor Author

SWeav02 commented Jan 30, 2025

@jacksund I'm going to try and work through this today and get it back to working condition. There are some adjustments to the evo search models that I made for the custom fitness field.

I also made some changes to the SteadystateSource model to prepare for automatically adjusting them in the future. These may need to be changed/updated when we get around to that part of the project. I'm not very attached to these updates if you'd like me to remove them.

@SWeav02
Copy link
Contributor Author

SWeav02 commented Jan 30, 2025

I think this is more or less in working order, but I still have an issue with the way the transformations are working. Each time we try and create a new structure with a transformation, we run the following while loop:

def apply_from_database_and_selector(
self,
selector,
datatable,
select_kwargs: dict = {},
max_attempts: int = 100,
**kwargs, # for apply_transformation_with_validation
):
logging.info(f"Creating a transformed structure with '{self.name}'")
logging.info(f"Parent(s) will be selected using '{selector.name}'")
# grab parent structures using the selection method
parent_ids, parent_structures = selector.select_from_datatable(
nselect=self.ninput,
datatable=datatable,
**select_kwargs,
)
logging.info(f"Selected parents: {parent_ids}")
# Until we get a new valid structure (or run out of attempts), keep trying
# with our given source. Assume we don't have a valid structure until
# proven otherwise
new_structure = False
attempt = 0
while not new_structure and attempt <= max_attempts:
# add an attempt
attempt += 1
# make a new structure
new_structure = self.apply_transformation_with_validation(
parent_structures,
**kwargs,
)
# see if we got a structure or if we hit the max attempts and there's
# a serious problem!
if not new_structure:
logging.warning(
f"Failed to create a structure after {max_attempts} different"
"parent combinations. Giving up."
)
return False
# Otherwise we were successful
logging.info("Creation Successful.")
return parent_ids, new_structure

where in turn we call this while loop:

def apply_transformation_with_validation(
self,
structures, # either a structure or list of structures. Depends on ninput.
validators=[],
max_attempts=100,
):
# Until we get a new valid structure (or run out of attempts), keep trying
# with our given source. Assume we don't have a valid structure until
# proven otherwise
new_structure = False
attempt = 0
while not new_structure and attempt <= max_attempts:
# add an attempt
attempt += 1
# make a new structure
new_structure = self.apply_transformation(structures)
# check to see if the structure passes all validation checks.
if new_structure:
for validator in validators:
is_valid = validator.check_structure(new_structure)
if not is_valid:
# if it is not unique, we can throw away the structure and
# try the loop again.
logging.debug(
"Generated structure failed validation by "
f"{validator.name}. Trying again."
)
new_structure = None
# one failed validation is enough to stop. There is no
# need to test the other validation methods.
break
# see if we got a structure or if we hit the max attempts and there's
# a serious problem!
if not new_structure:
logging.warning(
"Failed to create a structure from input after "
f"{max_attempts} attempts"
)
return False
# return the structure and its parents
return new_structure

If after this we have failed to generate a new structure, we automatically remove this transformation source. My issue is that this nested while loop results in attempting to create new structures 100,000 times which takes a very long time even for the simplest of structures. In my testing, I've been using the ChemicalSystemSearch on NaCl and during the first search for Na, I always run into this issue for several of the transformations (presumably because several of the transformations are unlikely to succeed on very small structures).

I would push for us to reduce the amount of times that we try and generate structures or maybe add additional logic to prevent transformations that likely won't succeed for small structures similar to this snippet:

if len(composition.elements) == 1 and source_name in [
"from_ase.AtomicPermutation"
]:
logging.warning(
f"{source_name} is not possible with single-element structures."
" This is being removed from your steadystate_sources."
)
continue # skips to next source

@jacksund any thoughts?

@SWeav02
Copy link
Contributor Author

SWeav02 commented Jan 30, 2025

I'm also getting an error when trying to create phase diagrams. I'm not sure if this is because of something on pymatgen's end. If you've seen this before let me know, but otherwise I can try and figure it out tomorrow.

In [3]: StaticEnergy.get_hull_diagram_plot(chemical_system="Na", workflow_name="static-energy.vasp.evo-tutorial")
Traceback (most recent call last):

  Cell In[3], line 1
    StaticEnergy.get_hull_diagram_plot(chemical_system="Na", workflow_name="static-energy.vasp.evo-tutorial")

  File ~/Documents/github/simmate/src/simmate/visualization/plotting/figure.py:28 in get_plot
    plot = cls.get_plot(input_obj, **kwargs)

  File ~/Documents/github/simmate/src/simmate/database/base_data_types/thermodynamics.py:249 in get_plot
    plot = plotter.get_plot(label_unstable=False)

  File ~/anaconda3/envs/simmate_dev/lib/python3.11/site-packages/pymatgen/analysis/phase_diagram.py:2231 in get_plot
    fig = go.Figure(data=data)

  File ~/anaconda3/envs/simmate_dev/lib/python3.11/site-packages/plotly/graph_objs/_figure.py:629 in __init__
    super(Figure, self).__init__(data, layout, frames, skip_invalid, **kwargs)

  File ~/anaconda3/envs/simmate_dev/lib/python3.11/site-packages/plotly/basedatatypes.py:512 in __init__
    data = self._data_validator.validate_coerce(

  File ~/anaconda3/envs/simmate_dev/lib/python3.11/site-packages/_plotly_utils/basevalidators.py:2711 in validate_coerce
    self.raise_invalid_elements(invalid_els)

  File ~/anaconda3/envs/simmate_dev/lib/python3.11/site-packages/_plotly_utils/basevalidators.py:312 in raise_invalid_elements
    raise ValueError(

ValueError: 
    Invalid element(s) received for the 'data' property of 
        Invalid elements include: [None, None]

    The 'data' property is a tuple of trace instances
    that may be specified as:
      - A list or tuple of trace instances
        (e.g. [Scatter(...), Bar(...)])
      - A single trace instance
        (e.g. Scatter(...), Bar(...), etc.)
      - A list or tuple of dicts of string/value properties where:
        - The 'type' property specifies the trace type
            One of: ['bar', 'barpolar', 'box', 'candlestick',
                     'carpet', 'choropleth', 'choroplethmapbox',
                     'cone', 'contour', 'contourcarpet',
                     'densitymapbox', 'funnel', 'funnelarea',
                     'heatmap', 'heatmapgl', 'histogram',
                     'histogram2d', 'histogram2dcontour', 'icicle',
                     'image', 'indicator', 'isosurface', 'mesh3d',
                     'ohlc', 'parcats', 'parcoords', 'pie',
                     'pointcloud', 'sankey', 'scatter',
                     'scatter3d', 'scattercarpet', 'scattergeo',
                     'scattergl', 'scattermapbox', 'scatterpolar',
                     'scatterpolargl', 'scattersmith',
                     'scatterternary', 'splom', 'streamtube',
                     'sunburst', 'surface', 'table', 'treemap',
                     'violin', 'volume', 'waterfall']

        - All remaining properties are passed to the constructor of
          the specified trace type

        (e.g. [{'type': 'scatter', ...}, {'type': 'bar, ...}])

@jacksund
Copy link
Owner

I won't be able to fully dive into you're two comments until next week (or for the later bug until qe is working). But I will say that the first issue of transformations stalling should probably be addressed in a separate PR from this one!

@SWeav02
Copy link
Contributor Author

SWeav02 commented Jan 31, 2025

Ok, no worries! I'll be travelling this afternoon/evening and won't be able to work on stuff until next week anyways. I'm happy to address the stalling transformations in a PR after this one. In terms of timeline, I'd love to get to a point where I can start benchmarking in the next couple of weeks before my third year seminar/prospectus in late Feb. Let me know if you think that makes sense! I can adjust my plan if it seems like it'll take longer.

I'm going to try and resolve the error I'm getting today, and then I think this PR will be ready for review. Other than the issues I mentioned above, the search seems to be working fine (at least with VASP). Hopefully I can resolve the QE issue soon so you can also test things out!

@SWeav02
Copy link
Contributor Author

SWeav02 commented Jan 31, 2025

I'm also getting an error when trying to create phase diagrams

This is an issue with pymatgen and phase diagrams with only one element. It seems to be a bug that was present in the 2023 version of pymatgen that is fixed now. At one point, they called the following method for all situations (including 1 atom) which resulted in a returned None type object and eventual error. There may have been some additional problems.
https://github.com/materialsproject/pymatgen/blob/5b997f78c9ae785e7a18a157c697d996f7dba746/src/pymatgen/analysis/phase_diagram.py#L2694-L2754

Since updating pymatgen would probably cause other issues to spring up that will take a lot of work to fix, I'm just adding a warning to the plotting method in Simmate so that it doesn't halt the summary writeup for now.

@jacksund
Copy link
Owner

jacksund commented Feb 2, 2025

In terms of timeline, I'd love to get to a point where I can start benchmarking in the next couple of weeks

I did list custom fitness fields as an important feature a while back, but in our call about what the first paper will be, it didn't sound like non-energy opts would be in there. So I don't think this PR is an absolute requirement for you to start benchmarking. Also this shouldn't affect benchmark times at all, so you could even merge it while benchmarks are on-going. If you're struggling on time available, I'd deprioritize this PR in favor of debugging qe or whatever else you had in mind before benchmarking.

As a side thought, the SI probably should have one qe vs vasp run. Feels odd doing the whole paper with vasp when we're planning for our default to be qe

@SWeav02
Copy link
Contributor Author

SWeav02 commented Feb 3, 2025

I did list custom fitness fields as an important feature a while back, but in our call about what the first paper will be, it didn't sound like non-energy opts would be in there.

That's a good point, I had it in my mind that this was going to be part of the first paper, but that was mostly because I had already done work on it for the first PR (#700). I think it should be something we mention as a feature in the paper, but it doesn't need to be part of the benchmarks.

I'd deprioritize this PR in favor of debugging qe or whatever else you had in mind before benchmarking.

I think there really isn't much more we need for benchmarking since this was the last thing I was planning to do. The QE bug needs to be figured out before publishing, at least for the VASP/QE comparison you mentioned, but I don't want to use it for the benchmarking since it will likely take longer and need more troubleshooting.

At this point, the only thing I'm really waiting on then is space on warwulf. The benchmarks will take up a little less than half our resources I think, and some of the others are using a lot of space to get calculations done for their own research/theses. I'll talk to Scott about that in our meeting this week. and see if there's any way to have some of the others take a pause for a bit. Not sure what to do when we get to USPEX since I won't be able to reserve time with workers, but we'll cross that bridge when we come to it.

@SWeav02
Copy link
Contributor Author

SWeav02 commented Feb 3, 2025

I'm happy to address the stalling transformations in a PR after this one.

@jacksund if I were to do another PR before benchmarks, I would probably want it to be this one. It won't matter as much in the full benchmarks where there are many workers, but it does result in at least one worker freezing up for several minutes. The transformation is only removed after the long trial cycle, so if there are multiple submissions of the failing transformation, multiple workers will freeze. This might cause slightly poorer performance, especially for the smaller structures where the transformations seem likelier to fail.

I don't have a great sense of how long it would take to fix or if it would be worth it for this first paper. Any thoughts?

@jacksund
Copy link
Owner

jacksund commented Feb 4, 2025

I don't have a great sense of how long it would take to fix or if it would be worth it for this first paper. Any thoughts?

The quick and fallback fix is to put a timer in the transformation function and to check it after each attempt -- exiting when it takes too long. While the code won't be pretty, it's probably important to have this feature exist somewhere. The timer limit should probably be proportional to the complexity/size of the system

The more robust fix is to find the cases where it does get stuck and either catch it or adjust the number of transformation attempts ahead of time. It'd be nice to have a reproducible test case to work with -- you can try adding a bunch of logs to a search where it happens often, and when a structure gets stuck, you go in an pull out those parent structures + the transformation it was using. This is preferred but might take more time to track down.

either way, let's open a separate issue for this!

@SWeav02
Copy link
Contributor Author

SWeav02 commented Feb 4, 2025

The quick and fallback fix is to put a timer in the transformation function and to check it after each attempt -- exiting when it takes too long. While the code won't be pretty, it's probably important to have this feature exist somewhere. The timer limit should probably be proportional to the complexity/size of the system

The more robust fix is to find the cases where it does get stuck and either catch it or adjust the number of transformation attempts ahead of time. It'd be nice to have a reproducible test case to work with -- you can try adding a bunch of logs to a search where it happens often, and when a structure gets stuck, you go in an pull out those parent structures + the transformation it was using. This is preferred but might take more time to track down.

either way, let's open a separate issue for this!

Sounds good! Just opened the issue: #716

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants