Allow for Custom Fitness Field #700

SWeav02 · 2024-10-29T17:16:58Z

The goal of this PR is to add functionality for fitness fields other than minimizing energy.

I've broken down the goal into these steps:

Replace hard-coded areas with fitness_field equivalents
Add fitness function parameter (min, max, or target value)
Update plots to use fitness field rather than energy (temporary until dashboard is created)
Make it easier to create a custom subworkflow model
Add explanation of custom model to tutorial

Added functionality for fitness fields other than minimizing energy. Field can be anything that exists as a result of the subworkflow. Fitness function can be minimum, maximum, or target_value. Additional work is needed to make it easier to use custom fitness fields not already in simmate (i.e. easily make new models)

To make things easier on users, I've added a StagedCalculation model that should be used by any staged workflows. The model inherits the Calculation model and adds columns for subworkflow names and ids. It also has convenience methods to obtain the subworkflow results from their respective tables. In addition to this, I've made it so that the fixed_composition evo search looks to see if the selected workflow is a staged calculation. If it is, it uses the database of the last workflow of the staged calculation when filtering for finished calculations. This way the only things the user needs to do for custom calculations is define subworkflows and make sure the final workflow returns the desired fitness field. For custom fitness fields they will need to make a custom model, where the only requirement is that it inherits the Structure model.

The staged workflow and corresponding model now check if the workflows fail and if they do, record which one. Additionally, necessary arguments for subworkflows can be passes through the run argument as a dictionary "subworkflow_kwargs". So the only requirement now is that a Structure object is passed. Warren lab and badelf apps were updated to use this new staged workflow setup.

Added some additional columns and methods to SteadystateSource table. Still need to add methods that get information such as average field value or average difference from source structure.

Steadystate sources submitted up to an allowed integer. This is problematic as if one source tends to result in structures that are easier to relax than another, it will submit more than its ideal proportion of structures. The submission now looks at all submissions and submits from the source that is most needed to reach the desired proportion. This will help with the next step of dynamically updating sources based on various criteria. There are still some bugs to work on.

The steadystate sources are now automatically adjusted based on average improvement from parent structure. Only transformations are included and there may be a better way to adjust the rates.

Updated chemical system model and workflow to use alternate fitness fields. Not updated to properly print out information for best structures. Added very basic streamlit app.

SWeav02 · 2024-12-09T16:39:20Z

@jacksund Talking with Scott, it sounds like the next steps before adding much else is to get everything to a good state, benchmark, and publish. One of the main things he remembered being an issue is structure generation as we get to larger systems. I'm planning to look into getting the structure generation faster, but I wasn't sure if there was a good system to use as a test case. I tried using a larger Na-Cl system but it didn't seem to have any issue creating a structure quickly. Do you remember running into the slowdown in a particular situation?

Also, do you have any suggestions for other stuff we should work on/refine before publishing? I've mostly worked through the list you suggested in #695. The main outstanding things are the implementation of pgvector and a more involved streamlit dashboard. I figured I'd wait for you to get your pgvector work out, and for streamlit I think it might be useful to create a more involved dashboard for simmate as a whole rather than just for the evo app. That might make more sense for me to work on after/during my time at Corteva since I'll get more experience with it there. In the meantime I added a very basic dashboard to the app.

jacksund · 2024-12-09T20:28:36Z

Talking with Scott, it sounds like the next steps before adding much else is to get everything to a good state, benchmark, and publish.

Nice! We should clearly define an upper-limit for searches in the initial publication, and I'd suggest we target up to 25-atoms and ternary searches -- and anything beyond our decided cutoffs should be discouraged via docs + warnings until some follow up paper or release.

Sounds like you and Scott were imagining a larger cutoff in the initial paper, like 50 atoms...? And maybe quaternary searches? If you want to shoot for those, there will be significantly more work required on optimizing transformations, fingerprinting, etc. And if we do aim this big, we should also try to set goal times -- e.g. is matching USPEX's 50 atom search time good enough to publish, or we need to beat it like we do in <20 atom searches?

Do you remember running into the slowdown in a particular situation?

It was both structure creation AND transformations that struggled at >20 atom counts. I never got to fine-tuning transformations and fingerprinting, which is why I had "benchmarking transformations and adding adaptive steady-states" to that list in #695. I don't have the numbers to back it up, but I would guess the "% of transformations that lead to lower energy structures" is much worse for large systems compared the % in other evo softwares

I figured I'd wait for you to get your pgvector work out, and for streamlit I think it might be useful to create a more involved dashboard for simmate as a whole rather than just for the evo app. That might make more sense for me to work on after/during my time at Corteva since I'll get more experience with it there. In the meantime I added a very basic dashboard to the app.

I'm excited to hear more about what you're imagining for a general dashboard. But yeah, I would wait until I can teach you what I've learned with user interfaces. I have a lot of fun stuff in the works for streamlit and other UI features that aren't yet documented.

For the pgvector, I'll find some time to work on this. Just give me a heads up of your timeline so I can plan for it

Also, do you have any suggestions for other stuff we should work on/refine before publishing?

Let me know once you have an upper limit set with Scott, and then we can prioritize things. Aiming for 25 vs 50 atoms as our cutoff will change which things to clean up / spend the most time on

SWeav02 · 2024-12-12T19:43:11Z

Sounds good! Talking more with Scott, we decided that we can focus just on the improved speed at less than about 25 atoms and ternary materials. We can then try and improve it further for > 25 atoms and quaternary materials for a follow up paper which would also ideally include other improvements such as ML-FFs.

In terms of timeline, I'll probably have a better sense after the holidays. We're still finishing up the BadELF follow-up paper so I haven't started looking into the additional benchmarking we'll likely need for the evo search before publishing it. Once I start working on that a bit more I'll try and give you a more concrete idea of when we're aiming to publish.

Added model to store elf ionic radii calculated during a BadELF calculation. Also added column in badelf model to store ELF values at atom/electride sites

jacksund · 2025-01-21T15:49:58Z

src/simmate/database/base_data_types/staged_calculation.py

+from simmate.database.base_data_types import Calculation, Structure, table_column
+
+
+class StagedCalculation(Structure, Calculation):


I don't think StagedCalculation should be a standalone table, but a mixin.

For total beginners that run a staged workflow, there's a big difference between giving a table that says... "here is where you can go find the results" vs. "here are the final results, and there are some extra columns that say where we got them from".

The staged relaxation and population analysis tables that I currently have in main are more of the latter, while your PR is pushing the former. Your approach is pretty common in other software (bc it saves on disk space), but it's just a pain for new users. So my push back is more of a personal preference + design philosophy thing

So just as a heads up, once you're ready for review+edits, I'll be modifying some of your tables to clean things up

jacksund · 2025-01-29T20:44:48Z

closing this PR as other smaller ones are being added in its place. See #710 and #714 and more to follow

SWeav02 added 9 commits October 28, 2024 17:10

Generalize Staged Workflow

c9358bd

Fix bug if staged workflow results are called after error

0e708fb

Start of Added Analytics

5579c43

Added some additional columns and methods to SteadystateSource table. Still need to add methods that get information such as average field value or average difference from source structure.

Auto adjust steadystate sources

60df37c

The steadystate sources are now automatically adjusted based on average improvement from parent structure. Only transformations are included and there may be a better way to adjust the rates.

Chemical System and Basic Streamlit

9c0b9a8

Updated chemical system model and workflow to use alternate fitness fields. Not updated to properly print out information for best structures. Added very basic streamlit app.

SWeav02 added 2 commits January 16, 2025 13:35

Fix staged workflow file copy

ceb9d82

Add elf radii model

eaba5bb

Added model to store elf ionic radii calculated during a BadELF calculation. Also added column in badelf model to store ELF values at atom/electride sites

SWeav02 mentioned this pull request Jan 17, 2025

merge BadElfAnalysis table into PopulationAnalysis table #559

Open

evosearch benchmark prep

b42a4be

jacksund reviewed Jan 21, 2025

View reviewed changes

SWeav02 mentioned this pull request Jan 22, 2025

prototype staged workflow ♻️ #710

Merged

3 tasks

jacksund closed this Jan 29, 2025

SWeav02 mentioned this pull request Feb 3, 2025

Evo-search Custom Fitness Field #715

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for Custom Fitness Field #700

Allow for Custom Fitness Field #700

SWeav02 commented Oct 29, 2024

SWeav02 commented Dec 9, 2024

jacksund commented Dec 9, 2024

SWeav02 commented Dec 12, 2024

jacksund Jan 21, 2025

jacksund commented Jan 29, 2025

		from simmate.database.base_data_types import Calculation, Structure, table_column


		class StagedCalculation(Structure, Calculation):

Allow for Custom Fitness Field #700

Allow for Custom Fitness Field #700

Conversation

SWeav02 commented Oct 29, 2024

SWeav02 commented Dec 9, 2024

jacksund commented Dec 9, 2024

SWeav02 commented Dec 12, 2024

jacksund Jan 21, 2025

Choose a reason for hiding this comment

jacksund commented Jan 29, 2025