Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in processing entries of GGA/GGA+U/R2SCAN scheme #3113

Open
peikai opened this issue Jun 28, 2023 · 12 comments
Open

Bug in processing entries of GGA/GGA+U/R2SCAN scheme #3113

peikai opened this issue Jun 28, 2023 · 12 comments
Labels
analysis Concerning pymatgen.analysis mixing-schemes About mixing energies from different DFT functionals needs repro Bug report that needs a reproduction

Comments

@peikai
Copy link
Contributor

peikai commented Jun 28, 2023

I found that the entry mp-1181334-GGA+U for Cu-Fe-Li-O-Te chemical system can disappear and reappear in the processed entries result occasionally, when the GGA/GGA+U/R2SCAN scheme correction was conducted multiple times. The inconsistency issue happened about every third run in my test -- although the codes below been tested in the latest pymatgen package.

from pymatgen.ext.matproj import MPRester
from pymatgen.entries.mixing_scheme import MaterialsProjectDFTMixingScheme

with MPRester() as mpr:
    entryList = mpr.get_entries_in_chemsys('Cu-Fe-Li-O-Te', additional_criteria={'thermo_types':['GGA_GGA+U', 'R2SCAN']})
    entryList = MaterialsProjectDFTMixingScheme().process_entries(entryList, verbose=False)

    print(len(entryList))
    print('mp-1181334-GGA+U' in [e.entry_id for e in entryList])

There may be either 637 or 638 entries in the processed entryList. The latter case would contain an entry mp-1181334-GGA+U, which actually contains no energy adjustment related to the GGA/GGA+U/R2SCAN scheme, as shown in the exported JSON file below.

mp-1181334-GGA+U ComputedStructureEntry - Fe4 O13      (Fe4O13)
Energy (Uncorrected)     = -92.6898  eV (-5.4523  eV/atom)
Correction               = -11.1170  eV (-0.6539  eV/atom)
Energy (Final)           = -103.8068 eV (-6.1063  eV/atom)
Energy Adjustments:
  MP2020 anion correction (superoxide): -2.0930   eV (-0.1231  eV/atom)
  MP2020 GGA/GGA+U mixing correction (Fe): -9.0240   eV (-0.5308  eV/atom)
Parameters:
  potcar_spec            = [{'titel': 'PAW_PBE Fe_pv 06Sep2000', 'hash': '5963411539032ec3298fa675a32c5e64'}, {'titel': 'PAW_PBE O 08Apr2002', 'hash': '7a25bc5b9a5393f46600a4939d357982'}]
  is_hubbard             = True
  hubbards               = {'Fe': 5.3, 'O': 0.0}
  run_type               = GGA+U
Data:
  oxide_type             = superoxide
  aspherical             = True
  last_updated           = 2021-02-05 09:26:25.316000
  task_id                = mp-1829691
  material_id            = mp-1181334
  oxidation_states       = {}
  run_type               = GGA+U

The potential bug needs to be found, as the reliability of local processing algorithms has become critical for the generation of GGA/GGA+U/R2SCAN phase diagrams1.

Footnotes

  1. https://docs.materialsproject.org/methodology/materials-methodology/thermodynamic-stability/phase-diagrams-pds#gga-gga+u-r2scan

@janosh
Copy link
Member

janosh commented Jun 30, 2023

I can't repro this. Tried about 7 times. I get 638 entries every time.

which actually contains no energy adjustment related to the GGA/GGA+U/R2SCAN scheme

I'm not very familiar with our r2SCAN/GGA mixing scheme but I think what might be happening here is that we place r2SCAN energies onto the GGA(+U) hull? If so, you wouldn't expect an adjustment on a GGA+U entry? Pinging @rkingsbury @munrojm to correct me in case I'm wrong.

@janosh janosh added needs repro Bug report that needs a reproduction analysis Concerning pymatgen.analysis mixing-schemes About mixing energies from different DFT functionals labels Jun 30, 2023
@rkingsbury
Copy link
Contributor

I'm not very familiar with our r2SCAN/GGA mixing scheme but I think what might be happening here is that we place r2SCAN energies onto the GGA(+U) hull? If so, you woudln't expect an adjustment on a GGA+U entry? Pinging @rkingsbury @munrojm to correct me in case I'm wrong here.

That's correct, some GGA/GGA+U entries will pass through the mixing scheme unmodified, if the hull is still being built with GGA/GGA+U entries. That would occur any time there are insufficient R2SCAN entries to build the hull. In such a high dimensional chemical system as this, I'm almost certain the hull would be built in GGA/GGA+U, so getting the entry back unmodified is exactly what should happen. Please refer to the mixing scheme publication for a more thorough explanation.

As for the non-reproducibility / I'm not sure. The mixing scheme does rely on structure matching, so perhaps due to some quirk in your computer system the matching occasionally fails to consider two structures "the same"? That's a stretch but it's the best thought I have.

Also, I note that you're using the old MPRester (pymatgen.ext.matproj). If you want to use r2SCAN data I think you would be better off using the rester from the mp-api package, but @munrojm would know better.

@peikai
Copy link
Contributor Author

peikai commented Jun 30, 2023

Also, I note that you're using the old MPRester (pymatgen.ext.matproj). If you want to use r2SCAN data I think you would be better off using the rester from the mp-api package, but @munrojm would know better.

@rkingsbury Oh, it is a typo here. Actually in my test codes, the MPRester method was imported by from mp_api.client import MPRester.

I tested it on a different computer and network, but the inconsistency issue still happened. I will try to run them with a fresh environment in a container next.

@peikai
Copy link
Contributor Author

peikai commented Jul 3, 2023

The same issue happened, even though I ran the codes in a fresh set-up container of python3.10-alpine. I think this is a rare case (the chemical system and entry) that induces this issue whereas would not happen in other chemical systems, but the potential bug may still result in entry loss or excess, thereby possibly changing the shape of the convex hull.

@janosh
Copy link
Member

janosh commented Jul 3, 2023

Can you pinpoint the bug?

@peikai
Copy link
Contributor Author

peikai commented Jul 5, 2023

@janosh Yes, it is the the structure_matcher.group_structures() methods in mixing_scheme.py 564L, that results in inconsistency when matching the structures of the entry mp-1181411-GGA+U Fe4O13 with the entry mp-1181334-GGA+U Fe4O13. I found that the codes sometimes match the two entries together in one group, whereas sometimes divide them into two individual groups. In the latter case, there would be 638 processed entries, otherwise, there would be 637 entries.

@rkingsbury
Copy link
Contributor

@janosh Yes, it is the the structure_matcher.group_structures() methods in mixing_scheme.py 564L, that results in inconsistency when matching the structures of the entry mp-1181411-GGA+U Fe4O13 with the entry mp-1181334-GGA+U Fe4O13. I found that the codes sometimes match the two entries together in one group, whereas sometimes divide them into two individual groups. In the latter case, there would be 638 processed entries, otherwise, there would be 637 entries.

@peikai it sounds to me like you are running into an edge case related to the numerical tolerances of StructureMatcher, where tiny numerical noise is causing those structures to sometimes match and sometimes not. I suggest 2 things:

  1. Can you reproduce the problem outside of the mixing scheme, by just running StructureMatcher on your list of input structures?
  2. If yes, then try adjusting the tolerance kwargs of StructureMatcher. If you slightly change the ltol, stol, or angle_tol, you can hopefully get reproducible results.

@peikai
Copy link
Contributor Author

peikai commented Jan 21, 2025

@rkingsbury, I tried with suggestion 1, the results are consistent that two entries are always distinguished to different groups.

I can still reproduce the aforementioned issue with the up-to-date pymatgen and online database.

@rkingsbury
Copy link
Contributor

OK, so if StructureMatcher does not match the structures, then the mixing scheme will treat them as distinct materials. Can you try experimenting with the tolerances and other kwargs of StructureMatcher to see if you can get them to match more reliably?

If you can, those settings can be passed into the mixing scheme.

@peikai
Copy link
Contributor Author

peikai commented Jan 22, 2025

As a supplement, I tried to find a comparation from pre-computed phase diagram on this chemical system. The pre-computed phase diagram contains the entry mp-1181334-GGA+U. However, the entryList getting from the MaterialsProjectDFTMixingScheme().process_entries(), contain it in mostly cases whereas sometimes don't. I tend to believe it should be included into the processed entries, since the StructureMatcher() did not match them into the same group. In a high-throughput context, I think the adjustment of default parameters might create further consistency on other chemical systems. Moreover, it has not been clear that why the result from process_entries() is inconsistent with those of StructureMatcher().

Although, this is an issue that did not bother so much, I committed it in case there are potential bugs or possible improvements. I once attempted to migrate to using pre-computing phase diagrams to keep consistency, but I found that for certain chemical systems, pre-computing phase diagrams are not always available (missing) for entire thermo types. As a result, it is necessary to rely on the local processing methods.

@rkingsbury
Copy link
Contributor

I believe this is fixed in #4255

@peikai
Copy link
Contributor Author

peikai commented Jan 28, 2025

I believe this is fixed in #4255

It fixed a new issue that r2SCAN entries are not included for the mixing correction.

The problem still exists that the entry mp-1181334-GGA+U can still appear or disappear in processed entries.

I record the output messages accordingly.

I also noticed there is an unusual message in above two outputs:

spglib: ssm_get_exact_positions failed.
spglib: get_bravais_exact_positions_and_lattice failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Concerning pymatgen.analysis mixing-schemes About mixing energies from different DFT functionals needs repro Bug report that needs a reproduction
Projects
None yet
Development

No branches or pull requests

3 participants