Code for the paper:
Autoformalizing Euclidean Geometry
International Conference on Machine Learning (ICML), 2024
Logan Murphy*, Kaiyu Yang* (* equal contribution), Jialiang Sun, Zhaoyu Li, Anima Anandkumar, and Xujie Si
@inproceedings{murphy2024leaneuclid,
title={Autoformalizing {Euclidean} Geometry},
author={Murphy, Logan and Yang, Kaiyu and Sun, Jialiang and Li, Zhaoyu and Anandkumar, Anima and Si, Xujie},
booktitle={International Conference on Machine Learning (ICML)},
year={2024}
}
- Requirements
- Building
- The Formal System E
- LeanEuclid
- Evaluating Autoformalized Theorem Statements
- Experiments
- Acknowledgements
- A working setup of Lean 4, including elan and Lean's VSCode extension
- Install the latest version of Z3 and CVC5 and make sure they can be accessed from the command line
- Install Python dependencies:
pip install smt-portfolio openai
- Find out the location of
smt-portfolio
and make sure Lean's VSCode extension can also access it. For example, ifwhich smt-portfolio
outputs/Users/yangky/miniconda3/envs/lean/bin/smt-portfolio
, you should setServer Env Paths
in Lean's VSCode extension as below:
Take the following steps to build the Lean project:
- Run
lake script run check
to check if the requirements are satisfied. - Run
lake exe cache get
to download the mathlib cache - Run
lake build
to compile the formal system E - Open a file for Euclid's Elements in VS Code, e.g., Book/Prop01.lean. You should expect to see:
Optionally, you can:
- Run
lake -R -Kenv=dev build SystemE:docs
to build the documentation - View the auto-generated documentation locally at ./lake/build/doc, e.g., using
python -mhttp.server
If you have problems building the project, Dockerfile and scripts/build.sh may serve as useful references.
E is a formal system introduced by Avigad et al., 2009 for faithfully formalizing the proofs in Euclid’s Elements, including the use of diagrammatic reasoning. It defines a language for expressing basic objects in 2D Euclidean geometry (points, lines, circles, etc.), as well as the form of theorems and proofs. We implement the formal system E in Lean and use SMT solvers to perform diagrammatic reasoning automatically and implicitly. Details of our implementation can be found here and in the auto-generated documentation.
LeanEuclid is a benchmark for testing autoformalization. It consists of 173 Euclidean geometry problems manually formalized into Lean. The theorems and proofs are in the format prescribed by the formal system E. Among the 173 problems, 48 are from Euclid's Elements, and 125 are adapted from the UniGeo dataset (Chen et al., 2022).
We manually formalized the first book of Euclid's Elements, using an open-source LaTex version. The formalized theorems and proofs can be found here. Thanks to E and automatic diagrammatic reasoning, our formalized proofs are "faithful" in the sense that they correspond very closely to Euclid's original proofs. To our knowledge, this is the first time Euclid's Elements has been faithfully formalized in a proof assistant such as Lean, and our formalization has uncovered errors in Euclid's proofs (Appendix B of our paper).
For example, below is our formalization of Euclid's first proposition in Book I, which states you can construct an equilateral triangle given two distinct points:
Our formalized theorem and proof in Book/Prop01.lean:
theorem proposition_1 :
∀ (a b : Point) (AB : Line),
distinctPointsOnLine a b AB →
∃ c : Point, |(c─a)| = |(a─b)| ∧ |(c─b)| = |(a─b)| := by
euclid_intros
euclid_apply circle_from_points a b as BCD
euclid_apply circle_from_points b a as ACE
euclid_apply intersection_circles BCD ACE as c
euclid_apply point_on_circle_onlyif a b c BCD
euclid_apply point_on_circle_onlyif b a c ACE
use c
euclid_finish
Problems from UniGeo are generally easier than those from Elements. Unlike Elements, UniGeo's problem text does not include complete information about the problem, so we manually provide missing diagrammatic details to the text. For example:
Diagram in UniGeo/Congruent/diagrams/1.png:
UniGeo's theorem statement in UniGeo/Congruent/texts/1.txt.
Given T U ≅ R S. R S ∥ T U. Complete the proof that △ R T U ≅ △ T R S.
UniGeo's proof in UniGeo/Congruent/proofs/1.txt:
RS ∥ TU
TU ≅ RS
∠RTU ≅ ∠SRT
RT ≅ RT
△RTU ≅ △TRS
Diagrammatic details we provide:
There is a quadrilateral RSUT with vertices R, S, U, and T. There is a diagonal line segment RT drawn inside the quadrilateral, connecting vertices R and T.
Our formalized theorem and proof in UniGeo/Congruent/Thm01.lean:
theorem theorem_1 : ∀ (R S T U : Point) (RS ST RT TU RU : Line),
formTriangle R S T RS ST RT ∧
formTriangle R T U RT TU RU ∧
S.opposingSides U RT ∧
|(T─U)| = |(R─S)| ∧
¬ RS.intersectsLine TU →
(△ R:T:U).congruent (△ T:R:S) :=
by
euclid_intros
euclid_apply Elements.Book1.proposition_29''' S U R T RS TU RT
euclid_finish
In addition to proof automation, our symbolic reasoning engine can be used to check the equivalence of autoformalized theorem statements against ground-truth theorem statements. We have built a wrapper E3
which performs two forms of equivalence checking:
- Standard equivalence checking, i.e., check if the two theorem statements are logically equivalent or not. It may also be that one is strictly stronger than the other.
- Approximate equivalence checking, where we try to quantify how close the two statements are to one another.
Typically, the latter is only done if the statements are not proven equivalent, but you'd like to see how "close" the prediction was to the ground truth. More details on E3
can be found here.
In our paper, we used LeanEuclid to test state-of-the-art LLMs on theorem statements and proof autoformalization. We provide a Python wrapper which can be used to replicate our experiment procedure. More details and usage examples can be found here.
- We use (our own fork of) lean-smt for running SMT solvers from Lean.
- There are concurrent efforts on formalizing Euclidean geometry in Lean (e.g., EG and Hernandez-Espiet's work). To the best of our knowledge, LeanEuclid is unique in its use of external solvers to handle diagrammatic reasoning, which is typically performed explicitly. However, this comes at the expense of breaking soundness.