Skip to content

Conversation

@voodoos
Copy link
Collaborator

@voodoos voodoos commented Jan 17, 2025

Initial PR presentation by @Lucccyo:

The current implementation of ocaml-index uses Marshal to store on the disk the data.
Searching for occurrences on massive projects is time-consuming because the search loads all the data structures from the disk to perform the search.

This Pull Request aims to replace Marshal with a granular version to make the ocaml-index more efficient in reading.
It comes with two granular implementations of the data structures set and map, based on the Stdlib implementation.
During a search operation, the program lazily loads only the required part of the ocaml-index.
It works because the heavy nodes of the granular_map and granular_set have link indirections,
introducing serialization boundaries, which allows Marshal to delay the deserialization of their children.

voodoos added a commit to voodoos/merlin that referenced this pull request Jan 21, 2025
voodoos added a commit to voodoos/merlin that referenced this pull request Jan 22, 2025
voodoos added a commit to voodoos/merlin that referenced this pull request Jan 22, 2025
voodoos added a commit to voodoos/merlin that referenced this pull request Jan 22, 2025
voodoos added a commit to voodoos/merlin that referenced this pull request Jan 22, 2025
@voodoos
Copy link
Collaborator Author

voodoos commented Feb 5, 2025

On my own testing the indexing time is a slight (5%) increase of the indexing time and no increase in the file size. Fetches are now close to instantaneous and Merlin's memory usage vastly reduced since the index are not fully loaded into memory anymore.

Indexing one large library in Dune:

Benchmark 1: Main
  Time (mean ± σ):      1.116 s ±  0.011 s    [User: 0.941 s, System: 0.159 s]
  Range (min … max):    1.100 s …  1.132 s    10 runs

Benchmark 1: With the new granular-marshall:
  Time (mean ± σ):      1.177 s ±  0.011 s    [User: 1.004 s, System: 0.158 s]
  Range (min … max):    1.162 s …  1.195 s    10 runs

Impact on the larger Dune parallel build appear to be much lower: 1.6% increase.

@liam923 I am going to merge these improvements that make sense for most users, but we should also evaluate the impact of this PR on "larger" codebases.

@voodoos voodoos merged commit 102eee4 into ocaml:main Feb 5, 2025
5 checks passed
voodoos added a commit to voodoos/merlin-jst that referenced this pull request Apr 9, 2025
voodoos added a commit to voodoos/merlin-jst that referenced this pull request Apr 10, 2025
voodoos added a commit to voodoos/merlin-jst that referenced this pull request Apr 23, 2025
voodoos added a commit to voodoos/opam-repository that referenced this pull request Jun 24, 2025
CHANGES:

Tue Jun 24 16:10:42 CEST 2025

  + merlin library
    - Expose utilities to manipulate typed-holes in `Merlin_analysis.Typed_hole`
      (ocaml/merlin#1888)
    - `locate` can now disambiguate between files with identical names and contents
      (ocaml/merlin#1882)
    - `occurrences` now reports stale files (ocaml/merlin#1885)
    - `inlay-hints` fix inlay hints on function parameters (ocaml/merlin#1923)
    - Fix issues with ident validation and Lid comparison for occurrences (ocaml/merlin#1924)
    - Handle class type in outline (ocaml/merlin#1932)
    - Handle locally defined value in outline (ocaml/merlin#1936)
    - Fix a typer issue triggering assertions in the short-paths graph (ocaml/merlin#1935,
      fixes ocaml/merlin#1913)
    - Downstreamed a typer fix from 5.3.X that would trigger assertions linked
      to scopes bit masks when backtracking the typer cache (ocaml/merlin#1935)
    - Add a new selection field to outline results that contains the location of
      the symbol itself. (ocaml/merlin#1942)
    - Fix destruct hanging when printing patterns with (::). (ocaml/merlin#1944, fixes
      ocaml/ocaml-lsp#1489)
    - Reproduce and fix a handful of jump-to-definition (locate) issues  (ocaml/merlin#1930,
      fixes ocaml/merlin#1580 and ocaml/merlin#1588, workaround for ocaml/merlin#1934)
  + ocaml-index
    - Improve the granularity of index reading by segmenting the marshalization
      of the involved data-structures. (ocaml/merlin#1889)
  + test suite
    - Add a test case illustrating wrong open order proposed in issue ocaml/merlin#1900. (ocaml/merlin#1901)
voodoos added a commit to voodoos/merlin-jst that referenced this pull request Jun 26, 2025
voodoos added a commit to voodoos/merlin-jst that referenced this pull request Jun 27, 2025
voodoos added a commit to voodoos/merlin-jst that referenced this pull request Jun 30, 2025
liam923 pushed a commit to oxcaml/merlin that referenced this pull request Jul 10, 2025
liam923 added a commit to oxcaml/merlin that referenced this pull request Jul 16, 2025
* Downstream: Fix occurrences when the definition's source is hidden (ocaml/merlin#1865)

* Downstream: Use new uid info to fix jumps and provide occurrences in both the interface and the implementation (ocaml/merlin #1857)

* Downstream: perform less merges when indexing (ocaml/merlin#1881)

Immediately grow the final index instead of building and merging.

* Merge project-wide renaming changes from  (ocaml/merlin#1877)

* Merge changes adding granular marshal (ocaml/merlin#1889)

* Disable new upstream test relying on Dune

* Downstream: Fixes for renaming(ocaml/merlin#1924)

* Reformat files

* Undo  ->  change

* Add fold to Granular_set

* Resolve conflicts in occurrences.ml

* Promote good test changes

* Fix json serialization for renaming scope

* Promote more good test changes

* Review changes in locate.ml

* Review 1857

* Handle store_shapes in index.ml

* Review occurrences.ml

* Make functor renaming test run

* Review r-modules-and-types.t

* Resolve cr

* Add missing .merlin file

---------

Co-authored-by: Ulysse Gérard <thevoodoos@gmail.com>
Co-authored-by: Ulysse <5031221+voodoos@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants