feat: fuzzy matching completions #671

jspaezp · 2024-10-23T07:23:56Z

Closes #601

What are the key elements of this solution?

I am implementing fuzzy matching for auto completions. The two main cases that I focused on were typos and partial matches.
- Typos: Since right now a substring prefix is required, any typo prevents suggestions from being displayed ("slect" will not suggest "select"), this PR adds support for that.
- Partial matches: Since right now startswith is used, if I have a column named "team_sales", typing "sales" is not a match, since it does not share a prefix. This adds support for it.

Why did you design your solution this way? Did you assess any alternatives? Are there tradeoffs?

Some discussion here: Add fuzzy-matching for autocomplete #601 (comment)
I also tested various alternatives using a harnessed snippet with the completions for a local database, where I modified the number of possible characters and 2 was a good tradeoff between the worst case scenario performance and real world use.
Time to generate a completion in my laptop+database is still in the microsecond range but it does seem to be 3/4 times slower in the worst case scenarios.

Does this PR require a change to Harlequin's docs?

No.
Yes, and I have opened a PR at tconbeer/harlequin-web.
Yes; I haven't opened a PR, but the gist of the change is: ...

Did you add or update tests for this change?

Yes.
No, I believe tests aren't necessary.
- NOTE: Since the current implementation of the matching is not tested I didn't add it. Let me know if you think its needed and I would be happy to write it.
No, I need help with testing this change.

Please complete the following checklist:

I have added an entry to CHANGELOG.md, under the [Unreleased] section heading. That entry references the issue closed by this PR.
I acknowledge Harlequin's MIT license. I do not own my contribution.

tconbeer · 2024-10-25T15:15:01Z

This looks great - thank you so much!

As you noted, there aren't unit tests for the completers, but there are functional/snapshot tests here:

harlequin/tests/functional_tests/test_query_editor.py

Line 122 in e2a2793

async def test_word_autocomplete(

It would be great to tweak that test to test fuzzy matches, in addition to exact matches; in particular I want to make sure that we don't just append the match to the buffer, but replace the input characters. The snapshot tests can be a little bit of a pain, so let me know if you can't figure it out or don't want to, and I can take it from here. Little more info here: https://harlequin.sh/docs/contributing/index#inspecting-and-updating-snapshot-tests

test_word_autocomplete is failing (only on Ubuntu b/c that test is skipped on other platforms) because the matches have changed. That snapshot makes me realize that in this case, the matches are quite a bit worse (analyse instead of select for an input of se)

I think to address that, fuzzy matches should be ordered better -- right now they are listed alphabetically; would be good to order fuzzy matches by edit distance, or order all matches by:

exact matches first
startswith matches
fuzzy matches last (ordered ideally by edit distance or some heuristic or maybe just alphabetically)

A final optimization related to the ordering -- it might make sense to only compute fuzzy matches if the length of the list of exact and startswith matches is below some threshold. i.e., no reason to fuzzy match on a single character input that already returns dozens of results.

Let me know if you want to contribute the next iteration here. Thanks so much for getting this started!

jspaezp · 2024-10-28T00:08:42Z

I am happy to contribute the new version! (will get it done at some point this week)

jspaezp · 2024-11-02T22:37:15Z

Hummm oddly enough the tests hang forever after merging main. I was able to run them before the update.

jspaezp · 2024-11-02T22:51:22Z

Well in despite the tests being broken for me RN ...

I think to address that, fuzzy matches should be ordered better -- right now they are listed alphabetically; would be good to order fuzzy matches by edit distance, or order all matches by:

exact matches first
startswith matches
fuzzy matches last (ordered ideally by edit distance or some heuristic or maybe just alphabetically)
A final optimization related to the ordering -- it might make sense to only compute fuzzy > matches if the length of the list of exact and startswith matches is below some threshold. i.e., no reason to fuzzy match on a single character input that already returns dozens of results.

I implemented this idea with some minor variations ... I is Exact > starswith > fuzzy and I am sorting the fuzzy matches by length. This results in "more fuzzy" matches generally going last (which I feel is a good approximation). I am also skipping the fuzzy matching if there are already more than 20 direct matches.

Also took the freedom to add an unit test for it. I bundled the completions from a dummy database with just the 'iris' dataset added to it and stored the HarlequinCompletions as json, my only reservation on that is that it is a large-ish file and re-generating it entails some source code hooking/monkeypatching which I have not included in the PR.

LMK what you think and if you have any ideas on how to fix the testing issue.

Full traceback after killing a pytest session python -m pytest --full-trace tests/functional_tests/test_app.py (sorry, my python async is not great so I am having a hard time debugging this one ...)


=================================== test session starts ===================================
platform darwin -- Python 3.12.6, pytest-8.3.2, pluggy-1.5.0
rootdir: /Users/myuser/git/harlequin
configfile: pyproject.toml
plugins: syrupy-4.6.1, asyncio-0.21.2, textual-snapshot-0.4.0
asyncio: mode=Mode.STRICT
collected 49 items                                                                        

tests/functional_tests/test_app.py ^C

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! KeyboardInterrupt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

config = <_pytest.config.Config object at 0x101d86240>
doit = <function _main at 0x101e69620>

    def wrap_session(
        config: Config, doit: Callable[[Config, Session], int | ExitCode | None]
    ) -> int | ExitCode:
        """Skeleton command line program."""
        session = Session.from_config(config)
        session.exitstatus = ExitCode.OK
        initstate = 0
        try:
            try:
                config._do_configure()
                initstate = 1
                config.hook.pytest_sessionstart(session=session)
                initstate = 2
>               session.exitstatus = doit(config, session) or 0

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/main.py:283: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

config = <_pytest.config.Config object at 0x101d86240>
session = <Session  exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=49>

    def _main(config: Config, session: Session) -> int | ExitCode | None:
        """Default command line protocol for initialization, session,
        running tests and reporting."""
        config.hook.pytest_collection(session=session)
>       config.hook.pytest_runtestloop(session=session)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/main.py:337: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <HookCaller 'pytest_runtestloop'>
kwargs = {'session': <Session  exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=49>
}
firstresult = True

    def __call__(self, **kwargs: object) -> Any:
        """Call the hook.
    
        Only accepts keyword arguments, which should match the hook
        specification.
    
        Returns the result(s) of calling all registered plugins, see
        :ref:`calling`.
        """
        assert (
            not self.is_historic()
        ), "Cannot directly call a historic hook - use call_historic instead."
        self._verify_all_args_are_provided(kwargs)
        firstresult = self.spec.opts.get("firstresult", False) if self.spec else False
        # Copy because plugins may register other plugins during iteration (#438).
>       return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/pluggy/_hooks.py:513: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_pytest.config.PytestPluginManager object at 0x100f006e0>
hook_name = 'pytest_runtestloop'
methods = [<HookImpl plugin_name='main', plugin=<module '_pytest.main' from '/Users/sebasti
anpaez/Library/Caches/pypoetry/virtua...t 0x109417470>>, <HookImpl plugin_name='logging-plu
gin', plugin=<_pytest.logging.LoggingPlugin object at 0x109416d80>>]
kwargs = {'session': <Session  exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=49>
}
firstresult = True

    def _hookexec(
        self,
        hook_name: str,
        methods: Sequence[HookImpl],
        kwargs: Mapping[str, object],
        firstresult: bool,
    ) -> object | list[object]:
        # called from all hookcaller instances.
        # enable_tracing will set its own wrapping function at self._inner_hookexec
>       return self._inner_hookexec(hook_name, methods, kwargs, firstresult)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/pluggy/_manager.py:120: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_pytest.logging.LoggingPlugin object at 0x109416d80>
session = <Session  exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=49>

    @hookimpl(wrapper=True)
    def pytest_runtestloop(self, session: Session) -> Generator[None, object, object]:
        if session.config.option.collectonly:
            return (yield)
    
        if self._log_cli_enabled() and self._config.getoption("verbose") < 1:
            # The verbose flag is needed to avoid messy test progress output.
            self._config.option.verbose = 1
    
        with catching_logs(self.log_cli_handler, level=self.log_cli_level):
            with catching_logs(self.log_file_handler, level=self.log_file_level):
>               return (yield)  # Run all the tests.

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/logging.py:805: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_pytest.terminal.TerminalReporter object at 0x109417470>

    @hookimpl(wrapper=True)
    def pytest_runtestloop(self) -> Generator[None, object, object]:
>       result = yield

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/terminal.py:673: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

session = <Session  exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=49>

    def pytest_runtestloop(session: Session) -> bool:
        if session.testsfailed and not session.config.option.continue_on_collection_errors:
            raise session.Interrupted(
                "%d error%s during collection"
                % (session.testsfailed, "s" if session.testsfailed != 1 else "")
            )
    
        if session.config.option.collectonly:
            return True
    
        for i, item in enumerate(session.items):
            nextitem = session.items[i + 1] if i + 1 < len(session.items) else None
>           item.config.hook.pytest_runtest_protocol(item=item, nextitem=nextitem)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/main.py:362: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <HookCaller 'pytest_runtest_protocol'>
kwargs = {'item': <Function test_select_1[duckdb]>, 'nextitem': <Function test_select_1[sql
ite]>}
firstresult = True

    def __call__(self, **kwargs: object) -> Any:
        """Call the hook.
    
        Only accepts keyword arguments, which should match the hook
        specification.
    
        Returns the result(s) of calling all registered plugins, see
        :ref:`calling`.
        """
        assert (
            not self.is_historic()
        ), "Cannot directly call a historic hook - use call_historic instead."
        self._verify_all_args_are_provided(kwargs)
        firstresult = self.spec.opts.get("firstresult", False) if self.spec else False
        # Copy because plugins may register other plugins during iteration (#438).
>       return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/pluggy/_hooks.py:513: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_pytest.config.PytestPluginManager object at 0x100f006e0>
hook_name = 'pytest_runtest_protocol'
methods = [<HookImpl plugin_name='runner', plugin=<module '_pytest.runner' from '/Users/seb
astianpaez/Library/Caches/pypoetry/vi...paez/Library/Caches/pypoetry/virtualenvs/harlequin-
FuiG_xwU-py3.12/lib/python3.12/site-packages/_pytest/warnings.py'>>]
kwargs = {'item': <Function test_select_1[duckdb]>, 'nextitem': <Function test_select_1[sql
ite]>}
firstresult = True

    def _hookexec(
        self,
        hook_name: str,
        methods: Sequence[HookImpl],
        kwargs: Mapping[str, object],
        firstresult: bool,
    ) -> object | list[object]:
        # called from all hookcaller instances.
        # enable_tracing will set its own wrapping function at self._inner_hookexec
>       return self._inner_hookexec(hook_name, methods, kwargs, firstresult)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/pluggy/_manager.py:120: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

item = <Function test_select_1[duckdb]>

    @pytest.hookimpl(wrapper=True, tryfirst=True)
    def pytest_runtest_protocol(item: Item) -> Generator[None, object, object]:
        with catch_warnings_for_item(
            config=item.config, ihook=item.ihook, when="runtest", item=item
        ):
>           return (yield)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/warnings.py:112: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

item = <Function test_select_1[duckdb]>

    @hookimpl(wrapper=True, tryfirst=True)
    def pytest_runtest_protocol(item: Item) -> Generator[None, object, object]:
        """Setup the pytest_assertrepr_compare and pytest_assertion_pass hooks.
    
        The rewrite module will use util._reprcompare if it exists to use custom
        reporting via the pytest_assertrepr_compare hook.  This sets up this custom
        comparison for the test.
        """
        ihook = item.ihook
    
        def callbinrepr(op, left: object, right: object) -> str | None:
            """Call the pytest_assertrepr_compare hook and prepare the result.
    
            This uses the first result from the hook and then ensures the
            following:
            * Overly verbose explanations are truncated unless configured otherwise
              (eg. if running in verbose mode).
            * Embedded newlines are escaped to help util.format_explanation()
              later.
            * If the rewrite mode is used embedded %-characters are replaced
              to protect later % formatting.
    
            The result can be formatted by util.format_explanation() for
            pretty printing.
            """
            hook_result = ihook.pytest_assertrepr_compare(
                config=item.config, op=op, left=left, right=right
            )
            for new_expl in hook_result:
                if new_expl:
                    new_expl = truncate.truncate_if_required(new_expl, item)
                    new_expl = [line.replace("\n", "\\n") for line in new_expl]
                    res = "\n~".join(new_expl)
                    if item.config.getvalue("assertmode") == "rewrite":
                        res = res.replace("%", "%%")
                    return res
            return None
    
        saved_assert_hooks = util._reprcompare, util._assertion_pass
        util._reprcompare = callbinrepr
        util._config = item.config
    
        if ihook.pytest_assertion_pass.get_hookimpls():
    
            def call_assertion_pass_hook(lineno: int, orig: str, expl: str) -> None:
                ihook.pytest_assertion_pass(item=item, lineno=lineno, orig=orig, expl=expl)
    
            util._assertion_pass = call_assertion_pass_hook
    
        try:
>           return (yield)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/assertion/__init__.py:176: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

item = <Function test_select_1[duckdb]>

    @hookimpl(wrapper=True)
    def pytest_runtest_protocol(item: Item) -> Generator[None, object, object]:
        if isinstance(item, TestCaseFunction) and "twisted.trial.unittest" in sys.modules:
            ut: Any = sys.modules["twisted.python.failure"]
            global classImplements_has_run
            Failure__init__ = ut.Failure.__init__
            if not classImplements_has_run:
                from twisted.trial.itrial import IReporter
                from zope.interface import classImplements
    
                classImplements(TestCaseFunction, IReporter)
                classImplements_has_run = True
    
            def excstore(
                self, exc_value=None, exc_type=None, exc_tb=None, captureVars=None
            ):
                if exc_value is None:
                    self._rawexcinfo = sys.exc_info()
                else:
                    if exc_type is None:
                        exc_type = type(exc_value)
                    self._rawexcinfo = (exc_type, exc_value, exc_tb)
                try:
                    Failure__init__(
                        self, exc_value, exc_type, exc_tb, captureVars=captureVars
                    )
                except TypeError:
                    Failure__init__(self, exc_value, exc_type, exc_tb)
    
            ut.Failure.__init__ = excstore
            try:
                res = yield
            finally:
                ut.Failure.__init__ = Failure__init__
        else:
>           res = yield

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/unittest.py:429: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

item = <Function test_select_1[duckdb]>

    @pytest.hookimpl(wrapper=True, trylast=True)
    def pytest_runtest_protocol(item: Item) -> Generator[None, object, object]:
        timeout = get_timeout_config_value(item.config)
        if timeout > 0:
            import faulthandler
    
            stderr = item.config.stash[fault_handler_stderr_fd_key]
            faulthandler.dump_traceback_later(timeout, file=stderr)
            try:
                return (yield)
            finally:
                faulthandler.cancel_dump_traceback_later()
        else:
>           return (yield)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/faulthandler.py:87: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

item = <Function test_select_1[duckdb]>, nextitem = <Function test_select_1[sqlite]>

    def pytest_runtest_protocol(item: Item, nextitem: Item | None) -> bool:
        ihook = item.ihook
        ihook.pytest_runtest_logstart(nodeid=item.nodeid, location=item.location)
>       runtestprotocol(item, nextitem=nextitem)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/runner.py:113: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

item = <Function test_select_1[duckdb]>, log = True
nextitem = <Function test_select_1[sqlite]>

    def runtestprotocol(
        item: Item, log: bool = True, nextitem: Item | None = None
    ) -> list[TestReport]:
        hasrequest = hasattr(item, "_request")
        if hasrequest and not item._request:  # type: ignore[attr-defined]
            # This only happens if the item is re-run, as is done by
            # pytest-rerunfailures.
            item._initrequest()  # type: ignore[attr-defined]
        rep = call_and_report(item, "setup", log)
        reports = [rep]
        if rep.passed:
            if item.config.getoption("setupshow", False):
                show_test_item(item)
            if not item.config.getoption("setuponly", False):
>               reports.append(call_and_report(item, "call", log))

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/runner.py:132: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

item = <Function test_select_1[duckdb]>, when = 'call', log = True, kwds = {}
ihook = <_pytest.config.compat.PathAwareHookProxy object at 0x101f1ee10>
reraise = (<class '_pytest.outcomes.Exit'>, <class 'KeyboardInterrupt'>)

    def call_and_report(
        item: Item, when: Literal["setup", "call", "teardown"], log: bool = True, **kwds
    ) -> TestReport:
        ihook = item.ihook
        if when == "setup":
            runtest_hook: Callable[..., None] = ihook.pytest_runtest_setup
        elif when == "call":
            runtest_hook = ihook.pytest_runtest_call
        elif when == "teardown":
            runtest_hook = ihook.pytest_runtest_teardown
        else:
            assert False, f"Unhandled runtest hook case: {when}"
        reraise: tuple[type[BaseException], ...] = (Exit,)
        if not item.config.getoption("usepdb", False):
            reraise += (KeyboardInterrupt,)
>       call = CallInfo.from_call(
            lambda: runtest_hook(item=item, **kwds), when=when, reraise=reraise
        )

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/runner.py:241: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = <class '_pytest.runner.CallInfo'>
func = <function call_and_report.<locals>.<lambda> at 0x109475800>, when = 'call'
reraise = (<class '_pytest.outcomes.Exit'>, <class 'KeyboardInterrupt'>)

    @classmethod
    def from_call(
        cls,
        func: Callable[[], TResult],
        when: Literal["collect", "setup", "call", "teardown"],
        reraise: type[BaseException] | tuple[type[BaseException], ...] | None = None,
    ) -> CallInfo[TResult]:
        """Call func, wrapping the result in a CallInfo.
    
        :param func:
            The function to call. Called without arguments.
        :type func: Callable[[], _pytest.runner.TResult]
        :param when:
            The phase in which the function is called.
        :param reraise:
            Exception or exceptions that shall propagate if raised by the
            function, instead of being wrapped in the CallInfo.
        """
        excinfo = None
        start = timing.time()
        precise_start = timing.perf_counter()
        try:
>           result: TResult | None = func()

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/runner.py:341: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>       lambda: runtest_hook(item=item, **kwds), when=when, reraise=reraise
    )

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/runner.py:242: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <HookCaller 'pytest_runtest_call'>
kwargs = {'item': <Function test_select_1[duckdb]>}, firstresult = False

    def __call__(self, **kwargs: object) -> Any:
        """Call the hook.
    
        Only accepts keyword arguments, which should match the hook
        specification.
    
        Returns the result(s) of calling all registered plugins, see
        :ref:`calling`.
        """
        assert (
            not self.is_historic()
        ), "Cannot directly call a historic hook - use call_historic instead."
        self._verify_all_args_are_provided(kwargs)
        firstresult = self.spec.opts.get("firstresult", False) if self.spec else False
        # Copy because plugins may register other plugins during iteration (#438).
>       return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/pluggy/_hooks.py:513: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_pytest.config.PytestPluginManager object at 0x100f006e0>
hook_name = 'pytest_runtest_call'
methods = [<HookImpl plugin_name='runner', plugin=<module '_pytest.runner' from '/Users/seb
astianpaez/Library/Caches/pypoetry/vi...brary/Caches/pypoetry/virtualenvs/harlequin-FuiG_xw
U-py3.12/lib/python3.12/site-packages/_pytest/threadexception.py'>>]
kwargs = {'item': <Function test_select_1[duckdb]>}, firstresult = False

    def _hookexec(
        self,
        hook_name: str,
        methods: Sequence[HookImpl],
        kwargs: Mapping[str, object],
        firstresult: bool,
    ) -> object | list[object]:
        # called from all hookcaller instances.
        # enable_tracing will set its own wrapping function at self._inner_hookexec
>       return self._inner_hookexec(hook_name, methods, kwargs, firstresult)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/pluggy/_manager.py:120: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    @pytest.hookimpl(wrapper=True, tryfirst=True)
    def pytest_runtest_call() -> Generator[None, None, None]:
>       yield from thread_exception_runtest_hook()

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/threadexception.py:92: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    def thread_exception_runtest_hook() -> Generator[None, None, None]:
        with catch_threading_exception() as cm:
            try:
>               yield

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/threadexception.py:68: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    @pytest.hookimpl(wrapper=True, tryfirst=True)
    def pytest_runtest_call() -> Generator[None, None, None]:
>       yield from unraisable_exception_runtest_hook()

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/unraisableexception.py:95: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    def unraisable_exception_runtest_hook() -> Generator[None, None, None]:
        with catch_unraisable_exception() as cm:
            try:
>               yield

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/unraisableexception.py:70: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_pytest.logging.LoggingPlugin object at 0x109416d80>
item = <Function test_select_1[duckdb]>

    @hookimpl(wrapper=True)
    def pytest_runtest_call(self, item: nodes.Item) -> Generator[None, None, None]:
        self.log_cli_handler.set_when("call")
    
>       yield from self._runtest_for(item, "call")

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/logging.py:848: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_pytest.logging.LoggingPlugin object at 0x109416d80>
item = <Function test_select_1[duckdb]>, when = 'call'

    def _runtest_for(self, item: nodes.Item, when: str) -> Generator[None, None, None]:
        """Implement the internals of the pytest_runtest_xxx() hooks."""
        with catching_logs(
            self.caplog_handler,
            level=self.log_level,
        ) as caplog_handler, catching_logs(
            self.report_handler,
            level=self.log_level,
        ) as report_handler:
            caplog_handler.reset()
            report_handler.reset()
            item.stash[caplog_records_key][when] = caplog_handler.records
            item.stash[caplog_handler_key] = caplog_handler
    
            try:
>               yield

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/logging.py:831: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <CaptureManager _method='fd' _global_capturing=None _capture_fixture=None>
item = <Function test_select_1[duckdb]>

    @hookimpl(wrapper=True)
    def pytest_runtest_call(self, item: Item) -> Generator[None, None, None]:
        with self.item_capture("call", item):
>           return (yield)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/capture.py:879: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

item = <Function test_select_1[duckdb]>

    @hookimpl(wrapper=True)
    def pytest_runtest_call(item: Item) -> Generator[None, None, None]:
        xfailed = item.stash.get(xfailed_key, None)
        if xfailed is None:
            item.stash[xfailed_key] = xfailed = evaluate_xfail_marks(item)
    
        if xfailed and not item.config.option.runxfail and not xfailed.run:
            xfail("[NOTRUN] " + xfailed.reason)
    
        try:
>           return (yield)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/skipping.py:257: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

item = <Function test_select_1[duckdb]>

    def pytest_runtest_call(item: Item) -> None:
        _update_current_test_var(item, "call")
        try:
            del sys.last_type
            del sys.last_value
            del sys.last_traceback
            if sys.version_info >= (3, 12, 0):
                del sys.last_exc
        except AttributeError:
            pass
        try:
>           item.runtest()

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/runner.py:174: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <Function test_select_1[duckdb]>

    def runtest(self) -> None:
        """Execute the underlying test function."""
>       self.ihook.pytest_pyfunc_call(pyfuncitem=self)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/python.py:1627: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <HookCaller 'pytest_pyfunc_call'>
kwargs = {'pyfuncitem': <Function test_select_1[duckdb]>}, firstresult = True

    def __call__(self, **kwargs: object) -> Any:
        """Call the hook.
    
        Only accepts keyword arguments, which should match the hook
        specification.
    
        Returns the result(s) of calling all registered plugins, see
        :ref:`calling`.
        """
        assert (
            not self.is_historic()
        ), "Cannot directly call a historic hook - use call_historic instead."
        self._verify_all_args_are_provided(kwargs)
        firstresult = self.spec.opts.get("firstresult", False) if self.spec else False
        # Copy because plugins may register other plugins during iteration (#438).
>       return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/pluggy/_hooks.py:513: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_pytest.config.PytestPluginManager object at 0x100f006e0>
hook_name = 'pytest_pyfunc_call'
methods = [<HookImpl plugin_name='python', plugin=<module '_pytest.python' from '/Users/seb
astianpaez/Library/Caches/pypoetry/vi...Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_
xwU-py3.12/lib/python3.12/site-packages/pytest_asyncio/plugin.py'>>]
kwargs = {'pyfuncitem': <Function test_select_1[duckdb]>}, firstresult = True

    def _hookexec(
        self,
        hook_name: str,
        methods: Sequence[HookImpl],
        kwargs: Mapping[str, object],
        firstresult: bool,
    ) -> object | list[object]:
        # called from all hookcaller instances.
        # enable_tracing will set its own wrapping function at self._inner_hookexec
>       return self._inner_hookexec(hook_name, methods, kwargs, firstresult)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/pluggy/_manager.py:120: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

pyfuncitem = <Function test_select_1[duckdb]>

    @hookimpl(trylast=True)
    def pytest_pyfunc_call(pyfuncitem: Function) -> object | None:
        testfunction = pyfuncitem.obj
        if is_async_function(testfunction):
            async_warn_and_skip(pyfuncitem.nodeid)
        funcargs = pyfuncitem.funcargs
        testargs = {arg: funcargs[arg] for arg in pyfuncitem._fixtureinfo.argnames}
>       result = testfunction(**testargs)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/_pytest/python.py:159: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = ()
kwargs = {'app_all_adapters': Harlequin(title='Harlequin', classes={'-dark-mode'}), 'app_sn
apshot': <function app_snapshot.<loc...are at 0x109544e00>, 'wait_for_workers': <function w
ait_for_workers.<locals>.wait_for_filtered_workers at 0x109544d60>}
coro = <coroutine object test_select_1 at 0x10946c7c0>
task = <Task pending name='Task-1' coro=<test_select_1() running at /Users/myuser/gi
t/harlequin/tests/functional_tests/test_app.py:32> wait_for=<Future pending cb=[Task.task_w
akeup()]>>

    @functools.wraps(func)
    def inner(*args, **kwargs):
        coro = func(*args, **kwargs)
        if not inspect.isawaitable(coro):
            pyfuncitem.warn(
                pytest.PytestWarning(
                    f"The test {pyfuncitem} is marked with '@pytest.mark.asyncio' "
                    "but it is not an async function. "
                    "Please remove asyncio marker. "
                    "If the test is not marked explicitly, "
                    "check for global markers applied via 'pytestmark'."
                )
            )
            return
        task = asyncio.ensure_future(coro, loop=_loop)
        try:
>           _loop.run_until_complete(task)

../../Library/Caches/pypoetry/virtualenvs/harlequin-FuiG_xwU-py3.12/lib/python3.12/site-pac
kages/pytest_asyncio/plugin.py:529: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_UnixSelectorEventLoop running=False closed=False debug=False>
future = <Task pending name='Task-1' coro=<test_select_1() running at /Users/myuser/
git/harlequin/tests/functional_tests/test_app.py:32> wait_for=<Future pending cb=[Task.task
_wakeup()]>>

    def run_until_complete(self, future):
        """Run until the Future is done.
    
        If the argument is a coroutine, it is wrapped in a Task.
    
        WARNING: It would be disastrous to call run_until_complete()
        with the same coroutine twice -- it would wrap it in two
        different Tasks and that can't be good.
    
        Return the Future's result, or raise its exception.
        """
        self._check_closed()
        self._check_running()
    
        new_task = not futures.isfuture(future)
        future = tasks.ensure_future(future, loop=self)
        if new_task:
            # An exception is raised if the future didn't complete, so there
            # is no need to log the "destroy pending task" message
            future._log_destroy_pending = False
    
        future.add_done_callback(_run_until_complete_cb)
        try:
>           self.run_forever()

/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/pytho
n3.12/asyncio/base_events.py:674: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_UnixSelectorEventLoop running=False closed=False debug=False>

    def run_forever(self):
        """Run until stop() is called."""
        self._check_closed()
        self._check_running()
        self._set_coroutine_origin_tracking(self._debug)
    
        old_agen_hooks = sys.get_asyncgen_hooks()
        try:
            self._thread_id = threading.get_ident()
            sys.set_asyncgen_hooks(firstiter=self._asyncgen_firstiter_hook,
                                   finalizer=self._asyncgen_finalizer_hook)
    
            events._set_running_loop(self)
            while True:
>               self._run_once()

/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/pytho
n3.12/asyncio/base_events.py:641: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_UnixSelectorEventLoop running=False closed=False debug=False>

    def _run_once(self):
        """Run one full iteration of the event loop.
    
        This calls all currently ready callbacks, polls for I/O,
        schedules the resulting callbacks, and finally schedules
        'call_later' callbacks.
        """
    
        sched_count = len(self._scheduled)
        if (sched_count > _MIN_SCHEDULED_TIMER_HANDLES and
            self._timer_cancelled_count / sched_count >
                _MIN_CANCELLED_TIMER_HANDLES_FRACTION):
            # Remove delayed calls that were cancelled if their number
            # is too high
            new_scheduled = []
            for handle in self._scheduled:
                if handle._cancelled:
                    handle._scheduled = False
                else:
                    new_scheduled.append(handle)
    
            heapq.heapify(new_scheduled)
            self._scheduled = new_scheduled
            self._timer_cancelled_count = 0
        else:
            # Remove delayed calls that were cancelled from head of queue.
            while self._scheduled and self._scheduled[0]._cancelled:
                self._timer_cancelled_count -= 1
                handle = heapq.heappop(self._scheduled)
                handle._scheduled = False
    
        timeout = None
        if self._ready or self._stopping:
            timeout = 0
        elif self._scheduled:
            # Compute the desired timeout.
            when = self._scheduled[0]._when
            timeout = min(max(0, when - self.time()), MAXIMUM_SELECT_TIMEOUT)
    
>       event_list = self._selector.select(timeout)

/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/pytho
n3.12/asyncio/base_events.py:1948: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <selectors.KqueueSelector object at 0x1094ad7f0>, timeout = 0.019949874840676785

    def select(self, timeout=None):
        timeout = None if timeout is None else max(timeout, 0)
        # If max_ev is 0, kqueue will ignore the timeout. For consistent
        # behavior with the other selector classes, we prevent that here
        # (using max). See https://bugs.python.org/issue29255
        max_ev = self._max_events or 1
        ready = []
        try:
>           kev_list = self._selector.control(None, max_ev, timeout)
E           KeyboardInterrupt

/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/pytho
n3.12/selectors.py:566: KeyboardInterrupt
================================= no tests ran in 29.92s ==================================

tconbeer

I think there's a typo in your ordering, otherwise this looks good to me.

re: tests, you probably need to run poetry install --sync, since a lot of dependencies were updated with the latest commits.

I'll take the unit test, although it would be better if the .json was smaller (like a dozen or two values) and therefore easier to maintain and tweak to test the exact behavior you're looking for.

tconbeer · 2024-11-04T20:14:34Z

src/harlequin/autocomplete/completers.py

+        # Sort in ascending length.
+        # I am assuming here that more insertions are less likely to be
+        # the "right" match.
+        matches.sort(key=lambda c: len(c.match_val), reverse=True)


Suggested change

matches.sort(key=lambda c: len(c.match_val), reverse=True)

matches.sort(key=lambda c: len(c.match_val))

From the snapshot tests, I think adding the reverse causes the longest fuzzy matches to be listed first, which is the opposite of what you want.

jspaezp · 2024-11-05T03:18:46Z

After nuking the env it works fine again!

571ad25 Adds the review suggestions and updates the .ambr files.
Trimmed the json ... left ~100 completions and pretty printed it.

Also noticed it now shows a "Tx: Auto" in several of the snapshots. Which I don't believe are related with my PR.

Thanks a lot for the feedback!

tconbeer · 2024-11-05T14:55:56Z

The Tx: Auto is added when using python 3.12+ with SQLite. For developing and testing I use 3.8 (soon 3.9)

jspaezp added 2 commits October 23, 2024 00:31

feat: fuzzy matching

11a010b

chore: updated changelog

43e103c

jspaezp force-pushed the feat/fuzzymatch branch from e5e6fd7 to 43e103c Compare October 23, 2024 07:32

jspaezp marked this pull request as ready for review October 23, 2024 07:33

jspaezp and others added 2 commits November 2, 2024 15:10

feat(completions): conditional fuzzy match

f2a22c6

Merge branch 'main' into feat/fuzzymatch

165314d

tconbeer requested changes Nov 4, 2024

View reviewed changes

(pr) review suggestions

571ad25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: fuzzy matching completions #671

feat: fuzzy matching completions #671

jspaezp commented Oct 23, 2024 •

edited

Loading

tconbeer commented Oct 25, 2024

jspaezp commented Oct 28, 2024

jspaezp commented Nov 2, 2024

jspaezp commented Nov 2, 2024

tconbeer left a comment

tconbeer Nov 4, 2024

jspaezp Nov 5, 2024

jspaezp commented Nov 5, 2024

tconbeer commented Nov 5, 2024

	matches.sort(key=lambda c: len(c.match_val), reverse=True)
	matches.sort(key=lambda c: len(c.match_val))

feat: fuzzy matching completions #671

Are you sure you want to change the base?

feat: fuzzy matching completions #671

Conversation

jspaezp commented Oct 23, 2024 • edited Loading

tconbeer commented Oct 25, 2024

jspaezp commented Oct 28, 2024

jspaezp commented Nov 2, 2024

jspaezp commented Nov 2, 2024

tconbeer left a comment

Choose a reason for hiding this comment

tconbeer Nov 4, 2024

Choose a reason for hiding this comment

jspaezp Nov 5, 2024

Choose a reason for hiding this comment

jspaezp commented Nov 5, 2024

tconbeer commented Nov 5, 2024

jspaezp commented Oct 23, 2024 •

edited

Loading