Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import nouveau #4291

Merged
merged 216 commits into from
Apr 22, 2023
Merged

Import nouveau #4291

merged 216 commits into from
Apr 22, 2023

Conversation

rnewson
Copy link
Member

@rnewson rnewson commented Dec 5, 2022

This PR imports Nouveau, a replacement for the aging dreyfus+clouseau system.

Nouveau is written in pure Java and uses the Dropwizard framework (" a Java framework for developing ops-friendly, high-performance, RESTful web `services") and supports Lucene 9 indexes

dreyfus+clouseau is not touched, nouveau adds new endpoints (_nouveau where _search is for dreyfus).

@@ -0,0 +1,216 @@
<!--
Copyright 2022 Cloudant. All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray copyright notice

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are others, too, just leaving this as a placeholder. We probably also don't need a dedicated LICENSE copy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah oof, yes, I copied from clouseau's pom.xml, my bad

@janl
Copy link
Member

janl commented Dec 6, 2022

@rnewson this is great, thanks a lot:

for sake of easier consumption, it might be useful if you could pull out examples for what index definition and querying looks like in Search2 and Nouveau, so we can compare those easily (both JS and Mango).

I think I will prefer the version field (absence means Search2) variant, but I’m not 100% if this maps cleanly to the same _search /_find URL endpoints.

@rnewson rnewson force-pushed the import-nouveau branch 6 times, most recently from 362abfb to 7b392e4 Compare December 7, 2022 09:35
@big-r81
Copy link
Contributor

big-r81 commented Dec 7, 2022

Are the GH workflows working/necessary, if they are not in the root directory after the merge?

@rnewson
Copy link
Member Author

rnewson commented Dec 7, 2022

@big-r81 they worked in the original repo when they are at the root. good call, though. we either need to adjust them so they work here or remove them. haven't thought much about it.

@rnewson
Copy link
Member Author

rnewson commented Dec 7, 2022

API choices.

Currently you can define a nouveau index almost identically to clouseau, the index() function behaves the same way, you just need to use "nouveau" as a top-level key instead of "indexes";

clouseau;

{
  "indexes": {
    "bar": {
      "default_analyzer": "standard",
      "field_analyzers": {
        "foo": "english"
      },
      "index": "function(doc) { index(\"foo\", \"bar\", {\"store\":true}); index(\"bar\", doc.bar, {\"store\":true,\"facet\":true}); }"
    }
  }
}

nouveau

{
  "nouveau": {
    "bar": {
      "default_analyzer": "standard",
      "field_analyzers": {
        "foo": "english"
      },
      "index": "function(doc) { index(\"foo\", \"bar\", {\"store\":true}); index(\"bar\", doc.bar, {\"store\":true,\"facet\":true}); }"
    }
  }
}

the index() function is enhanced for nouveau (https://github.com/rnewson/nouveau#index-function) giving the user better control over the indexing process but the basic shape for clouseau works the same.

The alternative proposal is to use a version attribute

  "indexes": {
    "bar": {
      "version": 2,
      "default_analyzer": "standard",
      "field_analyzers": {
        "foo": "english"
      },
      "index": "function(doc) { index(\"foo\", \"bar\", {\"store\":true}); index(\"bar\", doc.bar, {\"store\":true,\"facet\":true}); }"
    }
  }
}

This could either be a "version": 2 or perhaps a "engine": "nouveau"

dreyfus and erlang-side nouveau would be adjusted to take the appropriate subset from the "indexes" object if we went this way.

@rnewson rnewson force-pushed the import-nouveau branch 4 times, most recently from 1ce2683 to 03bb9c0 Compare December 7, 2022 22:19
@janl
Copy link
Member

janl commented Dec 8, 2022

@rnewson thanks for the view definition side of things. How would the query URLs look for this? And how do things look like in mango?

@rnewson rnewson force-pushed the import-nouveau branch 5 times, most recently from 5711426 to e075520 Compare December 14, 2022 16:04
@rnewson rnewson force-pushed the import-nouveau branch 2 times, most recently from 3c966e5 to dc5a285 Compare December 21, 2022 19:43
@iilyak
Copy link
Contributor

iilyak commented Dec 22, 2022

JFYI
In case if there is an intend to run two backends at the same time.
Instead of relying on config:get_boolean("nouveau", "enable", false). it should be possible to enable/disable nouveau on per databases basis using feature flags. The feature flags are compiled down into an erlang module and therefore very efficient (if performance is the concern).

At the moment the feature flags are used for partition queries. However they can be used here as well. In order to make them work the subject need to be passed to the check function. Currently supported subjects are defined here. The check for a feature flag look like couch_flags:is_enabled(nouveau, Subject). In order to make it work for #idx{} and #index{} records we would need to extract the database name.

is_nouveau(#index{dbname = Subject}) ->
     couch_flags:is_enabled(nouveau, Subject);
is_nouveau(#idx{dbname = Subject}) ->
     couch_flags:is_enabled(nouveau, Subject).

The configuration for feature flag is done in ini files as follows.

[feature_flags]
; This enables nouveau for databases with given prefix. 
nouveau||my_database* = true

Supporting another type of a subject also supported via EPI.

-module(nouveue_plugin_feature_flags).

-export([
    rules/0,
    subject_key/1
]).

rules() ->
    [].

subject_key(#idx{dbname = Subject}) ->
   Subject;
subject_key(#index{dbname = Subject}) ->
   Subject;    
subject_key(_) ->
    no_decision.
-module(nouveau_epi).

-behaviour(couch_epi_plugin).

....
providers() ->
    [
         {chttpd_handlers, nouveau_httpd_handlers},
         {feature_flags, nouveau_plugin_feature_flags}
    ].

The only two places in the PR which are hard to do are in nouveau_api.erl since in these case we don't have access to database name.

}

private Path indexRootPath(final String name) {
final Path result = rootDir.resolve(name).normalize();

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression

This path depends on a [user-provided value](1). This path depends on a [user-provided value](2). This path depends on a [user-provided value](3). This path depends on a [user-provided value](4). This path depends on a [user-provided value](5). This path depends on a [user-provided value](6).
@rnewson rnewson merged commit a28b75a into main Apr 22, 2023
@rnewson
Copy link
Member Author

rnewson commented Apr 22, 2023

leaving branch alive as I squashed the history when I merged. some of the commits might be useful for understanding nascent project history.

Copy link
Contributor

@nickva nickva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this was merged prematurely?

There were a lot of nice improvements made but the major missing pieces were not addressed as discussed in #4291 (review)

  • Mango integration test coverage
  • Basic doc deletion tests.

It also doesn't seem like the integration tests are actually run in CI:

[2023-04-22T14:10:15.085Z] NouveauTest [test/elixir/test/nouveau_test.exs]
[2023-04-22T14:10:15.085Z] 
[2023-04-22T14:10:15.085Z] PartitionAllDocsTest 
     [test/elixir/test/partition_all_docs_test.exs]
[2023-04-22T14:10:15.085Z]  
   * test partition _all_docs with timeout [L#186]
   ...

The CI nodes now should have java and maven installed, specifically added to help test and maintain nouveau.

Looking over the PR again, another major missing piece I noticed is purge integration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants