Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch filter #991

Closed
dkarlovi opened this issue Mar 15, 2017 · 19 comments
Closed

Elasticsearch filter #991

dkarlovi opened this issue Mar 15, 2017 · 19 comments

Comments

@dkarlovi
Copy link
Contributor

It would maybe make sense to ship with a out-of-the-box advanced full-text search & filter capability, obvious candidate being Elasticsearch with FOSElasticaBundle.

Seeing as the data provider is still ORM, only the filtering is done with Elastic (which gets populated and updated out-of-band), I guess it would just be a custom filter (applying ID-based filter and order), not a full blown data provider? Would this make sense in this context?

@sstok
Copy link

sstok commented Mar 20, 2017

Note I plan to add a Api-Platform bridge for RollerworksSearch very soon, unless someone is able to help :) rollerworks/search#87 (comment)

Processing is actually provided now, but the https://github.com/rollerworks/search/blob/master/src/Exception/InvalidSearchConditionException.php needs to be cast to an Error context for Hydra/Swagger or whatever it's called 😄 that would be first start. And you need something to configure the data provider (of Api Platform).

ElasticSearch support is planned, and next on my list of things to do.

@dunglas
Copy link
Member

dunglas commented Mar 21, 2017

@sstok can't api-platform/docs#186 do the trick for the exception?

@sstok
Copy link

sstok commented Mar 21, 2017

The exception throw by RollerworksSearch input-processor can contain a list of error messages (similar to the Symfony Validation Violations).

https://github.com/api-platform/core/blob/master/src/Hydra/Serializer/ConstraintViolationListNormalizer.php the basis of this should work 😃 I will try to experiment with this.

@sstok
Copy link

sstok commented Mar 24, 2017

It's a start 😄 https://github.com/rollerworks/search-api-platform

@soyuka
Copy link
Member

soyuka commented Mar 24, 2017

@sstok I made an Oracle Text extension that helps creating indexes and build search queries with CONTAINS. Lmk if you'd be interested in such a thing ;).

@sstok
Copy link

sstok commented Mar 24, 2017

@soyuka I haven't used/tested Oracle, but there is a Doctrine DBAL/ORM extension for the search system:

https://github.com/rollerworks/search-doctrine-dbal
https://github.com/rollerworks/search-doctrine-orm

@soyuka
Copy link
Member

soyuka commented Mar 24, 2017

Yeah I saw those, this is why I mentioned oracle text! I think that modern DBMS all have some kind of optimized way of doing text search, sometimes there's no need to use ES on top.

@dkarlovi
Copy link
Contributor Author

@soyuka yes, but those are often very basic, MySQL has basic search too.

ES gives you access to quite advanced features you most often wont get from these built-in implementations.

@soyuka
Copy link
Member

soyuka commented Mar 24, 2017

Hmm, you mean MariaDb though, mysql has been bought by oracle \o/.

Of course that ES gives you more advances features, it has been built for search. Though it's often enough to just use the DMBS:

Both postgres and oracle have nice indexes with features like stop words and fuzzy matches :). Most of the time those are enough IMHO.

@dkarlovi
Copy link
Contributor Author

Nope, I mean MySQL, doesn't matter Oracle bought 'em, "Oracle" in this context means "Oracle DB" which isn't the same thing as Oracle's MySQL offering.

Side note: Oracle actually bought Sun who bought MySQL AB previously, BTW, I still don't call MySQL "Sun". :) None of this is relevant to the issue in hand which is "advanced search features for API Platform".

@sstok
Copy link

sstok commented Apr 5, 2017

Amazing news! API-Platform support for RollerworksSearch is ready (for testing).
Note that ElasticSearch is currently not yet supported, but Doctrine ORM is.

RollerworksSearch 2.0 as a whole is still in alpha phase because I want to be sure that everything is taken care of properly. But most parts are heavily tested.

Installing

Integration for the Symfony FrameworkBundle is completely provided, install the following:

  • rollerworks/search-bundle (>=v2.0.0-ALPHA5)
  • rollerworks/search-doctrine-orm (>=v2.0.0-ALPHA2)
  • rollerworks/search-api-platform (>=0.2.0)

And enable the RollerworksSearchBundle, bundle configuration is fully automatic 👍 (unless you use a custom entity-manager)

Metadata

The integration bridge supports multiple contexts for a Resource (backend/frontend) each with there own configuration, a _defaults context allows to define shared values for all contexts (similar to Symfony DI _defaults logic).

Unless an context is provided at Request#attibutes _api_search_context context _any is used.

To make your Resource searchable you need to set the rollerworks_search.contexts in the Resource Metadata attributes. Unsupported Resources are simple ignored.

/**
 * A book.
 *
 * @see http://schema.org/Book Documentation on Schema.org
 *
 * @ORM\Entity
 * @ApiResource(
 *     iri="http://schema.org/Book",
 *     attributes={
 *         "rollerworks_search"={
 *             "contexts"={
 *                 "_defaults"={
 *                      "fieldset" = "Acme\AppBundle\Search\FieldSet\BookFieldSet"
 *                  },
 *                  "_any"={
 *                      "doctrine_orm"={
 *                          "mappings"={
 *                              "id" = "id",
 *                              "title" = "title"
 *                          }
 *                      }
 *                  }
 *             }
 *         }
 *     }
 * )
 */

A FieldSet holds which fields can be used for searching (independent from the Data provider).
http://rollerworkssearch.readthedocs.io/en/latest/introduction.html#fieldset

The doctrine_orm configuration is to map a search field to a property and set-up relations for child entities. Full example:

"doctrine_orm" = {
                "relations" = {
                    "alias" = { "type" = "(left | right | inner)", "entity" "join", "conditionType" = null, "condition" = null, "indexBy" = null }
                },
                "mappings" = {
                    "mapping-name" = { "property" = "...", "alias" = "...", "db_type" = null }
                },
            },

When the mapping is a single value it's assumed to be a property-name, else it need a property key. The alias defaults to 'o' (root entity). db_type is required when it cannot be auto guessed.

Relations consists of an alias, a join and entity (both are required), type defaults to left.
They are currently only used for searching (not for fetching, I'm not sure if this should be done here or else were).

See https://github.com/rollerworks/search-api-platform/blob/master/Metadata-reference.md for a complete reference. Proper documentation is currently pending.

Searching

To perform a search, supply the search condition as array format in the URL. Eg.
http://127.0.0.1:8000/books.json?search[fields][id][simple-values][0]=1&search[fields][id][simple-values][1]=2

It's best to use the RollerworksSearch core library (ConditonBuilder) as SDK to compose a SearchCondition for processing.

Note: When the Condition contains duplicate/redundant values a redirect is issued with the new condition.

Processed conditions and generated DQL conditions are cached (unless caching is disabled) for future requests.

RFC

If you have any comments or idea's please let me know 👍 ❤️

I'm not sure if the relation configuration is as clean as it can be, and it's possible there can be conflicts with eager loading. So please try as much as you can.

ElasticSearch is something I hope to start working on really soon now, and of course integration will be provided for the API-Platform bridge 😄

@soyuka
Copy link
Member

soyuka commented Apr 5, 2017

Could you explain what does RollerworksSearch do that the embedded filters can't do?

Great work btw, looks nice!

@sstok
Copy link

sstok commented Apr 5, 2017

RollerworksSearch is a complete search solution, it takes care of everything.

Processing the Input (transforming to a Model format), optimizing the condition (supports nested grouping), and for Doctrine ORM you have some special options that allow for complex queries (like searching for a user based on both date and age (in years)). And handling relations without to much complexity, it does require some manual set-up but this allows for greater flexibility and less problems.

But what's most interesting is the build-in caching (for best performance) and error handling (similar to validation Constraints). Plus, you can reuse the existing Search system if you have an API/classic server processing set-up.

And in the future I plan to add support for smarty-query rollerworks/search#23 and a client-side condition builder.

You use this search system when you actually need to search for something rather then limiting something based on a (simple) filter 👍

@soyuka
Copy link
Member

soyuka commented Apr 5, 2017

Seems fair, thanks for the clarification. About the cache, it's a "search metadata cache" right (from your code)?

You said above that it can

conflicts with eager loading

Why is that?


Okay so you just added a PR to have the ability to add a query hint on the QueryBuilder. This is great lol, I missed this feature so much. Might help improve the code here btw (currently QueryResultExtensions are used for this because they let us do getQuery()->setHint()->getResult() :)). ref doctrine/orm#6359

/note to myself: check the doctrine query cache on api-platform.

@sstok
Copy link

sstok commented Apr 6, 2017

About the cache, it's a "search metadata cache" right (from your code)?

No, the caching is about keeping a processed (and optimized) search condition in your cache.

When you supply a condition /books.json?search... the ArrayInput processor transforms this input to a SearchCondition object (with values in the Model format. eg. a DateTime object for a datetime value). This also includes constraint validation (using a custom validator, like the Symfony validator component).

After this the condition is run trough an optimizer to remove duplicate values and merge overlapping ranges, etc.

At the end the condition is exported producing a searchCode (also the cache key), and the processed condition is serialized (using a special serializer. Because the FieldSet is not serializable, instead the set-name is stored and the FieldSet is recreated the next time) and stored in the cache. So for the next request, the system will not have to processed the condition again.

Second to this, the Doctrine ORM DqlCondition generator also catches the generated Dql condition.

Caching of the Metadata is fully done with the API-Platform itself, I did add another MetadataFactory to merge _defaults to each search context (so this doesn't happen at runtime 😄 ).

Caching the processed search condition and DQL condition is most powerful for complex conditions. It would properly be better to configure this per condition, but that's something I'm not can be done cleanly. Like a said it's all this in alpha 😉

@dkarlovi
Copy link
Contributor Author

dkarlovi commented Apr 6, 2017

@sstok first of all, this is an amazing amount of work, thank you!

I've yet to try this (will do over the weekend), but just to ask: with your Elasticsearch integration, do you intend to somehow integrate FOSElasticaBundle?

I ask because of for example EnqueueElasticaBundle (extends FOS's), having these integrated would mean we get a lot of existing functionality for free (such as concurrent index population, for example), but there might be some overlap in the way indices are defined in your bundle and FOS's, you're way better to judge here.

@sstok
Copy link

sstok commented Apr 6, 2017

RollerworksSearch is a library not a bundle 😉 it's more about searching rather then indexing, as other libraries have already solved that properly. My main goal was (and is) providing a powerful abstraction for searching without having to deal with the implementation details of various storage systems, and making usage as simple as possible.

It's the one thing most systems lack or only support for a single solution. Yes, you need to set-up the mapping from a FieldSet to storage query yourself, but my experience was that automating this introduces to many complexity and edge cases.

https://github.com/ruflin/Elastica is definitely something I will support.
In fact, it's properly easier then working with the raw DSL 😄

RollerworksSearch is a complete search solution, it takes care of everything.

Except the storing, but this is because each system is different and ORM already provides a good system for storing (instead RollerworksSearch focuses on a bridge approach). Same goes for ElasticSearch with the Doctrine ORM bridge provided by the FOSElasticaBundle.

In the past there was a Metadata system, but in 2.0 I removed it because it was to limited and complex. For the Api-platform Metadata is the only way to integrate the search system, which is one of the reasons why I provided this integration 😃

"I've yet to try this (will do over the weekend)", great 👍 let me know if anything is unclear or not working. Preferably by opening an issue in the RollerworksSearch issue tracker, thanks for your support.

@dkarlovi
Copy link
Contributor Author

@sstok I'm sorry that my "weekend" took so long. :( I'm currently actively testing your API Platform search extension in my app so surely will have questions and/or comments.

@dkarlovi
Copy link
Contributor Author

We've actually added support for searching & filtering via Elasticsearch. It's still quite rough, but it works.

Would need to better integrate with existing API platform functionality (for example, hook into the documentation generator so we can document the search query properly), but not a bad first step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants