GitHub - boomerdigital/solidus_elastic_product: High performance sync integration for Elastic Search

Solidus Elastic Product

This integration for Elastic Search provides a performance way to index products for Solidus ecommerce stores. To achieve that, products are concurrently serialized & uploaded with background jobs in batches.

The gem is used in production at:

Tee Shirt Palace with Solidus 1.2

Serialization of 500 products takes ~ 20 seconds. Already serialized 200K products can be uploaded to Elastic in ~ 10 mins.

The integration focuses on the backend synchronization of products with Elastic Search, and as such, does not have any frontend views, and does not construct any frontend search queries.

It has a dependency on the official Elasticsearch Model library and exposes its full interface through the Index and State classes.

Background

Existing integrations for indexing data to Elastic Search perform the serialization and upload operations on the fly, which does not allow for any optimizations to be added. Instead, by adding an intermediate storage for the serialized data, and separating the serialization and upload operations, we get to:

pre-load the data for serialization, thereby reducing sql lookups, avoiding N+1 query problem
upload batches of products to Elastic Search, reducing network trips and number of index operations performed by Elastic
inspect the serialized data with ease, as well as display it in an admin user interface
serialize and upload in parallel
do a full indexation of all products within minutes (~ 10 mins per 200K products) - which is useful in two situations:
1. Change of elastic mappings
2. Recover from cluster failure (this eliminates the need to pay for redundant search clusters)
perform inline update of the generated json data if only a single property is changed (such as view_count) avoiding full-reserialization of the product

Installation and quick start

Add solidus_elastic_product to your Gemfile:

gem 'solidus_elastic_product'

Bundle your dependencies and run the installation generator:

bundle
bundle exec rake railties:install:migrations
bundle exec rake db:migrate

Serialize all products for the first time

Solidus::ElasticProduct::Schedule.serialize_all

# monitor the serialized products or just tail the logs
Solidus::ElasticProduct::State.needing_upload.count

# once serialized (or can stop midway if testing), upload them all to elastic
Solidus::ElasticProduct::ReindexJob.perform_now

To connect to Elastic Search

Cleanest is really to place an ELASTICSEARCH_URL environmental variable, for example in .env. No such is necessary for localhost.

Workflow

To work with an intermediate storage of the serialized data, the following workflow has been set up:

A corresponding one-to-one record in a Elastic::Product::State table is created for every product. It is used to store the state of an indexed product and consists of the following fields:

{
                             :id => nil,
                     :product_id => nil,
                           :json => nil,
                       :uploaded => false,
    :locked_for_serialization_at => nil,
           :locked_for_upload_at => nil
}

Fields:

json - string representation of a serialized product;
uploaded - boolean flag to indicate if the product has been synced with Elastic
locked_for_serialization_at - time when a worker has started processing the product for serialization
locked_for_upload_at - time when a worker has started uploading the product for Elastic Search

The two locked columns ensure that concurrent serialization and upload processes do not overlap each other.

To serialize products:

Solidus::ElasticProduct::SerializerJob.perform_now([product_id_1, product_id_2 ..]) - serializes just the product ids specified as arguments in an array.
Solidus::ElasticProduct::Schedule.serialize_all - splits all products in batches of 500, and creates a SerializerJob for each such batch.
run Solidus::ElasticProduct.monitor as a clock process - it will check for products that need to be serialized (the json field in the State table is nil) every minute. If it finds such, it splits them in batches of 500 and creates SerializerJobs.

Usually, an upstart job or so is set-up to auto-start and deamonize this clock process.

To upload products to Elastic:

Solidus::ElasticProduct::ReindexJob.perform_now - will create a new index in elastic search, upload all serialized products to this new index, swap the alias for the environment (development, production) to the new index, and delete any old indices for the same environment.

This one is to be used when starting out, mappings are changed, or just want to start anew with a fresh index.
Solidus::ElasticProduct::UploaderJob.perform_now([product_id_1, product_id_2 ..]) - uploads just the product ids specified as arguments in an array.
run Solidus::ElasticProduct.monitor as a clock process - will check every minute for already serialized product states that need to be uploaded, and will create UploaderJobs to handle the upload

To operate through Elastic Model

Use the Solidus::ElasticProduct::Index class to perform class operations (define index name, do mappings, perform search or manipulate the index)
Use the Solidus::ElasticProduct::State class to perform instance level operations with individual indexed products (update, destroy, etc..)

To configure Elastic Search settings and mappings

All of Elastic Search Model class methods are available through the Index class. So, you can directly customize them from an initializer:

# config/initializers/elastic_product.rb
Solidus::ElasticProduct::Index.index_name
Solidus::ElasticProduct::Index.document_type
Solidus::ElasticProduct::Index.mapping

For example, to change the default Elastic Search mappings, in an initializer (or Index decorator) do:

# config/initializers/elastic_product.rb
options = { ... }

Solidus::ElasticProduct::Index.mappings(options) do
  indexes :name,          type: 'string', analyzer: 'snowball'
  indexes :created_at,    type: 'date'
  indexes :taxons,        type: 'nested' do
    indexes :permaname,   type: 'keyword', index: 'not_analyzed'
    indexes :child do
      indexes :permaname, type: 'keyword', index: 'not_analyzed'
      indexes :child do
        indexes :permaname, type: 'keyword', index: 'not_analyzed'
      end
    end
  end
end

To customize the serialization

Change the default indexed product hash

Just define a as_indexed_hash method in your Spree product_decorator. Your method will then take precedence. Ex:

def as_indexed_hash
  {
    name: name,
    popularity: indexed_popularity,
    view_count: view_count,
    image: display_image.as_indexed_hash
  }
end

Change any other serialized class (Variant, Property, Image) - again, just define as_indexed_hash method on your class, and follow the default logic in the ElasticRepresentation module.
Change the SerializationIterator preloader

You have two options:

a) redefine the full Serializer class by creating and specifying a Serializer class of your own.

```ruby
# config/initializers/spree.rb or so
Solidus::Elastic::Config.serializer_class = MyElasticSerializer
```

Your custom _serializer_class_ must respond to `#generate_json` method and define an ActiveRecord refinement method `#each_for_serialization` to preload associations. See the default [Product::Serializer](https://github.com/boomerdigital/solidus_elastic_search/blob/master/app/models/solidus/elastic/product/serializer.rb) as an example.

b) do the decorator drill, and for example re-define only the each_for_serialization iterator. Ex:

```ruby
# solidus/elastic_product/serializer_decorator.rb
module Solidus::ElasticProduct::Serializer::SerializationIterator
  refine ActiveRecord::Relation do
    def each_for_serialization &blk
      # your code
    end
  end
end
```

To set-up background workers

To perform the serialization, ideally, you'd have multiple single threaded processes as it is a CPU intensive task. A sidekiq example would be:

# config/deploy.rb
set :sidekiq_processes, 3

set :sidekiq_options_per_process, [
  "--config config/sidekiq.yml",
  "--config config/sidekiq-single-concurrency.yml",
  "--config config/sidekiq-single-concurrency.yml"

# config/sidekiq-single-concurrency.yml
:concurrency: 1
:queues:
  - elastic_serializer
  - paperclip

For upload - although you can upload in parallel, it could be advisable to avoid overwhelming the Elastic indexer with concurrent requests, but instead only have a single process single thread upload worker. The upload operation on the worker is not the bottleneck in this case, so there is little to gain in parallelizing that.

To run a sandbox app

cd spec/dummy
bin/rake db:drop
bin/rake db:reset
bin/rake spree_sample:load

Install ElasticSearch

Install Java - sudo apt-get install openjdk-8-jre
Follow elastic guide to install
Install Kibana for a user interface to elastic

Testing

First bundle your dependencies, then run rake. rake will default to building the dummy app if it does not exist, then it will run specs, and Rubocop static code analysis (not yet). The dummy app can be regenerated by using rake test_app.

bundle
bundle exec rake test_app

When testing your applications integration with this extension you may use it's factories. Simply add this require statement to your spec_helper:

require 'solidus_elastic_product/factories'

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
app		app
bin		bin
config		config
db/migrate		db/migrate
lib		lib
spec		spec
.gitignore		.gitignore
.rspec		.rspec
.rubocop.yml		.rubocop.yml
.travis.yml		.travis.yml
Gemfile		Gemfile
LICENSE		LICENSE
README.md		README.md
Rakefile		Rakefile
solidus_elastic_product.gemspec		solidus_elastic_product.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Solidus Elastic Product

Background

Installation and quick start

To connect to Elastic Search

Workflow

To operate through Elastic Model

To configure Elastic Search settings and mappings

To customize the serialization

To set-up background workers

To run a sandbox app

Install ElasticSearch

Testing

About

Releases

Packages

Contributors 2

Languages

License

boomerdigital/solidus_elastic_product

Folders and files

Latest commit

History

Repository files navigation

Solidus Elastic Product

Background

Installation and quick start

To connect to Elastic Search

Workflow

To operate through Elastic Model

To configure Elastic Search settings and mappings

To customize the serialization

To set-up background workers

To run a sandbox app

Install ElasticSearch

Testing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages