Skip to content

Commit

Permalink
feat(layers): Use aggregation to count records
Browse files Browse the repository at this point in the history
This project previously depended on Elasticsearch types in order to
provide a count of the number of records in each layer.

Elasticsearch types will be going away in Elasticsearch 6, and are
already deprecated in ES5.

Instead of relying on types, the count of records per layer is now
created using a [terms aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html)
similar to the source and layers [autodetection
code](pelias/api#1316) recently added to
pelias/api.

This is one of the last few roadblocks for dropping Pelias's use of
types completely and merging PRs like
pelias/schema#293 that will drastically simplify
our schema.

Connects pelias/pelias#461
Connects pelias/pelias#719
  • Loading branch information
orangejulius committed Jul 3, 2019
1 parent 90cbbea commit 9a97b48
Showing 1 changed file with 43 additions and 16 deletions.
59 changes: 43 additions & 16 deletions jobs/counts.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,11 @@
require_relative 'include.rb'

SCHEDULER.every '30s' do
types = %w(

# make a list of common layers
# total is also first to ensure it shows up first
default_layers = %w(
total
address
venue
street
Expand All @@ -18,20 +22,43 @@
region
postalcode
)
types_counts = Hash.new(value: 0)

# get total
total_url = URI.parse "#{@es_endpoint}#{@es_index}/_stats/docs"
total_response = JSON.parse Net::HTTP.get_response(total_url).body
total_count = as_val(total_response['indices'][@es_index]['primaries']['docs']['count'])
types_counts['total'] = { label: 'total', value: total_count }

# get types
types.each do |t|
url = URI.parse "#{@es_endpoint}#{@es_index}/#{t}/_count"
response = JSON.parse Net::HTTP.get_response(url).body
count = as_val(response['count'])
types_counts[t] = { label: t, value: count }
layer_counts = Hash.new(value: 0)

# initialize type counts to 0 for common layers
default_layers.each do |type|
layer_counts[type] = { label: type, value: 0 }
end
send_event('types-counts', items: types_counts.values)

# aggregation query to get all layer counts
query = {
aggs: {
layers: {
terms: {
field: 'layer',
size: 1000
}
}
},
size: 0
}

# get layer counts by aggregation
url = URI.parse "#{@es_endpoint}#{@es_index}/_search"

response = Net::HTTP.post(url, query.to_json)

response_body = JSON.parse response.body

# set total count
layer_counts['total'] = { label: 'total', value: as_val(response_body['hits']['total']) }

layer_count_results = response_body['aggregations']['layers']['buckets']

layer_count_results.each do |result|
layer = result['key']
count = as_val(result['doc_count'])
layer_counts[layer] = { label: layer, value: count }
end

send_event('types-counts', items: layer_counts.values)
end

0 comments on commit 9a97b48

Please sign in to comment.