Skip to content
This repository has been archived by the owner on Feb 5, 2023. It is now read-only.

result from using search method on model is different from that when i GET url exposed by elastic search #92

Closed
chbro opened this issue Jan 3, 2018 · 11 comments

Comments

@chbro
Copy link

chbro commented Jan 3, 2018

Hi,
when i use dd(App\Posts::search('场景1')->get()), the result is

Collection {#265 ▼
  #items: []
}

while what i get from http://localhost:9200/my_index/posts/_search?q=content:场景1 is

{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "hits": {
        "total": 5,
        "max_score": 2.9235544,
        "hits": [

wonder why ik doesn't work.
myconfig/scout.php is the same as that on readme.

@kronthto
Copy link
Contributor

kronthto commented Jan 3, 2018

It builds a different query, using a Request Body instead of URL-params.

I think what's fired to Elastic is:

GET http://localhost:9200/my_index/posts/_search
{"query":{"bool":{"must":[{"query_string":{"query":"*\u573a\u666f1*"}}]}}}

@chbro
Copy link
Author

chbro commented Jan 3, 2018

@kronthto i agree.
anyway, search method cannot make out here in my code.

now i use

shell_exec('curl http://localhost:9200/my_index/posts/_search?q=content:'.request('q'))

as a replacement.

@kronthto
Copy link
Contributor

kronthto commented Jan 3, 2018

If you need to do it that way you could at least use:

file_get_contents('http://localhost:9200/my_index/posts/_search?q=content:'.request('q'))

which is probably faster and more secure than shell_exec, also you don't rely on curl being available on the CLI.

Using Scout would directly map the results to Model-entities, so it would be nice to solve your initial problem. You could try modifying what is sent to ES using the callback-function parameter of the Builder (see #56 / laravel/scout#111).

@chbro
Copy link
Author

chbro commented Jan 4, 2018

thx a lot.
i rewrote with Builder, it works perfectly now.

@chbro
Copy link
Author

chbro commented Jan 4, 2018

so far i have made it to split chinese characters. but a new problem arises:

i have no idea how to strip html in my content,

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-htmlstrip-charfilter.html
this document just gives a example to demonstrate es can strip html, then how can i use it in my search?

copy from stackoverflow:

I have a document with property that contains html tags. I want to remove html before indexing.  
I found this htmlstrip-charfilter but I can't find example in using this. 
I'm new to elastic search and analyzer concept.  Thanks

@kronthto
Copy link
Contributor

kronthto commented Jan 4, 2018

I've never actually done that, but I think you need to define the HTMLStrip-filter as a normalizer type to your index and then add this normalizer to the field using the PUT-mapping API.

It could be something like (not tested):

PUT my_index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "my_html_char_filter": {
          "type": "html_strip"
        }
      },
      "normalizer": {
        "my_html_normalizer": {
          "type": "custom",
          "char_filter": ["my_html_char_filter"]
        }
      }
    }
  },
  "mappings": {
    "posts": {
      "properties": {
        "content": {
          "type": "text",
          "normalizer": "my_html_normalizer"
        }
      }
    }
  }
}

(inspired by https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-normalizers.html)

@chbro
Copy link
Author

chbro commented Jan 5, 2018

PUT config to my_index and POST url like my_index/_analyze with param { analyzer: 'my_analyzer', text: '<p>hello</p>' }, then i can get the stripped content.

btw,

it seems using Builder cannot highlight.
my code is:

        return App\Docs::search($req->search, function($engine, $query) {

                $query['body']  = [
                    'query' => [
                        'multi_match' => [
                            'query' => request('q'),
                            'fields' => ['name', 'rich_text'] 
                        ]
                    ],
                    'highlight' => [
                        'fields' => [
                            'name' => [
                                'force_source' => true
                            ],
                            'rich_text' => [
                                'force_source' => true
                            ]
                        ]
                    ]
                ];

                return $engine->search($query);

            })->paginate();

but value returned doesn't contain highlight field.
strange when i dd($engine->search($query)) , highlight is there in hits.hits :

  "took" => 48
  "timed_out" => false
  "_shards" => array:3 [▼
    "total" => 5
    "successful" => 5
    "failed" => 0
  ]
  "hits" => array:3 [▼
    "total" => 25
    "max_score" => 2.3903573
    "hits" => array:10 [▼
      0 => array:6 [▼
        "_index" => "laravel54"
        "_type" => "docs"
        "_id" => "88"
        "_score" => 2.3903573
        "_source" => array:2 [▶]
        "highlight" => array:1 [▼
          "name" => array:1 [▶]
        ]
      ]
      1 => array:6 [▶]
      2 => array:6 [▶]
      3 => array:6 [▶]
      4 => array:6 [▶]
      5 => array:6 [▶]
      6 => array:6 [▶]
      7 => array:6 [▶]
      8 => array:6 [▶]
      9 => array:6 [▶]
    ]
  ]
]

it will be very kind of u to explain it ?

@kronthto
Copy link
Contributor

kronthto commented Jan 5, 2018

When using get/paginate it ignores any field but _id and uses this to query the results from the database:

$keys = collect($results['hits']['hits'])
->pluck('_id')->values()->all();
$models = $model->whereIn(
$model->getKeyName(), $keys
)->get()->keyBy($model->getKeyName());

This behaviour is intended for Scout-drivers. So, yes, highlight is ignored. The only thing you can do is use raw/paginateRaw, but then you lose the mapping to Eloquent.

@chbro
Copy link
Author

chbro commented Jan 5, 2018

3q very much,

i've changed to elasticsearch-php, by which i can get raw data returned from elasticsearch and orginize them on my own.

well, last question
i want to post data to my_index/_analyze to get stripped content as follow :

$params = array(
    'http' => array(
        'method' => 'POST',
        'header' => 'Content-Type: application/json',
        'content' => http_build_query([
            'analyzer' => 'my_analyzer',
            'text' => $value['_source']['rich_text']
        ])
    )
);
$url = config('scout.elasticsearch.hosts')[0] . '/' . config('scout.elasticsearch.index') . '/_analyze';
$context = stream_context_create($params);
dd($result = file_get_contents($url, false, $context));

but the result is different from what i get on Postman, should i use http_build_query here?
i'm new to php.

@kronthto
Copy link
Contributor

kronthto commented Jan 5, 2018

elasticsearch-php is what this library here uses under the hood anyways:

"elasticsearch/elasticsearch": "^5.0"

I think http_build_query is wrong here, because it builds a querystring with ? and &s, which you don't want in the Request-body (only in the URL). If anything, you might have wanted to use json_encode? I never really do requests that way.

In general, if you want to do HTTP Requests in PHP I can only recommend using Guzzle, it makes the code so much cleaner / easier to read because you don't have to deal with stream_context_create and stuff. Then it could look like:

$response = $client->request('POST', config('scout.elasticsearch.hosts')[0] . '/' . config('scout.elasticsearch.index') . '/_analyze', ['json' => 
  [
    'analyzer' => 'my_analyzer',
    'text' => $value['_source']['rich_text']
  ]
]);

@chbro
Copy link
Author

chbro commented Jan 5, 2018

excellent !

time to close this issue.

feel very grateful for your help.

@chbro chbro closed this as completed Jan 5, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants