Skip to content

Get/update fails for item IDs that contain spaces #1278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cdlate opened this issue Feb 8, 2023 · 6 comments
Closed

Get/update fails for item IDs that contain spaces #1278

cdlate opened this issue Feb 8, 2023 · 6 comments
Labels
Milestone

Comments

@cdlate
Copy link

cdlate commented Feb 8, 2023

Summary

I want to update or get a record having id containing a space using the PHP client library.
Expected: The record having the correct id is returned/updated
Actual: The record is not found

Code snippet of problem

Test dataset:

Kibana dev tools for inserts:

POST test_index/doc/_bulk
{"index": {"_index": "test_index", "_id": "test test"}}
{"field1": "value1"}

This will return data properly:
GET /test_index/_doc/test%20test

$params = [
  'index' => 'test_index',
  //'id'    => 'lp-3MIYB7NAUVXHg5pvt', THIS is working perfectly(no spaces)
  'id'    => 'test test', // this is not working because urlencode does not respect the url encode standard.
  'type' => '_doc'
];

$response = $elasticClient->get($params);

I think the problem is caused by the usage of urlencode instead of rawurlencode in the file AbstractEndPoint, not sure about how many client's methods are affected, not sure if the latest versions are affected, but I highly suspect they are.

See: https://www.php.net/manual/en/function.urldecode.php
See: https://www.php.net/manual/en/function.rawurlencode.php

System details

  • Ubuntu
  • 7.3
  • 6.x//7.x
  • 7.17
@ezimuel
Copy link
Contributor

ezimuel commented Feb 21, 2023

@cdlate you right, the + character is not encoded anymore in a space since ES 7.4. You can force to use + as space in ES 7.x with the settings es.rest.url_plus_as_space to true.
We need to fix this as you proposed using rawurlencode() instead of urlencode(). At the same time, we need to avoid BC break since we cannot change the URL encoding for existing document (containing space in ID) indexed with elasticsearch-php.
I'm thinking to use an env variable like ES_URL_PLUS_AS_SPACE that will be false as default. We will use rawurlencode() if the previous env will be false, otherwise we will continue to use urlencode().
Any feedback? Thanks!

@ezimuel ezimuel added the bug label Feb 21, 2023
@ezimuel ezimuel added this to the 8.7.0 milestone Feb 21, 2023
@cdlate
Copy link
Author

cdlate commented Feb 22, 2023

Hey @ezimuel
It is a good plan! I knew I was missing something about the evolution of the library, thanks for giving me context about it.

Can I contribute?

@ezimuel ezimuel modified the milestones: 8.7.0, 7.17.2 Mar 2, 2023
@ezimuel
Copy link
Contributor

ezimuel commented Mar 2, 2023

I'll provide a PR for fixing it in elasticsearch-php 7.17.2. You can help me with a review, thanks!

@ezimuel
Copy link
Contributor

ezimuel commented Mar 27, 2023

I sent #1303 for fixing the issue. @cdlate can you check if this work for you? Thanks.

@ezimuel
Copy link
Contributor

ezimuel commented Apr 20, 2023

@cdlate I just merged #1303 in 7.17 branch. I'll release 7.17.2 soon.

@ezimuel ezimuel closed this as completed Apr 20, 2023
@ezimuel
Copy link
Contributor

ezimuel commented Apr 21, 2023

Released 7.17.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants