Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BulkIndexError when search_index --update #146

Open
Jiydam opened this issue Jan 26, 2016 · 8 comments
Open

BulkIndexError when search_index --update #146

Jiydam opened this issue Jan 26, 2016 · 8 comments

Comments

@Jiydam
Copy link

Jiydam commented Jan 26, 2016

I am unable to index my models.

My model:

class Program(models.Model):
    objects = ProgramManager()

    name = models.CharField(max_length=100, unique=True)
    description = models.TextField(null = True, blank = True)
    type = models.ForeignKey('ProgramType')
    school = models.ForeignKey('School')
    department = models.ForeignKey('Department', blank = True, null = True)
    campuses = models.ManyToManyField('Campus', blank = True, null = True)
    num_courses = models.IntegerField(blank = True, null = True)
    num_units = models.IntegerField(blank = True, null = True)
    staff = models.ManyToManyField('user_manager.Member', blank = True, null = True)

My index


from catalog.models import Program
from bungiesearch.indices import ModelIndex

class ProgramIndex(ModelIndex):
    class Meta:
        model = Program
        exclude = {'campuses', 'num_courses', 'num_units', 'staff', 'department', 'school', 'type'}
        hotfixes = {
                    'name': {'boost': 1.75},
                    'description': {'boost': 1.35}}

When I run ./manage.py search_index --update

INFO:root:Updating models ['Program'] on indices ['main_index'].
INFO:root:Getting index for model Program.
WARNING:root:No updated date field found for Program - not restricting with start and end date
INFO:root:index 19 documents on index main_index
INFO:root:Index: documents 0 to 100 of 19 total on index main_index.
INFO:urllib3.connectionpool:Starting new HTTP connection (1): localhost
INFO:elasticsearch:POST http://localhost:9200/main_index/Program/_bulk [status:200 request:0.128s]
No handlers could be found for logger "elasticsearch.trace"
Traceback (most recent call last):
  File "manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/Library/Python/2.7/site-packages/django/core/management/__init__.py", line 338, in execute_from_command_line
    utility.execute()
  File "/Library/Python/2.7/site-packages/django/core/management/__init__.py", line 330, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Library/Python/2.7/site-packages/django/core/management/base.py", line 390, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/Library/Python/2.7/site-packages/django/core/management/base.py", line 441, in execute
    output = self.handle(*args, **options)
  File "/Users/administrator/Documents/Metis/app/bungiesearch/management/commands/search_index.py", line 196, in handle
    update_index(src.get_model_index(model_name).get_model().objects.all(), model_name, bulk_size=options['bulk_size'], num_docs=options['num_docs'], start_date=options['start_date'], end_date=options['end_date'])
  File "/Users/administrator/Documents/Metis/app/bungiesearch/utils.py", line 62, in update_index
    bulk_index(src.get_es_instance(), data, index=index_name, doc_type=model.__name__, raise_on_error=True)
  File "/Library/Python/2.7/site-packages/elasticsearch/helpers/__init__.py", line 188, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "/Library/Python/2.7/site-packages/elasticsearch/helpers/__init__.py", line 160, in streaming_bulk
    for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
  File "/Library/Python/2.7/site-packages/elasticsearch/helpers/__init__.py", line 132, in _process_bulk_chunk
    raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors)
elasticsearch.helpers.BulkIndexError: (u'19 document(s) failed to index.', [{u'index': {u'status': 500, u'_type': u'Program', u'_id': u'1', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'5', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'6', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'7', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'8', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'9', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'10', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'11', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'12', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'13', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'14', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'15', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'16', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'17', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'18', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'19', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'20', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'21', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'22', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}])


My mapping

{
  "main_index": {
    "aliases": {},
    "mappings": {
      "Program": {
        "properties": {
          "_id": {
            "type": "integer"
          },
          "description": {
            "type": "string",
            "boost": 1.35,
            "analyzer": "snowball"
          },
          "id": {
            "type": "integer"
          },
          "name": {
            "type": "string",
            "boost": 1.75,
            "analyzer": "snowball"
          }
        }
      }
    },
    "settings": {
      "index": {
        "creation_date": "1453791581592",
        "number_of_shards": "5",
        "number_of_replicas": "1",
        "uuid": "6UmM8YJBQI6m7H1XKnEm6Q",
        "version": {
          "created": "2010099"
        }
      }
    },
    "warmers": {}
  }
}

@ChristopherRabotin
Copy link
Owner

Did that index already exist before you tried using bungie search on it?

On Tue, Jan 26, 2016, 07:08 Jiyda Mint Moussa notifications@github.com
wrote:

I am unable to index my models

My model:

class Program(modelsModel):
objects = ProgramManager()

name = modelsCharField(max_length=100, unique=True)
description = modelsTextField(null = True, blank = True)
type = modelsForeignKey('ProgramType')
school = modelsForeignKey('School')
department = modelsForeignKey('Department', blank = True, null = True)
campuses = modelsManyToManyField('Campus', blank = True, null = True)
num_courses = modelsIntegerField(blank = True, null = True)
num_units = modelsIntegerField(blank = True, null = True)
staff = modelsManyToManyField('user_managerMember', blank = True, null = True)

My index

from catalogmodels import Program
from bungiesearchindices import ModelIndex

class ProgramIndex(ModelIndex):
class Meta:
model = Program
exclude = {'campuses', 'num_courses', 'num_units', 'staff', 'department', 'school', 'type'}
hotfixes = {
'name': {'boost': 175},
'description': {'boost': 135}}

When I run /managepy search_index --update

INFO:root:Updating models ['Program'] on indices ['main_index']
INFO:root:Getting index for model Program
WARNING:root:No updated date field found for Program - not restricting with start and end date
INFO:root:index 19 documents on index main_index
INFO:root:Index: documents 0 to 100 of 19 total on index main_index
INFO:urllib3connectionpool:Starting new HTTP connection (1): localhost
INFO:elasticsearch:POST http://localhost:9200/main_index/Program/_bulk [status:200 request:0128s]
No handlers could be found for logger "elasticsearchtrace"
Traceback (most recent call last):
File "managepy", line 10, in
execute_from_command_line(sysargv)
File "/Library/Python/27/site-packages/django/core/management/__init__py", line 338, in execute_from_command_line
utilityexecute()
File "/Library/Python/27/site-packages/django/core/management/__init__py", line 330, in execute
selffetch_command(subcommand)run_from_argv(selfargv)
File "/Library/Python/27/site-packages/django/core/management/basepy", line 390, in run_from_argv
selfexecute(_args, *_cmd_options)
File "/Library/Python/27/site-packages/django/core/management/basepy", line 441, in execute
output = selfhandle(args, *options)
File "/Users/administrator/Documents/Metis/app/bungiesearch/management/commands/search_indexpy", line 196, in handle
update_index(srcget_model_index(model_name)get_model()objectsall(), model_name, bulk_size=options['bulk_size'], num_docs=options['num_docs'], start_date=options['start_date'], end_date=options['end_date'])
File "/Users/administrator/Documents/Metis/app/bungiesearch/utilspy", line 62, in update_index
bulk_index(srcget_es_instance(), data, index=index_name, doc_type=model__name
, raise_on_error=True)
File "/Library/Python/27/site-packages/elasticsearch/helpers/__init__py", line 188, in bulk
for ok, item in streaming_bulk(client, actions, *_kwargs):
File "/Library/Python/27/site-packages/elasticsearch/helpers/__init__py", line 160, in streaming_bulk
for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, *_kwargs):
File "/Library/Python/27/site-packages/elasticsearch/helpers/__init__py", line 132, in _process_bulk_chunk
raise BulkIndexError('%i document(s) failed to index' % len(errors), errors)
elasticsearchhelpersBulkIndexError: (u'19 document(s) failed to index', [{u'index': {u'status': 500, u'_type': u'Program', u'_id': u'1', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'5', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'6', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'7', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'8', u'error': {u'reason': u'javalangString cannot be cast to
javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'9', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'10', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'11', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'12', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'13', u'error': {u'reason': u'javalangString cannot
be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'14', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'15', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'16', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'17', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'18', u'error': {u'reason': u'javalang
String cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'19', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'20', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'21', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'22', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}])

My mapping

{
"main_index": {
"aliases": {},
"mappings": {
"Program": {
"properties": {
"_id": {
"type": "integer"
},
"description": {
"type": "string",
"boost": 135,
"analyzer": "snowball"
},
"id": {
"type": "integer"
},
"name": {
"type": "string",
"boost": 175,
"analyzer": "snowball"
}
}
}
},
"settings": {
"index": {
"creation_date": "1453791581592",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "6UmM8YJBQI6m7H1XKnEm6Q",
"version": {
"created": "2010099"
}
}
},
"warmers": {}
}
}


Reply to this email directly or view it on GitHub
#146.

@Jiydam
Copy link
Author

Jiydam commented Jan 26, 2016

I was using django-haystack before but I cleared that index. The only index I have now is main_index which was created by bungie search.

@ChristopherRabotin
Copy link
Owner

Okay. Odd. And the id on your model is definitely an integer with no null,
or any non integer value? Bungie search is able to detect the appropriate
field type to generate the mapping, and that part of the code hasn't
changed in over a year... Does the Program Manager mix in with the Bungie
search manager?

On Tue, Jan 26, 2016, 07:51 Jiyda Mint Moussa notifications@github.com
wrote:

I was using django-haystack before but I cleared that index. The only
index I have now is main_index which was created by bungie search.


Reply to this email directly or view it on GitHub
#146 (comment)
.

@Jiydam
Copy link
Author

Jiydam commented Jan 26, 2016

my ProgramManager is existing code I had before, doesn't really do anything related to search. Any pointers on how I would debug the issue?

@ChristopherRabotin
Copy link
Owner

If I recall correctly (I haven't changed how bungie search is used in
production for months), adding the manager will allow you to search the
model by invoking aliases or the search attribute. However, I don't think
it actually adds anything to the mapping.

To debug, I'd have a look at the Program Manager and see how the ID field
is definitely there (or more so how it's defined in the parent model mix
in).

On Tue, Jan 26, 2016, 08:14 Jiyda Mint Moussa notifications@github.com
wrote:

my ProgramManager is existing code I had before, doesn't really do
anything related to search. Any pointers on how I would debug the issue?


Reply to this email directly or view it on GitHub
#146 (comment)
.

@Jiydam
Copy link
Author

Jiydam commented Jan 26, 2016

I just changed the mapping of _id to string instead of integer and it worked, is that going to break other things?

@ChristopherRabotin
Copy link
Owner

No, it should not break anything if the field is indeed an integer and
never has a string value.

I'll take the code you posted to attempt to create a test case and see
whether your issue is reproducible. Is there anything in the Program
Manager code that you can disclose and which impacts the fields of the
table?

On Tue, Jan 26, 2016, 12:01 Jiyda Mint Moussa notifications@github.com
wrote:

I just changed the mapping of _id to string instead of integer and it
worked, is that going to break other things?


Reply to this email directly or view it on GitHub
#146 (comment)
.

@Jiydam
Copy link
Author

Jiydam commented Jan 26, 2016

I removed the ProgramManager and still was getting the error. It seems that the ES bulk method was expecting the mapping of _id to be string not an integer for some reason, since I tried the following in python console and it fails with _id as an integer

bulk_index(es_instance, data, index=index_name, doc_type=doc_type, raise_on_error=True)

You can check the bulk api, the _id is also provided as a string.

I really appreciate your support. I am using it now and everything seems to work fine so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants