You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Allow the user to specify an id field for the document to uniquely identify it when sending it to Elasticsearch. By default Elasticsearch will generate its own unique document identifiers.
The text was updated successfully, but these errors were encountered:
Just add 3 lines in the function 'documents_from_file'.
'id' is the specify 'id' in your xxx.csv file.
if 'id' in row.keys():
yield es.index_op(row, doc_type, True, index=index_name, id=row['id'])
else:
The complete function worked is as follows:
def documents_from_file(es, filename, delimiter, quiet, index_name, doc_type):
"""
Return a generator for pulling rows from a given delimited file.
:param es: an ElasticSearch client
:param filename: the name of the file to read from or '-' if stdin
:param delimiter: the delimiter to use
:param quiet: don't output anything to the console when this is True
:return: generator returning document-indexing operations
"""
def all_docs():
with open(filename, 'rb') if filename != '-' else sys.stdin as doc_file:
# delimited file should include the field names as the first row
fieldnames = doc_file.next().strip().split(delimiter)
echo('Using the following ' + str(len(fieldnames)) + ' fields:', quiet)
for fieldname in fieldnames:
echo(fieldname, quiet)
reader = csv.DictReader(doc_file, delimiter=delimiter, fieldnames=fieldnames)
count = 0
for row in reader:
count += 1
if count % 10000 == 0:
echo('Sent documents: ' + str(count), quiet)
if 'id' in row.keys():
yield es.index_op(row, doc_type, True, index=index_name, id=row['id'])
else:
yield es.index_op(row)
return all_docs
Allow the user to specify an
id
field for the document to uniquely identify it when sending it to Elasticsearch. By default Elasticsearch will generate its own unique document identifiers.The text was updated successfully, but these errors were encountered: