Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add user specified id field from available fields #1

Open
rholder opened this issue Apr 19, 2015 · 1 comment
Open

Add user specified id field from available fields #1

rholder opened this issue Apr 19, 2015 · 1 comment

Comments

@rholder
Copy link
Owner

rholder commented Apr 19, 2015

Allow the user to specify an id field for the document to uniquely identify it when sending it to Elasticsearch. By default Elasticsearch will generate its own unique document identifiers.

@TimCoder
Copy link

TimCoder commented Sep 5, 2017

Just add 3 lines in the function 'documents_from_file'.
'id' is the specify 'id' in your xxx.csv file.

if 'id' in row.keys():   
    yield es.index_op(row, doc_type, True, index=index_name, id=row['id'])
else: 

The complete function worked is as follows:

def documents_from_file(es, filename, delimiter, quiet, index_name, doc_type):
    """
    Return a generator for pulling rows from a given delimited file.

    :param es: an ElasticSearch client
    :param filename: the name of the file to read from or '-' if stdin
    :param delimiter: the delimiter to use
    :param quiet: don't output anything to the console when this is True
    :return: generator returning document-indexing operations
    """
    def all_docs():
        with open(filename, 'rb') if filename != '-' else sys.stdin as doc_file:
            # delimited file should include the field names as the first row
            fieldnames = doc_file.next().strip().split(delimiter)
            echo('Using the following ' + str(len(fieldnames)) + ' fields:', quiet)
            for fieldname in fieldnames:
                echo(fieldname, quiet)

            reader = csv.DictReader(doc_file, delimiter=delimiter, fieldnames=fieldnames)
            count = 0
            for row in reader:
                count += 1
                if count % 10000 == 0:
                    echo('Sent documents: ' + str(count), quiet)
                if 'id' in row.keys():   
                    yield es.index_op(row, doc_type, True, index=index_name, id=row['id'])
                else: 
                    yield es.index_op(row)

    return all_docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants