Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New format handling CSV NDJSON #329

Merged
merged 5 commits into from
Oct 11, 2021
Merged

Conversation

alallema
Copy link
Contributor

@alallema alallema commented Oct 5, 2021

New handler to accept CSV and NDJSON format.

  • New functions:
def add_documents_json(string_docs, primary_key)
def add_documents_csv(string_docs, primary_key)
def add_documents_ndjson(string_docs, primary_key)
def add_documents_raw(string_docs, primary_key, type)

These functions accept document data as string.
The add_documents_raw() has been added to avoid code duplication but is not mandatory.
The add_documents() still exists and gives the possibility to add documents as List.

Example of usage:

client = Client('http://127.0.0.1:7700', 'masterKey')

# JSON
jsonfile = open('../../data/dataset/movies.json', 'r')
data = jsonfile.read()
index = client.index('movies')
udpate = index.add_documents_json(data.encode('utf-8'))

# NDJSON
ndjsonfile = open('../../data/dataset/songs.ndjson', 'r')
data = ndjsonfile.read()
index = client.index('songsND')
udpate = index.add_documents_ndjson(data.encode('utf-8'))

# CSV as file
csvfile = open('../../data/dataset/songs.csv', 'r')
data = csvfile.read()
index = client.index('songs')
udpate = index.add_documents_csv(data.encode('utf-8'))

Please note:

  • The add_documents_csv() can also take file directly after open it, if the option rb is given to open(path_to_file, 'rb'). I haven't found a way to avoid this.

Copy link
Member

@curquiza curquiza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the integration guide issue I wrote:

The SDKs always send application/json as Content-Type to every request: this should be adapted to the concerned requests (POST and PUT)

What is your investigation about this, is it complicated to adapt?

Also, in the typesense docs, they don't seem to accept string as documents, but object or list
cf

Wouldn't it be an easier way to approach it for the users, accepting object instead of string? Look like there is less parsing on the user side.

meilisearch/index.py Outdated Show resolved Hide resolved
@alallema
Copy link
Contributor Author

alallema commented Oct 6, 2021

@curquiza,

The SDKs always send application/json as Content-Type to every request it still true I let the default Content-Type as application/json it only changes when a CSV or NDJSON is sent.

For the format:

  • The real problem is the csv it really complicated to transform it to string and make it work well. And it didn't work as expected because I can send a file directly to it.
  • If we ask users to send objects or lists it would work with add_documents() basic function. It could be a good option I will try it.

@curquiza
Copy link
Member

curquiza commented Oct 6, 2021

The SDKs always send application/json as Content-Type to every request it still true I let the default Content-Type as application/json it only changes when a CSV or NDJSON is sent.

Is it possible to only send Content-type for POST and PUT then, and not for DELETE and GET, or is it too complicated? That's what I meant with "his should be adapted to the concerned requests (POST and PUT)"

@alallema
Copy link
Contributor Author

alallema commented Oct 6, 2021

The SDKs always send application/json as Content-Type to every request it still true I let the default Content-Type as application/json it only changes when a CSV or NDJSON is sent.

Is it possible to only send Content-type for POST and PUT then, and not for DELETE and GET, or is it too complicated? That's what I meant with "his should be adapted to the concerned requests (POST and PUT)"

Oh yes, I think so! Sorry misunderstood

@bidoubiwa
Copy link
Contributor

bidoubiwa commented Oct 6, 2021

The real problem is the csv it really complicated to transform it to string and make it work well. And it didn't work as expected because I can send a file directly to it.

I feel like I don't really understand the problem 😕 Why would the user not send his CSV in string? Is it not possible to do the following:

const dataset = fs.readFileSync('./mydataset.csv')
client.index("movies").addDocumentsCsv(dataset)

and then at our side:

request.post(dataset, { "Content-type": "application/csv" })

I'm really sorry if I'm missing something

@alallema alallema force-pushed the new-format branch 2 times, most recently from e213f6f to 52f2067 Compare October 7, 2021 16:00
@alallema alallema closed this Oct 7, 2021
@alallema alallema reopened this Oct 7, 2021
@alallema
Copy link
Contributor Author

alallema commented Oct 7, 2021

The real problem is the csv it really complicated to transform it to string and make it work well. And it didn't work as expected because I can send a file directly to it.

I feel like I don't really understand the problem 😕 Why would the user not send his CSV in string? Is it not possible to do the following:

const dataset = fs.readFileSync('./mydataset.csv')
client.index("movies").addDocumentsCsv(dataset)

and then at our side:

request.post(dataset, { "Content-type": "application/csv" })

I'm really sorry if I'm missing something

You right It's work well!

@alallema alallema marked this pull request as ready for review October 11, 2021 09:11
@alallema alallema requested a review from curquiza October 11, 2021 09:11
@alallema alallema merged commit 1a42526 into bump-meilisearch-v0.23.0 Oct 11, 2021
@alallema alallema deleted the new-format branch October 11, 2021 09:13
bors bot added a commit that referenced this pull request Oct 12, 2021
327: Changes related to the next MeiliSearch release (v0.23.0) r=alallema a=meili-bot

Related to this issue: meilisearch/integration-guides#142

This PR:
- gathers the changes related to the next MeiliSearch release (v0.23.0) so that this package is ready when the official release is out.
- should pass the tests against the [latest pre-release of MeiliSearch](https://github.com/meilisearch/MeiliSearch/releases).
- might eventually contain test failures until the MeiliSearch v0.23.0 is out.

⚠️ This PR should NOT be merged until the next release of MeiliSearch (v0.23.0) is out.

_This PR is auto-generated for the [pre-release week](https://github.com/meilisearch/integration-guides/blob/master/guides/pre-release-week.md) purpose._

Done:
- #329 
    - Add new methods:
        - `addDocumentsJson(string $documents, ?string $primaryKey = null)`
        - `addDocumentsNdJson(string $documents, ?string $primaryKey = null)`
        - `addDocumentsCsv(string $documents, ?string $primaryKey = null)`
    - Add tests for new methods
    - Remove json header `application/json` for every http methods
- #331 

Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com>
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
Co-authored-by: alallema <amelie@meilisearch.com>
Co-authored-by: Amélie <alallema@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants