Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor code to prepare support for NDJSON and CSV #213

Closed
wants to merge 4 commits into from

Conversation

juchom
Copy link
Contributor

@juchom juchom commented Jan 10, 2022

Pull Request

The main idea of this pull request is to cleanup and refactor some code in order to easily add support for CSV and NDJSON.

The only format supported is still JSON. I have renamed the methods according to the issue.

Batches methods were a bit akward and difficult to read / understand, so I removed the BatchOperationAsync method and added an EnumerableExtensions class which gives the Chunk.

Documents Add / Update unit tests are now Theory so we can easily extends them for CSV and NDJSON.

Documentation has been updated.

@curquiza let me know if you need help to review the code.

What does this PR do?

First step for #182

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to MeiliSearch!

@juchom
Copy link
Contributor Author

juchom commented Jan 12, 2022

I was digging a bit further to add NDJSON and CSV support and I found something that needs to be discussed a bit.

I think there is an issue with the JsonSerializerOptions here :

public static readonly JsonSerializerOptions JsonSerializerOptions = new JsonSerializerOptions
{
DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull,
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
};

And this code

public static object RemoveNullValues<T>(this IEnumerable<T> objectToTransform)
{
var result = objectToTransform.Select(item =>
{
dynamic expando = new ExpandoObject();
var x = expando as IDictionary<string, object>;
foreach (var p in item.GetType().GetProperties().Where(p => p.GetValue(item) != null))
{
x[p.Name.ToLower()] = p.GetValue(item, null);
}
return expando;
}).ToList().AsEnumerable();
return result;
}

Used here

var filteredDocuments = documents.RemoveNullValues();

First issue is the custom function RemoveNullValues<T> is doing the same as the JsonSerializerOptions DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, duplicate code that can be removed easy one.

Second issue which is more serious, removing null properties is a bad decision.

Here is why :

Imagine we have this object we want to add to Meilisearch

public class UserAccount
{
    public int Id { get; set; }
    public string Name { get; set; }
    public bool? HasAcceptedLicense { get; set; }
}

When we add the object the first time we have the following values

Id = 1
Name = John
HasAcceptedLicense = true

A few days later, you change the license and require the user to accept or refuse it and you want to set HasAcceptedLicense to null which means "the user has not said if he accepts or refuses the new license".

You won't be able to do that with the current JsonSerializerOptions.

Last important thing is CSV is not as flexible as JSON when serializing, there is no optional field, the schema is the same for all records.

With the following records and actual options

Id = 1
Name = John
HasAcceptedLicense = true

Id = 2
Name = Jane
HasAcceptedLicense = null

The Jane record would not send the HasAcceptedLicense to Meilisearch with JSON or NDJSON, but it CSV it would send it.

To conclude

The best option would be to remove the custom RemoveNullValues<T> and DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull this would avoid the issues above.

If you're ok with that, I can add a small commit to this PR with the changes.

@juchom
Copy link
Contributor Author

juchom commented Jan 14, 2022

Before working on this PR, I suggest that we first finish the work to support Meilisearch 0.25.

Then, I will take some changes from this big PR and create smaller ones to prepare the support for NDJSON and CSV. It will be easier to review.

@curquiza
Copy link
Member

Before working on this PR, I suggest that we first finish the work to support Meilisearch 0.25.

Completely agreed!

@juchom juchom closed this Feb 2, 2022
bors bot added a commit that referenced this pull request Feb 4, 2022
230: Create a GetChunks extension method for batches r=brunoocasali a=juchom

# Pull Request

## What does this PR do?
This pull request is here to make code a little bit easier to follow for batches.

For exemple if we look at `AddDocumentsInBatchesAsync` we have this two piece of code.

https://github.com/meilisearch/meilisearch-dotnet/blob/3f14f432a658deb8a83ab505179b34559e9c1252/src/Meilisearch/Index.cs#L152-L161

https://github.com/meilisearch/meilisearch-dotnet/blob/3f14f432a658deb8a83ab505179b34559e9c1252/src/Meilisearch/Index.cs#L732-L745

To follow the code we have to start 159 with a call to the second snippet. We go to the second snippet where we do so computation to get chunks, then we call a function and we have to go up to the first snippet and see what is happening with this local function.

With this refactoring the `AddDocumentsInBatchesAsync` is self explaining.

The second snippet is now replaced by an extension method that returns a chunk on demand.

With the new code you see what the method is doing on first read

https://github.com/meilisearch/meilisearch-dotnet/blob/4264a71913db643356c128670de09122b6c7e7f4/src/Meilisearch/Index.cs#L152-L161

This is a part from a previous PR #213. I will make several from #213 because there is too much things in this one.

Co-authored-by: Julien Chomarat <j.chomarat@linoa.com>
bors bot added a commit that referenced this pull request Feb 8, 2022
230: Create a GetChunks extension method for batches r=brunoocasali a=juchom

# Pull Request

## What does this PR do?
This pull request is here to make code a little bit easier to follow for batches.

For exemple if we look at `AddDocumentsInBatchesAsync` we have this two piece of code.

https://github.com/meilisearch/meilisearch-dotnet/blob/3f14f432a658deb8a83ab505179b34559e9c1252/src/Meilisearch/Index.cs#L152-L161

https://github.com/meilisearch/meilisearch-dotnet/blob/3f14f432a658deb8a83ab505179b34559e9c1252/src/Meilisearch/Index.cs#L732-L745

To follow the code we have to start 159 with a call to the second snippet. We go to the second snippet where we do so computation to get chunks, then we call a function and we have to go up to the first snippet and see what is happening with this local function.

With this refactoring the `AddDocumentsInBatchesAsync` is self explaining.

The second snippet is now replaced by an extension method that returns a chunk on demand.

With the new code you see what the method is doing on first read

https://github.com/meilisearch/meilisearch-dotnet/blob/4264a71913db643356c128670de09122b6c7e7f4/src/Meilisearch/Index.cs#L152-L161

This is a part from a previous PR #213. I will make several from #213 because there is too much things in this one.

Co-authored-by: Julien Chomarat <j.chomarat@linoa.com>
@juchom juchom deleted the juchom/new-format branch May 18, 2022 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants