Refactor code to prepare support for NDJSON and CSV #213

juchom · 2022-01-10T18:56:33Z

Pull Request

The main idea of this pull request is to cleanup and refactor some code in order to easily add support for CSV and NDJSON.

The only format supported is still JSON. I have renamed the methods according to the issue.

Batches methods were a bit akward and difficult to read / understand, so I removed the BatchOperationAsync method and added an EnumerableExtensions class which gives the Chunk.

Documents Add / Update unit tests are now Theory so we can easily extends them for CSV and NDJSON.

Documentation has been updated.

@curquiza let me know if you need help to review the code.

What does this PR do?

First step for #182

PR checklist

Please check if your PR fulfills the following requirements:

Does this PR fix an existing issue?
Have you read the contributing guidelines?
Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to MeiliSearch!

…tween formats

juchom · 2022-01-12T08:30:33Z

I was digging a bit further to add NDJSON and CSV support and I found something that needs to be discussed a bit.

I think there is an issue with the JsonSerializerOptions here :

meilisearch-dotnet/src/Meilisearch/MeilisearchClient.cs

Lines 22 to 26 in 8d63ed4

    
           public static readonly JsonSerializerOptions JsonSerializerOptions = new JsonSerializerOptions 
        
           { 
        
               DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, 
        
               PropertyNamingPolicy = JsonNamingPolicy.CamelCase, 
        
           };

And this code

meilisearch-dotnet/src/Meilisearch/ObjectExtensions.cs

Lines 49 to 63 in 8d63ed4

    
           public static object RemoveNullValues<T>(this IEnumerable<T> objectToTransform) 
        
           { 
        
               var result = objectToTransform.Select(item => 
        
               { 
        
                   dynamic expando = new ExpandoObject(); 
        
                   var x = expando as IDictionary<string, object>; 
        
                   foreach (var p in item.GetType().GetProperties().Where(p => p.GetValue(item) != null)) 
        
                   { 
        
                       x[p.Name.ToLower()] = p.GetValue(item, null); 
        
                   } 
        
                   return expando; 
        
               }).ToList().AsEnumerable(); 
        
               return result; 
        
           }

Used here

meilisearch-dotnet/src/Meilisearch/Index.cs

Line 212 in 8d63ed4

var filteredDocuments = documents.RemoveNullValues();

First issue is the custom function RemoveNullValues<T> is doing the same as the JsonSerializerOptions DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, duplicate code that can be removed easy one.

Second issue which is more serious, removing null properties is a bad decision.

Here is why :

Imagine we have this object we want to add to Meilisearch

public class UserAccount
{
    public int Id { get; set; }
    public string Name { get; set; }
    public bool? HasAcceptedLicense { get; set; }
}

When we add the object the first time we have the following values

Id = 1
Name = John
HasAcceptedLicense = true

A few days later, you change the license and require the user to accept or refuse it and you want to set HasAcceptedLicense to null which means "the user has not said if he accepts or refuses the new license".

You won't be able to do that with the current JsonSerializerOptions.

Last important thing is CSV is not as flexible as JSON when serializing, there is no optional field, the schema is the same for all records.

With the following records and actual options

Id = 1
Name = John
HasAcceptedLicense = true

Id = 2
Name = Jane
HasAcceptedLicense = null

The Jane record would not send the HasAcceptedLicense to Meilisearch with JSON or NDJSON, but it CSV it would send it.

To conclude

The best option would be to remove the custom RemoveNullValues<T> and DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull this would avoid the issues above.

If you're ok with that, I can add a small commit to this PR with the changes.

juchom · 2022-01-14T16:04:09Z

Before working on this PR, I suggest that we first finish the work to support Meilisearch 0.25.

Then, I will take some changes from this big PR and create smaller ones to prepare the support for NDJSON and CSV. It will be easier to review.

curquiza · 2022-01-15T13:08:31Z

Before working on this PR, I suggest that we first finish the work to support Meilisearch 0.25.

Completely agreed!

230: Create a GetChunks extension method for batches r=brunoocasali a=juchom # Pull Request ## What does this PR do? This pull request is here to make code a little bit easier to follow for batches. For exemple if we look at `AddDocumentsInBatchesAsync` we have this two piece of code. https://github.com/meilisearch/meilisearch-dotnet/blob/3f14f432a658deb8a83ab505179b34559e9c1252/src/Meilisearch/Index.cs#L152-L161 https://github.com/meilisearch/meilisearch-dotnet/blob/3f14f432a658deb8a83ab505179b34559e9c1252/src/Meilisearch/Index.cs#L732-L745 To follow the code we have to start 159 with a call to the second snippet. We go to the second snippet where we do so computation to get chunks, then we call a function and we have to go up to the first snippet and see what is happening with this local function. With this refactoring the `AddDocumentsInBatchesAsync` is self explaining. The second snippet is now replaced by an extension method that returns a chunk on demand. With the new code you see what the method is doing on first read https://github.com/meilisearch/meilisearch-dotnet/blob/4264a71913db643356c128670de09122b6c7e7f4/src/Meilisearch/Index.cs#L152-L161 This is a part from a previous PR #213. I will make several from #213 because there is too much things in this one. Co-authored-by: Julien Chomarat <j.chomarat@linoa.com>

juchom added 3 commits January 10, 2022 19:43

Refactor code to prepare support for ndjson and csv

0a00383

Fix file encoding

60d2e0f

Suffix index name with test format to avoid sharing the same index be…

8d1815b

…tween formats

Remove Meilisearch.sln.DotSettings.user and fix .gitignore

4136e58

juchom mentioned this pull request Jan 31, 2022

Create a GetChunks extension method for batches #230

Merged

juchom closed this Feb 2, 2022

juchom deleted the juchom/new-format branch May 18, 2022 16:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor code to prepare support for NDJSON and CSV #213

Refactor code to prepare support for NDJSON and CSV #213

juchom commented Jan 10, 2022

juchom commented Jan 12, 2022 •

edited

Loading

juchom commented Jan 14, 2022

curquiza commented Jan 15, 2022

Refactor code to prepare support for NDJSON and CSV #213

Refactor code to prepare support for NDJSON and CSV #213

Conversation

juchom commented Jan 10, 2022

Pull Request

What does this PR do?

PR checklist

juchom commented Jan 12, 2022 • edited Loading

juchom commented Jan 14, 2022

curquiza commented Jan 15, 2022

juchom commented Jan 12, 2022 •

edited

Loading