Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gatsby-plugin-algolia is deleting my entire index before writing to it #93

Closed
miketheodorou opened this issue Sep 15, 2020 · 16 comments
Closed

Comments

@miketheodorou
Copy link

miketheodorou commented Sep 15, 2020

I am writing to the same index from two different sources. One of which is using the 'algoliasearch' package in a node backend, and the other, using the gatsby-plugin-algolia to read and write from strapi. The objects on the backend are being added first, then after the gatsby build is complete, the plugin runs and deletes everything. I'm wondering if there is a way to use both of these methods to write to the same index without them colliding.

@Haroenv
Copy link
Contributor

Haroenv commented Sep 16, 2020

Hi @miketheo423, I've published 0.12.0 which solves this use case. However it requires you do the following:

  1. use enablePartialUpdates
  2. make sure the external objects do not have any of your matchFields
  3. verify that external objects don't get deleted

@miketheodorou
Copy link
Author

Hey @Haroenv , I am using the enablePartialUpdates flag and and passing in a field that I know the other objects definitely do not have but it still seems to blow everything out

@Haroenv
Copy link
Contributor

Haroenv commented Sep 16, 2020

Reopening to investigate

@Haroenv Haroenv reopened this Sep 16, 2020
@miketheodorou
Copy link
Author

miketheodorou commented Sep 16, 2020

Here's an example of what i'm attempting to pass in now.

algolia-queries.js

const eventsQuery = `{
  allStrapiEventsPage {
    edges {
      node {
        events {
          objectID: id
          title
          description
          availability
          timezone
          rating
          genres
          published
          image {
            publicURL
          }
        }
      }
    }
  }
}`;

const eventsReducer = ({ data }) => {
  return data.allStrapiEventsPage.edges.reduce((acc, { node }) => {
    acc = [...acc, ...node.events];
    return acc;
  }, []);
};

const queries = [
  {
    query: eventsQuery,
    transformer: eventsReducer,
    matchFields: ['publicURL'],
  },
];

module.exports = queries;

gatsby-config.js

...,
{
      resolve: `gatsby-plugin-algolia`,
      options: {
        appId: process.env.ALGOLIA_APP_ID,
        apiKey: process.env.ALGOLIA_ADMIN_KEY,
        indexName: process.env.ALGOLIA_INDEX_NAME,
        enablePartialUpdates: true,
        matchFields: ['publicURL'],
        queries: require('./src/utils/algolia-queries'),
      },
    },
...

This is what happened when I ran my build:

Algolia: 1 queries to index
Algolia: query #1: executing query
Algolia: query 0: graphql resulted in 1 records
Algolia: query 0: starting Partial updates
Algolia: query 0: found 1511 existing records
Algolia: query 0: Partial updates – [insert/update: 0, total: 1]
Algolia: query 0: splitting in 0 jobs
Algolia: deleting 1510 objects from prod_SOD index
⠴ onPostBuild

@miketheodorou
Copy link
Author

miketheodorou commented Sep 16, 2020

@Haroenv Does that need to be matchFields: ['image.publicURL'] instead?

UPDATE:
Using that field above did not make a difference.

@Haroenv
Copy link
Contributor

Haroenv commented Sep 17, 2020

ah, I think that the plugin isn't yet written to allow dots in matchFields. Can you try it out with a top-level attribute first? Then we can add the feature of dots in the attribute

@miketheodorou
Copy link
Author

@Haroenv Yeah it looks like the top-level attribute is yielding the same result unfortunately.

@Haroenv
Copy link
Contributor

Haroenv commented Sep 24, 2020

Do you have a reproduction? With a top-level attribute that only exists in the Gatsby index I don't see an issue

@JesusFdezDav
Copy link

Hi @Haroenv,
We are having the same problem as @miketheo423 but, by doing what you suggested, we would only be checking for updates in the objects from one of the two sources. Is that correct?

@Haroenv
Copy link
Contributor

Haroenv commented Sep 30, 2020

I'm not sure what you mean. Could you make a reproduction or a script that makes this index + Gatsby configuration which removes the index? I've tried this multiple times, and as long as the Gatsby index has an attribute on top-level which is used for matchFields, which the other records don't have, I see no issues...

@prichey
Copy link
Contributor

prichey commented Oct 9, 2020

After getting this issue myself I think I have an idea why this is happening, which I think is just a misunderstanding of how matchFields should be used.

In the source, you check if any of the matchFields have a truthy value in the fresh algoliaObjects object here:

Object.keys(algoliaObjects).forEach(objectID => {
    // if the object has one of the matchFields, it should be removed,
    // but objects without matchFields are considered "not controlled"
    // and stay in the index
    if (matchFields.some(field => algoliaObjects[objectID][field])) {
      currentIndexState.toRemove[objectID] = true;
    }
  });
}

While this may work for the boolean flag on a modified field, it doesn't work in the use case where you want to update a field iff the field value has changed. In @miketheo423 's example above, they're using publicUrl for the matchField, which if it ends up being truthy (i.e. a non-empty string), will satisfy the matchFields.some check above and therefore be removed.

Before digging into the source, I made the same assumption about how the plugin works. (I actually also made the same assumption about the object.key pattern.)

@Haroenv what would you think about adding a predicate function to the plugin config (and maybe even per query?) which takes an object representing the previous value and returns true / false based on whether or not the object should be updated in the index? Passing a function rather than an array of strings would allow both for the modified behavior I believe matchFields is built around but also can accommodate more complex examples?

@Haroenv
Copy link
Contributor

Haroenv commented Oct 12, 2020

I think that makes sense @prichey. If it makes more sense, since this plugin is still in 0.x, if you find a more clean way to express the API, don't hesitate to make breaking changes. Thanks!

@prichey
Copy link
Contributor

prichey commented Oct 12, 2020

Sounds good, I'll work on some changes then PR.

I'm actually also interested in making some changes to add a disableConcurrentAccess option to the plugin as an attempt to fix #20. All of my queries necessarily deal with the same index so I'm thinking that indexing sequentially rather than concurrently might fix the instances when my builds hang due to Algolia tasks getting stalled.

That being the case, @Haroenv would you prefer I make 2 separate PR's or are you fine with accepting one that addresses both issues?

@Haroenv
Copy link
Contributor

Haroenv commented Oct 12, 2020

separate PRs will be easier to review, thanks @prichey !

@prichey
Copy link
Contributor

prichey commented Nov 19, 2020

@miketheo423 Have you tried updating to the most recent version? This should be fixed now

@Haroenv
Copy link
Contributor

Haroenv commented Jul 30, 2021

Let's assume it's fixed :) If not, please open a new issue with reproduction

@Haroenv Haroenv closed this as completed Jul 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants