Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSCrawler override my own index mapping fscrawler_mapping_content after started #1797

Open
bobjiang1988 opened this issue Jan 12, 2024 · 10 comments
Assignees

Comments

@bobjiang1988
Copy link

bobjiang1988 commented Jan 12, 2024

The document said that If you want to define your own index settings and mapping to set analyzers for example, you can update the needed component template before starting the FSCrawler.

But, _component_template/fscrawler_mapping_content will be override after FSCrawler started.

Originally posted by @bobjiang1988 in #469 (comment)

@dadoonet dadoonet added the bug For confirmed bugs label Jan 14, 2024
@dadoonet dadoonet self-assigned this Jan 14, 2024
@dadoonet dadoonet added this to the 2.10 milestone Jan 14, 2024
@dadoonet dadoonet removed the bug For confirmed bugs label Jan 14, 2024
@dadoonet
Copy link
Owner

I looked at this today and actually this is documented (see https://fscrawler.readthedocs.io/en/latest/admin/fs/elasticsearch.html#mappings):

You can stop FSCrawler creating/updating the index templates for you by setting push_templates to false:

name: "test"
elasticsearch:
  push_templates: false

Could you try this and confirm that this is fixing your issue?

@dadoonet dadoonet removed this from the 2.10 milestone Jan 14, 2024
@bobjiang1988
Copy link
Author

push_templates will stop to push all _component_templates and _index_templates. Could it only stop to push the specified template, like _component_template/fscrawler_mapping_content?

@dadoonet
Copy link
Owner

No. But here's my take on this.

I think I should add a CLI option like --setup which would only send the templates for a given job but not start it. May be --loop 0 would do this.
Then you can overwrite the template you want and start fscrawler with push_templates to false.

Would that work for you in the meantime? I would like to implement a better user experience.

@dadoonet dadoonet reopened this Jan 18, 2024
@bobjiang1988
Copy link
Author

I think if user could put the customer templates in a specified directory would be a better user experience. Then FSCrawler should read the templates from both user's directory and the default directory. Is it?

@aaltonenp
Copy link

I used to edit the default file at ~./.fscrawler/_default/7/_settings.json but that seems to be ignored now. I only need one small modification to make filenames case-insensitive. It's a bit cumbersome to push all templates beforehand after such a small change.

The old way where local modifications would be combined with defaults would seem best to me.

@dadoonet
Copy link
Owner

dadoonet commented Jul 8, 2024

@aaltonenp Would the solution posted here would work for you?

Basically, create an index template and disable fscrawler index templates?

@aaltonenp
Copy link

@dadoonet It might once I figure out how to add the normalizer. I let fscrawler run once to create the templates and then try to modify them.

In the _settings.json I used to define the normalizer where the analysis section is. That is now in the fscrawler_mapping_path template. That can be modified but then using that normalizer in fscrawler_mapping_file template causes a "normalizer not found" error.

I don't know much about Index & Component Templates and how they draw the settings together. Do I need to define the normalizer in the fscrawler_mapping_file a second time?

@dadoonet
Copy link
Owner

dadoonet commented Jul 8, 2024

You need to define the normalizer in the same component template as you would use it. So probably indeed in fscrawler_mapping_file and in fscrawler_mapping_path if you are also using it there.

@aaltonenp
Copy link

Ok thanks. Although looks like I can get away using the built-in lowercase normalizer so I don't have to define a custom one at all.

@gsgleason
Copy link

I'm having a similar problem with the current docker build.

I use an ingest pipeline with a grok processor to match patterns in the filenames and use those to create custom fields. Some of those fields are date types with custom date formats.

I cannot figure out a streamlined way to do this that isn't totally hackish.

Perhaps have the ability to create mappings when the index is initially created would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants