-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional tags when crawling files #1984
Comments
It's similar to #884 right? Note that FSCrawler supports tags when using the REST API. |
Yes. The feature is available via REST API but it would be really valuable while crawling files. |
Could you describe and example of |
Following the example of REST API (see https://fscrawler.readthedocs.io/en/latest/admin/fs/rest.html#additional-tags), we should be able to not only specify additional global tags, but path to the file with tags for each folder branch. In my case, where I am indexing a library of books under
So
So if any folder had this
|
I started a first implementation. It's not exactly what you asked for but it's a start at least. Have a look at #2017 for the current status of the feature. |
Add support for external tags The goal of this feature is to allow users to provide additional metadata when crawling files. Whenever a directory is crawled, FSCrawler checks if a file named `.meta.yml` is present in the directory. If it is, the content of this file is used to enrich the document. ## Example For example, if you have a file named `.meta.yml` in the directory `/path/to/data/dir`: ```yaml external: myTitle: "My document title" ``` Then the document indexed will have a new field named `external.myTitle` with the value `My document title`. ## Supported Fields Only supported fields can be added to the document. If you try to add a field which is not supported, it will be ignored. For example, if you have the `.meta.yml` file contains: ```yaml foo: "bar" external: myTitle: "My document title" ``` The document indexed will have a new field named `external.myTitle` with the value `My document title`. The field `foo` will be ignored. If you really want to add a field named `foo`, you need to add it first as an external tag: ```yaml external: foo: "bar" myTitle: "My document title" ``` and then use an ingest pipeline to rename the `external.foo` field to `foo`. ## Overwriting Fields The `.meta.yml` file can also overwrite existing fields. For example, if you have the following `.meta.yml` file: ```yaml content: "HIDDEN" ``` Then the `content` field will be replaced by `HIDDEN` even though something else is extracted. > **Note:** The `.meta.yml` file is not indexed. It is only used to enrich the document. ## Tags Settings Here is a list of Tags settings (under `tags.` prefix): | Name | Default value | Documentation | |-----------------------|-----------------|---------------------| | `tags.metaFilename` | `.meta.yml` | [Meta Filename](#meta-filename) | ### Meta Filename You can use another filename for the external tags file. For example, if you want to use `meta_tags.json` instead of `.meta.yml`, you can set: ```yaml tags: metaFilename: "meta_tags.json" ``` > **Note:** Only json and yaml files are supported. Closes #1984.
Looks great. Thanks a lot. Appreciated. |
Is your feature request related to a problem? Please describe.
I'm always frustrated when I can't categorize or add context to the files being indexed by FSCrawler. The current plugin lacks the ability to add custom tags during the crawling process, which limits the flexibility and searchability of the indexed files.
Describe the solution you'd like
I would like to have the capability to specify additional tags when crawling files with FSCrawler. These tags should be included in the metadata of each indexed file, allowing for better categorization, context, and enhanced search capabilities. This feature should be configurable through the
fscrawler.yml
file.Describe alternatives you've considered
I've considered manually adding tags to the files after they are indexed, but this approach is time-consuming and inefficient. Another alternative is to use a different tool that supports tagging, but I prefer to continue using FSCrawler due to its other features and integration with Elasticsearch.
Feature Request: Add Additional Tags When Crawling Files
Summary:
Enhance the FSCrawler Elasticsearch plugin by adding the capability to include additional tags when crawling files. This feature will allow users to specify custom tags that can be associated with the files being indexed, providing more flexibility and improving searchability.
Description:
The current FSCrawler plugin for Elasticsearch efficiently indexes files and extracts metadata. However, it lacks the ability to add custom tags during the crawling process. By introducing a feature that allows users to specify additional tags, we can significantly enhance the plugin's functionality. These tags can be used to categorize files, add context, and improve the overall search experience.
Use Cases:
Implementation:
fscrawler.yml
file where users can define custom tags.Benefits:
Conclusion:
Adding the ability to specify additional tags when crawling files will greatly enhance the FSCrawler plugin's usability and functionality. This feature will provide users with more control over their indexed data and improve the overall search experience.
Let me know if you need any more help!
The text was updated successfully, but these errors were encountered: