Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional tags when crawling files #1984

Open
4islam opened this issue Dec 7, 2024 · 2 comments
Open

Add additional tags when crawling files #1984

4islam opened this issue Dec 7, 2024 · 2 comments
Labels
feature_request for feature request

Comments

@4islam
Copy link

4islam commented Dec 7, 2024

Is your feature request related to a problem? Please describe.

I'm always frustrated when I can't categorize or add context to the files being indexed by FSCrawler. The current plugin lacks the ability to add custom tags during the crawling process, which limits the flexibility and searchability of the indexed files.

Describe the solution you'd like

I would like to have the capability to specify additional tags when crawling files with FSCrawler. These tags should be included in the metadata of each indexed file, allowing for better categorization, context, and enhanced search capabilities. This feature should be configurable through the fscrawler.yml file.

Describe alternatives you've considered

I've considered manually adding tags to the files after they are indexed, but this approach is time-consuming and inefficient. Another alternative is to use a different tool that supports tagging, but I prefer to continue using FSCrawler due to its other features and integration with Elasticsearch.


Feature Request: Add Additional Tags When Crawling Files

Summary:
Enhance the FSCrawler Elasticsearch plugin by adding the capability to include additional tags when crawling files. This feature will allow users to specify custom tags that can be associated with the files being indexed, providing more flexibility and improving searchability.

Description:
The current FSCrawler plugin for Elasticsearch efficiently indexes files and extracts metadata. However, it lacks the ability to add custom tags during the crawling process. By introducing a feature that allows users to specify additional tags, we can significantly enhance the plugin's functionality. These tags can be used to categorize files, add context, and improve the overall search experience.

Use Cases:

  1. Categorization: Users can categorize files based on project, department, or any other custom criteria, making it easier to organize and retrieve relevant documents.
  2. Contextual Information: Adding tags that provide context, such as "confidential," "urgent," or "archived," can help users quickly identify the nature of the files. In addition, when you are crawling folders, a meta.inf file can include metadata for each file's content. In the case of a book folder with pages to be indexed in it, a meta.inf file describes information like the book name, its author, its ISBN etc. which can be added to each indexed page as its metadata.
  3. Enhanced Search: Custom tags can be used as search filters, allowing users to perform more precise and targeted searches within the indexed files.

Implementation:

  • Configuration: Introduce a new configuration option in the fscrawler.yml file where users can define custom tags.
  • Tagging Mechanism: Modify the crawling process to include the specified tags in the metadata of each indexed file.
  • Search Integration: Ensure that the custom tags are indexed and searchable within Elasticsearch, allowing users to filter search results based on these tags.

Benefits:

  • Improved file organization and retrieval.
  • Enhanced search capabilities with custom filters.
  • Greater flexibility in managing and categorizing indexed files.

Conclusion:
Adding the ability to specify additional tags when crawling files will greatly enhance the FSCrawler plugin's usability and functionality. This feature will provide users with more control over their indexed data and improve the overall search experience.


Let me know if you need any more help!

@4islam 4islam added the feature_request for feature request label Dec 7, 2024
@dadoonet
Copy link
Owner

dadoonet commented Dec 7, 2024

It's similar to #884 right? Note that FSCrawler supports tags when using the REST API.

@4islam
Copy link
Author

4islam commented Dec 7, 2024

Yes. The feature is available via REST API but it would be really valuable while crawling files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature_request for feature request
Projects
None yet
Development

No branches or pull requests

2 participants