Skip to content

Commit 64877b0

Browse files
vkehfdl1bwook00jeffrey홍승우hongsw
authored
Make AutoRAG to Monorepo (Marker-Inc-Korea#960)
* just commit * just commit * add the root directory * .gitignore in the autorag source folder * edit github actions * fix .env .gitignore * add root .gitignore * set PYTHONPATH at test.yml * change the name of the test_base.py * change the VERSION path at docs/conf.py * Add api to repository * Add api to repository * add autorag at pythonpath * edit gitignore for tracking projects folder * add README.md at projects folder for tracking projects folder * add autorag-frontend as git submodule * Do not run API test at github actions * rename: update file path from api/projects/README.md to projects/README.md * 🚑 fix: Update .gitignore and add .dockerignore and Dockerfile Added various entries to ignore specific files and directories in both the root directory's .gitignore and the api directory's .dockerignore. Additionally, included a Dockerfile for building a Python 3.10-slim-based API image with specified dependencies and runtime configurations. A docker-compose.yml file was introduced to define services and networks for frontend and API components. * 📝 docs: remove AutoRAG Workflow API documentation and related resources. * ✨ feat: Add description for tutorial_1 project * 🔧 chore: update .gitignore to exclude .DS_Store * 🚑 fix: Update project naming convention in README and adjust requirements This commit updates the project naming convention in the README file from "AutoRAG API Server" to "AutoRAG-API" for consistency. Additionally, it modifies the version requirement in the `requirements.txt` file for AutoRAG to be greater than or equal to 0.3.8 to ensure compatibility with the latest features. * 🚑 fix: Update ports and environment variables in docker-compose.yml to use port 5001 instead of 5000 * 🚑 fix: Update schema.py with corrected field indentation and added 'path' field in ParseRequest model * 🚑 fix: Fix indentation in validate.py for decorator functions. * 🚑 fix: refactor authentication decorator in auth.py * 🚑 fix: Correct get_new_trial_dir parameter naming and handle trial directory creation accurately * 🚑 fix: Corrected import formatting in qa_create.py and standardized function indentation. * ✨ feat: Add dashboard module to autorag package and implement async parser function * 🚑 fix: Refactor PandasTrialDB to handle trial operations more efficiently and improve error handling * move upload file endpoint * turn evaluate_history.py workable again * just reformat and edit ignore files * working with uvicorn now * Add env variable to locate the project folder and resolve new pydantic version issues (Marker-Inc-Korea#971) Co-authored-by: jeffrey <vkefhdl1@gmail.com> * Add env variable endpoints for managing env variable (Marker-Inc-Korea#975) * add delete endpoint and change to .env based operations * add api endpoint for gathering all env settings * load env variable when start each task * change GET /env to return everything (key & values) --------- Co-authored-by: jeffrey <vkefhdl1@gmail.com> * upload multiple files at once using key 'files' (Marker-Inc-Korea#981) Co-authored-by: jeffrey <vkefhdl1@gmail.com> * [API] fix validate and evaluation api config, set_trial_config Marker-Inc-Korea#984 (Marker-Inc-Korea#987) * feat: refactor SQL Trial DB from Pandas Trial DB, and Test code * 🚑 fix: Set correct WORK_DIR based on environment variable - Updated the logic in app.py to properly set the `WORK_DIR` based on the environment variable `AUTORAG_API_ENV`. If the environment is 'dev', the `WORK_DIR` will be located at `"../projects"`, otherwise, it will be set to `"projects"`. Additionally, the `.env` file path is now correctly constructed using the determined `WORK_DIR` value. * 🚑 fix: Update method to use model_validate_json in trial_dict['config'] assignment and update set_trial_config for trial_id with TrialConfig model dump JSON. Add get_all_config_ids and get_all_trial_ids SQL query functions. * ✨ feat: Add CORS headers and handle OPTIONS requests This commit introduces the addition of CORS headers in every response and explicit handling of OPTIONS requests in the API server. Includes setting Access-Control-Allow-Origin, Access-Control-Allow-Credentials, Access-Control-Allow-Headers, and Access-Control-Allow-Methods based on the request origin. * ✅ test: add test file for project creation with setup and cleanup fixtures, including logging configurations, environment setup, client creation, and project directory validation * 🚑 fix: Remove unnecessary commented-out properties in Trial class * 🚑 fix: Set correct WORK_DIR based on environment variable AUTORAG_WORK_DIR * ♻️ refactor: Update code in app.py and schema.py for better handling of working directory and model configuration. Fix deprecated usage in test_app.py and enhance testing in test_trial_config.py. * 📝 docs: update README with instructions for running using Docker Compose and monitoring options. * ✨ feat: start parsing documents task with improved import handling This commit introduces changes to the document parsing task initiation. The import statement for `parse_documents` has been updated within the file. Additionally, the logic for initiating the parsing process has been streamlined and improved for better performance and handling of imports. * ✅ test: add tests for project database operations such as initializing DB, setting/getting trials, updating trial configurations, and retrieving trial information by project or ID. * ♻️ refactor: Improve database initialization in SQLiteProjectDB - Refactored the `_init_db` method to enhance database initialization. - Added logging and enhanced debugging statements for better clarity. - Now checks for the existence of the database file and its directory before initializing. - If the database file does not exist, it creates the necessary directory and tables. - Adjusted permissions for directories (777) and the database file (666) accordingly. * 🚑 fix: correct chunking and parsing tasks in trial_tasks.py * 🔧 chore: Update imports and debug logging level in app.py - Updated import statement in app.py to include chunk_documents from trial_tasks module. - Changed the logging level from INFO to DEBUG for more detailed logging information. * ♻️ refactor: refactor parsing endpoint and improve error handling - Refactored the parsing endpoint to handle configuration data retrieval more efficiently. - Improved error handling to provide more informative error messages in case of missing data or failed tasks. * 🚑 fix: Correct chunked data path and task handling in start_chunking function * ✨ feat: Configure not to use uvloop, apply nest_asyncio, and correct import in app.py - Avoid using uvloop by setting asyncio event loop policy to DefaultEventLoopPolicy(). - Apply nest_asyncio after that to prevent conflicts. - Change the import in app.py from `from database.project_db import SQLiteProjectDB` to the correct import. refactor: Update Celery configuration in celery_app.py - Adjust broker and backend URLs to use 'redis://redis:6379/0'. - Modify the timezone to 'Asia/Seoul' for better synchronization. * 🚑 fix: Install system dependencies and pip, adjust Dockerfile for API service - Removed unnecessary comments related to installing pip as it's clear from the command itself - Added installation of 'watchfiles', setting PYTHONPATH and PYTHONUNBUFFERED environment variables - Created a directory for celery beat schedule and added an entrypoint script - Adjusted permissions for the entrypoint script and removed Windows line endings - Updated entrypoint to /entrypoint.sh in the API service section - Added environment variables for watching files, setting time zone, log level, and disabling Python output buffering * 🔧 chore: update subproject commit reference in autorag-frontend * 🔧 chore: add test_projects to .gitignore * add new lines and fix .env.dev * fix chunk_documents --------- Co-authored-by: Seungwoo hong <Seungwoo hong 1100974+hongsw@users.noreply.github.com> Co-authored-by: jeffrey <vkefhdl1@gmail.com> * Make the default timezone at the API server to UTC (Marker-Inc-Korea#992) * Change all datetime.now() to the timezone UTC * properly working UTC timezone in the API server --------- Co-authored-by: jeffrey <vkefhdl1@gmail.com> * ✨ feat: Add QA document generation task in trial_tasks.py and schema.py (Marker-Inc-Korea#1005) * ✨ feat: Add QA document generation task in trial_tasks.py and schema.py - Added a new field `qa_task_id` in the Trial schema to store the QA task ID. - Introduced `generate_qa_documents` shared task in `trial_tasks.py` for creating QA documents. - Updated imports and added `QACreationRequest` in `trial_tasks.py`. - Included function `run_qa_creation` in `generate_qa_documents` task for generating QA documents with status tracking and database updates. * 🚑 fix: Return full trial config in get_trial_config Adjusts the return statement in `get_trial_config` to return the complete trial configuration instead of just the model dump. * 🔧 chore: update subproject commit in autorag-frontend to 1434e797 --------- Co-authored-by: Seungwoo hong <Seungwoo hong 1100974+hongsw@users.noreply.github.com> * Change the api port to 8000 (Marker-Inc-Korea#1007) * artifacts/content GET endpoint for sending raw_data files (Marker-Inc-Korea#1008) * Change the WORK_DIR setting * send file directly * Change the API server that qa, chunk, and qa contains to the project_id. (Marker-Inc-Korea#1011) * get all parsed documents and the parse is not relevant to the trial_id now * add get chunk list at the API server * chunk document at project view * /parse POST with parse_name * QA creation endpoint * Working API with SQL DB (Marker-Inc-Korea#1016) * Refactor start_evaluate api endpoint * if there is no .env, make one * make to one api endpoint that retrieve file content /artifacts/content * add /artifacts/content delete operation to delete the file * upload korean filenames * working parse with frontend * working QA! * validation 정상화 shout! * checkpoint (working but no result at evaluation) * Fix problem that trial_tasks.py cannot load the env * Finally success!!!! Working evaluate and validate * API server refactor to celery with report, streamlit, and qaurt api server with streaming (Marker-Inc-Korea#1021) * working running dashboard * working running and closing report * working and closing the chat streamlit server * working and closing the external api server port to 8100 * add ko version at requirements.txt (Marker-Inc-Korea#1026) * Update frontend * update for api compatibility * update autorag-frontend * Add parsed data get endpoint (Marker-Inc-Korea#1041) * add parsed file get endpoint * Add an "all_files" endpoint. * update AutoRAG version to 0.3.11rc3 * update AutoRAG version to 0.3.12 * update autorag frontend to the latest * Enable the file extensions (data, html, etc.) (Marker-Inc-Korea#1053) * change to the dynamic root directory * enable uploading html and data file extensions * merge main into Feature/Marker-Inc-Korea#959 * Feature/Marker-Inc-Korea#1080 (Marker-Inc-Korea#1082) * Add documentation about AutoRAG GUI * Delete talk with founders * Change to build the docs * move embedding package --------- Co-authored-by: kimbwook <bwook00@naver.com> Co-authored-by: jeffrey <vkefhdl1@gmail.com> Co-authored-by: 홍승우 <martin@hongseung-uui-MacBookAir.local> Co-authored-by: Seungwoo hong <1100974+hongsw@users.noreply.github.com> Co-authored-by: Seungwoo hong <Seungwoo hong 1100974+hongsw@users.noreply.github.com>
1 parent dd251e6 commit 64877b0

File tree

293 files changed

+5822
-1563
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

293 files changed

+5822
-1563
lines changed

.github/workflows/docker-push.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@ on:
44
push:
55
branches: [ "main" ]
66

7+
defaults:
8+
run:
9+
working-directory: ./autorag
10+
711
env:
812
DOCKER_REPO: "autoraghq/autorag"
913

.github/workflows/gpu-docker-push.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@ on:
44
push:
55
branches: [ "main" ]
66

7+
defaults:
8+
run:
9+
working-directory: ./autorag
10+
711
env:
812
DOCKER_REPO: "autoraghq/autorag"
913

.github/workflows/publish.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,10 @@ on:
55
branches:
66
- main
77

8+
defaults:
9+
run:
10+
working-directory: ./autorag
11+
812
jobs:
913
pypi-publish:
1014
name: upload release to PyPI

.github/workflows/test.yml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ jobs:
4040
sudo apt-get install tesseract-ocr
4141
- name: Install AutoRAG
4242
run: |
43-
pip install -e '.[ko,dev,parse,ja]'
43+
pip install -e './autorag[ko,dev,parse,ja]'
4444
- name: Install dependencies
4545
run: |
4646
pip install -r tests/requirements.txt
@@ -54,6 +54,8 @@ jobs:
5454
python3 -c "import nltk; nltk.download('averaged_perceptron_tagger_eng')"
5555
- name: delete tests package
5656
run: python3 tests/delete_tests.py
57-
- name: Run tests
57+
- name: Run AutoRAG tests
58+
env:
59+
PYTHONPATH: ${PYTHONPATH}:./autorag
5860
run: |
59-
python3 -m pytest -o log_cli=true --log-cli-level=INFO -n auto tests/
61+
python3 -m pytest -o log_cli=true --log-cli-level=INFO -n auto tests/autorag

.gitignore

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -106,8 +106,10 @@ ipython_config.py
106106
#pdm.lock
107107
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
108108
# in version control.
109-
# https://pdm.fming.dev/#use-with-ide
109+
# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
110110
.pdm.toml
111+
.pdm-python
112+
.pdm-build/
111113

112114
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
113115
__pypackages__/
@@ -158,10 +160,10 @@ cython_debug/
158160
# and can be added to the global gitignore or merged into this file. For a more nuclear
159161
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
160162
.idea/
161-
pytest.ini
162163
.DS_Store
163-
projects/tutorial_1
164-
!projects/tutorial_1/config.yaml
164+
pytest.ini
165+
projects
166+
test_projects
165167

166168
# Visual Studio Code
167169
.vscode/

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[submodule "autorag-frontend"]
2+
path = autorag-frontend
3+
url = https://github.com/Auto-RAG/autorag-frontend.git

CNAME

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
docs.auto-rag.com
1+
docs.auto-rag.com

0 commit comments

Comments
 (0)