Autocrawling Agent

Auto crawling content of website using GPT-4O vision api.

!!! Due to limitation of performance, crawling result can be different compared to original website content. !!!
+@) the openai change of policy, after "gpt-4o-2024-05-13" model, can't be crawling text content of image.

How to use

make ".env" file. ".env" file must have "OPENAI_API_KEY" information.
```
cd autocrawling_agent
vi .env
...    
```

build docker image

docker build -t autocrawling_agent_image .

run docker container

docker run -itd -p 8000:8000 --name autocrawling_agent_api_container autocrawling_agent_image

docker exec to container

docker exec -it autocrawling_agent_api_container bash

install playwright
```
cd /workspace
playwright install
```
start uvicorn server
```
uvicorn src.api.main:app --port 8000
```

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autocrawling Agent

How to use

About

Releases

Packages

Languages

JminJ/autocrawling_agent

Folders and files

Latest commit

History

Repository files navigation

Autocrawling Agent

How to use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages