Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Want to provide a Docker version #1

Open
zsinba opened this issue May 22, 2024 · 21 comments
Open

Want to provide a Docker version #1

zsinba opened this issue May 22, 2024 · 21 comments

Comments

@zsinba
Copy link

zsinba commented May 22, 2024

Use Docker to configure the environment and provide API services to facilitate confirmation and service. Thanks for contribution

@BUJIDAOVS
Copy link

+1 plz

@adithya-s-k
Copy link
Owner

Coming soon !!

@zsinba
Copy link
Author

zsinba commented May 25, 2024

Great

@adithya-s-k
Copy link
Owner

@zsinba @BUJIDAOVS i have added docker support and along with that added skypilot as well
Enjoy !!

@zsinba
Copy link
Author

zsinba commented May 31, 2024

@zsinba @BUJIDAOVS i have added docker support and along with that added skypilot as well Enjoy !!

Great. I'll try it right away

@zsinba
Copy link
Author

zsinba commented May 31, 2024

@BUJIDAOVS
image

INFO: 192.168.1.222:55630 - "POST /convert HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi result = await app( # type: ignore[func-returns-value] File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__ return await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__ await super().__call__(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__ await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__ raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__ await self.app(scope, receive, _send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__ await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app response = await func(request) File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) File "/app/server.py", line 74, in convert_pdf_to_markdown markdown_text, metadata, image_data = parse_pdf_and_return_markdown(await pdf_file.read(), extract_images=extract_images) File "/app/server.py", line 17, in parse_pdf_and_return_markdown full_text, images, out_meta = convert_single_pdf(pdf_file, model_list) File "/app/marker/convert.py", line 65, in convert_single_pdf pages, toc = get_text_blocks( File "/app/marker/pdf/extract_text.py", line 85, in get_text_blocks char_blocks = dictionary_output(doc, page_range=page_range, keep_chars=True) File "/usr/local/lib/python3.10/dist-packages/pdftext/extraction.py", line 75, in dictionary_output pages = _get_pages(pdf_path, model, page_range, workers=workers) File "/usr/local/lib/python3.10/dist-packages/pdftext/extraction.py", line 26, in _get_pages pdf_doc = pdfium.PdfDocument(pdf_path) File "/usr/local/lib/python3.10/dist-packages/pypdfium2/_helpers/document.py", line 78, in __init__ self.raw, to_hold, to_close = _open_pdf(self._input, self._password, self._autoclose) File "/usr/local/lib/python3.10/dist-packages/pypdfium2/_helpers/document.py", line 674, in _open_pdf raise TypeError(f"Invalid input type '{type(input_data).__name__}'") TypeError: Invalid input type 'PdfDocument'

I'm sorry I didn't get the result

@zsinba
Copy link
Author

zsinba commented May 31, 2024

if pdf_file:
image

@adithya-s-k
Copy link
Owner

will test it out and get back to you

@adithya-s-k adithya-s-k reopened this May 31, 2024
@adithya-s-k
Copy link
Owner

I have updated the Docker image.

docker pull savatar101/marker-api:0.2
# If you are running on a GPU
docker run --gpus all -p 8000:8000 savatar101/marker-api:0.2
# Otherwise
docker run -p 8000:8000 savatar101/marker-api:0.2

Let me know if everything works properly.

In the next update, I will make the server more concurrent to handle multiple API requests simultaneously.

@zsinba
Copy link
Author

zsinba commented Jun 1, 2024

I have updated the Docker image.

docker pull savatar101/marker-api:0.2
# If you are running on a GPU
docker run --gpus all -p 8000:8000 savatar101/marker-api:0.2
# Otherwise
docker run -p 8000:8000 savatar101/marker-api:0.2

Let me know if everything works properly.

In the next update, I will make the server more concurrent to handle multiple API requests simultaneously.

thanks a lot. I will try.

@zsinba
Copy link
Author

zsinba commented Jun 2, 2024

it's ok now.
image

Thank you for sharing and giving

@zsinba
Copy link
Author

zsinba commented Jun 2, 2024

I used a 4090 GPU to convert a 17-page PDF (plain text type) and used about 27.37 seconds, is this time within the normal range?

@BUJIDAOVS
Copy link

@adithya-s-k
It seems that the model needs to be downloaded from Hugging Face after starting from Docker. Is it possible to directly include the model in the Docker image? Chinese users are unable to download the model from Hugging Face directly.

@zsinba
Copy link
Author

zsinba commented Jun 2, 2024

BUJIDAOVS

It is true, but the current image has 16G, and it will be larger, which is not convenient for later updates.

@adithya-s-k
Copy link
Owner

adithya-s-k commented Jun 2, 2024

I used a 4090 GPU to convert a 17-page PDF (plain text type) and used about 27.37 seconds, is this time within the normal range?

Yep it takes about that much time to parse it
I will soon be adding support for optimised inference to speed the whole process

currently working on it

@adithya-s-k
Copy link
Owner

@adithya-s-k It seems that the model needs to be downloaded from Hugging Face after starting from Docker. Is it possible to directly include the model in the Docker image? Chinese users are unable to download the model from Hugging Face directly.

will create another docker image with the weights already present as an alternative but it might be around 20 to 25 gb in size

@zsinba
Copy link
Author

zsinba commented Jun 2, 2024

Thanks a lot.

@BUJIDAOVS
Copy link

@adithya-s-k It seems that the model needs to be downloaded from Hugging Face after starting from Docker. Is it possible to directly include the model in the Docker image? Chinese users are unable to download the model from Hugging Face directly.

will create another docker image with the weights already present as an alternative but it might be around 20 to 25 gb in size

Thank you, I have already found a way to avoid re-downloading when restarting the container.

version: '3'

services:
  marker-api:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
    image: savatar101/marker-api:0.2
    volumes:
      - /home/user/Documents/Projects/hf-download/pdf2md/huggingface:/root/.cache/huggingface
    environment:
      - CUDA_VISIBLE_DEVICES=0
      - HF_ENDPOINT=https://hf-mirror.com
    ports:
      - "17915:8000"

This requires extracting the downloaded models from Docker to the host machine after the first successful startup.

@liwenlong520
Copy link

How do I install the Mac M1 Docker container?

@liwenlong520
Copy link

Detecting bboxes: 0%| | 0/4 [00:00<?, ?it/s][W NNPACK.cpp:61] Could not initialize NNPACK! Reason: Unsupported hardware.

@yemoutao
Copy link

@adithya-s-k好像从Docker启动后需要从Hugging Face下载模型,能不能直接把模型包含到Docker镜像里?中国用户无法直接从Hugging Face下载模型。

将创建另一个具有现有权重的 docker 镜像作为替代方案,但它的大小可能约为 20 到 25 gb

谢谢,我已经找到了避免在重新启动容器时重新下载的方法。

version: '3'

services:
  marker-api:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
    image: savatar101/marker-api:0.2
    volumes:
      - /home/user/Documents/Projects/hf-download/pdf2md/huggingface:/root/.cache/huggingface
    environment:
      - CUDA_VISIBLE_DEVICES=0
      - HF_ENDPOINT=https://hf-mirror.com
    ports:
      - "17915:8000"

这需要在第一次成功启动后将下载的模型从 Docker 提取到主机。

可以绑定主机名这个办法:

extra_hosts:
- "huggingface.co:13.33.174.80"
- "cdn-lfs.huggingface.co:13.33.174.80"
- "www.huggingface.co:13.33.174.80"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants