Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Pyodide #94

Closed
Emasoft opened this issue Apr 3, 2023 · 6 comments
Closed

Add support for Pyodide #94

Emasoft opened this issue Apr 3, 2023 · 6 comments

Comments

@Emasoft
Copy link

Emasoft commented Apr 3, 2023

Please add support for Pyodide.
Pyodide was created in 2018 by Michael Droettboom at Mozilla as part of the Iodide project. Iodide is a web-based notebook environment for literate scientific computing and communication.

Pyodide is a Python distribution for the browser and Node.js based on WebAssembly.

What is Pyodide?

Pyodide is a port of CPython to WebAssembly/Emscripten.

Pyodide makes it possible to install and run Python packages in the browser with
micropip. Any pure
Python package with a wheel available on PyPi is supported. Many packages with C
extensions have also been ported for use with Pyodide. These include many
general-purpose packages such as regex, PyYAML, lxml and scientific Python
packages including NumPy, pandas, SciPy, Matplotlib, and scikit-learn.

Pyodide comes with a robust Javascript ⟺ Python foreign function interface so
that you can freely mix these two languages in your code with minimal friction.
This includes full support for error handling, async/await, and much more.

When used inside a browser, Python has full access to the Web APIs.

Try Pyodide (no installation needed)

Try Pyodide in a
REPL directly in
your browser. For further information, see the
documentation.

Example Projects running in Pyodide

Pydantic-core
Pydantic-core running in browser

@hauntsaninja
Copy link
Collaborator

Thanks, I'm not currently planning on providing Pyodide wheels. I've linked to this issue in the FAQ #98 ; if this proves to be a popular request, I'll reopen

@MRYingLEE
Copy link

+1

@psymbio
Copy link

psymbio commented Oct 21, 2023

I think the issue has enough support (#98) to be taken up again.

See:
#57
#134
josephrocca/gpt-2-3-tokenizer#2
pyodide/pyodide#3875
pyodide/pyodide#3663
pyodide/pyodide#3543
emscripten-forge/recipes#660

A pure Python version need not be created just some patches for the wasm for Pyodide. I have worked on the script for creating the wasm wheel https://github.com/psymbio/tiktoken_rust_wasm/ (the README describes the steps) and for testing use this wheel, host index.html and view the error on console. Currently, there's an issue with "requests" library ImportError: Can't connect to HTTPS URL because the SSL module is not available. as it is not supported by Pyodide.

@Emasoft
Copy link
Author

Emasoft commented Dec 10, 2023

There are already 2 pure python implementations of the tokenizer:

In the educational version:
https://github.com/openai/tiktoken/blob/main/tiktoken/_educational.py
In this fork, courtesy of @kechan:
https://github.com/kechan/tiktoken
As discussed here: #36

@hauntsaninja Since everything is in place, when are you going to add the Pyodide compatible version of Tiktoken? We really need it.

@psymbio
Copy link

psymbio commented Dec 10, 2023

It's not fully completed gpt2 isn't completed; gpt4, cl100k_base, p50k, and r50k completed will look into this probably by this week. The compilation is pretty time-consuming and is documented here (will need to clean this up as well).

But for the time-being you can use the following:

import micropip
await micropip.install("https://raw.githubusercontent.com/psymbio/pyodide_wheels/main/tiktoken/tiktoken-0.5.1-cp311-cp311-emscripten_3_1_45_wasm32.whl", keep_going=True)
import tiktoken
enc = tiktoken.get_encoding("cl100k_base")
assert enc.decode(enc.encode("hello world")) == "hello world"
enc = tiktoken.encoding_for_model("gpt-4")
assert enc.decode(enc.encode("hello world")) == "hello world"

@hauntsaninja can you look at the feasibility of a PR here?

@maartenbreddels
Copy link

I compiled version 0.7 for wasm/pyodide, and put it at https://py.cafe/maartenbreddels/tiktoken-demo

Note that you can right-click the file in the file browser, copy the public URL to your clipboard, and use it in other projects for your requirements.txt.

At pycafe you don't have CORS issues, so we did not use the trick from @psymbio for version 0.7 (0.5 is similar to his)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants