diff --git a/README.md b/README.md index 8e011e1..0def946 100644 --- a/README.md +++ b/README.md @@ -15,10 +15,11 @@ # Gemini Icon Gemini-API -A reverse-engineered asynchronous python wrapper for [Google Gemini](https://gemini.google.com) web chat (formerly Bard). +A reverse-engineered asynchronous python wrapper for [Google Gemini](https://gemini.google.com) web app (formerly Bard). ## Features +- **(WIP) Auto Cookie Management** - **ImageFx Support** - Supports retrieving images generated by ImageFx, Google's latest AI image generator. - **Extension Support** - Supports generating contents with [Gemini extensions](https://gemini.google.com/extensions) on, like YouTube and Gmail. - **Classified Outputs** - Auto categorizes texts, web images and AI generated images from the response. @@ -33,7 +34,8 @@ A reverse-engineered asynchronous python wrapper for [Google Gemini](https://gem - [Authentication](#authentication) - [Usage](#usage) - [Initialization](#initialization) - - [Generate contents from text inputs](#generate-contents-from-text-inputs) + - [Generate contents from text](#generate-contents-from-text) + - [Generate contents from image](#generate-contents-from-image) - [Conversations across multiple turns](#conversations-across-multiple-turns) - [Retrieve images in response](#retrieve-images-in-response) - [Generate images with ImageFx](#generate-images-with-imagefx) @@ -56,6 +58,7 @@ pip install gemini_webapi - Click any request and copy cookie values of `__Secure-1PSID` and `__Secure-1PSIDTS` > [!TIP] +> > `__Secure-1PSIDTS` could get expired frequently if is kept opened in the browser after copying cookies. It's recommended to get cookies from a separate session (e.g. a new login in browser's private mode) if you are building a keep-alive service with this package. > > For more details, please refer to discussions in [issue #6](https://github.com/HanaokaYuzu/Gemini-API/issues/6) @@ -82,9 +85,10 @@ asyncio.run(main()) ``` > [!TIP] +> > `auto_close` and `close_delay` are optional arguments for automatically closing the client after a certain period of inactivity. This feature is disabled by default. In a keep-alive service like chatbot, it's recommended to set `auto_close` to `True` combined with reasonable seconds of `close_delay` for better resource management. -### Generate contents from text inputs +### Generate contents from text Ask a one-turn quick question by calling `GeminiClient.generate_content`. @@ -97,8 +101,21 @@ asyncio.run(main()) ``` > [!TIP] +> > Simply use `print(response)` to get the same output if you just want to see the response text +### Generate contents from image + +Gemini supports image recognition and generate contents from image (currently only supports one image at a time). Optionally, you can pass image data in `bytes` or its path in `str` to `GeminiClient.generate_content` together with text prompt. + +```python +async def main(): + response = await client.generate_content("Describe the image", image="assets/banner.png") + print(response.text) + +asyncio.run(main()) +``` + ### Conversations across multiple turns If you want to keep conversation continuous, please use `GeminiClient.start_chat` to create a `ChatSession` object and send messages through it. The conversation history will be automatically handled and get updated after each turn. @@ -113,6 +130,10 @@ async def main(): asyncio.run(main()) ``` +> [!TIP] +> +> Same as `GeminiClient.generate_content`, `ChatSession.send_message` also accepts `image` as an optional argument. + ### Retrieve images in response Images in the API's output are stored as a list of `Image` objects. You can access the image title, URL, and description by calling `image.title`, `image.url` and `image.alt` respectively. @@ -131,6 +152,7 @@ asyncio.run(main()) In February 2022, Google introduced a new AI image generator called ImageFx and integrated it into Gemini. You can ask Gemini to generate images with ImageFx simply by natural language. > [!IMPORTANT] +> > Google has some limitations on the image generation feature in Gemini, so its availability could be different per region/account. Here's a summary copied from [official documentation](https://support.google.com/gemini/answer/14286560) (as of February 15th, 2024): > > > Image generation in Gemini Apps is available in most countries, except in the European Economic Area (EEA), Switzerland, and the UK. It’s only available for **English prompts**. @@ -149,6 +171,7 @@ asyncio.run(main()) ``` > [!NOTE] +> > by default, when asked to send images (like the previous example), Gemini will send images fetched from web instead of generating images with AI model, unless you specifically require to "generate" images in your prompt. In this package, web images and generated images are treated differently as `WebImage` and `GeneratedImage`, and will be automatically categorized in the output. ### Save images to local files @@ -167,6 +190,7 @@ asyncio.run(main()) ### Generate contents with Gemini extensions > [!IMPORTANT] +> > To access Gemini extensions in API, you must activate them on the [Gemini website](https://gemini.google.com/extensions) first. Same as image generation, Google also has limitations on the availability of Gemini extensions. Here's a summary copied from [official documentation](https://support.google.com/gemini/answer/13695044) (as of February 18th, 2024): > > > To use extensions in Gemini Apps: @@ -191,6 +215,7 @@ asyncio.run(main()) ``` > [!NOTE] +> > For the available regions limitation, it actually only requires your Google account's **preferred language** to be set to one of the three supported languages listed above. You can change your language settings [here](https://myaccount.google.com/language). ### Check and switch to other reply candidates diff --git a/src/gemini_webapi/client.py b/src/gemini_webapi/client.py index 86e2c8b..309d94b 100644 --- a/src/gemini_webapi/client.py +++ b/src/gemini_webapi/client.py @@ -9,6 +9,7 @@ from .types import WebImage, GeneratedImage, Candidate, ModelOutput from .exceptions import APIError, AuthError, TimeoutError, GeminiError +from .utils import upload_file from .constant import HEADERS @@ -34,16 +35,16 @@ async def wrapper(self: "GeminiClient", *args, **kwargs): class GeminiClient: """ - Async httpx client interface for gemini.google.com + Async httpx client interface for gemini.google.com. Parameters ---------- secure_1psid: `str` - __Secure-1PSID cookie value + __Secure-1PSID cookie value. secure_1psidts: `str`, optional - __Secure-1PSIDTS cookie value, some google accounts don't require this value, provide only if it's in the cookie list + __Secure-1PSIDTS cookie value, some google accounts don't require this value, provide only if it's in the cookie list. proxy: `dict`, optional - Dict of proxies + Dict of proxies. """ __slots__ = [ @@ -65,12 +66,12 @@ def __init__( ): self.cookies = {"__Secure-1PSID": secure_1psid} self.proxy = proxy - self.client: AsyncClient | None = None - self.access_token: Optional[str] = None + self.client: AsyncClient = None + self.access_token: str = None self.running: bool = False self.auto_close: bool = False self.close_delay: float = 300 - self.close_task: Task | None = None + self.close_task: Task = None if secure_1psidts: self.cookies["__Secure-1PSIDTS"] = secure_1psidts @@ -84,12 +85,12 @@ async def init( Parameters ---------- timeout: `float`, optional - Request timeout of the client in seconds. Used to limit the max waiting time when sending a request + Request timeout of the client in seconds. Used to limit the max waiting time when sending a request. auto_close: `bool`, optional If `True`, the client will close connections and clear resource usage after a certain period - of inactivity. Useful for keep-alive services + of inactivity. Useful for keep-alive services. close_delay: `float`, optional - Time to wait before auto-closing the client in seconds. Effective only if `auto_close` is `True` + Time to wait before auto-closing the client in seconds. Effective only if `auto_close` is `True`. """ try: self.client = AsyncClient( @@ -132,7 +133,7 @@ async def close(self, delay: float = 0) -> None: Parameters ---------- delay: `float`, optional - Time to wait before closing the client in seconds + Time to wait before closing the client in seconds. """ if delay: await asyncio.sleep(delay) @@ -155,7 +156,10 @@ async def reset_close_task(self) -> None: @running async def generate_content( - self, prompt: str, chat: Optional["ChatSession"] = None + self, + prompt: str, + image: Optional[bytes | str] = None, + chat: Optional["ChatSession"] = None, ) -> ModelOutput: """ Generates contents with prompt. @@ -163,15 +167,17 @@ async def generate_content( Parameters ---------- prompt: `str` - Prompt provided by user + Prompt provided by user. + image: `bytes` | `str`, optional + File data in bytes, or path to the image file to be sent together with the prompt. chat: `ChatSession`, optional - Chat data to retrieve conversation history. If None, will automatically generate a new chat id when sending post request + Chat data to retrieve conversation history. If None, will automatically generate a new chat id when sending post request. Returns ------- :class:`ModelOutput` Output data from gemini.google.com, use `ModelOutput.text` to get the default text reply, `ModelOutput.images` to get a list - of images in the default reply, `ModelOutput.candidates` to get a list of all answer candidates in the output + of images in the default reply, `ModelOutput.candidates` to get a list of all answer candidates in the output. """ assert prompt, "Prompt cannot be empty." @@ -184,7 +190,23 @@ async def generate_content( data={ "at": self.access_token, "f.req": json.dumps( - [None, json.dumps([[prompt], None, chat and chat.metadata])] + [ + None, + json.dumps( + [ + image + and [ + prompt, + 0, + None, + [[[await upload_file(image), 1]]], + ] + or [prompt], + None, + chat and chat.metadata, + ] + ), + ] ), }, ) @@ -280,7 +302,7 @@ def start_chat(self, **kwargs) -> "ChatSession": Returns ------- :class:`ChatSession` - Empty chat object for retrieving conversation history + Empty chat object for retrieving conversation history. """ return ChatSession(geminiclient=self, **kwargs) @@ -292,15 +314,15 @@ class ChatSession: Parameters ---------- geminiclient: `GeminiClient` - Async httpx client interface for gemini.google.com + Async httpx client interface for gemini.google.com. metadata: `list[str]`, optional - List of chat metadata `[cid, rid, rcid]`, can be shorter than 3 elements, like `[cid, rid]` or `[cid]` only + List of chat metadata `[cid, rid, rcid]`, can be shorter than 3 elements, like `[cid, rid]` or `[cid]` only. cid: `str`, optional - Chat id, if provided together with metadata, will override the first value in it + Chat id, if provided together with metadata, will override the first value in it. rid: `str`, optional - Reply id, if provided together with metadata, will override the second value in it + Reply id, if provided together with metadata, will override the second value in it. rcid: `str`, optional - Reply candidate id, if provided together with metadata, will override the third value in it + Reply candidate id, if provided together with metadata, will override the third value in it. """ # @properties needn't have their slots pre-defined @@ -339,23 +361,29 @@ def __setattr__(self, name: str, value: Any) -> None: self.metadata = value.metadata self.rcid = value.rcid - async def send_message(self, prompt: str) -> ModelOutput: + async def send_message( + self, prompt: str, image: Optional[bytes | str] = None + ) -> ModelOutput: """ Generates contents with prompt. - Use as a shortcut for `GeminiClient.generate_content(prompt, self)`. + Use as a shortcut for `GeminiClient.generate_content(prompt, image, self)`. Parameters ---------- prompt: `str` - Prompt provided by user + Prompt provided by user. + image: `bytes` | `str`, optional + File data in bytes, or path to the image file to be sent together with the prompt. Returns ------- :class:`ModelOutput` Output data from gemini.google.com, use `ModelOutput.text` to get the default text reply, `ModelOutput.images` to get a list - of images in the default reply, `ModelOutput.candidates` to get a list of all answer candidates in the output + of images in the default reply, `ModelOutput.candidates` to get a list of all answer candidates in the output. """ - return await self.geminiclient.generate_content(prompt, self) + return await self.geminiclient.generate_content( + prompt=prompt, image=image, chat=self + ) def choose_candidate(self, index: int) -> ModelOutput: """ @@ -364,7 +392,12 @@ def choose_candidate(self, index: int) -> ModelOutput: Parameters ---------- index: `int` - Index of the candidate to choose, starting from 0 + Index of the candidate to choose, starting from 0. + + Returns + ------- + :class:`ModelOutput` + Output data of the chosen candidate. """ if not self.last_output: raise ValueError("No previous output data found in this chat session.") diff --git a/src/gemini_webapi/constant.py b/src/gemini_webapi/constant.py index 13c9f65..cd08599 100644 --- a/src/gemini_webapi/constant.py +++ b/src/gemini_webapi/constant.py @@ -6,3 +6,5 @@ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36", "X-Same-Domain": "1", } + +UPLOAD_PUSHID = "feeds/mcudyrk2a4khkz" diff --git a/src/gemini_webapi/utils.py b/src/gemini_webapi/utils.py new file mode 100644 index 0000000..aee55b5 --- /dev/null +++ b/src/gemini_webapi/utils.py @@ -0,0 +1,41 @@ +from httpx import AsyncClient +from pydantic import validate_call + +from .constant import UPLOAD_PUSHID + + +@validate_call +async def upload_file(file: bytes | str) -> str: + """ + Upload a file to Google's server and return its identifier. + + Parameters + ---------- + file : `bytes` | `str` + File data in bytes, or path to the file to be uploaded. + + Returns + ------- + `str` + Identifier of the uploaded file. + E.g. "/contrib_service/ttl_1d/1709764705i7wdlyx3mdzndme3a767pluckv4flj" + + Raises + ------ + `httpx.HTTPStatusError` + If the upload request failed. + """ + + if isinstance(file, str): + with open(file, "rb") as f: + file = f.read() + + async with AsyncClient() as client: + response = await client.post( + url="https://content-push.googleapis.com/upload/", + headers={"Push-ID": UPLOAD_PUSHID}, + files={"file": file}, + follow_redirects=True, + ) + response.raise_for_status() + return response.text diff --git a/tests/test_client_features.py b/tests/test_client_features.py index 4d211e2..68835f6 100644 --- a/tests/test_client_features.py +++ b/tests/test_client_features.py @@ -17,10 +17,20 @@ async def asyncSetUp(self): except AuthError: self.skipTest("Test was skipped due to invalid cookies") + @logger.catch(reraise=True) async def test_successful_request(self): response = await self.geminiclient.generate_content("Hello World!") self.assertTrue(response.text) + @logger.catch(reraise=True) + async def test_upload_image(self): + response = await self.geminiclient.generate_content( + "Describe the image", image="assets/banner.png" + ) + self.assertTrue(response.text) + logger.debug(response.text) + + @logger.catch(reraise=True) async def test_continuous_conversation(self): chat = self.geminiclient.start_chat() response1 = await chat.send_message("Briefly introduce Europe") @@ -30,6 +40,19 @@ async def test_continuous_conversation(self): self.assertTrue(response2.text) logger.debug(response2.text) + @logger.catch(reraise=True) + async def test_chatsession_with_image(self): + chat = self.geminiclient.start_chat() + response1 = await chat.send_message( + "Describe the image", image="assets/banner.png" + ) + self.assertTrue(response1.text) + logger.debug(response1.text) + response2 = await chat.send_message("Tell me more about it.") + self.assertTrue(response2.text) + logger.debug(response2.text) + + @logger.catch(reraise=True) async def test_send_web_image(self): response = await self.geminiclient.generate_content( "Send me some pictures of cats" @@ -39,6 +62,7 @@ async def test_send_web_image(self): self.assertTrue(image.url) logger.debug(image) + @logger.catch(reraise=True) async def test_ai_image_generation(self): response = await self.geminiclient.generate_content( "Generate some pictures of cats" @@ -48,6 +72,7 @@ async def test_ai_image_generation(self): self.assertTrue(image.url) logger.debug(image) + @logger.catch(reraise=True) async def test_extension_google_workspace(self): response = await self.geminiclient.generate_content( "@Gmail What's the latest message in my mailbox?" @@ -55,6 +80,7 @@ async def test_extension_google_workspace(self): self.assertTrue(response.text) logger.debug(response) + @logger.catch(reraise=True) async def test_extension_youtube(self): response = await self.geminiclient.generate_content( "@Youtube What's the lastest activity of Taylor Swift?" @@ -62,11 +88,10 @@ async def test_extension_youtube(self): self.assertTrue(response.text) logger.debug(response) + @logger.catch(reraise=True) async def test_reply_candidates(self): chat = self.geminiclient.start_chat() - response = await chat.send_message( - "Recommend a science fiction book for me." - ) + response = await chat.send_message("Recommend a science fiction book for me.") if len(response.candidates) == 1: logger.debug(response.candidates[0])