feat: add support to upload image and generate contents from image

HanaokaYuzu · Mar 7, 2024 · d2274c2 · d2274c2
1 parent 6c48d50
commit d2274c2
Show file tree

Hide file tree

Showing 5 changed files with 160 additions and 34 deletions.
diff --git a/README.md b/README.md
@@ -15,10 +15,11 @@
 
 # <img src="assets/logo.svg" width="35px" alt="Gemini Icon" /> Gemini-API
 
-A reverse-engineered asynchronous python wrapper for [Google Gemini](https://gemini.google.com) web chat (formerly Bard).
+A reverse-engineered asynchronous python wrapper for [Google Gemini](https://gemini.google.com) web app (formerly Bard).
 
 ## Features
 
+- **(WIP) Auto Cookie Management**
 - **ImageFx Support** - Supports retrieving images generated by ImageFx, Google's latest AI image generator.
 - **Extension Support** - Supports generating contents with [Gemini extensions](https://gemini.google.com/extensions) on, like YouTube and Gmail.
 - **Classified Outputs** - Auto categorizes texts, web images and AI generated images from the response.
@@ -33,7 +34,8 @@ A reverse-engineered asynchronous python wrapper for [Google Gemini](https://gem
 - [Authentication](#authentication)
 - [Usage](#usage)
  - [Initialization](#initialization)
- - [Generate contents from text inputs](#generate-contents-from-text-inputs)
+ - [Generate contents from text](#generate-contents-from-text)
+ - [Generate contents from image](#generate-contents-from-image)
  - [Conversations across multiple turns](#conversations-across-multiple-turns)
  - [Retrieve images in response](#retrieve-images-in-response)
  - [Generate images with ImageFx](#generate-images-with-imagefx)
@@ -56,6 +58,7 @@ pip install gemini_webapi
 - Click any request and copy cookie values of `__Secure-1PSID` and `__Secure-1PSIDTS`
 
 > [!TIP]
+>
 > `__Secure-1PSIDTS` could get expired frequently if <https://gemini.google.com> is kept opened in the browser after copying cookies. It's recommended to get cookies from a separate session (e.g. a new login in browser's private mode) if you are building a keep-alive service with this package.
 >
 > For more details, please refer to discussions in [issue #6](https://github.com/HanaokaYuzu/Gemini-API/issues/6)
@@ -82,9 +85,10 @@ asyncio.run(main())
 ```
 
 > [!TIP]
+>
 > `auto_close` and `close_delay` are optional arguments for automatically closing the client after a certain period of inactivity. This feature is disabled by default. In a keep-alive service like chatbot, it's recommended to set `auto_close` to `True` combined with reasonable seconds of `close_delay` for better resource management.
 
-### Generate contents from text inputs
+### Generate contents from text
 
 Ask a one-turn quick question by calling `GeminiClient.generate_content`.
 
@@ -97,8 +101,21 @@ asyncio.run(main())
 ```
 
 > [!TIP]
+>
 > Simply use `print(response)` to get the same output if you just want to see the response text
 
+### Generate contents from image
+
+Gemini supports image recognition and generate contents from image (currently only supports one image at a time). Optionally, you can pass image data in `bytes` or its path in `str` to `GeminiClient.generate_content` together with text prompt.
+
+```python
+async def main():
+ response = await client.generate_content("Describe the image", image="assets/banner.png")
+ print(response.text)
+
+asyncio.run(main())
+```
+
 ### Conversations across multiple turns
 
 If you want to keep conversation continuous, please use `GeminiClient.start_chat` to create a `ChatSession` object and send messages through it. The conversation history will be automatically handled and get updated after each turn.
@@ -113,6 +130,10 @@ async def main():
 asyncio.run(main())
 ```
 
+> [!TIP]
+>
+> Same as `GeminiClient.generate_content`, `ChatSession.send_message` also accepts `image` as an optional argument.
+
 ### Retrieve images in response
 
 Images in the API's output are stored as a list of `Image` objects. You can access the image title, URL, and description by calling `image.title`, `image.url` and `image.alt` respectively.
@@ -131,6 +152,7 @@ asyncio.run(main())
 In February 2022, Google introduced a new AI image generator called ImageFx and integrated it into Gemini. You can ask Gemini to generate images with ImageFx simply by natural language.
 
 > [!IMPORTANT]
+>
 > Google has some limitations on the image generation feature in Gemini, so its availability could be different per region/account. Here's a summary copied from [official documentation](https://support.google.com/gemini/answer/14286560) (as of February 15th, 2024):
 >
 > > Image generation in Gemini Apps is available in most countries, except in the European Economic Area (EEA), Switzerland, and the UK. It’s only available for **English prompts**.
@@ -149,6 +171,7 @@ asyncio.run(main())
 ```
 
 > [!NOTE]
+>
 > by default, when asked to send images (like the previous example), Gemini will send images fetched from web instead of generating images with AI model, unless you specifically require to "generate" images in your prompt. In this package, web images and generated images are treated differently as `WebImage` and `GeneratedImage`, and will be automatically categorized in the output.
 
 ### Save images to local files
@@ -167,6 +190,7 @@ asyncio.run(main())
 ### Generate contents with Gemini extensions
 
 > [!IMPORTANT]
+>
 > To access Gemini extensions in API, you must activate them on the [Gemini website](https://gemini.google.com/extensions) first. Same as image generation, Google also has limitations on the availability of Gemini extensions. Here's a summary copied from [official documentation](https://support.google.com/gemini/answer/13695044) (as of February 18th, 2024):
 >
 > > To use extensions in Gemini Apps:
@@ -191,6 +215,7 @@ asyncio.run(main())
 ```
 
 > [!NOTE]
+>
 > For the available regions limitation, it actually only requires your Google account's **preferred language** to be set to one of the three supported languages listed above. You can change your language settings [here](https://myaccount.google.com/language).
 
 ### Check and switch to other reply candidates

diff --git a/src/gemini_webapi/client.py b/src/gemini_webapi/client.py
@@ -9,6 +9,7 @@
 
 from .types import WebImage, GeneratedImage, Candidate, ModelOutput
 from .exceptions import APIError, AuthError, TimeoutError, GeminiError
+from .utils import upload_file
 from .constant import HEADERS
 
 
@@ -34,16 +35,16 @@ async def wrapper(self: "GeminiClient", *args, **kwargs):
 
 class GeminiClient:
  """
- Async httpx client interface for gemini.google.com
+ Async httpx client interface for gemini.google.com.
 
  Parameters
  ----------
  secure_1psid: `str`
- __Secure-1PSID cookie value
+ __Secure-1PSID cookie value.
  secure_1psidts: `str`, optional
- __Secure-1PSIDTS cookie value, some google accounts don't require this value, provide only if it's in the cookie list
+ __Secure-1PSIDTS cookie value, some google accounts don't require this value, provide only if it's in the cookie list.
  proxy: `dict`, optional
- Dict of proxies
+ Dict of proxies.
  """
 
  __slots__ = [
@@ -65,12 +66,12 @@ def __init__(
  ):
  self.cookies = {"__Secure-1PSID": secure_1psid}
  self.proxy = proxy
- self.client: AsyncClient | None = None
- self.access_token: Optional[str] = None
+ self.client: AsyncClient = None
+ self.access_token: str = None
  self.running: bool = False
  self.auto_close: bool = False
  self.close_delay: float = 300
- self.close_task: Task | None = None
+ self.close_task: Task = None
 
  if secure_1psidts:
  self.cookies["__Secure-1PSIDTS"] = secure_1psidts
@@ -84,12 +85,12 @@ async def init(
  Parameters
  ----------
  timeout: `float`, optional
- Request timeout of the client in seconds. Used to limit the max waiting time when sending a request
+ Request timeout of the client in seconds. Used to limit the max waiting time when sending a request.
  auto_close: `bool`, optional
  If `True`, the client will close connections and clear resource usage after a certain period
- of inactivity. Useful for keep-alive services
+ of inactivity. Useful for keep-alive services.
  close_delay: `float`, optional
- Time to wait before auto-closing the client in seconds. Effective only if `auto_close` is `True`
+ Time to wait before auto-closing the client in seconds. Effective only if `auto_close` is `True`.
  """
  try:
  self.client = AsyncClient(
@@ -132,7 +133,7 @@ async def close(self, delay: float = 0) -> None:
  Parameters
  ----------
  delay: `float`, optional
- Time to wait before closing the client in seconds
+ Time to wait before closing the client in seconds.
  """
  if delay:
  await asyncio.sleep(delay)
@@ -155,23 +156,28 @@ async def reset_close_task(self) -> None:
 
  @running
  async def generate_content(
- self, prompt: str, chat: Optional["ChatSession"] = None
+ self,
+ prompt: str,
+ image: Optional[bytes | str] = None,
+ chat: Optional["ChatSession"] = None,
  ) -> ModelOutput:
  """
  Generates contents with prompt.
 
  Parameters
  ----------
  prompt: `str`
- Prompt provided by user
+ Prompt provided by user.
+ image: `bytes` | `str`, optional
+ File data in bytes, or path to the image file to be sent together with the prompt.
  chat: `ChatSession`, optional
- Chat data to retrieve conversation history. If None, will automatically generate a new chat id when sending post request
+ Chat data to retrieve conversation history. If None, will automatically generate a new chat id when sending post request.
 
  Returns
  -------
  :class:`ModelOutput`
  Output data from gemini.google.com, use `ModelOutput.text` to get the default text reply, `ModelOutput.images` to get a list
- of images in the default reply, `ModelOutput.candidates` to get a list of all answer candidates in the output
+ of images in the default reply, `ModelOutput.candidates` to get a list of all answer candidates in the output.
  """
  assert prompt, "Prompt cannot be empty."
 
@@ -184,7 +190,23 @@ async def generate_content(
  data={
  "at": self.access_token,
  "f.req": json.dumps(
- [None, json.dumps([[prompt], None, chat and chat.metadata])]
+ [
+ None,
+ json.dumps(
+ [
+ image
+ and [
+ prompt,
+ 0,
+ None,
+ [[[await upload_file(image), 1]]],
+ ]
+ or [prompt],
+ None,
+ chat and chat.metadata,
+ ]
+ ),
+ ]
  ),
  },
  )
@@ -280,7 +302,7 @@ def start_chat(self, **kwargs) -> "ChatSession":
  Returns
  -------
  :class:`ChatSession`
- Empty chat object for retrieving conversation history
+ Empty chat object for retrieving conversation history.
  """
  return ChatSession(geminiclient=self, **kwargs)
 
@@ -292,15 +314,15 @@ class ChatSession:
  Parameters
  ----------
  geminiclient: `GeminiClient`
- Async httpx client interface for gemini.google.com
+ Async httpx client interface for gemini.google.com.
  metadata: `list[str]`, optional
- List of chat metadata `[cid, rid, rcid]`, can be shorter than 3 elements, like `[cid, rid]` or `[cid]` only
+ List of chat metadata `[cid, rid, rcid]`, can be shorter than 3 elements, like `[cid, rid]` or `[cid]` only.
  cid: `str`, optional
- Chat id, if provided together with metadata, will override the first value in it
+ Chat id, if provided together with metadata, will override the first value in it.
  rid: `str`, optional
- Reply id, if provided together with metadata, will override the second value in it
+ Reply id, if provided together with metadata, will override the second value in it.
  rcid: `str`, optional
- Reply candidate id, if provided together with metadata, will override the third value in it
+ Reply candidate id, if provided together with metadata, will override the third value in it.
  """
 
  # @properties needn't have their slots pre-defined
@@ -339,23 +361,29 @@ def __setattr__(self, name: str, value: Any) -> None:
  self.metadata = value.metadata
  self.rcid = value.rcid
 
- async def send_message(self, prompt: str) -> ModelOutput:
+ async def send_message(
+ self, prompt: str, image: Optional[bytes | str] = None
+ ) -> ModelOutput:
  """
  Generates contents with prompt.
- Use as a shortcut for `GeminiClient.generate_content(prompt, self)`.
+ Use as a shortcut for `GeminiClient.generate_content(prompt, image, self)`.
 
  Parameters
  ----------
  prompt: `str`
- Prompt provided by user
+ Prompt provided by user.
+ image: `bytes` | `str`, optional
+ File data in bytes, or path to the image file to be sent together with the prompt.
 
  Returns
  -------
  :class:`ModelOutput`
  Output data from gemini.google.com, use `ModelOutput.text` to get the default text reply, `ModelOutput.images` to get a list
- of images in the default reply, `ModelOutput.candidates` to get a list of all answer candidates in the output
+ of images in the default reply, `ModelOutput.candidates` to get a list of all answer candidates in the output.
  """
- return await self.geminiclient.generate_content(prompt, self)
+ return await self.geminiclient.generate_content(
+ prompt=prompt, image=image, chat=self
+ )
 
  def choose_candidate(self, index: int) -> ModelOutput:
  """
@@ -364,7 +392,12 @@ def choose_candidate(self, index: int) -> ModelOutput:
  Parameters
  ----------
  index: `int`
- Index of the candidate to choose, starting from 0
+ Index of the candidate to choose, starting from 0.
+
+ Returns
+ -------
+ :class:`ModelOutput`
+ Output data of the chosen candidate.
  """
  if not self.last_output:
  raise ValueError("No previous output data found in this chat session.")

diff --git a/src/gemini_webapi/constant.py b/src/gemini_webapi/constant.py
@@ -6,3 +6,5 @@
  "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
  "X-Same-Domain": "1",
 }
+
+UPLOAD_PUSHID = "feeds/mcudyrk2a4khkz"
diff --git a/src/gemini_webapi/utils.py b/src/gemini_webapi/utils.py
@@ -0,0 +1,41 @@
+from httpx import AsyncClient
+from pydantic import validate_call
+
+from .constant import UPLOAD_PUSHID
+
+
+@validate_call
+async def upload_file(file: bytes | str) -> str:
+ """
+ Upload a file to Google's server and return its identifier.
+
+ Parameters
+ ----------
+ file : `bytes` | `str`
+ File data in bytes, or path to the file to be uploaded.
+
+ Returns
+ -------
+ `str`
+ Identifier of the uploaded file.
+ E.g. "/contrib_service/ttl_1d/1709764705i7wdlyx3mdzndme3a767pluckv4flj"
+
+ Raises
+ ------
+ `httpx.HTTPStatusError`
+ If the upload request failed.
+ """
+
+ if isinstance(file, str):
+ with open(file, "rb") as f:
+ file = f.read()
+
+ async with AsyncClient() as client:
+ response = await client.post(
+ url="https://content-push.googleapis.com/upload/",
+ headers={"Push-ID": UPLOAD_PUSHID},
+ files={"file": file},
+ follow_redirects=True,
+ )
+ response.raise_for_status()
+ return response.text