Deep research #2

0xjunhao · 2025-08-09T19:21:06Z

No description provided.

0xjunhao · 2025-08-09T19:21:21Z

/gemini review all changes

gemini-code-assist

Summary of Changes

Hello @0xjunhao, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've introduced a powerful "deep research" system that enables the application to perform iterative, AI-driven web searches and reasoning to answer complex questions. This new capability is seamlessly integrated into our existing evaluation framework, allowing us to test its performance. Additionally, I've made minor adjustments to thread management and improved the robustness of data loading for our evaluation datasets.

Highlights

Implemented a "Deep Research" Framework: I've added a new sampler/deep_research.py module that orchestrates multi-step research. This includes fetching content from URLs (including PDFs), summarizing information, performing iterative reasoning, and generating concise answers based on gathered data.
Integrated Ubicloud Models and Deep Research Samplers: I've introduced UbicloudCompletionSampler and DeepRsearchCompletionSampler to allow our system to leverage Ubicloud-hosted AI models and the new deep research pipeline directly within our sampling and evaluation infrastructure.
Updated Evaluation Configuration: The simple_evals.py script now includes the new samplers and switches the default grading model to a Ubicloud-hosted llama-3-3-70b-turbo. The evaluation scope has been temporarily narrowed to the simpleqa dataset to focus testing on the new features.
Enhanced Data Loading for SimpleQA: I've refactored the simpleqa_eval.py to use the requests library for fetching the test dataset CSV, improving the reliability and handling of network operations during data retrieval.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

0xjunhao · 2025-08-09T19:21:55Z

/gemini what is deep research?

gemini-code-assist

Code Review

This pull request introduces a significant new feature: a "deep research" agent capable of using search engines and language models to answer questions. It also adds new samplers for this agent and for Ubicloud endpoints. The changes are extensive and add valuable functionality.

My review focuses on improving the robustness, maintainability, and correctness of the new code. Key findings include a potential infinite recursion bug, incorrect type hints, and some unused code and parameters that could be cleaned up. I've also pointed out a bug in command-line argument parsing and some areas where logging could be improved over print statements.

gemini-code-assist · 2025-08-09T19:23:09Z

sampler/deep_research.py

+def deep_dive(state: ResearchState, search_engine: str) -> ResearchState:
+    """Deep dive into a few areas."""
+    subtopics = inference([
+        create_system_message("""
+Generate 1 to 2 search queries on the given topic.
+Return a JSON array of strings.
+Each string should be a well-structured search engine query.
+At least one of them should explicitly ask for trustworthy sources, such as wiki, gov, bookforum, or fandom."""),
+        create_user_message({
+            "topic": state["topic"],
+            "thinking": state.get("thinking", "")
+        })
+    ], InferenceMode.JSON)
+    if subtopics is None:
+        logger.warning("No JSON object found. Retrying deep_dive...")
+        return deep_dive(state, search_engine)
+    for subtopic in subtopics:
+        state = gather_information(subtopic, state, search_engine)
+    return state


This function uses recursion to retry if it fails to get subtopics from the inference call. However, there is no base case to stop the recursion after a certain number of retries. This can lead to a RecursionError and a stack overflow if the inference call consistently returns None. This is a critical issue that could crash the application.

def deep_dive(state: ResearchState, search_engine: str, _retries_left: int = 3) -> ResearchState: """Deep dive into a few areas.""" if _retries_left == 0: logger.error("Failed to get subtopics after multiple retries.") return state subtopics = inference([ create_system_message(""" Generate 1 to 2 search queries on the given topic. Return a JSON array of strings. Each string should be a well-structured search engine query. At least one of them should explicitly ask for trustworthy sources, such as wiki, gov, bookforum, or fandom."""), create_user_message({ "topic": state["topic"], "thinking": state.get("thinking", "") }) ], InferenceMode.JSON) if subtopics is None: logger.warning(f"No JSON object found. Retrying deep_dive... ({4 - _retries_left}/3)") return deep_dive(state, search_engine, _retries_left - 1) for subtopic in subtopics: state = gather_information(subtopic, state, search_engine) return state

gemini-code-assist · 2025-08-09T19:23:10Z

sampler/deep_research.py

+
+
+def deep_research(
+        topic: str, depth: int, search_engine: str, initial_state: Optional[dict], debug: bool) -> None:


The function is type-hinted to return None, but it actually returns state["answer"] on line 397. This type mismatch can cause issues with static analysis tools and makes the code harder to understand and maintain.

Suggested change

topic: str, depth: int, search_engine: str, initial_state: Optional[dict], debug: bool) -> None:

topic: str, depth: int, search_engine: str, initial_state: Optional[dict], debug: bool) -> str:

gemini-code-assist · 2025-08-09T19:23:10Z

sampler/deep_research.py

+                        help="Path to a JSON file to resume state from")
+    parser.add_argument("--search_engine", type=str, default="duckduckgo",
+                        help="The search engine to use, either DuckDuckGo or Tavily.")
+    parser.add_argument("--debug", type=bool, default=False, help="Run in debug mode")


Using type=bool for command-line flags in argparse is problematic and doesn't work as one might expect. For instance, --debug false would still result in True. The standard and recommended way to handle boolean flags is to use action="store_true".

Suggested change

parser.add_argument("--debug", type=bool, default=False, help="Run in debug mode")

parser.add_argument("--debug", action="store_true", help="Run in debug mode")

gemini-code-assist · 2025-08-09T19:23:10Z

sampler/deep_research_sampler.py

+    def __init__(
+        self,
+        model: str = "llama-3-3-70b-turbo",
+        system_message: str | None = None,
+        temperature: float = 0.5,
+        max_tokens: int = 1024,
+        depth: int = 3,
+    ):


The model, temperature, and max_tokens parameters are initialized but never used. The deep_research function that this sampler calls has its own hardcoded model configurations and does not accept these parameters. This makes the sampler's API misleading and could cause confusion for developers using this class. Please remove these unused parameters and their corresponding instance attributes.

gemini-code-assist · 2025-08-09T19:23:10Z

sampler/deep_research_sampler.py

+)
+
+
+class DeepRsearchCompletionSampler(SamplerBase):


There is a typo in the class name DeepRsearchCompletionSampler. It should be DeepResearchCompletionSampler. This should be corrected here and in any files where it is imported, such as simple_evals.py.

Suggested change

class DeepRsearchCompletionSampler(SamplerBase):

class DeepResearchCompletionSampler(SamplerBase):

gemini-code-assist · 2025-08-09T19:23:10Z

sampler/deep_research_sampler.py

+    def _handle_image(
+        self, image: str, encoding: str = "base64", format: str = "png", fovea: int = 768
+    ):
+        new_image = {
+            "type": "image_url",
+            "image_url": {
+                "url": f"data:image/{format};{encoding},{image}",
+            },
+        }
+        return new_image
+
+    def _handle_text(self, text: str):
+        return {"type": "text", "text": text}
+


These methods are defined but are not used anywhere in the class. This represents dead code and should be removed to improve maintainability.

gemini-code-assist · 2025-08-09T19:23:10Z

sampler/deep_research_sampler.py

+                print("Bad Request Error", e)
+                return ""
+            except Exception as e:
+                exception_backoff = 2**trial  # expontial back off
+                print(
+                    f"Rate limit exception so wait and retry {trial} after {exception_backoff} sec",
+                    e,
+                )


Using print for logging errors and status messages is not ideal for production code. It's better to use the logging module, which allows for configurable log levels, formats, and destinations. Please replace these print statements with appropriate logger calls (e.g., logger.error, logger.warning).

gemini-code-assist · 2025-08-09T19:23:10Z

sampler/ubicloud_sampler.py

+    def _handle_image(
+        self, image: str, encoding: str = "base64", format: str = "png", fovea: int = 768
+    ):
+        new_image = {
+            "type": "image_url",
+            "image_url": {
+                "url": f"data:image/{format};{encoding},{image}",
+            },
+        }
+        return new_image
+
+    def _handle_text(self, text: str):
+        return {"type": "text", "text": text}
+


These helper methods are not used within the class. They should be removed to avoid dead code.

gemini-code-assist · 2025-08-09T19:23:10Z

sampler/ubicloud_sampler.py

+            except openai.BadRequestError as e:
+                print("Bad Request Error", e)
+                return ""
+            except Exception as e:
+                exception_backoff = 2**trial  # expontial back off
+                print(
+                    f"Rate limit exception so wait and retry {trial} after {exception_backoff} sec",
+                    e,
+                )


The use of print for logging errors and retry attempts should be replaced with the logging module for better control over log levels and output in a production environment.

gemini-code-assist · 2025-08-09T19:23:10Z

simple_evals.py

+        # for eval_name in ["simpleqa", "mmlu", "math", "gpqa", "mgsm", "drop", "humaneval"]
+        for eval_name in ["simpleqa"]


The list of evaluations to run is hardcoded to just ["simpleqa"], with the full list commented out. This appears to be for debugging purposes. This should be reverted to the full list before merging to ensure all evaluations are run.

Suggested change

# for eval_name in ["simpleqa", "mmlu", "math", "gpqa", "mgsm", "drop", "humaneval"]

for eval_name in ["simpleqa"]

for eval_name in ["simpleqa", "mmlu", "math", "gpqa", "mgsm", "drop", "humaneval"]

gemini-code-assist

Code Review

This pull request introduces a significant new feature for performing deep research using a combination of web search and large language models. The implementation is comprehensive, covering everything from data fetching and summarization to report generation. My review focuses on improving the robustness, maintainability, and clarity of the new code. I've identified a few critical issues, including a potential for unbounded recursion and a sampler class with misleading and unused code. I have also provided suggestions to address incorrect type hints, argument parsing, and brittle implementation details. Addressing these points will make the new feature more reliable and easier to maintain.

gemini-code-assist · 2025-08-09T19:24:50Z

sampler/deep_research.py

+    if subtopics is None:
+        logger.warning("No JSON object found. Retrying deep_dive...")
+        return deep_dive(state, search_engine)


The recursive call to deep_dive when subtopics is None lacks a retry limit. If the inference call consistently fails to produce a valid JSON, this will lead to unbounded recursion and eventually a RecursionError. To make this more robust, you should introduce a retry mechanism with a maximum number of attempts.

gemini-code-assist · 2025-08-09T19:24:50Z

sampler/deep_research_sampler.py

+    def __init__(
+        self,
+        model: str = "llama-3-3-70b-turbo",
+        system_message: str | None = None,
+        temperature: float = 0.5,
+        max_tokens: int = 1024,
+        depth: int = 3,
+    ):
+        self.api_key_name = "OPENAI_API_KEY"
+        self.client = OpenAI(base_url=f"https://{model}.ai.ubicloud.com/v1/")
+        # using api_key=os.environ.get("OPENAI_API_KEY")  # please set your API_KEY
+        self.model = model
+        self.system_message = system_message
+        self.temperature = temperature
+        self.max_tokens = max_tokens
+        self.image_format = "url"
+        self.depth = depth


This class's __init__ method contains a significant amount of unused and misleading code. It accepts parameters (model, temperature, max_tokens) that are ignored in __call__. The self.client attribute is also initialized but never used. This makes the class difficult to understand and use correctly. The code should be refactored to remove all unused elements.

Suggested change

def __init__(

self,

model: str = "llama-3-3-70b-turbo",

system_message: str | None = None,

temperature: float = 0.5,

max_tokens: int = 1024,

depth: int = 3,

):

self.api_key_name = "OPENAI_API_KEY"

self.client = OpenAI(base_url=f"https://{model}.ai.ubicloud.com/v1/")

# using api_key=os.environ.get("OPENAI_API_KEY") # please set your API_KEY

self.model = model

self.system_message = system_message

self.temperature = temperature

self.max_tokens = max_tokens

self.image_format = "url"

self.depth = depth

def __init__(

self,

system_message: str | None = None,

depth: int = 3,

):

self.api_key_name = "OPENAI_API_KEY"

self.client = OpenAI(base_url=f"https://{model}.ai.ubicloud.com/v1/")

# using api_key=os.environ.get("OPENAI_API_KEY") # please set your API_KEY

self.model = model

self.system_message = system_message

self.depth = depth

gemini-code-assist · 2025-08-09T19:24:50Z

sampler/deep_research.py

+
+    def get_model(self) -> str:
+        """Retrieve the corresponding model name for the inference mode."""
+        return globals().get(f"{self.name}_MODEL")


Using globals().get(...) to retrieve model names is brittle and relies on a naming convention for global variables. This makes the code harder to understand and maintain, as the connection between the InferenceMode enum and the model strings is implicit. A more robust and explicit approach would be to use a dictionary to map enum members to model names.

return { "SUMMARIZATION": SUMMARIZATION_MODEL, "REASONING": REASONING_MODEL, "WRITING": WRITING_MODEL, "JSON": JSON_MODEL, }[self.name]

gemini-code-assist · 2025-08-09T19:24:51Z

sampler/deep_research.py

+def deep_research(
+        topic: str, depth: int, search_engine: str, initial_state: Optional[dict], debug: bool) -> None:


The function deep_research is type-hinted to return None, but it actually returns state["answer"] which is a string. This mismatch can be misleading for developers using this function and can cause issues with static analysis tools. The return type hint should be updated to str.

def deep_research( topic: str, depth: int, search_engine: str, initial_state: Optional[dict], debug: bool) -> str:

gemini-code-assist · 2025-08-09T19:24:51Z

sampler/deep_research.py

+                        help="Path to a JSON file to resume state from")
+    parser.add_argument("--search_engine", type=str, default="duckduckgo",
+                        help="The search engine to use, either DuckDuckGo or Tavily.")
+    parser.add_argument("--debug", type=bool, default=False, help="Run in debug mode")


Using type=bool with argparse does not create a boolean flag as one might expect. For example, --debug false will be interpreted as True. The correct way to handle boolean flags is to use action="store_true".

parser.add_argument("--debug", action="store_true", help="Run in debug mode")

gemini-code-assist · 2025-08-09T19:24:51Z

sampler/deep_research_sampler.py

+)
+
+
+class DeepRsearchCompletionSampler(SamplerBase):


There is a typo in the class name DeepRsearchCompletionSampler. It should be DeepResearchCompletionSampler. This should be corrected for clarity and consistency.

Suggested change

class DeepRsearchCompletionSampler(SamplerBase):

class DeepResearchCompletionSampler(SamplerBase):

gemini-code-assist · 2025-08-09T19:24:51Z

sampler/deep_research_sampler.py

+                return deep_research.deep_research(
+                    message_list[0]["content"], depth=self.depth, search_engine="tavily", initial_state=None, debug=False)


The code accesses message_list[0] without checking if message_list is empty. If an empty list is passed, this will raise an IndexError. You should add a check to handle this case gracefully.

Suggested change

return deep_research.deep_research(

message_list[0]["content"], depth=self.depth, search_engine="tavily", initial_state=None, debug=False)

if not message_list:

return ""

return deep_research.deep_research(

message_list[0]["content"], depth=self.depth, search_engine="tavily", initial_state=None, debug=False)

gemini-code-assist · 2025-08-09T19:25:01Z

The term "deep research" in this pull request refers to an automated process implemented in sampler/deep_research.py that leverages large language models (LLMs) and web search capabilities to comprehensively investigate a given topic.

Specifically, the deep_research function performs the following steps:

Information Gathering: It starts by performing web searches related to the initial topic.
Content Summarization: It fetches content from the search results (including handling PDFs) and summarizes them using an LLM.
Iterative Deep Dive: For a specified depth, it iteratively generates new search queries (subtopics) based on the current understanding and then gathers and summarizes information for these subtopics. This allows for a more thorough exploration of related areas.
Reasoning and Synthesis: An LLM is used to "think" about and synthesize all the gathered information, prioritizing trustworthy sources.
Answer Generation: Finally, it generates a concise and accurate answer to the original research topic based on the accumulated knowledge.

In essence, it's an automated system designed to simulate a researcher's process of searching, reading, synthesizing, and answering questions on a topic, going beyond a single search query to explore related concepts in depth.

0xjunhao added 20 commits March 12, 2025 19:13

Adapt for ubicloud.

3fea96f

Emphasize wiki.

0bd4b82

Emphasize wiki.

569b7e7

.

b40b68a

.

092542f

.

3c52468

.

44e5c6a

.

7370de8

.

0c60729

.

9faca65

.

df519eb

.

425faf2

.

6af6082

.

bc6d6f6

.

5006d10

.

fd3a2d2

.

7feb0f6

.

97329bd

.

44ff62f

.

ebdc86e

gemini-code-assist bot reviewed Aug 9, 2025

View reviewed changes



		def deep_research(
		topic: str, depth: int, search_engine: str, initial_state: Optional[dict], debug: bool) -> None:

	parser.add_argument("--debug", type=bool, default=False, help="Run in debug mode")
	parser.add_argument("--debug", action="store_true", help="Run in debug mode")

	class DeepRsearchCompletionSampler(SamplerBase):
	class DeepResearchCompletionSampler(SamplerBase):

		# for eval_name in ["simpleqa", "mmlu", "math", "gpqa", "mgsm", "drop", "humaneval"]
		for eval_name in ["simpleqa"]

	# for eval_name in ["simpleqa", "mmlu", "math", "gpqa", "mgsm", "drop", "humaneval"]
	for eval_name in ["simpleqa"]
	for eval_name in ["simpleqa", "mmlu", "math", "gpqa", "mgsm", "drop", "humaneval"]

		return deep_research.deep_research(
		message_list[0]["content"], depth=self.depth, search_engine="tavily", initial_state=None, debug=False)

Deep research #2

Are you sure you want to change the base?

Deep research #2

Uh oh!

Conversation

0xjunhao commented Aug 9, 2025

Uh oh!

0xjunhao commented Aug 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

0xjunhao commented Aug 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot commented Aug 9, 2025

Uh oh!

Reviewers

Assignees