[BUG] multimodal=True is not supported in Agent.kickoff

### Description

Setting `multimodal=True` is supposedly adding the `AddImageTool` to the agent's tools, however it is not the case when calling the kickoff method.

The current implementation could be patched quite simply by adding the tool at kickoff time if the parameter is true (it's literaly a one liner).

### Steps to Reproduce

See code snippet

### Expected behavior

We should observe an AddImage tool call

### Screenshots/Code snippets

```python
from crewai import Agent

agent = Agent(role="Image captioner", goal="caption images", backstory="You are used to caption images since you are a kid", multimodal=True)

image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Cat_November_2010-1a.jpg/960px-Cat_November_2010-1a.jpg"

result = agent.kickoff(f"What's in this image ? {image_url}")
print(result)
```

### Operating System

Ubuntu 20.04

### Python Version

3.10

### crewAI Version

1.5.0

### crewAI Tools Version

1.5.0

### Virtual Environment

Venv

### Evidence

`Agent.kickoff` code is quite explicit about the missing multimodal feature

### Possible Solution

Patch the kickoff method:

```python
from crewai.tools.agent_tools.add_image_tool import AddImageTool

def kickoff(...):
        if self.apps:
            platform_tools = self.get_platform_tools(self.apps)
            if platform_tools:
                self.tools.extend(platform_tools)
        if self.mcps:
            mcps = self.get_mcp_tools(self.mcps)
            if mcps:
                self.tools.extend(mcps)

        # PATCH HERE
        if self.multimodal:
               self.tools.extend(AddImageTool())
        # /PATCH HERE

        lite_agent = LiteAgent(
            id=self.id,
            role=self.role,
            goal=self.goal,
            backstory=self.backstory,
            llm=self.llm,
            tools=self.tools or [],
            max_iterations=self.max_iter,
            max_execution_time=self.max_execution_time,
            respect_context_window=self.respect_context_window,
            verbose=self.verbose,
            response_format=response_format,
            i18n=self.i18n,
            original_agent=self,
            guardrail=self.guardrail,
            guardrail_max_retries=self.guardrail_max_retries,
        )

        return lite_agent.kickoff(messages)
```

But tbh I don't really like doing this in the kickoff as it's not a pure method, if we call kickoff twice we'll have twice as much tools. It'd be better to just declare a method-scoped `tools` variable and feed it with the current tools, platform tools, mcp tools and multimodal tools

### Additional context

/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] multimodal=True is not supported in Agent.kickoff #3936

Description

Steps to Reproduce

Expected behavior

Screenshots/Code snippets

Operating System

Python Version

crewAI Version

crewAI Tools Version

Virtual Environment

Evidence

Possible Solution

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] multimodal=True is not supported in Agent.kickoff #3936

Description

Description

Steps to Reproduce

Expected behavior

Screenshots/Code snippets

Operating System

Python Version

crewAI Version

crewAI Tools Version

Virtual Environment

Evidence

Possible Solution

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions