Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF and DOCX support in Write File - Feature Improvement, close #548 #1125

Merged
merged 29 commits into from
Sep 6, 2023
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
5e80166
Added functions to write various file types and a file handler too
Arkajit-Datta Jul 21, 2023
2738a9a
FileManager updated to handle and save HTMLs.
Arkajit-Datta Jul 26, 2023
0fdeff9
adding PDF + DOCX support to save images
Arkajit-Datta Jul 28, 2023
608f738
Merge branch 'dev' into feature/write-file-pdf-docx-support
Arkajit-Datta Jul 28, 2023
a0ca125
Added Wkhtmltopdf package installation run commands in docker
Arkajit-Datta Jul 30, 2023
cccd599
Added get_all_responses feature for extractng the response for partic…
Arkajit-Datta Jul 30, 2023
285a132
Added Image embedding feature, this will extract and embed the images…
Arkajit-Datta Jul 30, 2023
610bd20
Merge branch 'feature/write-file-pdf-docx-support' into dev
Arkajit-Datta Jul 30, 2023
74c938c
renaming functions and refactoring
Arkajit-Datta Aug 10, 2023
1c4bbba
renaming functions and refactoring
Arkajit-Datta Aug 10, 2023
0fa209f
Merge branch 'feature/write-file-pdf-docx-support' into dev
Arkajit-Datta Aug 10, 2023
5803171
Update Dockerfile
Fluder-Paradyne Aug 10, 2023
31b20a6
removing unsused classmethods
Arkajit-Datta Aug 20, 2023
9e2b4d6
Finding generated images and attached files in the write tool. Images…
Arkajit-Datta Aug 20, 2023
eb430e6
Merge branch 'dev' into dev
Arkajit-Datta Aug 23, 2023
a24df27
Adding the filename and paths to the Resource Manager S3 storage
Arkajit-Datta Aug 23, 2023
0dc6fa6
Code Cleanup
Arkajit-Datta Aug 23, 2023
3c0a578
added logger: Fix for the failing TEST
Arkajit-Datta Aug 24, 2023
4993251
Exceptions dir created to store custom exceptions
Arkajit-Datta Aug 26, 2023
677c28e
review changes
Arkajit-Datta Aug 26, 2023
6606919
updated the name of exception file
Arkajit-Datta Sep 4, 2023
91adecf
update exception import
Arkajit-Datta Sep 4, 2023
224b227
Added unit test cases for file manager
Arkajit-Datta Sep 4, 2023
82ce44b
Added CSV test file UT case
Arkajit-Datta Sep 4, 2023
7cf317d
code cleanup
Arkajit-Datta Sep 4, 2023
6f4fdf5
reverting the few unecessary changes
Arkajit-Datta Sep 6, 2023
694477d
Merge branch 'develop' into dev
Arkajit-Datta Sep 6, 2023
7c834c0
removing changes
Arkajit-Datta Sep 6, 2023
6275fba
Merge branch 'dev' into dev
luciferlinx101 Sep 6, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ FROM python:3.10-slim-bullseye AS compile-image
WORKDIR /app
Arkajit-Datta marked this conversation as resolved.
Show resolved Hide resolved

RUN apt-get update && \
apt-get install --no-install-recommends -y wget libpq-dev gcc g++ python3-dev && \
apt-get install --no-install-recommends -y wget libpq-dev gcc g++ python3-dev wkhtmltopdf && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

Expand All @@ -24,7 +24,7 @@ FROM python:3.10-slim-bullseye AS build-image
WORKDIR /app

RUN apt-get update && \
apt-get install --no-install-recommends -y libpq-dev && \
apt-get install --no-install-recommends -y libpq-dev wkhtmltopdf && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

Expand Down
2 changes: 2 additions & 0 deletions DockerfileCelery
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ FROM python:3.9
WORKDIR /app

#RUN apt-get update && apt-get install --no-install-recommends -y git wget libpq-dev gcc python3-dev && pip install psycopg2
RUN apt-get update && apt-get install -y wkhtmltopdf

RUN pip install --upgrade pip

COPY requirements.txt .
Expand Down
3 changes: 3 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -154,3 +154,6 @@ html2text==2020.1.16
duckduckgo-search==3.8.3
google-generativeai==0.1.0
unstructured==0.8.1
beautifulsoup4==4.12.2
pdfkit==1.0.0
htmldocx==0.0.6
Empty file added superagi/exceptions/__init__.py
Empty file.
10 changes: 10 additions & 0 deletions superagi/exceptions/file_exceptions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@

class UnsupportedFileTypeError(Exception):
def __init__(self, file_name: str, supported_types: list):
message = f"Unsupported file type for '{file_name}'. Supported types are: {', '.join(supported_types)}"
super().__init__(message)

class FileNotCreatedError(Exception):
def __init__(self, file_name: str):
message = f"Failed to create the file '{file_name}'."
super().__init__(message)
3 changes: 2 additions & 1 deletion superagi/models/agent_execution_feed.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def get_last_tool_response(cls, session: Session, agent_execution_id: int, tool_
if agent_execution_feed.feed.startswith("Tool"):
return agent_execution_feed.feed
return ""

@classmethod
Arkajit-Datta marked this conversation as resolved.
Show resolved Hide resolved
def fetch_agent_execution_feeds(cls, session, agent_execution_id: int):
agent_execution = AgentExecution.find_by_id(session, agent_execution_id)
Expand All @@ -66,3 +66,4 @@ def fetch_agent_execution_feeds(cls, session, agent_execution_id: int):
return agent_feeds
else:
return agent_feeds[2:]

104 changes: 86 additions & 18 deletions superagi/resource_manager/file_manager.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,25 @@
import csv
from sqlalchemy.orm import Session
from superagi.config.config import get_config
import os

from superagi.config.config import get_config
from superagi.helper.resource_helper import ResourceHelper
from superagi.helper.s3_helper import S3Helper
from superagi.lib.logger import logger
from superagi.models.agent import Agent
from superagi.models.agent_execution import AgentExecution
from superagi.types.storage_types import StorageType
from superagi.exceptions.file_exceptions import UnsupportedFileTypeError, FileNotCreatedError

import pdfkit
from htmldocx import HtmlToDocx
luciferlinx101 marked this conversation as resolved.
Show resolved Hide resolved

class FileManager:
def __init__(self, session: Session, agent_id: int = None, agent_execution_id: int = None):
self.session = session
self.agent_id = agent_id
self.agent_execution_id = agent_execution_id

def write_binary_file(self, file_name: str, data):
if self.agent_id is not None:
final_path = ResourceHelper.get_agent_write_resource_path(file_name,
Expand All @@ -32,6 +39,7 @@ def write_binary_file(self, file_name: str, data):
return f"Binary {file_name} saved successfully"
except Exception as err:
return f"Error write_binary_file: {err}"

def write_to_s3(self, file_name, final_path):
with open(final_path, 'rb') as img:
resource = ResourceHelper.make_written_file_resource(file_name=file_name,
Expand All @@ -55,25 +63,16 @@ def write_file(self, file_name: str, content):
self.agent_execution_id))
else:
final_path = ResourceHelper.get_resource_path(file_name)

try:
with open(final_path, mode="w") as file:
file.write(content)
file.close()
self.write_to_s3(file_name, final_path)
logger.info(f"{file_name} - File written successfully")
return f"{file_name} - File written successfully"
self.save_file_by_type(file_name=file_name, file_path=final_path, content=content)
except Exception as err:
return f"Error write_file: {err}"
def write_csv_file(self, file_name: str, csv_data):
if self.agent_id is not None:
final_path = ResourceHelper.get_agent_write_resource_path(file_name,
agent=Agent.get_agent_from_id(self.session,
self.agent_id),
agent_execution=AgentExecution
.get_agent_execution_from_id(self.session,
self.agent_execution_id))
else:
final_path = ResourceHelper.get_resource_path(file_name)

logger.info(f"{file_name} - File written successfully")
return f"{file_name} - File written successfully"

def write_csv_file(self, file_name: str, final_path: str, csv_data) -> str:
try:
with open(final_path, mode="w", newline="") as file:
writer = csv.writer(file, lineterminator="\n")
Expand All @@ -82,15 +81,63 @@ def write_csv_file(self, file_name: str, csv_data):
logger.info(f"{file_name} - File written successfully")
return f"{file_name} - File written successfully"
except Exception as err:
return f"Error write_csv_file: {err}"
raise FileNotCreatedError(file_name=file_name) from err

def write_pdf_file(self, file_name: str ,file_path: str, content):
# Saving the HTML file
html_file_path = f"{file_path[:-4]}.html"
self.write_txt_file(file_name=html_file_path.split('/')[-1], file_path=html_file_path, content=content)

# Convert HTML file to a PDF file
try:
options = {
'quiet': '',
'page-size': 'Letter',
'margin-top': '0.75in',
'margin-right': '0.75in',
'margin-bottom': '0.75in',
'margin-left': '0.75in',
'enable-local-file-access': ''
}
config = pdfkit.configuration(wkhtmltopdf = "/usr/bin/wkhtmltopdf")
pdfkit.from_file(html_file_path, file_path, options = options, configuration = config)
self.write_to_s3(file_name, file_path)
return file_path

except Exception as err:
raise FileNotCreatedError(file_name=file_name) from err

def write_docx_file(self, file_name: str ,file_path: str, content):
# Saving the HTML file
html_file_path = f"{file_path[:-4]}.html"
self.write_txt_file(file_name=html_file_path.split('/')[-1], file_path=html_file_path, content=content)

# Convert HTML file to a DOCx file
try:
new_parser = HtmlToDocx()
new_parser.parse_html_file(html_file_path, file_path)
self.write_to_s3(file_name, file_path)
return file_path
except Exception as err:
raise FileNotCreatedError(file_name=file_name) from err

def write_txt_file(self, file_name: str ,file_path: str, content) -> str:
try:
with open(file_path, mode="w") as file:
file.write(content)
file.close()
self.write_to_s3(file_name, file_path)
return file_path
except Exception as err:
raise FileNotCreatedError(file_name=file_name) from err

def get_agent_resource_path(self, file_name: str):
return ResourceHelper.get_agent_write_resource_path(file_name, agent=Agent.get_agent_from_id(self.session,
self.agent_id),
agent_execution=AgentExecution
.get_agent_execution_from_id(self.session,
self.agent_execution_id))

def read_file(self, file_name: str):
if self.agent_id is not None:
final_path = self.get_agent_resource_path(file_name)
Expand All @@ -104,6 +151,7 @@ def read_file(self, file_name: str):
return content
except Exception as err:
return f"Error while reading file {file_name}: {err}"

def get_files(self):
"""
Gets all file names generated by the CodingTool.
Expand All @@ -122,3 +170,23 @@ def get_files(self):
logger.error(f"Error while accessing files in {final_path}: {err}")
files = []
return files

def save_file_by_type(self, file_name: str, file_path: str, content):

# Extract the file type from the file_name
file_type = file_name.split('.')[-1].lower()

# Dictionary to map file types to corresponding functions
file_type_handlers = {
'txt': self.write_txt_file,
'pdf': self.write_pdf_file,
'docx': self.write_docx_file,
'doc': self.write_docx_file,
'csv': self.write_csv_file,
'html': self.write_txt_file
# NOTE: Add more file types and corresponding functions as needed, These functions should be defined
}

if file_type not in file_type_handlers:
raise UnsupportedFileTypeError(file_name=file_name, supported_types=list(file_type_handlers))

4 changes: 4 additions & 0 deletions superagi/tools/file/prompts/add_images_to_html.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Now, you will be provided with few image path locations. You will have to attach the following images in appropriate locations inside the html code.
Remember to maintain the elegancy and styling of the User Interface generated. Make sure you attach all the images provided to you.

The relevant paths of the images are provided below:
7 changes: 7 additions & 0 deletions superagi/tools/file/prompts/content_to_html_prompt.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
You are an HTML code generating AI Agent. Your task is to generate a well formatted and well styled HTML file for a given content.
Remember to style the HTML beautifully, for which you can add the <style> element in the html itself. Make sure that the User Interface looks elegant to the user.
Make sure you have covered all of the content which is provided to you.

{embedding_image}.

Content to be used:`{content}`
Loading