⚡️ Speed up method BaseArangoService.get_records by 46%
#641
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 46% (0.46x) speedup for
BaseArangoService.get_recordsinbackend/python/app/connectors/services/base_arango_service.py⏱️ Runtime :
4.27 milliseconds→2.93 milliseconds(best of250runs)📝 Explanation and details
The optimized code achieves a 45% runtime improvement (4.27ms → 2.93ms) through several targeted micro-optimizations that reduce Python overhead in this database-heavy function:
Key optimizations applied:
Pre-computed repeated operations:
sort_order.upper()is calculated once and stored insort_order_upperrather than being called multiple times in f-string interpolations, eliminating redundant string operations.Set creation optimization:
permissions_set = set(permissions) if permissions else Noneavoids recreating the permissions set multiple times during intersection operations, reducing memory allocations.Iterator-based result extraction: Replaced
list(db.aql.execute(...))[0]withnext(db.aql.execute(...))for count and filters queries, eliminating unnecessary list construction for single-value results.Consolidated dictionary creation: Combined filter bind variable assignments using a shared
bind_commondictionary with unpacking (**bind_common), reducing dictionary creation overhead and improving memory locality.Optimized conditional assignments: Converted separate if-statements for filter bind vars into compact single-line conditionals, reducing Python bytecode execution.
Method reference caching: Stored
available_filters.setdefaultasaf_setdefaultto avoid repeated attribute lookups during the filter structure initialization.Why these optimizations work:
Test case performance: The optimizations show consistent benefits across all test scenarios - basic operations, edge cases with exception handling, concurrent execution, and large-scale record processing (up to 500 records). The 45% improvement applies uniformly since the optimizations target fundamental Python operations used throughout the function.
Note: Throughput remains unchanged at 71,250 ops/sec as this appears to be an async function where the runtime improvement doesn't directly translate to throughput gains, likely due to I/O bottlenecks in the database operations.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
import asyncio # used to run async functions
from unittest.mock import AsyncMock, MagicMock
import pytest # used for our unit tests
from app.connectors.services.base_arango_service import BaseArangoService
--- Function under test ---
(BaseArangoService.get_records is copied exactly as provided above)
For testing, we need to mock dependencies: logger, arango_client, config_service, kafka_service, and especially .db.aql.execute
We'll use MagicMock for sync dependencies and AsyncMock for async if needed.
class DummyLogger:
def init(self):
self.infos = []
self.errors = []
def info(self, msg):
self.infos.append(msg)
def error(self, msg):
self.errors.append(msg)
class DummyDbAql:
def init(self, main_result=None, count_result=None, filters_result=None):
self._main_result = main_result if main_result is not None else []
self._count_result = count_result if count_result is not None else [0]
self._filters_result = filters_result if filters_result is not None else [{}]
self.calls = []
def execute(self, query, bind_vars=None):
# Record the call for inspection
self.calls.append((query, bind_vars))
# Return based on query content
if "RETURN kbCount + connectorCount + uniqueNewPermissionCount" in query:
return self._count_result
elif "RETURN {" in query and "recordTypes:" in query:
return self._filters_result
else:
return self._main_result
class DummyDb:
def init(self, main_result=None, count_result=None, filters_result=None):
self.aql = DummyDbAql(main_result, count_result, filters_result)
class DummyArangoClient:
pass
class DummyConfigService:
pass
class DummyKafkaService:
pass
Helper to create a BaseArangoService instance with dummy dependencies
def make_service(main_result=None, count_result=None, filters_result=None):
logger = DummyLogger()
arango_client = DummyArangoClient()
config_service = DummyConfigService()
kafka_service = DummyKafkaService()
service = BaseArangoService(logger, arango_client, config_service, kafka_service)
service.db = DummyDb(main_result, count_result, filters_result)
return service
=======================
BASIC TEST CASES
=======================
@pytest.mark.asyncio
async def test_get_records_basic_returns_expected_tuple():
"""Test basic async/await behavior and tuple structure."""
# Setup: one record, count=1, filters with some values
main_result = [
{
"id": "rec1",
"recordName": "Doc1",
"recordType": "FILE",
"origin": "UPLOAD",
"connectorName": "KNOWLEDGE_BASE",
"permission": {"role": "OWNER", "type": "USER"},
"kb": {"id": "kb1", "name": "KnowledgeBase1"},
"fileRecord": {"id": "file1", "name": "File1.pdf"},
"mailRecord": None,
}
]
count_result = [1]
filters_result = [{
"recordTypes": ["FILE"],
"origins": ["UPLOAD"],
"connectors": ["KNOWLEDGE_BASE"],
"indexingStatus": ["INDEXED"],
"permissions": ["OWNER"]
}]
service = make_service(main_result, count_result, filters_result)
# Await the async function
records, count, available_filters = await service.get_records(
user_id="user1", org_id="org1", skip=0, limit=10,
search=None, record_types=None, origins=None, connectors=None,
indexing_status=None, permissions=None, date_from=None, date_to=None,
sort_by="createdAtTimestamp", sort_order="asc", source="all"
)
@pytest.mark.asyncio
async def test_get_records_basic_empty_results():
"""Test that function returns empty results if no records found."""
service = make_service(main_result=[], count_result=[0], filters_result=[{}])
records, count, available_filters = await service.get_records(
user_id="user1", org_id="org1", skip=0, limit=10,
search=None, record_types=None, origins=None, connectors=None,
indexing_status=None, permissions=None, date_from=None, date_to=None,
sort_by="createdAtTimestamp", sort_order="asc", source="all"
)
@pytest.mark.asyncio
async def test_get_records_basic_async_await_behavior():
"""Test that the function returns a coroutine and can be awaited."""
service = make_service(main_result=[], count_result=[0], filters_result=[{}])
# Should be a coroutine before awaiting
codeflash_output = service.get_records(
user_id="user1", org_id="org1", skip=0, limit=10,
search=None, record_types=None, origins=None, connectors=None,
indexing_status=None, permissions=None, date_from=None, date_to=None,
sort_by="createdAtTimestamp", sort_order="asc", source="all"
); coro = codeflash_output
result = await coro
=======================
EDGE TEST CASES
=======================
@pytest.mark.asyncio
async def test_get_records_edge_concurrent_execution():
"""Test concurrent execution of multiple get_records calls."""
service = make_service(
main_result=[{"id": "rec1"}],
count_result=[1],
filters_result=[{"recordTypes": ["FILE"], "origins": ["UPLOAD"], "connectors": ["KNOWLEDGE_BASE"], "indexingStatus": ["INDEXED"], "permissions": ["OWNER"]}]
)
# Run 5 concurrent calls with different skip values
tasks = [
service.get_records(
user_id=f"user{i}", org_id="org1", skip=i, limit=1,
search=None, record_types=None, origins=None, connectors=None,
indexing_status=None, permissions=None, date_from=None, date_to=None,
sort_by="createdAtTimestamp", sort_order="asc", source="all"
)
for i in range(5)
]
results = await asyncio.gather(*tasks)
# Each result should be a tuple and have the correct structure
for records, count, available_filters in results:
pass
@pytest.mark.asyncio
async def test_get_records_edge_exception_handling():
"""Test that function handles exceptions and returns empty results."""
# Simulate db.aql.execute raising an exception
class ExceptionAql:
def execute(self, query, bind_vars=None):
raise RuntimeError("DB error")
class ExceptionDb:
def init(self):
self.aql = ExceptionAql()
service = make_service()
service.db = ExceptionDb()
records, count, available_filters = await service.get_records(
user_id="user1", org_id="org1", skip=0, limit=10,
search=None, record_types=None, origins=None, connectors=None,
indexing_status=None, permissions=None, date_from=None, date_to=None,
sort_by="createdAtTimestamp", sort_order="asc", source="all"
)
@pytest.mark.asyncio
async def test_get_records_edge_permissions_filtering():
"""Test that permissions filter disables KB records if no matching roles."""
# permissions that do not intersect with base_kb_roles
service = make_service(main_result=[], count_result=[0], filters_result=[{}])
records, count, available_filters = await service.get_records(
user_id="user1", org_id="org1", skip=0, limit=10,
search=None, record_types=None, origins=None, connectors=None,
indexing_status=None, permissions=["NOT_A_ROLE"], date_from=None, date_to=None,
sort_by="createdAtTimestamp", sort_order="asc", source="all"
)
@pytest.mark.asyncio
async def test_get_records_edge_date_filters():
"""Test that date_from and date_to are passed as bind vars and handled."""
# Setup service and inspect bind_vars
service = make_service(main_result=[{"id": "rec1"}], count_result=[1], filters_result=[{"recordTypes": ["FILE"]}])
await service.get_records(
user_id="user1", org_id="org1", skip=0, limit=1,
search=None, record_types=None, origins=None, connectors=None,
indexing_status=None, permissions=None, date_from=1700000000, date_to=1800000000,
sort_by="createdAtTimestamp", sort_order="asc", source="all"
)
# Check that date_from and date_to were passed in bind_vars
main_call = service.db.aql.calls[0]
bind_vars = main_call[1]
=======================
LARGE SCALE TEST CASES
=======================
@pytest.mark.asyncio
async def test_get_records_large_scale_many_records():
"""Test function with a large number of records (up to 500)."""
main_result = [{"id": f"rec{i}", "recordName": f"Doc{i}"} for i in range(500)]
count_result = [500]
filters_result = [{"recordTypes": ["FILE", "MAIL"], "origins": ["UPLOAD", "CONNECTOR"], "connectors": ["KNOWLEDGE_BASE"], "indexingStatus": ["INDEXED"], "permissions": ["OWNER", "READER"]}]
service = make_service(main_result, count_result, filters_result)
records, count, available_filters = await service.get_records(
user_id="user1", org_id="org1", skip=0, limit=500,
search=None, record_types=None, origins=None, connectors=None,
indexing_status=None, permissions=None, date_from=None, date_to=None,
sort_by="createdAtTimestamp", sort_order="asc", source="all"
)
@pytest.mark.asyncio
async def test_get_records_large_scale_concurrent_load():
"""Test concurrent execution with moderate load."""
main_result = [{"id": f"rec{i}"} for i in range(100)]
count_result = [100]
filters_result = [{"recordTypes": ["FILE"]}]
service = make_service(main_result, count_result, filters_result)
# Run 20 concurrent calls
tasks = [
service.get_records(
user_id=f"user{i}", org_id="org1", skip=0, limit=100,
search=None, record_types=None, origins=None, connectors=None,
indexing_status=None, permissions=None, date_from=None, date_to=None,
sort_by="createdAtTimestamp", sort_order="asc", source="all"
)
for i in range(20)
]
results = await asyncio.gather(*tasks)
for records, count, available_filters in results:
pass
=======================
THROUGHPUT TEST CASES
=======================
@pytest.mark.asyncio
#------------------------------------------------
import asyncio # used to run async functions
from unittest.mock import AsyncMock, MagicMock
import pytest # used for our unit tests
from app.connectors.services.base_arango_service import BaseArangoService
The function to test (copied exactly as provided)
... (see above for the full BaseArangoService.get_records definition) ...
Helper class to simulate ArangoDB's AQL execution
class DummyAQL:
def init(self, responses):
# responses: dict mapping query string to response
self.responses = responses
class DummyDB:
def init(self, responses):
self.aql = DummyAQL(responses)
Dummy logger for testing
class DummyLogger:
def init(self):
self.infos = []
self.errors = []
Dummy config service and kafka service (not used in get_records)
class DummyConfigService:
pass
class DummyKafkaService:
pass
@pytest.fixture
def base_arango_service_factory():
def factory(responses):
logger = DummyLogger()
arango_client = MagicMock()
config_service = DummyConfigService()
kafka_service = DummyKafkaService()
service = BaseArangoService(
logger=logger,
arango_client=arango_client,
config_service=config_service,
kafka_service=kafka_service,
)
service.db = DummyDB(responses)
return service
return factory
========== BASIC TEST CASES ==========
@pytest.mark.asyncio
To edit these changes
git checkout codeflash/optimize-BaseArangoService.get_records-mhxiis50and push.