Conversation
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (60.43%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #377 +/- ##
==========================================
+ Coverage 56.63% 56.97% +0.34%
==========================================
Files 55 56 +1
Lines 4953 4953
Branches 431 446 +15
==========================================
+ Hits 2805 2822 +17
+ Misses 2120 2090 -30
- Partials 28 41 +13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR advances COMPASS’s plugin architecture by introducing a centralized plugin registry, refactoring ordinance plugins into more generic base classes, and enabling prompt-driven collector/extractor implementations to reduce duplicated extraction logic across technologies.
Changes:
- Added a plugin registry (
PLUGIN_REGISTRY+register_plugin) and migrated the processing runner to resolve plugins via the registry. - Refactored ordinance plugin framework to support prompt-chain collectors/extractors and centralized configuration/validation.
- Updated threaded cleaned-text writing to be driven by per-plugin file output registration, and adjusted unit tests/docs accordingly.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
compass/plugin/registry.py |
Introduces plugin registration and the global plugin registry used by the runner. |
compass/plugin/ordinance.py |
Adds generalized ordinance plugin base + prompt-driven collector/extractor implementations; registers cleaned output filenames. |
compass/plugin/interface.py |
Renames/splits plugin base responsibilities (filtered pipeline base separated from ordinance-specific parsing). |
compass/plugin/base.py |
Adds JURISDICTION_DATA_FP hook and a base validate_plugin_configuration entrypoint. |
compass/plugin/__init__.py |
Re-exports new plugin framework symbols and registry utilities. |
compass/scripts/process.py |
Switches from hardcoded extractor registry to PLUGIN_REGISTRY for tech→plugin resolution. |
compass/services/threaded.py |
Makes cleaned text outputs configurable via CLEANED_FP_REGISTRY keyed by plugin tech. |
compass/validation/content.py |
Extends chunk-validation callback signature to support prompt-driven validation calls. |
compass/utilities/jurisdictions.py |
Removes TX water districts CSV from default registry (now added via plugin registration). |
compass/extraction/wind/plugin.py |
Migrates wind plugin to OrdinanceExtractionPlugin and registers it. |
compass/extraction/wind/ordinance.py |
Converts wind ordinance collectors/extractors to prompt-driven implementations. |
compass/extraction/solar/plugin.py |
Migrates solar plugin to OrdinanceExtractionPlugin and registers it. |
compass/extraction/solar/ordinance.py |
Converts solar ordinance collectors/extractors to prompt-driven implementations. |
compass/extraction/small_wind/plugin.py |
Migrates small-wind plugin to OrdinanceExtractionPlugin and registers it. |
compass/extraction/small_wind/ordinance.py |
Converts small-wind ordinance collectors/extractors to prompt-driven implementations. |
compass/extraction/water/plugin.py |
Registers the TX water rights plugin and supplies its jurisdiction dataset path. |
compass/extraction/__init__.py |
Temporarily imports plugins to force registration via side effects. |
compass/__init__.py |
Temporarily imports plugins at package import time to force registration. |
docs/source/conf.py |
Updates Sphinx crossrefs and adds nitpick ignores for moved/changed symbols. |
tests/python/unit/services/test_services_threaded.py |
Updates cleaned-file writer tests for the new registry-driven output behavior. |
tests/python/unit/plugin/test_plugin_ordinances.py |
Updates ordinance plugin validation tests for the new ordinance plugin base classes. |
| self._validate_collector_prompts() | ||
| self._validate_collector_prompts() | ||
| self._register_clean_file_names() |
There was a problem hiding this comment.
OrdinanceExtractionPlugin.validate_plugin_configuration() calls _validate_collector_prompts() twice, and _validate_collector_prompts is also defined twice (the second definition overwrites the first). As a result, prompts for PromptBasedTextCollector subclasses are never validated, and the second call is redundant. Split these into two distinct methods (e.g., _validate_text_collector_prompts and _validate_text_extractor_prompts) and call each once from validate_plugin_configuration.
| self._validate_collector_prompts() | |
| self._validate_collector_prompts() | |
| self._register_clean_file_names() | |
| self._validate_text_collector_prompts() | |
| self._validate_text_extractor_prompts() | |
| self._register_clean_file_names() | |
| def _validate_text_collector_prompts(self): | |
| """Validate prompts for text collectors""" | |
| self._validate_collector_prompts() | |
| def _validate_text_extractor_prompts(self): | |
| """Validate prompts for text extractors""" | |
| self._validate_collector_prompts() |
| fp_names = { | ||
| "relevant_text": "{jurisdiction} Ordinance Original text.txt", | ||
| "cleaned_text_for_extraction": "{jurisdiction} Cleaned Text.txt", | ||
| "districts_text": "{jurisdiction} Districts.txt", | ||
| } | ||
|
|
||
| CLEANED_FP_REGISTRY["cleaned_file_test"] = fp_names | ||
| outputs = threaded._write_cleaned_file( | ||
| doc, tmp_path, jurisdiction_name="Sample Jurisdiction" | ||
| doc, | ||
| tmp_path, | ||
| tech="cleaned_file_test", | ||
| jurisdiction_name="Sample Jurisdiction", |
There was a problem hiding this comment.
This test mutates the global CLEANED_FP_REGISTRY without restoring it, which can leak state across tests (especially with re-ordering or xdist). Use monkeypatch to set the registry entry and ensure cleanup (or delete the key at the end of the test).
| if plugin_class.JURISDICTION_DATA_FP is not None: | ||
| KNOWN_JURISDICTIONS_REGISTRY.add(plugin_class.JURISDICTION_DATA_FP) | ||
|
|
||
| plugin_instance = plugin_class(None, None) | ||
| plugin_instance.validate_plugin_configuration() | ||
|
|
||
| PLUGIN_REGISTRY[plugin_class.IDENTIFIER.casefold()] = plugin_class |
There was a problem hiding this comment.
register_plugin will silently overwrite an existing entry if another plugin registers with the same IDENTIFIER (case-insensitive). Consider raising COMPASSPluginConfigurationError when a duplicate identifier is detected to avoid non-deterministic behavior based on import order.
| # Temporarily import to register plugins | ||
| # Can drop once plugins register themselves | ||
| from .extraction import ( | ||
| COMPASSWindExtractor, | ||
| COMPASSSolarExtractor, | ||
| COMPASSSmallWindExtractor, | ||
| TexasWaterRightsExtractor, | ||
| ) |
There was a problem hiding this comment.
Importing all extraction plugins in compass.__init__ to force registration adds significant import-time side effects (loads pandas/LLM-related modules, runs register_plugin, mutates registries) for any consumer that imports any compass.* module. Prefer an explicit plugin registration entrypoint (e.g., compass.plugin.discover_plugins() called by the CLI) or lazy registration to keep library imports lightweight and reduce circular-import risk.
| """Validate that all text collectors have prompts defined""" | ||
|
|
||
| for collector in self.TEXT_COLLECTORS: | ||
| if not issubclass(collector, PromptBasedTextCollector): | ||
| continue | ||
| try: | ||
| num_prompts = len(collector.PROMPTS) | ||
| except NotImplementedError: | ||
| msg = ( | ||
| f"Text collector {self.__class__.__name__} is missing " | ||
| "required property 'PROMPTS'" | ||
| ) | ||
| raise COMPASSPluginConfigurationError(msg) from None | ||
|
|
||
| if num_prompts == 0: | ||
| msg = ( | ||
| f"Text collector {self.__class__.__name__} has an empty " | ||
| "'PROMPTS' property! Please provide at least one prompt " | ||
| "dictionary." | ||
| ) | ||
| raise COMPASSPluginConfigurationError(msg) | ||
|
|
||
| def _validate_collector_prompts(self): |
There was a problem hiding this comment.
This assignment to '_validate_collector_prompts' is unnecessary as it is redefined before this value is used.
| """Validate that all text collectors have prompts defined""" | |
| for collector in self.TEXT_COLLECTORS: | |
| if not issubclass(collector, PromptBasedTextCollector): | |
| continue | |
| try: | |
| num_prompts = len(collector.PROMPTS) | |
| except NotImplementedError: | |
| msg = ( | |
| f"Text collector {self.__class__.__name__} is missing " | |
| "required property 'PROMPTS'" | |
| ) | |
| raise COMPASSPluginConfigurationError(msg) from None | |
| if num_prompts == 0: | |
| msg = ( | |
| f"Text collector {self.__class__.__name__} has an empty " | |
| "'PROMPTS' property! Please provide at least one prompt " | |
| "dictionary." | |
| ) | |
| raise COMPASSPluginConfigurationError(msg) | |
| def _validate_collector_prompts(self): |
Generalized plugin classes a little bit as well as add prompt-driven plugin implementations. Also added a plugin registry.
This PR brings us about 90% of the way to the full plugin architecture. We are now also in a position where we can easily implement 1-shot extraction