-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/pipeline simpler fitting #36
Conversation
def get_max_length(self) -> int | None: | ||
return self.vector_index_client.embedder_max_length | ||
|
||
def get_dump_dir(self) -> Path | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Сделай get... просто как property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
context.config_logs(self.logging_config) | ||
context.config_vector_index(self.vector_index_config, self.embedder_config) | ||
|
||
self.optimize(context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Мб лучше это сделать как init для оптимизатора, а потом он сам по себе будет оптимизировать?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Предлагаешь ещё один класс создать, чтобы у него был свой инит, который бы создавал контекст либо принимал существующий?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Предлагаю оставить такой метод только у самого context
# Conflicts: # autointent/context/optimization_info/data_models.py # autointent/context/optimization_info/optimization_info.py # autointent/pipeline/inference/inference_pipeline.py # autointent/pipeline/optimization/pipeline_optimizer.py
* tess: added inference_test * test: added inference pipeline cli * test: fixed device * test: added optimization tests * fix `inference_config.yaml` not found error --------- Co-authored-by: voorhs <ilya_alekseev_2016@list.ru>
пр готов к мерджу, жду только ревью от кого-нибудь |
from .data_handler import Dataset | ||
|
||
|
||
class NumpyEncoder(json.JSONEncoder): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Как будто этот класс больше не нужен
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
он используется еще в Context.dump()
и infererence.cli_enpoint.main()
|
||
|
||
@dataclass | ||
class ModulesList: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
А может везде Pydantic сделаем. Есть ли минусы?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Конкретно в данном месте pydantic не получилось использовать из-за сложной схемы с тайпингом и проблемой circular import. Была ошибка, что ещё не определен объект Module. Как только я заменил на датакласс, ошибка пропала
def get_max_length(self) -> int | None: | ||
return self.vector_index_client.embedder_max_length | ||
|
||
def get_dump_dir(self) -> Path | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -52,3 +52,6 @@ def predict(self, *args: list[str] | npt.NDArray[Any], **kwargs: dict[str, Any]) | |||
@abstractmethod | |||
def from_context(cls, context: Context, **kwargs: dict[str, Any]) -> Self: | |||
pass | |||
|
|||
def get_embedder_name(self) -> str | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Тогда можно будет убрать все переопределения этого метода.
def get_embedder_name(self) -> str | None:
if hasattr(self, "embedder_name"):
return getattr(self, "embedder_name", None)
return None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Пока хочу пожить с такой версией. Просто боюсь вдруг понадобится в одном из потомков как-то поменять название embedder_name
. Если даже в следующих релизах не понадобится, то уберем
помечу эту функцию в базовом Module
как экспериментальную
context.vector_index_client.delete_db() | ||
|
||
def optimize_from_dataset( | ||
self, train_data: Dataset, val_data: Dataset | None = None, force_multilabel: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
train_dataset
и test_dataset
в моем понимании лучше
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
В будущем ближе к релизу мб исправим. Проблем с неймингами много
self.vector_index_config = VectorIndexConfig() | ||
self.embedder_config = EmbedderConfig() | ||
|
||
def set_config(self, config: LoggingConfig | VectorIndexConfig | EmbedderConfig) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Можно так написать для красоты. Только сообщение для ошибки вынести в отдельную переменную, иначе ruff падает.
def set_config(self, config: LoggingConfig | VectorIndexConfig | EmbedderConfig) -> None:
match config:
case LoggingConfig():
self.logging_config = config
case VectorIndexConfig():
self.vector_index_config = config
case EmbedderConfig():
self.embedder_config = config
case _:
raise TypeError("unknown config type")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
жесть не знал что в питоне есть свой switch-case...
augmenter=augmenter, | ||
) | ||
|
||
def set_datasets( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Метод set_datasets
, а на вход ..._data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
не оч понял
autointent/context/context.py
Outdated
self.seed = seed | ||
self._logger = logging.getLogger(__name__) | ||
|
||
def config_logs(self, config: LoggingConfig) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Я бы подобные методы называл configure_logging
или setup_logging
cfg.embedder.max_length, | ||
) | ||
context = Context(cfg.seed) | ||
context.config_logs(cfg.logs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logs
-> logging_config
и т. д.
Тем более так сделано в классе PipelineOptimizer
.
|
||
def predict(self, utterances: list[str]) -> list[LabelType]: | ||
scores = self.nodes[NodeType.scoring].module.predict(utterances) | ||
return self.nodes[NodeType.prediction].module.predict(scores) # type: ignore[return-value] | ||
|
||
def fit(self, utterances: list[str], labels: list[LabelType]) -> None: | ||
pass | ||
|
||
@classmethod | ||
def from_context(cls, context: Context) -> "InferencePipeline": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Рома пишет -> Self
. Надо договориться
Лаконичный пример работы с новым апи для оптимизации пайплайна.
Еще из фич:
Context
теперь не такая громоздкаяlogs.dump_modules=False
в конфигеTODO: