Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
a4d190d
feat: AMD hardware API (#1797)
vansangpfiev Jan 14, 2025
763026e
fix: add cpu usage (#1868)
vansangpfiev Jan 17, 2025
cbfe6c7
Merge pull request #1914 from janhq/chore/cherry-pick-amd-hw
vansangpfiev Feb 3, 2025
5d355c8
fix: PATCH method for Thread and Messages management (#1923)
vansangpfiev Feb 4, 2025
3e8572d
fix: ignore compute_cap if not present (#1866)
vansangpfiev Jan 17, 2025
a57d392
Merge pull request #1924 from janhq/chore/cherry-pick-dev
vansangpfiev Feb 5, 2025
5dee826
fix: models.cc: symlinked model deletion shouldn't remove original fi…
ohaiibuzzle Feb 5, 2025
e435d54
Merge pull request #1933 from janhq/chore/cherry-pick-dev
vansangpfiev Feb 5, 2025
e48a0b4
fix: correct gpu info list (#1944)
vansangpfiev Feb 10, 2025
36a26cf
fix: gpu: filter out llvmpipe
sangjanai Feb 10, 2025
4be3a74
Merge pull request #1949 from janhq/fix/filter-out-llvmpipe
vansangpfiev Feb 10, 2025
cdb2446
fix: add vendor in gpu info (#1952)
vansangpfiev Feb 11, 2025
8b9f6f4
fix: correct get server name method (#1953)
vansangpfiev Feb 11, 2025
2d74836
fix: map nvidia and vulkan uuid (#1954)
vansangpfiev Feb 11, 2025
0dc62ee
fix: permission issue for default drogon uploads folder (#1870)
vansangpfiev Feb 3, 2025
7ff92ce
chore: change timeout
sangjanai Feb 11, 2025
36222f1
Merge pull request #1955 from janhq/chore/cherry-pick-bugfix
vansangpfiev Feb 11, 2025
4eba4ee
fix: make get hardware info function thread-safe (#1956)
vansangpfiev Feb 11, 2025
eb28d51
fix: cache data for gpu information (#1959)
vansangpfiev Feb 12, 2025
257573b
fix: handle path with space (#1963)
vansangpfiev Feb 13, 2025
0991133
fix: unload engine before updating (#1970)
vansangpfiev Feb 14, 2025
c18c650
fix: auto-reload model for remote engine (#1971)
vansangpfiev Feb 14, 2025
df4b1a2
fix: use updated configuration for remote model when reload (#1972)
vansangpfiev Feb 15, 2025
111a657
fix: correct engine interface order (#1974)
vansangpfiev Feb 17, 2025
26466d3
fix: improve error handling for remote engine (#1975)
vansangpfiev Feb 17, 2025
0127d84
fix: temporarily remove model setting recommendation (#1977)
vansangpfiev Feb 18, 2025
c4082e5
Merge branch 'main' of https://github.com/janhq/cortex.cpp into s/cho…
sangjanai Feb 18, 2025
9f030c7
Merge branch 'dev' into s/chore/sync-main
vansangpfiev Feb 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions docs/docs/architecture/cortex-db.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,14 @@ import TabItem from "@theme/TabItem";
This document outlines Cortex database architecture which is designed to store and manage models, engines,
files and more.

## Tables Structure

## Table Structure
### schema Table

The `schema` table is designed to hold schema version for cortex database. Below is the structure of the table:

| Column Name | Data Type | Description |
|--------------------|-----------|---------------------------------------------------------|
| version | INTEGER | A unique schema version for database. |
| schema_version | INTEGER | A unique schema version for database. |


### models Table
The `models` table is designed to hold metadata about various AI models. Below is the structure of the table:
Expand Down Expand Up @@ -53,7 +52,6 @@ The `hardware` table is designed to hold metadata about hardware information. Be
| activated | INTEGER | A boolean value (0 or 1) indicating whether the hardware is activated or not. |
| priority | INTEGER | An integer value representing the priority associated with the hardware. |


### engines Table
The `engines` table is designed to hold metadata about the different engines available for useage with Cortex.
Below is the structure of the table:
Expand Down
1 change: 0 additions & 1 deletion engine/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,6 @@ if(CMAKE_BUILD_INJA_TEST)
add_subdirectory(examples/inja)
endif()


find_package(jsoncpp CONFIG REQUIRED)
find_package(Drogon CONFIG REQUIRED)
find_package(yaml-cpp CONFIG REQUIRED)
Expand Down
12 changes: 6 additions & 6 deletions engine/cli/commands/server_start_cmd.cc
Original file line number Diff line number Diff line change
Expand Up @@ -66,16 +66,16 @@ bool ServerStartCmd::Exec(const std::string& host, int port,
si.cb = sizeof(si);
ZeroMemory(&pi, sizeof(pi));
std::wstring params = L"--start-server";
params += L" --config_file_path " +
file_manager_utils::GetConfigurationPath().wstring();
params += L" --data_folder_path " +
file_manager_utils::GetCortexDataPath().wstring();
params += L" --config_file_path \"" +
file_manager_utils::GetConfigurationPath().wstring() + L"\"";
params += L" --data_folder_path \"" +
file_manager_utils::GetCortexDataPath().wstring() + L"\"";
params += L" --loglevel " + cortex::wc::Utf8ToWstring(log_level_);
std::wstring exe_w = cortex::wc::Utf8ToWstring(exe);
std::wstring current_path_w =
file_manager_utils::GetExecutableFolderContainerPath().wstring();
std::wstring wcmds = current_path_w + L"/" + exe_w + L" " + params;
CTL_DBG("wcmds: " << wcmds);
std::wstring wcmds = current_path_w + L"\\" + exe_w + L" " + params;
CTL_INF("wcmds: " << wcmds);
std::vector<wchar_t> mutable_cmds(wcmds.begin(), wcmds.end());
mutable_cmds.push_back(L'\0');
// Create child process
Expand Down
6 changes: 5 additions & 1 deletion engine/common/hardware_common.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ struct GPU {
int64_t total_vram;
std::string uuid;
bool is_activated = true;
std::string vendor;
};

inline Json::Value ToJson(const std::vector<GPU>& gpus) {
Expand All @@ -100,7 +101,10 @@ inline Json::Value ToJson(const std::vector<GPU>& gpus) {
gpu["total_vram"] = gpus[i].total_vram;
gpu["uuid"] = gpus[i].uuid;
gpu["activated"] = gpus[i].is_activated;
res.append(gpu);
gpu["vendor"] = gpus[i].vendor;
if (gpus[i].total_vram > 0) {
res.append(gpu);
}
}
return res;
}
Expand Down
5 changes: 5 additions & 0 deletions engine/controllers/engines.cc
Original file line number Diff line number Diff line change
Expand Up @@ -375,17 +375,21 @@ void Engines::UpdateEngine(
metadata = (*exist_engine).metadata;
}

(void)engine_service_->UnloadEngine(engine);

auto upd_res =
engine_service_->UpsertEngine(engine, type, api_key, url, version,
"all-platforms", status, metadata);
if (upd_res.has_error()) {
Json::Value res;
res["message"] = upd_res.error();
CTL_WRN("Error: " << upd_res.error());
auto resp = cortex_utils::CreateCortexHttpJsonResponse(res);
resp->setStatusCode(k400BadRequest);
callback(resp);
} else {
Json::Value res;
CTL_INF("Remote Engine update successfully!");
res["message"] = "Remote Engine update successfully!";
auto resp = cortex_utils::CreateCortexHttpJsonResponse(res);
resp->setStatusCode(k200OK);
Expand All @@ -394,6 +398,7 @@ void Engines::UpdateEngine(
} else {
Json::Value res;
res["message"] = "Request body is empty!";
CTL_WRN("Error: Request body is empty!");
auto resp = cortex_utils::CreateCortexHttpJsonResponse(res);
resp->setStatusCode(k400BadRequest);
callback(resp);
Expand Down
9 changes: 5 additions & 4 deletions engine/controllers/models.cc
Original file line number Diff line number Diff line change
Expand Up @@ -218,10 +218,11 @@ void Models::ListModel(
obj["id"] = model_entry.model;
obj["model"] = model_entry.model;
obj["status"] = "downloaded";
auto es = model_service_->GetEstimation(model_entry.model);
if (es.has_value() && !!es.value()) {
obj["recommendation"] = hardware::ToJson(*(es.value()));
}
// TODO(sang) Temporarily remove this estimation
// auto es = model_service_->GetEstimation(model_entry.model);
// if (es.has_value() && !!es.value()) {
// obj["recommendation"] = hardware::ToJson(*(es.value()));
// }
data.append(std::move(obj));
yaml_handler.Reset();
} else if (model_config.engine == kPythonEngine) {
Expand Down
6 changes: 3 additions & 3 deletions engine/cortex-common/EngineI.h
Original file line number Diff line number Diff line change
Expand Up @@ -59,14 +59,14 @@ class EngineI {
const std::string& log_path) = 0;
virtual void SetLogLevel(trantor::Logger::LogLevel logLevel) = 0;

// Stop inflight chat completion in stream mode
virtual void StopInferencing(const std::string& model_id) = 0;

virtual Json::Value GetRemoteModels() = 0;
virtual void HandleRouteRequest(
std::shared_ptr<Json::Value> json_body,
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;
virtual void HandleInference(
std::shared_ptr<Json::Value> json_body,
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;

// Stop inflight chat completion in stream mode
virtual void StopInferencing(const std::string& model_id) = 0;
};
11 changes: 9 additions & 2 deletions engine/extensions/remote-engine/remote_engine.cc
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,13 @@ size_t StreamWriteCallback(char* ptr, size_t size, size_t nmemb,
CTL_DBG(chunk);
Json::Value check_error;
Json::Reader reader;
if (reader.parse(chunk, check_error)) {
context->chunks += chunk;
if (reader.parse(context->chunks, check_error) ||
(reader.parse(chunk, check_error) &&
chunk.find("error") != std::string::npos)) {
CTL_WRN(context->chunks);
CTL_WRN(chunk);
CTL_INF("Request: " << context->last_request);
Json::Value status;
status["is_done"] = true;
status["has_error"] = true;
Expand Down Expand Up @@ -143,7 +148,9 @@ CurlResponse RemoteEngine::MakeStreamingChatCompletionRequest(
"",
config.model,
renderer_,
stream_template};
stream_template,
true,
body};

curl_easy_setopt(curl, CURLOPT_URL, full_url.c_str());
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
Expand Down
2 changes: 2 additions & 0 deletions engine/extensions/remote-engine/remote_engine.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ struct StreamContext {
extensions::TemplateRenderer& renderer;
std::string stream_template;
bool need_stop = true;
std::string last_request;
std::string chunks;
};
struct CurlResponse {
std::string body;
Expand Down
2 changes: 1 addition & 1 deletion engine/services/engine_service.cc
Original file line number Diff line number Diff line change
Expand Up @@ -870,10 +870,10 @@ cpp::result<void, std::string> EngineService::UnloadEngine(
auto unload_opts = EngineI::EngineUnloadOption{};
e->Unload(unload_opts);
delete e;
engines_.erase(ne);
} else {
delete std::get<RemoteEngineI*>(engines_[ne].engine);
}
engines_.erase(ne);

CTL_DBG("Engine unloaded: " + ne);
return {};
Expand Down
21 changes: 12 additions & 9 deletions engine/services/hardware_service.cc
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ bool TryConnectToServer(const std::string& host, int port) {

HardwareInfo HardwareService::GetHardwareInfo() {
// append active state
std::lock_guard<std::mutex> l(mtx_);
auto gpus = cortex::hw::GetGPUInfo();
auto res = db_service_->LoadHardwareList();
if (res.has_value()) {
Expand All @@ -63,7 +64,8 @@ bool HardwareService::Restart(const std::string& host, int port) {
namespace luh = logging_utils_helper;
if (!ahc_)
return true;
auto exe = commands::GetCortexServerBinary();
auto exe = file_manager_utils::Subtract(
file_manager_utils::GetExecutablePath(), cortex_utils::GetCurrentPath());
auto get_config_file_path = []() -> std::string {
if (file_manager_utils::cortex_config_file_path.empty()) {
return file_manager_utils::GetConfigurationPath().string();
Expand Down Expand Up @@ -144,16 +146,17 @@ bool HardwareService::Restart(const std::string& host, int port) {
ZeroMemory(&pi, sizeof(pi));
// TODO (sang) write a common function for this and server_start_cmd
std::wstring params = L"--ignore_cout";
params += L" --config_file_path " +
file_manager_utils::GetConfigurationPath().wstring();
params += L" --data_folder_path " +
file_manager_utils::GetCortexDataPath().wstring();
params += L" --config_file_path \"" +
file_manager_utils::GetConfigurationPath().wstring() + L"\"";
params += L" --data_folder_path \"" +
file_manager_utils::GetCortexDataPath().wstring() + L"\"";
params += L" --loglevel " +
cortex::wc::Utf8ToWstring(luh::LogLevelStr(luh::global_log_level));
std::wstring exe_w = cortex::wc::Utf8ToWstring(exe);
std::wstring exe_w = exe.wstring();
std::wstring current_path_w =
file_manager_utils::GetExecutableFolderContainerPath().wstring();
std::wstring wcmds = current_path_w + L"/" + exe_w + L" " + params;
std::wstring wcmds = current_path_w + L"\\" + exe_w + L" " + params;
CTL_DBG("wcmds: " << wcmds);
std::vector<wchar_t> mutable_cmds(wcmds.begin(), wcmds.end());
mutable_cmds.push_back(L'\0');
// Create child process
Expand Down Expand Up @@ -185,7 +188,7 @@ bool HardwareService::Restart(const std::string& host, int port) {
auto dylib_path_mng = std::make_shared<cortex::DylibPathManager>();
auto db_srv = std::make_shared<DatabaseService>();
EngineService(download_srv, dylib_path_mng, db_srv).RegisterEngineLibPath();
std::string p = cortex_utils::GetCurrentPath() + "/" + exe;
std::string p = cortex_utils::GetCurrentPath() / exe;
commands.push_back(p);
commands.push_back("--ignore_cout");
commands.push_back("--config_file_path");
Expand Down Expand Up @@ -486,7 +489,7 @@ std::vector<int> HardwareService::GetCudaConfig() {
// Map uuid back to nvidia id
for (auto const& uuid : uuids) {
for (auto const& ngpu : nvidia_gpus) {
if (uuid == ngpu.uuid) {
if (ngpu.uuid.find(uuid) != std::string::npos) {
res.push_back(std::stoi(ngpu.id));
}
}
Expand Down
2 changes: 2 additions & 0 deletions engine/services/hardware_service.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
#include <stdint.h>
#include <string>
#include <vector>
#include <mutex>

#include "common/hardware_config.h"
#include "database_service.h"
Expand Down Expand Up @@ -39,4 +40,5 @@ class HardwareService {
private:
std::shared_ptr<DatabaseService> db_service_ = nullptr;
std::optional<cortex::hw::ActivateHardwareConfig> ahc_;
std::mutex mtx_;
};
18 changes: 10 additions & 8 deletions engine/services/inference_service.cc
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,12 @@ cpp::result<void, InferResult> InferenceService::HandleChatCompletion(
auto status = std::get<0>(ir)["status_code"].asInt();
if (status != drogon::k200OK) {
CTL_INF("Model is not loaded, start loading it: " << model_id);
auto res = LoadModel(saved_models_.at(model_id));
// ignore return result
// For remote engine, we use the updated configuration
if (engine_service_->IsRemoteEngine(engine_type)) {
(void)model_service_.lock()->StartModel(model_id, {}, false);
} else {
(void)LoadModel(saved_models_.at(model_id));
}
}
}

Expand All @@ -38,7 +42,7 @@ cpp::result<void, InferResult> InferenceService::HandleChatCompletion(
LOG_WARN << "Engine is not loaded yet";
return cpp::fail(std::make_pair(stt, res));
}

if (!model_id.empty()) {
if (auto model_service = model_service_.lock()) {
auto metadata_ptr = model_service->GetCachedModelMetadata(model_id);
Expand Down Expand Up @@ -72,7 +76,6 @@ cpp::result<void, InferResult> InferenceService::HandleChatCompletion(
}
}


CTL_DBG("Json body inference: " + json_body->toStyledString());

auto cb = [q, tool_choice](Json::Value status, Json::Value res) {
Expand Down Expand Up @@ -217,10 +220,9 @@ InferResult InferenceService::LoadModel(
std::get<RemoteEngineI*>(engine_result.value())
->LoadModel(json_body, std::move(cb));
}
if (!engine_service_->IsRemoteEngine(engine_type)) {
auto model_id = json_body->get("model", "").asString();
saved_models_[model_id] = json_body;
}
// Save model config to reload if needed
auto model_id = json_body->get("model", "").asString();
saved_models_[model_id] = json_body;
return std::make_pair(stt, r);
}

Expand Down
2 changes: 2 additions & 0 deletions engine/services/model_service.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1233,6 +1233,8 @@ cpp::result<std::optional<std::string>, std::string>
ModelService::MayFallbackToCpu(const std::string& model_path, int ngl,
int ctx_len, int n_batch, int n_ubatch,
const std::string& kv_cache_type) {
// TODO(sang) temporary disable this function
return std::nullopt;
assert(hw_service_);
auto hw_info = hw_service_->GetHardwareInfo();
assert(!!engine_svc_);
Expand Down
3 changes: 1 addition & 2 deletions engine/services/model_source_service.cc
Original file line number Diff line number Diff line change
Expand Up @@ -475,14 +475,13 @@ ModelSourceService::AddCortexsoRepoBranch(const std::string& model_source,

void ModelSourceService::SyncModelSource() {
while (running_) {
std::this_thread::sleep_for(std::chrono::milliseconds(500));
std::this_thread::sleep_for(std::chrono::milliseconds(100));
auto now = std::chrono::system_clock::now();
auto config = file_manager_utils::GetCortexConfig();
auto last_check =
std::chrono::system_clock::time_point(
std::chrono::milliseconds(config.checkedForSyncHubAt)) +
std::chrono::hours(1);

if (now > last_check) {
CTL_DBG("Start to sync cortex.db");

Expand Down
13 changes: 9 additions & 4 deletions engine/utils/file_manager_utils.cc
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,15 @@
#endif

namespace file_manager_utils {
std::filesystem::path GetExecutableFolderContainerPath() {

std::filesystem::path GetExecutablePath() {
#if defined(__APPLE__) && defined(__MACH__)
char buffer[1024];
uint32_t size = sizeof(buffer);

if (_NSGetExecutablePath(buffer, &size) == 0) {
// CTL_DBG("Executable path: " << buffer);
return std::filesystem::path{buffer}.parent_path();
return std::filesystem::path{buffer};
} else {
CTL_ERR("Failed to get executable path");
return std::filesystem::current_path();
Expand All @@ -35,7 +36,7 @@ std::filesystem::path GetExecutableFolderContainerPath() {
if (len != -1) {
buffer[len] = '\0';
// CTL_DBG("Executable path: " << buffer);
return std::filesystem::path{buffer}.parent_path();
return std::filesystem::path{buffer};
} else {
CTL_ERR("Failed to get executable path");
return std::filesystem::current_path();
Expand All @@ -44,13 +45,17 @@ std::filesystem::path GetExecutableFolderContainerPath() {
wchar_t buffer[MAX_PATH];
GetModuleFileNameW(NULL, buffer, MAX_PATH);
// CTL_DBG("Executable path: " << buffer);
return std::filesystem::path{buffer}.parent_path();
return std::filesystem::path{buffer};
#else
LOG_ERROR << "Unsupported platform!";
return std::filesystem::current_path();
#endif
}

std::filesystem::path GetExecutableFolderContainerPath() {
return GetExecutablePath().parent_path();
}

std::filesystem::path GetHomeDirectoryPath() {
#ifdef _WIN32
const wchar_t* homeDir = _wgetenv(L"USERPROFILE");
Expand Down
2 changes: 2 additions & 0 deletions engine/utils/file_manager_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ inline std::string cortex_config_file_path;

inline std::string cortex_data_folder_path;

std::filesystem::path GetExecutablePath();

std::filesystem::path GetExecutableFolderContainerPath();

std::filesystem::path GetHomeDirectoryPath();
Expand Down
Loading
Loading