Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build inference libs as independent opaque plugins and remove deduplication of their deps #177

Closed
iboB opened this issue Nov 1, 2024 · 9 comments
Assignees
Labels
build Build system and CI meta Issue is related to repo itself or dev processes

Comments

@iboB
Copy link
Member

iboB commented Nov 1, 2024

Basically follow #166 : Don't use third party code

@iboB iboB added meta Issue is related to repo itself or dev processes build Build system and CI labels Nov 1, 2024
@iboB iboB self-assigned this Nov 1, 2024
This was referenced Nov 2, 2024
@iboB
Copy link
Member Author

iboB commented Nov 4, 2024

@iboB
Copy link
Member Author

iboB commented Nov 4, 2024

Some thoughts, decisions and implications:

Breaking up the monorepo

Now, we could keep the inference libs in the monorepo, but they will still have to be configured separately. Almost all benefits of the monorepo are gone in this case and the negative aspects remain:

  • The build gets very complicated and CI scripts as well.
  • There's the implicit expectation that if you modify the language api, you have to update all inference libs
  • Third party inference libs would use a different build pattern than internal ones

And... the same can be said about the language wrappers. Having them here was decided only because of monorepo purity.

So, I decided to ditch the monorepo and move inference libs and language wrappers to separate ones.

(Alas, as anticipated, we went full circle with the wrappers and #113 is no more. We're keeping CMake, though)

API-level clashes and ODR violations

Having inference libs as plugins removes clashes and the risk of ODR violations between themselves, but not with ac-local itself. They are expected to link with ac-local's C++ interface. So, if any inference libs does use nlohmann::json, it will (subtly and nastily) clash with Dict.

That's true on two levels:

  • Symbol level, which in this particular case can easily be mitigated by tagging the template with a custom type
  • Header level, which is inevitable. We can't prevent a third party lib from including their version of json first

So, we are to wrap nlohmann::json as ac::nlohmann::json.

But that's not all. We also expose jalog. We do so only implicitly by relying on a common shared library using the same global storage. There is no risk of ODR violations or crashes here, but if not handled, inference libs won't be able to attach to the logger.

To solve this, we should wrap jalog too, but it's too much work at this point, so we'll add it to an issue for the future. For now we will optionally allow exposing jalog publicly so that inference libs may use it if they want.

Currently we don't expose anything other third party library implicitly or explicitly (well, besides the standard library, but let's just not go there, please. If we need to go there, we should just ditch C++ for the language api altogether)

But! Now exposing third party libraries via the language api should be guarded and each and every possible entry should be closely evaluated and a mitigation should be decided for it.

@iboB
Copy link
Member Author

iboB commented Nov 4, 2024

Moving inference libs to separate repos is tracked here. As #184, this also includes deleting issue labels and transferring issues.

@iboB
Copy link
Member Author

iboB commented Nov 5, 2024

Exposing jalog as discussed previously is not an option. It has dependencies of its own which can't be managed safely.

Delving deeper into this. The underlying problem is package/dependency consolidation. Unfortunately C++ has no standard package manager to speak of. Many projects use vcpkg, conan, and CPM, many use ad-hoc solutions like submodules or direct splicing. Even though a proper package manager would be able to consolidate the dependencies properly, we can't impose it on third party code.

If we were big and strong, we possibly could: If you want to be an ac-local plugin, use X. If we were big and strong we could even impose more, we could impose an ecosystem and force inference libs to respect it. Then #166 would not have been an issue at all.

Moreover the examples in #166 are one of the worst offenders in dependency management. Saying "If you want to be an ac-local plugin use vcpkg" is likely fine, but this excludes llama.cpp, for example, since it uses nlohmann::json directly in the code.

Our voice is not strong enough to influence bigger projects. Imposing a package manager is not an option. The only way to consolidate dependencies at this point is to have no dependencies.

For this reason, we're merging jalog and parts of itlib in our codebase, just like we did with nlohmann::json

Sad

@iboB
Copy link
Member Author

iboB commented Nov 7, 2024

Renaming the targets sounds very appealing:

  • Most importantly it will allow one to create a flavor in CMake just with add_subdirectory with no using install-s or export-s. The build process of a flavor is staright forward
  • It will allow a package manager to consolidate shared dependencies
  • It's relatively little work to rename the targets, and also a PR which allows target renames will likely get accepted in many places, which will allow us to use vanilla forks
  • The ODR/linker/symbol clash dangers are not increased. CMake will produce a fatal error on target clashes, so you HAVE to resolve it somehow

However it also increases the load for new inference plugins. Clashing targets need to be renamed. This goes potentially deep into the dep trees. This does turn into an endless support features. Granted this should be a rare occurence. GGML is the only known bad violater of package purity.

The above on its own would have been an acceptable sacrifice had it not been for the following problem:

Options will also clash. GGML has a lot of options. Some of them might be set something in a fork and other to another thing. Even if the target is renamed, the options will remain the same, so supporting the add_subdirectory flow will necessarily mean that the values of different options withing the renamed forks are to be the same across all forks.

This is a risk we're not willing to take.

Now, this doesn't mean that using add_subdirectory will be impossible. If you happen to have no target clashes, it will be. It just means that we're not investing in support for this flow for our forks and our inference lib implementations.

(As usual, this is subject to change)

iboB added a commit that referenced this issue Nov 7, 2024
@iboB
Copy link
Member Author

iboB commented Nov 8, 2024

Another variant which was considered (a brutal hack): write a script which renames all symbols (cmake, c, and c++) in a project, to produce another, renamed project. Thus no duplications would exist and everything can be added in a single build with add_subdirectory. The end result is quite appealing. Everything would be neatly packed and all build and dev flows (even a static build) would be possible and pleasant.

But:

  • This is too risky. Should the script fail to rename some symbols, it would at best lead to linker errors, but ODR violations are very likely, too.
  • This makes the job of inference lib contributors very hard. They will have to use this scrips and adapt it appropriately to the upstream specific code. Local changes to the source (or the source source, rather) will have to be multi-phase and seem very error prone.
  • The endless support is there too. The script will have to be continuously adapted to the upstream and local changes, with the above risks and problems always looming above

@iboB
Copy link
Member Author

iboB commented Nov 8, 2024

Yet another variant which was considered (a not-so-brutal hack): rename the symbols at object level. If successful, this would allow static library builds, and, combined with renaming CMake targets (and all of its cons) would also allow add_subdirectory

This is only a viable option for C codebases. While renaming C symbols is relatively easy, C++ mangling is super hard. Renaming C++ symbols seems close to impossible (or at least worthy of a super long project on its own). So you can only rename C symbols, which basically means a whitelist. Composing such a whitelist is almost as hard as the previous variant (not JUST as hard, because macros are exempt in this case). But that's not all, now this doesn't take care of clashing C++ symbols, so they will have to be taken care of somehow... It's a mess.

Moreover even if we did have a C codebase, this will completely break the debugger. So if you want to debug something, you will have to do it in isolation, without renames.

And even more... er... overer this will absolutely break the build system. You either rename in place, in which case renamed output doesn't match the build output and would seem out of sync, or rename to another file, in which case you can't use target dependency management, and will have to resort to CMake 2.6 techniques, like it's 2005.

@iboB
Copy link
Member Author

iboB commented Nov 8, 2024

Dropping CMake was also considered. Semi-manually building with make (or rake, or nix, or many other options) will make the build and integration process more straight-forward. But first it doesn't fix the clashes. Plugins will still have to be used (and the main cons of plugins are deployment on restricted systems like mobile and browser).

But mainly this will make supporting multiple platforms super hard. CMake may be lacking in some regards, but it's the de-facto build system standard for C++. Practically every project uses it. Practically every IDE or external build system (like gradle) has CMake support. Dropping CMake will make us invest a considerable amount of toolchain-specific effort, which we now get for free. Dropping CMake will mean that we have to recreate the build of practically every third party lib we use, which we now get for free, by just using their CMakeLists.

iboB added a commit that referenced this issue Nov 13, 2024
iboB added a commit that referenced this issue Nov 13, 2024
iboB added a commit that referenced this issue Nov 15, 2024
iboB added a commit that referenced this issue Nov 18, 2024
@iboB
Copy link
Member Author

iboB commented Nov 20, 2024

This is done, further plugin development to be tracked separately

@iboB iboB closed this as completed Nov 20, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Local SDK Beta Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Build system and CI meta Issue is related to repo itself or dev processes
Projects
Status: Done
Development

No branches or pull requests

1 participant