Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libtokenizers.a path should not depend on build source directory #4

Closed
wants to merge 2 commits into from

Conversation

clems4ever
Copy link
Contributor

By relatively locating the library, one can just drop the pre-built library in their own project to make it work.

By relatively locating the library, one can just drop the pre-built
library in their own project to make it work.
@clems4ever
Copy link
Contributor Author

When I imported the module and dropped the prebuilt library at the root of my project, I had the following error:

$ go run main.go
# command-line-arguments
/usr/lib/go/pkg/tool/linux_amd64/link: running gcc failed: exit status 1
/usr/bin/ld: cannot find /home/user/go/pkg/mod/github.com/daulet/tokenizers@v0.5.0/libtokenizers.a: No such file or directory
collect2: error: ld returned 1 exit status

With the change in this commit, it just works.

@daulet
Copy link
Owner

daulet commented Jul 24, 2023

Yes, it will work if you just have one package, but if your repo has multiple packages that use tokenizers, they will all look for the binary in the root of each package. So making the path relative to the module source decouples it from user's repo structure.

@clems4ever
Copy link
Contributor Author

Yes, it will work if you just have one package, but if your repo has multiple packages that use tokenizers, they will all look for the binary in the root of each package. So making the path relative to the module source decouples it from user's repo structure.

Fine. Maybe a better way would be to make the library built during the installation by using the //go:generate macro somehow, maybe that could work and avoid the burden of searching for the exact path where the module is located in the go directory to drop the prebuilt library. WDYT?

@daulet
Copy link
Owner

daulet commented Jul 25, 2023

Maybe a better way would be to make the library built during the installation by using the //go:generate macro somehow

If you could make it work, sure.

avoid the burden of searching for the exact path where the module is located

isn't that always under $GOPATH/pkg/mod, unless you vendor.

@clems4ever
Copy link
Contributor Author

clems4ever commented Jul 25, 2023

Maybe a better way would be to make the library built during the installation by using the //go:generate macro somehow

If you could make it work, sure.

avoid the burden of searching for the exact path where the module is located

isn't that always under $GOPATH/pkg/mod, unless you vendor.

Nop, as you can see in the error above, the library should land in /home/user/go/pkg/mod/github.com/daulet/tokenizers@v0.5.0/

I mean that's not so bad but that's definitely a barrier that we can get rid of to increase adoption of the project in my opinion.

I'll try the generate method and tell you how it goes.

@daulet
Copy link
Owner

daulet commented Jul 28, 2023

As I mentioned, under proposed solution once your project has two packages using this module you'd have to copy the lib to two places .

@RJKeevil
Copy link

RJKeevil commented Aug 7, 2023

I think this proposed change is an improvement, as we build our application via docker CI/CD, and have a complex, version dependent, pre-setup step of having to initialize a full path (/home/user/go/pkg/mod/github.com/daulet/tokenizers@v0.5.0/) in order to drop in libtokenizers.a before compiling our app. At least with this change we can just place libtokenizers.a in the root of our app before compiling. Alternative is to actually commit this file to the repo, so it will be there without the copy (though committing artifacts is a bit nasty)?

@clems4ever
Copy link
Contributor Author

I tried the generate method but I did not manage to make it work.
I was in the same situation as @RJKeevil , that's why I made the change in the first place. However, I think that commiting the file to the repo is not the right approach because the lib could be misaligned with the code. I would not recommend it.

@sunhailin-Leo
Copy link

@clems4ever @RJKeevil

@daulet
Copy link
Owner

daulet commented Nov 9, 2023

Alternative is to actually commit this file to the repo, so it will be there without the copy (though committing artifacts is a bit nasty)?

Artifacts are platform (OS/CPU) dependent so there is no single artifact to satisfy all.

@daulet
Copy link
Owner

daulet commented Nov 9, 2023

At least with this change we can just place libtokenizers.a in the root of our app before compiling.

You'd need to copy it to the directory of each module that uses this dependency, which means N copies, while making it source agnostic allows a single copy. Does that make sense?

@RJKeevil
Copy link

RJKeevil commented Nov 9, 2023

@daulet for us at least that is not an issue, as we use it in a single module, and the docker build becomes much more simple/reliable than having to determine the current tokenizer version, generating a deeply nested folder and copying the lib to that path. The main issue is having the version number in the path, in reality it can be placed anywhere, as long as the version number isnt part of the path.

Edit: I see you now have an example dockerfile, which helps makes this simpler for people new to the project.

@daulet
Copy link
Owner

daulet commented Nov 9, 2023

The main issue is having the version number in the path, in reality it can be placed anywhere, as long as the version number isnt part of the path.

I bet you can write a script to extract the version from go.mod since it will specify it, then that becomes the only place that requires bumping the version.

@RJKeevil
Copy link

Yes it's definitely possible and no big issue, just might be easier for future users if it can be an argument, something like how it is done in this project https://github.com/yalue/onnxruntime_go

@RJKeevil
Copy link

Ok one additional issue with the lib being in the go path - it prevents you from using the docker run target feature in goland/intellij. This feature dynamically compiles and runs your dev code in an existing docker image. But it fails to compile as go mod sees that the tokenizer folder exists in the image, and assumes the go code for the dependency has already been fetched. So I would need to also copy the tokenizer go code in my Docker build process, which is very messy.

@daulet
Copy link
Owner

daulet commented Nov 16, 2023

Yes it's definitely possible and no big issue, just might be easier for future users if it can be an argument, something like how it is done in this project https://github.com/yalue/onnxruntime_go

Thanks for suggestion, was interesting read. They have a different problem where the lib they are trying to load is outside of their control and named differently on different platforms. So they end up loading the library at runtime, which brings its own complications with packaging and releasing. This library produces static build that is self sufficient.

@RJKeevil
Copy link

RJKeevil commented Nov 17, 2023

Hello, in understanding cgo a little better I perhaps have a simple solution to this discussion. I see that you can switch cgo directives using build tags. This means that if you replace the directive statement with something like

#cgo redirect_tokenizer LDFLAGS: /usr/lib/libtokenizers.a -ldl -lm -lstdc++
#cgo !redirect_tokenizer LDFLAGS: ${SRCDIR}/libtokenizers.a -ldl -lm -lstdc++

Then using go build -tags redirect_tokenizer would instead point to the fixed path at compile time. This solves my main issues (allows library to be used in Goland's docker run targets, simplifies image build). It's probably a nice answer for supporting different architectures in future, e.g. we could now package linux, darwin versions of libtokenizer.a and switch based on build flags.

@daulet
Copy link
Owner

daulet commented Nov 19, 2023

@RJKeevil that's cool, want to send a PR? I think we should the directive something like tokenizers_srcdir_relative

@RJKeevil
Copy link

Thanks @daulet , submitted a PR to #15 . As mentioned in the notes this does change the default behaviour, let me know if this is ok or not.

@AlmogBaku
Copy link

The thing is, the go module cache is read-only by design... long story short- this shouldn't be there, and it's probably should be configurable....

@jmoney
Copy link
Contributor

jmoney commented Jun 4, 2024

I think #18 allows for the modularity that is being described here.

@daulet
Copy link
Owner

daulet commented Jun 12, 2024

I believe #18 solves everyone's use cases.

@daulet daulet closed this Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants