Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

baby-llama : use unnamed namespace in baby_llama_layer #9557

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

danbev
Copy link
Collaborator

@danbev danbev commented Sep 20, 2024

This commit renames the struct llama_layer to baby_llama_layer in the baby-llama example. This is to avoid a symbol conflict with the llama_layer struct in llama.cpp.

I ran into this when investigating an issue related to the baby-llama example and stepping through the code I noticed the following in init_model:

$ gdb --args ./llama-baby-llama
Reading symbols from ./llama-baby-llama...
(gdb) br init_model
Breakpoint 1 at 0x251767: file examples/baby-llama/baby-llama.cpp,
line 190.
(gdb) r
...
(gdb)
204	    model->layers.resize(n_layer);

If we inspect the size of model->layers we see that it is 0:

(gdb) p model->layers.size()
$1 = 0

And also n_layer is 1:

(gdb) p n_layer
$3 = 1

And we can inspect the type of model->layers:

(gdb) ptype model->layers
type = std::vector<llama_layer>

Now if we step over the resize function we will see something interesting:

(gdb) p model->layers.size()
$2 = 12

I also added two print statements to show the size of n_layer and model->layers.size():

n_layer: 1
layers.size(): 2049638230412172414

I later realized that there is a symbol conflict. There is a llama_layer in llama.cpp and this object file is compiled into the llama-baby-llama binary:

/usr/bin/ccache c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic
-Wcast-qual -Wno-unused-function -Wmissing-declarations
-Wmissing-noreturn -pthread -fopenmp  -march=native -mtune=native
-Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include
-Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE
-D_GLIBCXX_ASSERTIONS -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE
ggml/src/llamafile/sgemm.o ggml/src/ggml.o ggml/src/ggml-alloc.o
ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o
src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o
src/unicode.o src/unicode-data.o common/common.o common/arg.o
common/log.o common/console.o common/ngram-cache.o common/sampling.o
common/train.o common/build-info.o common/json-schema-to-grammar.o
examples/baby-llama/baby-llama.o -o llama-baby-llama -g

This could be worked around by renaming the llama_layer in baby-llama.cpp to baby_llama_layer, or use a namespace in baby-llama.cpp. I initially considered not compiling llama.o into the llama-baby-llama binary, but it looks like the baby-llama example uses train.h so it needs llama.cpp indirectly, so I opted to rename the struct.

After renaming the resize function works as expected:

(gdb) p model->layers.size()
$1 = 0
(gdb) ptype model->layers
type = std::vector<baby_llama_layer>
(gdb) p model->layers.size()
$2 = 1

This commit renames the struct llama_layer to baby_llama_layer in the
baby-llama example. This is to avoid a symbol conflict with the
llama_layer struct in llama.cpp.

I ran into this when investigating an issue related to the baby-llama
example and stepping through the code I noticed the following in
`init_model`:
```console
$ gdb --args ./llama-baby-llama
Reading symbols from ./llama-baby-llama...
(gdb) br init_model
Breakpoint 1 at 0x251767: file examples/baby-llama/baby-llama.cpp,
line 190.
(gdb) r
...
(gdb)
204	    model->layers.resize(n_layer);
```
If we inspect the size of `model->layers` we see that it is 0:
```console
(gdb) p model->layers.size()
$1 = 0
```
And also `n_layer` is 1:
```console
(gdb) p n_layer
$3 = 1
```
And we can inspect the type of `model->layers`:
```console
(gdb) ptype model->layers
type = std::vector<llama_layer>
```

Now if we step over the resize function we will see something
interesting:
```console
(gdb) p model->layers.size()
$2 = 12
```
I also added two print statements to show the size of `n_layer` and
`model->layers.size()`:
```console
n_layer: 1
layers.size(): 2049638230412172414
```

I later realized that there is a symbol conflict. There is a
`llama_layer` in llama.cpp and this object file is compiled into the
`llama-baby-llama` binary:
```console
/usr/bin/ccache c++ -std=c++11 -fPIC -O0 -g -Wall -Wextra -Wpedantic
-Wcast-qual -Wno-unused-function -Wmissing-declarations
-Wmissing-noreturn -pthread -fopenmp  -march=native -mtune=native
-Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include
-Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE
-D_GLIBCXX_ASSERTIONS -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE
ggml/src/llamafile/sgemm.o ggml/src/ggml.o ggml/src/ggml-alloc.o
ggml/src/ggml-backend.o ggml/src/ggml-quants.o ggml/src/ggml-aarch64.o
src/llama.o src/llama-vocab.o src/llama-grammar.o src/llama-sampling.o
src/unicode.o src/unicode-data.o common/common.o common/arg.o
common/log.o common/console.o common/ngram-cache.o common/sampling.o
common/train.o common/build-info.o common/json-schema-to-grammar.o
examples/baby-llama/baby-llama.o -o llama-baby-llama -g
```
This could be worked around by renaming the `llama_layer` in
baby-llama.cpp to `baby_llama_layer`, or use a namespace in
baby-llama.cpp. I initially considered not compiling llama.o into the
llama-baby-llama binary, but it looks like the baby-llama example uses
`train.h` so it needs llama.cpp indirectly, so I opted to rename the
struct.

After renaming the resize function works as expected:
```console
(gdb) p model->layers.size()
$1 = 0
(gdb) ptype model->layers
type = std::vector<baby_llama_layer>
(gdb) p model->layers.size()
$2 = 1
```
@ggerganov
Copy link
Owner

That's not a symbol conflict since llama_layer is a type. Something else is going wrong in your debugging, but in any case, this change is not necessary.

@danbev
Copy link
Collaborator Author

danbev commented Sep 20, 2024

That's not a symbol conflict since llama_layer is a type. Something else is going wrong in your debugging, but in any case, this change is not necessary.

Yeah, sorry it is indeed not symbol conflict. I do think that there might be an issue here on Linux (using gcc) with some form or memory corruption. I've just tried this on mac and but I can't reproduce this there. I'll take another look when I get a chance, but I'll close this PR for now. Sorry about the noise.

@danbev danbev closed this Sep 20, 2024
@ggerganov
Copy link
Owner

but in any case, this change is not necessary.

Actually, I was wrong and it might be a good change. When the structs start having member functions, their symbols will duplicate (already happened in #9449) which prevents static builds.

One convention for the types in examples is to prefix with gpt_. Some of the structs in common already do that. For example, llama_layer becomes gpt_layer.

@danbev danbev reopened this Sep 20, 2024
@danbev
Copy link
Collaborator Author

danbev commented Sep 20, 2024

One convention for the types in examples is to prefix with gpt_. Some of the structs in common already do that. For example, llama_layer becomes gpt_layer.

Sounds good, I'll update this shortly 👍

@slaren
Copy link
Collaborator

slaren commented Sep 20, 2024

Another option is to use unnamed namespaces, that would make the struct only accessible from the file where it is defined.

@danbev
Copy link
Collaborator Author

danbev commented Sep 20, 2024

Another option is to use unnamed namespaces, that would make the struct only accessible from the file where it is defined.

I'd be happy to use that, just let me know what you prefer.

@slaren
Copy link
Collaborator

slaren commented Sep 20, 2024

There are other structs in baby-llama.cpp with conflicting names like llama_hparams, llama_model and llama_kv_cache, so just putting the entire file in an unnamed namespace may be the easiest solution.

Though, I suspect that this example will become outdated after the next sync with ggml, and probably will be removed.

@ggerganov
Copy link
Owner

Should we anonymous namespace the llama code instead?

@slaren
Copy link
Collaborator

slaren commented Sep 20, 2024

Yeah I think it would be good to move all the local C++ code into unnamed namespaces, since just declaring functions and globals as static is not enough to hide all the symbols in C++.

Use an unnamed namespace to make identifiers unique to the translation
unit.
@danbev
Copy link
Collaborator Author

danbev commented Sep 20, 2024

Yeah I think it would be good to move all the local C++ code into unnamed namespaces, since just declaring functions and globals as static is not enough to hide all the symbols in C++.

I'd be happy to take a stab at adding an anonymous namespace the llama code.

@danbev danbev changed the title baby-llama : rename llama_layer to baby_llama_layer baby-llama : use unnamed namespace in baby_llama_layer Sep 20, 2024
danbev added a commit to danbev/llama.cpp that referenced this pull request Sep 23, 2024
This commit introduces an anonymous namespace in llama.cpp to
encapsulate the following structs and types:

* llama_state
* llama_hparams
* llama_cparams
* llama_layer
* llama_ubatch
* llama_kv_cell
* llama_kv_cache
* llama_control_vector
* e_model

There are potentially more structs, and also functions that are currently
declared as static, which could be included in this anonymous namespace
in the future.

The motivation for this change is to avoid polluting the global
namespace with these types.

Refs: ggerganov#9557
Copy link

@vignesh1507 vignesh1507 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i approve with the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants