-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rdkafka can't be built with native compilers #319
Comments
The l did get librdkafka+mklove to build on Solaris x86, but that might have been with the GNU toolchain. If you could set me up with an account on a box with the proper toolchain, or provide a virtualbox image for the same, I could do the porting work required. I'm very reluctant to autoconf, it is just bad technology, mklove should be able to handle it. |
While autoconf is not a tool I particularly like it handles a lot of different cases that mklove currently does not. For example rdkafka doesn't build on our Linux / older gcc versions out of the box. Neither the dev branch or prod branch builds. I upgraded to a more recent version of gcc and that builds the production branch. However in the dev branch I get: rdendian.h:96:2: error: #error Missing definition for be64toh mklove doesn't seem to handle setting CFLAGS to values like "-D ABC=123" correctly. For mklove to work with a native compiler one needs to be able to specify and override all the compiler/linker options. I will try and play around some more. |
Yeah, I hear you, but I also think this can be fixed in mklove and librdkafka without too much effort given that I get access to the proper dev environment. Do you know if they are freely available? |
I will try and see. We, unfortunately, don't expose our machine environments for public consumption. We would also want to port to AIX which we still support. At this point I have confirmed that using gcc/g++ builds cleanly on Linux and Solaris for us. AIX is failing and I am investigating. |
Could you provide the exact build env names and version numbers and I'll try to see how close I can get? |
I actually got it to compile on AIX using the gcc compiler. Or at least most of the code. It is failing to build the .so - some gcc atomic ops failing to get resolved. I can give you a pull request with the changes which are not much. I haven't tested it yet but it does compile. Give me some more time to play with it. |
Awesome, I suggest you make your changes to 'dev' branch which has better porting support (and is the future of software!) |
I am working in the dev branch and am now struggling to get the code compiling using gcc 4.8.4 on linux with no modifications to the code. The current roadblocks are:
Can you fix these before I try and merge my changes in? |
dev branch builds fine on my Debian machine with gcc/g++ 4.8.4. |
|
|
|
If you want to suggest something to try or send me code changes I can give it a try. |
Can you look in your /usr/include/endian.h and see what the prereqs for be64toh() is? Do you know if there are there vbox or vagrant images available for something similar to your system? |
I can probably do a CentOS image, but which version? |
So our older version (which I am testing on) doesn't even have any of the be* macro(s) defined. On a more recent version they exist. So you will need a fallback case. This did compile in your prod branch. |
I've reproduced this issue on CentOS 5: I'll grab an image and fix it. |
Cool. When I hacked around that problem to see what else would come up I quickly came across strndup missing as well. Once you push changes back let me know and I will continue trying on my end. As a FYI to get this to build on AIX required changes to 8 files (small changes). |
Do you have a rough ETA on when you will push changes back to dev with the fixes for older Linux envs? |
Looking at the dev branch - native compiler issues:
#ifdef GNUC
|
I am not sure how to properly change the mklove code to say if the compiler is gcc then do this else if the compiler xlc (AIX) do that, else if SUN's cc do xxx so that is why I posted the above. After I hacked around those issues I found the other stuff. |
Sorry for all the updates. I think this issue is now devolved into two separate problems:
My suggested fix for this issue is that when running configure a new option be provided to use the specified compiler/linker values and not intermix them with defaults. Part of the problem is that you are providing some defaults which are appended to by user specified values and the defaults are GCC specific. If we have the option to not use any defaults but to take what is specified then that might make it easier to implement. The downside to this approach is that everyone who wants to use a native compiler has to know all their options well. And you may need to add a lot more options to configure to maximize the flexibility you need. The alternative is to detect the compiler provided and set appropriate options internally (like you do for gcc). I can help with the AIX and SUN compilers. Let me know what approach you want to take and I will try and help out. |
Thanks for all the effort you are putting into this, very much appreciated. |
Cool. Note that issue 328 is now for the AIX port. This one I will keep for the native compilers. |
I think the alternative approach is best: |
be64toh and strndup are now fixed on the dev branch, can you give it a go? |
I merged master into dev as well. |
I picked up the dev branch and this builds cleanly on my gcc/linux version. However on sun (using gcc) it fails: If I ifdef this out for sun I get more problems, e.g. struct timespec undefined, strdup undeclared etc. There seem to be a lot of mismatched #include statements for Sun. |
Interesting - the master branch builds fine on both SunOS and Linux. So do you want me to merge my AIX changes into that or do you want to continue working on the dev branch and have me merge it there when you are done? |
No we can work off the dev branch. What would be helpful it to email me when you have made significant dev changes. I can pull and do test builds on various platforms. If you want to avoid cluttering this issue with a ton of more comments you can email me directly as shalstead@bloomberg.net |
Will do. |
Yes indeed. For the consumer we only plan to use it on Linux. For the publisher I need windows, aix, solaris and linux. |
Okay. The consumer implementation will get an overhaul on the dev branch to fix a number of issues (mainly offset storage bugs) and get ready for the broker based consumer groups that will be available in Kafka 0.9. I see that collectdwin is written in C#, how are you planning on integrating librdkafka in that project? |
I am not 100% sure since I am not a C# developer. C# can call C and we will just have to build the glue (dll ?) to do it. Collectdwin doesn't do dlopen / dlsym like the Unix variant. |
Sounds like proper C# bindings for librdkafka is what you need. |
Nice to have but not required we need to wrap it regardless to match the collectdwin plugin interface. Low priority. AIX, Windows and Native compiler support are higher priority from my perspective. As an aside I need to disable the linker script support (WITH_LDS) at ./configure time in your master branch. Do you have an option to do this? On my solaris machine this is not supported. |
WITH_LDS: if the configure check succeeds but the actual use in the makefile fails I think the configure check should be fixed so it properly fails and disables WITH_LDS. |
I will be on vacation for a bit and back on Aug 12th. It would be nice if there you could shoot to release something by Aug 17th. If not a stable dev version with support for windows, etc by then could be a fallback. Also from a build perspective it might be useful to have separate targets for just the libs, examples and tests and then a build all to do everything. When we do packaging with only want to build the libs. Everything should of course build on all platforms and run correctly with appropriate config changes. |
How goes the next release (current dev branch) with AIX, Win32 and hopefully native compiler support? Do you have an ETA for it yet? |
Welcome back from vacation! I'm currently on vacation myself 'til monday. Would rolling your own release off dev work for the time being? |
If you are on vacation you shouldn't be so active on github ;-) I presume that you believe your dev branch to be stable at this point. I will get a copy of this and try to do builds (using gcc) on our platforms (linux, aix and solaris) and see how that goes. |
Linux ============================make tests: warning should be cleaned up. Note that everything else built cleanly and the tests ran and completed correctly.
Solaris ============================make failed building the shared library
However this command worked:
The examples did not build:
This can be fixed by adding -lsocket -lnsl to the linkline The tests did not build. The compilation of test.c did not honor the -m64 option and linking fails:
Then the it failed to link due to the same missing -lsocket -lnsl Finally once it linked it crashed on Solaris:
stack of crash
I didn't try and do windows or AIX due to the above errors. |
I decided to give IBM a try.
|
I am trying to debug the Solaris (on Sparc) problem. Turning off optimization I can see that it is crashing in at reading through the topics - on the first one. It looks like the messsage is good.
Any suggestions? I may need to move the macro code inline to debug further. |
The problem appears to be when the assignment / write is done to the topics[0] fields inside the for loop on line 876 of rdkafka_broker.c (dev branch). I tried assigning 0 to both the err and partition_cnt fields and both crash. Kind of at a loss. The MSH_ALLOC calls seem to be returning reasonable values and I can print out the info in a debugger. |
I think the problem is with unaligned access, typical sparc thing. I'll
push a fix
|
Let me know when you push and I will give it a try. It would be useful to also:
|
So - also please fix the library archiving for Solaris and AIX - it needs to be the following command |
The problem on Solaris (SPARC) is that the memory returned from the _MSH_ALLOC() macro is not 64 byle aligned. In rdkafka_broker.c (in dev branch) line 727-728 to be: #define _MSH_ALLOC(PTR,LEN) do { The first two tests then pass. The third test is failing - I am looking into it: |
When I ifdef out the 3rd test, the 5th test 0005_order fails with message too long as well: So something more fundamental with the message is a problem. |
Magnus - I think you can go ahead and close this out once you have merged the dev branch changes you have made so far to the main branch. I will look to do some retesting and put in new issues, if I find any, off of that. Do you have an ETA on when you will do your next release ? |
I have rewritten all the buffer handling to be alignment safe - no more crashes on Sparc, no more soft-alignment interrupts on ARM. |
OK - let me know when you are ready... and from which branch to work off of. |
awesome, will do |
@shalstea Not sure if this is still on your radar, but could you give latest master a try? |
We need to build opensource products with the native compiler, e.g. SUN's Workshop tools on Solaris machines not gcc/g++. The code uses gcc specific conventions and the mklove tools assume the same. It would be useful to convert this to autoconf and to make the code compiler agnostic.
Would you be open to take changes for this?
The text was updated successfully, but these errors were encountered: