-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build instruction\requirements windows? #3
Comments
Mecab doesn't have any special requirements in itself. There are two ways to build it on windows.
To be honest developing on Windows is always troublesome and I don't have a development environment setup on windows to test it right now. I hope that these few tips can help you otherwise I'll set up a little test environment in a virtual machine. |
I actually started my experiments but so far i was able to build mecab only with MSVC compiler... I suppose i'm going to switch to MSVC ABI to try out if it works fine, and i wasn't able to work out how to build mecab with gcc For 64bit MSVC there is instruction from another project UPD: Right now for gcc i'm struggle with this:
But it seems to be more or less possible to build libmecab itself. p.s. i wasn't able to configure with msys so i just wrote CMake config: cmake_minimum_required(VERSION 2.6)
project(Mecab)
SET( CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Os -Wall -DNDEBUG -lstdc++")
SET( CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Os -Wall -DNDEBUG")
include_directories("${PROJECT_BINARY_DIR}")
add_definitions(-DVERSION="0.996")
add_definitions(-DMECAB_DEFAULT_RC="mecabrc")
add_definitions(-DPACKAGE="mecab")
add_definitions(-DDIC_VERSION=102)
add_definitions(-D_WIN32_IE=0x0900)
add_definitions(-DMECAB_USE_THREAD -D_CRT_SECURE_NO_DEPRECATE -DHAVE_WINDOWS_H -DDLL_EXPORT -DHAVE_GETENV -DUNICODE -D_UNICODE)
add_library(mecab STATIC mecabrc viterbi.cpp tagger.cpp utils.cpp utils.h eval.cpp iconv_utils.cpp iconv_utils.h dictionary_rewriter.h dictionary_rewriter.cpp dictionary_generator.cpp dictionary_compiler.cpp context_id.h context_id.cpp winmain.h thread.h connector.cpp nbest_generator.h nbest_generator.cpp connector.h writer.h writer.cpp mmap.h ucs.h string_buffer.h string_buffer.cpp tokenizer.h stream_wrapper.h common.h darts.h char_property.h ucstable.h freelist.h viterbi.h param.cpp tokenizer.cpp ucstable.h char_property.cpp dictionary.h scoped_ptr.h param.h mecab.h dictionary.cpp feature_index.cpp feature_index.h lbfgs.cpp lbfgs.h learner_tagger.cpp learner_tagger.h learner.cpp learner_node.h libmecab.cpp)
add_executable(mecab-dict-index mecab-dict-index.cpp)
target_link_libraries(mecab-dict-index mecab)
add_executable(mecab-dict-gen mecab-dict-gen.cpp)
target_link_libraries(mecab-dict-gen mecab)
add_executable(mecab-system-eval mecab-system-eval.cpp)
target_link_libraries(mecab-system-eval mecab)
add_executable(mecab-cost-train mecab-cost-train.cpp)
target_link_libraries(mecab-cost-train mecab)
add_executable(mecab-test-gen mecab-test-gen.cpp)
target_link_libraries(mecab-test-gen mecab)
add_executable(mecab-cli mecab.cpp)
target_link_libraries(mecab-cli mecab) UPD 2: |
Ok, putting in linker options link to stdc++ and pthread solves build issue with gcc. |
What exactly is it panicking on? Can you give me a backtrace? |
It panicks in method I don't really understand what is wrong as of now. But it happens here:
For some reason result contains invalid utf8 string...
|
hmm.... could it by that the input string you provide is not utf encoded but ascii or shiftjs? This especially applies to windows command line and powershell. Both are not using utf8 and could cause this error. Or your actual rust source file is saved as ascii |
I'm using eample from repository which is saved as utf-8. https://github.com/tsurai/mecab-rs/blob/master/examples/simple.rs
Input itself is correctly printed, but something goes wrong inside mecab :( |
Anyway from my experiments with msys2 it seems that there is actually no need even to build with gcc. You can link with windows dll without problem even on rust with gcc toolchain, but binary will require this dll for work. UPD:
|
I've set up a gcc toolchain test environment and am currently investigating the problem. There seem to be a problem with the String conversion before passing it to mecab. Some functions work fine and others return an invalid utf string. Will post an update when I found something out. Edit: |
Thanks, let me know if you'll need any help |
Turns out that a lifetime problem was the cause for the errors and panics. I've tested the new version on my Windows and it works fine now. Please look if this fixed your problems and I'll upload a new crate version |
Hm... Strange but it doesn't work for me.
It seems input's ptr is still getting corrupted in my case.
Result:
UPD: UPD 2: UPD 3: |
Please bear with me as I express my love for windows.... This is definitly not a problem with the code or rust itself. My whole system is set to japanese and the codepage of my commandline is set to 932 (Japanese Shift-JIS) which is utf-8 compatible (?). Now this is where the fun begins... changing the japanese font MS ゴシック to the TrueType font Consolas causes the program to print the usual missing character symbols as expected. But changing it to the default bitfont causes the program to panic and crash while PRINTING the characters. Setting the codepage to 65001 (UTF-8) should also work but the problem is that windows only offers a few fonts (none of them supports japanese). Adding a new font can only be done by changing an entry in the registry which requires a REBOOT to apply. Please note that there are numerous reports of bugs and undefinied behavior with the UTF-8 codepage because windows has a slightly different standard to support some legacy behavior. That sums things up. Your best bet is it change the codepage to 932 if you want to get it to work with the standard commandline. Other than that you could use a different console program with proper utf-8 support. Sadly utf-8 is a second class citizen on windows. |
I suppose. Anyway i think we need these requirements described for windows and since it works just fine we can close this issue :) |
Thanks a lot for your research into this. Wish I could have helped you more but windows is a bit stubborn in this case. I added a few notes to the README and linked this issue for transparencies sake. |
Np. And link to my final CMake config for mecab sources for reference 64bit library and binaries: |
I just tested it with my gcc toolchain and it seems that you don't even need msys2 to use the pre-built binaries (if you are using the 32bit toolchain). Just copy the pre-build dll to /lib/rustlib//lib and the compiler will handle the rest by itself. |
That's even greater! |
I'm pretty sure that you will need to bundle the two together for it to work. Or at least have it somewhere reachable in your path variable. |
Yes, of course feel free to use these links |
subj :)
The text was updated successfully, but these errors were encountered: