Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vulkan-api on macOS should use libvulkan instead of libMoltenVK by default, and ideally the same lib as the one GLFW already loaded #24

Closed
cjay opened this issue Mar 10, 2019 · 16 comments

Comments

@cjay
Copy link
Contributor

cjay commented Mar 10, 2019

I have tried a lot of things trying to get vulkan-examples (executable ve-06-Drawing) to work with MoltenVK on macOS, but always get the error "ve-06-Drawing: VulkanException {vkeCode = Nothing, vkeMessage = "GLFW reports that vulkan is not supported!"}". It would be awesome if the README had instructions on how to pull that off. I'm assuming this is possible because the README claims that vulkan-api is tested with MoltenVK.

System: macOS 10.14.3
cabal-install and Cabal version: 2.4.1.0
ghc: 8.6.4

Things I tried:

  • cabal flags --extra-include-dirs and --extra-lib-dirs
  • --constraint="bindings-GLFW +system-GLFW" with glfw from homebrew
  • putting the MoltenVK.framework and vulkan.framework in /Library/Frameworks and writing "frameworks: MoltenVK, vulkan" into cabal file
  • DYLD_INSERT_LIBRARIES
  • reading glfw C source code from bindings-GLFW trying to figure out how it detects Vulkan.

I think that static linking to the frameworks doesn't work because the C code is built without cmake which would define _GLFW_VULKAN_STATIC. This define is queried in vulkan.c. Interestingly this file only mentions .dll and .so files as parameters to dlopen, not .dylib. But I'm probably missing something.

@cjay
Copy link
Contributor Author

cjay commented Mar 10, 2019

Discovered related bsl/bindings-GLFW#61. So I was on the right track, though I'm still puzzled how the author of vulkan-api worked around that when testing with MoltenVK. I tried inserting current glfw C code from github into bindings-GLFW, but this doesn't compile. As suggested in bsl/bindings-GLFW#64 there is more work needed on this.

@o1lo01ol1o
Copy link
Contributor

See #20. The bindings were built and tested but binaries weren't run via GLFW. I think bsl/bindings-GLFW#64 was mostly done with the needed api updates to support a recent commit of GLFW. If you pick this work up, let me know how you fare.

@cjay
Copy link
Contributor Author

cjay commented Mar 13, 2019

I'm on it. I finished the changes to GLFW.hsc that you already started, though I didn't check if there were more C functions to add. A small change to GLFW-b was necessary because of some removed functions. GLFW now finds MoltenVK, but I get another error:

GLFW version: 3.3.0 Cocoa NSGL Initialized GLFW window. [***MoltenVK ERROR***] VK_ERROR_LAYER_NOT_PRESENT: Vulkan layer VK_LAYER_LUNARG_standard_validation is not supported. Closed GLFW window. Terminated GLFW. ve-06-Drawing: VulkanException {vkeCode = Just VK_ERROR_LAYER_NOT_PRESENT, vkeMessage = "vkCreateInstance: Failed to create vkInstance."}

I'm haven't figured out the cause yet, and am not sure which repos the fix to this will touch.

To share my changes, should I fork your fork of bindings-GLFW to later address a pull request to you, or is it better to fork the original bindings-GLFW myself and pull your commits into that?

@achirkin
Copy link
Owner

Great job @cjay , thanks for moving this forward!

Regarding the validation layers:

Just to check if the whole thing working at all, you can disable them; for example, by switching off cabal flag dev in vulkan-triangles or changing the source at Vulkan.Instance.defaultLayers.

To go deeper, check out this thread on reddit.

@o1lo01ol1o
Copy link
Contributor

To share my changes, should I fork your fork of bindings-GLFW to later address a pull request to you, or is it better to fork the original bindings-GLFW myself and pull your commits into that?

The latter would be easiest, yes?

Cool!

@cjay
Copy link
Contributor Author

cjay commented Mar 16, 2019

Forks are at https://github.com/cjay/bindings-GLFW/tree/Upgrade-3.3 and https://github.com/cjay/GLFW-b/tree/Upgrade-3.3

After turning off the validation layer now I'm stuck with:

Vulkan error: VK_ERROR_EXTENSION_NOT_PRESENT
*** Vulkan command returned an error VkResult
CallStack (from HasCallStack):
runVk, called at src/Lib/Vulkan/Presentation.hs:36:7 in vulkan-triangles-0.3.0.0-inplace:Lib.Vulkan.Presentation

Interestingly the call liftIO GLFW.getRequiredInstanceExtensions in createGLFWVulkanInstance returns an empty list. So createSurface uses the instance that is created with an empty list of extensions, but still complains about an extension not being present.

Not sure I'm doing the whole linking thing right. Bindings-GLFW only seems to find MoltenVK when i put all the .dylib files from the SDK (including libMoltenVK.dylib and libvulkan.dylib) into the working directory. --extra-lib-dirs wasn't enough. I did use --constraint="vulkan-api +usePlatformMacosMvk" and also tried +useNativeFFI-1-1 in addition to that. No change to the error message above. Once again I'll try to look at the C code to find out what's going on there.

Edit: This is the error that seems to be happening, I don't know why the error message doesn't get displayed:

if (!_glfw.vk.extensions[0])
{
_glfwInputError(GLFW_API_UNAVAILABLE,
"Vulkan: Window surface creation extensions not found");
return VK_ERROR_EXTENSION_NOT_PRESENT;
}

I hope you don't get spammed with notification mails because of my edits :)

@cjay
Copy link
Contributor Author

cjay commented Mar 16, 2019

Correction: with +useNativeFFI-1-1 I need --extra-framework-dirs and get the following:

: can't load framework: MoltenVK (dlopen(/Library/Frameworks/MoltenVK.framework/MoltenVK, 5): no suitable image found. Did find:
/Library/Frameworks/MoltenVK.framework/MoltenVK: unknown file type, first eight bytes: 0x21 0x3C 0x61 0x72 0x63 0x68 0x3E 0x0A
/Library/Frameworks/MoltenVK.framework/MoltenVK: unknown file type, first eight bytes: 0x21 0x3C 0x61 0x72 0x63 0x68 0x3E 0x0A
/Library/Frameworks/MoltenVK.framework/Versions/A/MoltenVK: unknown file type, first eight bytes: 0x21 0x3C 0x61 0x72 0x63 0x68 0x3E 0x0A
/Library/Frameworks/MoltenVK.framework/Versions/A/MoltenVK: unknown file type, first eight bytes: 0x21 0x3C 0x61 0x72 0x63 0x68 0x3E 0x0A)

Edit: When I change "frameworks: MoltenVK" of vulkan-api.cabal to frameworks: vulkan MoltenVK, I get the original error though.

@cjay
Copy link
Contributor Author

cjay commented Mar 16, 2019

Found out that the Vulkan loader searches for ICDs in OS-specific locations, and that MoltenVK is an ICD. Used the VK_ICD_FILENAMES env variable to get the loader to use MoltenVK. Now the expected extensions are present, including VK_MVK_macos_surface. However, GLFW code in cocoa_window.m errors at this spot:

vkCreateMacOSSurfaceMVK = (PFN_vkCreateMacOSSurfaceMVK)
vkGetInstanceProcAddr(instance, "vkCreateMacOSSurfaceMVK");
if (!vkCreateMacOSSurfaceMVK)
{
_glfwInputError(GLFW_API_UNAVAILABLE,
"Cocoa: Vulkan instance missing VK_MVK_macos_surface extension");
return VK_ERROR_EXTENSION_NOT_PRESENT;
}

So the loader can't find vkCreateMacOSSurfaceMVK despite the MoltenVK related extension being present. The symbol _vkCreateMacOSSurfaceMVK is present in the text sections of both libvulkan.dylib and libMoltenVK.dylib though.

@achirkin
Copy link
Owner

achirkin commented Mar 16, 2019

Hmm, can you check which extensions lists GLFW.getRequiredInstanceExtensions?
And then add VK_MVK_macos_surface to that list when creating a vulkan instance, e.g. in withVulkanInstanceExt?

@o1lo01ol1o
Copy link
Contributor

I’m on mobile, but there’s an issue on GLFW under mojave that throws a similar error. Check the issues on the repo — I remember there was a work around.

@cjay
Copy link
Contributor Author

cjay commented Mar 17, 2019

VK_MVK_macos_surface is already in the output of getRequiredInstanceExtensions.

I checked the glfw issues and found similar problems. This one "solved" it by avoiding static linking to the vulkan frameworks, but I already don't link statically. I removed the frameworks and .a files to be sure.

Here someone had the analogue problem on Windows, also solved by dynamic linking.

I also found a related issue in Vulkan-Docs. Haven't read every comment there, but what I read didn't help. This comment points out "All remaining functions will be loaded by a delegate VkCreateInstance function, which loads everything else using GetInstanceProcAddr and a valid, fresh and not NULL instance." but as far as I can tell this is already happening. The call vkGetInstanceProcAddr(instance, "vkCreateMacOSSurfaceMVK") happens with a non-NULL instance.

I also tried using glfwGetInstanceProcAddress which has a fallback via _glfw_dlsym, and also tried _glfw_dlsym directly. This returned an address for vkCreateMacOSSurfaceMVK, however the call to vkCreateMacOSSurfaceMVK resulted in a segmentation fault every time I tried.

I also tried +useNativeFFI-1-1 once again with extra-libraries: vulkan in the cabal file, which despite using --extra-lib-dirs with the correct location resulted in:

: can't load .so/.DLL for: /Users/cjay/Projects/vulkan/vulkan-triangles/dist-newstyle/build/x86_64-osx/ghc-8.6.4/vulkan-api-1.1.3.0/build/libHSvulkan-api-1.1.3.0-inplace-ghc8.6.4.dylib (dlopen(/Users/cjay/Projects/vulkan/vulkan-triangles/dist-newstyle/build/x86_64-osx/ghc-8.6.4/vulkan-api-1.1.3.0/build/libHSvulkan-api-1.1.3.0-inplace-ghc8.6.4.dylib, 5): Library not loaded: @rpath/libvulkan.1.dylib
Referenced from: /Users/cjay/Projects/vulkan/vulkan-triangles/dist-newstyle/build/x86_64-osx/ghc-8.6.4/vulkan-api-1.1.3.0/build/libHSvulkan-api-1.1.3.0-inplace-ghc8.6.4.dylib
Reason: image not found)

This is absurd. And it can't even find the lib when it's in the working dir now.

I think the options are now:

  • reading the vulkan loader source or related source that is responsible for what vkGetInstanceProcAddr returns
  • taking a closer look at the segfault
  • trying to integrate latest glfw code into bindings-GLFW and possibly latest Vulkan headers into vulkan-api
  • investigating the linking problem with +useNativeFFI-1-1. Cabal bug?
  • trying Vulkan examples in C to see if something is wrong with the system or SDK

Please let me know if you can think of any other ideas, I think I'll take a look at the loader first 😕

@achirkin
Copy link
Owner

That is rather unfortunate situation :(

I may have messed up a bit cabal settings for the osx static linking; but the only thing you can specify there is the framework name (and then make sure the framework is visible in your build environment).

I would try to go full dynamic: ask GLFW to link vulkan stuff dynamically, and disable all "useNativeFFI" flags and also disable all extension-related flags in vulkan-api/examples. This way, you are confident that all calls are done via vkGetInstanceProcAddr (which itself is loaded via dlsym) and that the version of the vulkan headers does not matter.

I would suggest to try two things to know for sure if the error is on Vulkan/GLFW/vulkan-api side:

  1. Try to build and execute vulkan/vulkan-examples/ve-01-CreateInstance (with/without useNativeFFI-1-0) -- this is the only example that does not use GLFW at all.
  2. Try some GLFW C examples.

@cjay
Copy link
Contributor Author

cjay commented Mar 17, 2019

tl;dr: Making GLFW use libMoltenVK directly forgoes the Vulkan loader and displaying triangles works, but MVK alone is no full Vulkan implementation. The loader seems to mess it up despite finding the MoltenVK ICD.

Finally some success. ve-01-CreateInstance worked as it should, and I saw that vulkan-api dlopens libMoltenVK.dylib instead of libvulkan.1.dylib. This gave me the idea to change the glfw code in vulkan.c to load libMoltenVK.dylib instead of libvulkan.1.dylib. With that change the vulkan-triangles example works. And vulkan-api doesn't even need the flag +usePlatformMacosMvk for it to work. Which is weird, because according to the MoltenVK documentation -DVK_USE_PLATFORM_IOS_MVK is needed, which is activated by that flag.

However this is only a workaround. As the SDK documentation states:

The MoltenVK library takes on the role of the Installable Client Driver (ICD) from the point of view of the application and the Vulkan loader. It is NOT a fully-conforming Vulkan driver for macOS or iOS devices.

And the MoltenVK documentation states:

MoltenVK is a Layer-0 driver implementation of Vulkan 1.0 Since it takes on the role of a driver in the Vulkan architecture, it does not load Vulkan Layers on its own. In order to use Vulkan layers such as the validation layers, use the Vulkan loader and layers from the LunarG Vulkan SDK.

As I understand it, directly loading libMoltenVK.dylib forgoes the Vulkan loader, and because of that layers can't be used. As expected, I still need to comment out VK_LAYER_LUNARG_standard_validation in Instance.hs of vulkan-triangles for it to work.

I still need to try C examples to narrow down the location of the problem.

Other observations:

  • When using libvulkan, the MoltenVK ICD definitely is found. I used the ICD installation path ~/.local/share/vulkan/icd.d. When I remove the ICD json file, only 2 extensions get returned by vkEnumerateInstanceExtensionProperties instead of 5 (VK_EXT_debug_report and VK_EXT_debug_utils).
  • when using libMoltenVK, 29 extensions get returned by vkEnumerateInstanceExtensionProperties
  • when using libvulkan, it's only 5, but they do include VK_MVK_macos_surface and VK_KHR_surface. Though by using the env var VK_LOADER_DEBUG=all I can see that the loader finds all 29 extensions in the ICD (DEBUG: Build ICD instance extension list…)

@achirkin
Copy link
Owner

A sidenote: the purpose of usePlatformMacosMvk is to expose on haskell side some symbols from vulkan.h headers. And typically these are just some constants. Thus, if you use none of these symbols explicitly in your haskell code, you don't need to enable this cabal flag (and probably you don't, because GLFW handles the platform-specific part).

So, do I understand correctly, that vulkan-triangles works if you bypass vulkan loader and load libMoltenVK directly, but ve-01-CreateInstance works in both cases? And that if you use vulkan loader, some extensions are missing (if you don't use debug env var)?

@cjay
Copy link
Contributor Author

cjay commented Mar 18, 2019

So, do I understand correctly, that vulkan-triangles works if you bypass vulkan loader and load libMoltenVK directly

Yes, by changing this line of bindings-GLFW to use libMoltenVK.dylib.

but ve-01-CreateInstance works in both cases?

I had not tested both cases with 01-CreateInstance, but yes it does work with both.
Had to changed this line of vulkan-api. By this line vulkan-api already used libMoltenVK.dylib.

Thanks to this question I found the solution though! GLFW loaded libvulkan and vulkan-api loaded libMoltenVK, and mixing both is bad. I had wrongly assumed that vulkan-api would just use whatever GLFW had loaded already. When both use libvulkan, everything works now, including the standard validation layer.

Can you change vulkan-api to detect which lib has already been loaded? I'm not sure if this his feasible. Otherwise the documentation should warn about this problem. The default lib to load should definitely be libvulkan though.

@cjay cjay changed the title Missing instructions on how to get vulkan-examples to work with macOS vulkan-api on macOS should use libvulkan instead of libMoltenVK by default, and ideally the same lib as the one GLFW already loaded Mar 18, 2019
@achirkin
Copy link
Owner

Aha, so this really is a vulkan-api issue in the end! Thank you so much for finding this out, I totally forgot about this hard-coded dynamic loader.

Even though quick googling tells that the detection is possible, I think it's going to be more confusing. A user technically can make the first call to Vulkan earlier than to GLFW or other window manager (even though in our case we always call GLFW first to get the list of required extensions). Maybe, I should add something like an environment variable to override the name of the Vulkan-providing library?..

Anyway, it seems to be an universal agreement to load the Vulkan loader rather than ICD, so we definitely need to change the loader shim as you suggested. Could you please do that and also add some instructions how to render triangles on macOS to the readme? Maybe also add something like stack-macos.yaml to the example projects, pointing to the updated GLFW Upgrade-3.3 on github?..

cjay added a commit to cjay/vulkan that referenced this issue Mar 20, 2019
cjay added a commit to cjay/vulkan that referenced this issue Mar 20, 2019
achirkin added a commit that referenced this issue Mar 21, 2019
Linking to the Vulkan loader on macOS, fixes #24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants