Skip to content

Find-Module -Command need perf improvement #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
SteveL-MSFT opened this issue May 25, 2018 · 11 comments
Open

Find-Module -Command need perf improvement #31

SteveL-MSFT opened this issue May 25, 2018 · 11 comments
Milestone

Comments

@SteveL-MSFT
Copy link
Member

As part of PowerShell/PowerShell#1982, if the user types a command that doesn't exist, we want to be able to search PSGallery and inform the user to install a module. However, the current execution of Find-Module -Command foo takes ~4.5s, this needs to be closer to 100ms (or less).

@edyoung
Copy link

edyoung commented May 25, 2018

We can look at the speed but since people can register arbitrary repositories, they can be on flaky networks, etc etc the caller will definitely need to apply a timeout in order to ensure bounded latency.

@SteveL-MSFT
Copy link
Member Author

SteveL-MSFT commented May 30, 2018

We would probably limit the search to PSGallery by default and allow the user to add custom repositories if there's a need

@iricigor
Copy link

iricigor commented Jun 2, 2018

Not sure if this is a proper place to suggest it, but I think it is a related functionality.

I think PowerShellGet should provide functionality similar to apt-get update which will simply download list of modules and their basic properties (i.e. version, commands list, etc). Then Find-Module -Command or its equivalent can just look into available cache and it will not depend on the network.

Today, command Find-Module with no arguments (or with -Repository PSGallery) provides the same data, but it is very slow (more than 2 minutes in my test). It gives about 30MB of data which can be compressed to 1MB, so network transfer time should not be the issue. This should be first enabled on repository side and then clients can get commands for it.

@jzabroski
Copy link

@iricigor Finally, someone who understands how package managers are supposed to work, back when we had 56k modems and debian key rings for package security.

@jzabroski
Copy link

We would probably limit the search to PSGallery by default and allow the user to add custom repositories if there's a need

@SteveL-MSFT Your ultimate design problem is going to be the fact you are tightly coupled to nuspec and have no way to declare to nuspect specific sub-package artifacts like commands. You can see your design flaw right here, where you accumulate PowerShell Module-specific entities in a general purpose "Tags" field":

https://github.com/PowerShell/PowerShellGet/blob/f0728e01a5f972b2afbde0fd3acfee498d36ed21/PowerShellGet/private/functions/Publish-PSArtifactUtility.ps1#L287-L292

https://github.com/PowerShell/PowerShellGet/blob/f0728e01a5f972b2afbde0fd3acfee498d36ed21/PowerShellGet/private/functions/Publish-PSArtifactUtility.ps1#L302-L305

https://github.com/PowerShell/PowerShellGet/blob/f0728e01a5f972b2afbde0fd3acfee498d36ed21/PowerShellGet/private/functions/Publish-PSArtifactUtility.ps1#L314

Even with lucene on the backend, you will not be able to quickly search on an elasticsearch index. Plus, you're going to be decoding things, when you really just want a HDB document type that supports a list of commands as a field type.

@SteveL-MSFT
Copy link
Member Author

I like the suggestion to adopt the apt update model. However, my original thinking is that the backend opens up the nupkg and puts the metadata into something that can be queried quickly (like AzureTable). There should not be any decoding during search time.

@jzabroski
Copy link

jzabroski commented Sep 25, 2018

Please see PowerShell/PowerShellGetv2#335 where I suggest breaking Publish-Module into smaller pieces. The smaller pieces can then be used to define a cleaner interface for metadata.

Have you used ElasticSearch? I ask because I don't understand how AzureTable solves your problems. To the best of my knowledge and searching, AzureTable doesn't support wildcard operators. But my searching was from 2013, so maybe it's changed, but if the currently supported operators are available here, then it's not supported and it's a really bad design choice. It looks like the best it can do is prefix matching.

Also, while AzureTable is highly available, I am unaware of it being used for fast querying.

Edit: To be clear, your SLA should be 10 milliseconds or effectively the time of your latency from pinging the remote server - ElasticSearch is effectively that fast.

@iricigor
Copy link

iricigor commented Nov 1, 2018

I took a liberty and did proof-of-concept module psaptgetupdate. It finds information about commands, modules or scripts in about 20ms via local cache.

You can test it with:

Install-Module psaptgetupdate -Scope CurrentUser
# test functionality
Find-ModuleFromCache PlatyPS
# check speed, it does 50 runs in under the second
Measure-Command {1..50 | % {Find-ModuleFromCache PlatyPS}} | Select TotalMilliseconds 

Local cache can be updated when needed (via command psaptgetupdate) and the initial update is done when the module is imported. The whole update takes less than 5 seconds. The update is served as a pre-prepared package from Azure, as explained on the design page.

This module also modifies the CommandNotFoundAction error handler, to suggest how to install missing command (see it in action). Also, module can be used to update all modules on your system. It runs for 5 seconds total, where 3-4s are spent on querying systerm for existing modules. It checks 20-30 modules per second! Try it with:

Update-ModuleFromCache -WhatIf

@edyoung
Copy link

edyoung commented Nov 2, 2018

Hi Igor, thanks very much for showing how much this can help. There's a lot of potential! We'll need to give it a bit of thought to see how best to incorporate in PSGet - things like what to do with multiple repositories. I noticed that Find-CommandFromCache get-azvm returns all the commands which start with get-azvm rather than just 'get-azvm', which is likely an easy change.

FYI @SydneyhSmith

@iSazonov
Copy link

iSazonov commented Nov 2, 2018

It will nice to have a public API for fast search from the local cache - we could use the API in PowerShell engine for Help and IntelliSense subsystems.

@jzabroski
Copy link

jzabroski commented Nov 2, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants