Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cheaper search #44475

Closed
cjyabraham opened this issue Dec 22, 2023 · 34 comments
Closed

Implement cheaper search #44475

cjyabraham opened this issue Dec 22, 2023 · 34 comments
Labels
area/localization General issues or PRs related to localization area/web-development Issues or PRs related to the kubernetes.io's infrastructure, design, or build processes kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@cjyabraham
Copy link
Contributor

CNCF pays for the current search on this site. It uses both Google Custom Search (for requests outside of China) and Bing Search API (for requests in China). Due to recent changes in Bing Search pricing, this is now costing CNCF too much money so we should implement a different search solution such as those suggested here.

These other search-related issues could be addressed at the same time:

@cjyabraham cjyabraham added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 22, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Dec 22, 2023
@sftim
Copy link
Contributor

sftim commented Dec 22, 2023

What prioritization would you put on this, @cjyabraham ?

/area web-development

@k8s-ci-robot k8s-ci-robot added the area/web-development Issues or PRs related to the kubernetes.io's infrastructure, design, or build processes label Dec 22, 2023
@cjyabraham
Copy link
Contributor Author

I would say important-soon

@dims
Copy link
Member

dims commented Dec 22, 2023

@cjyabraham How much is it costing us? (may be drop an email to steering-private if not public?)

@cjyabraham
Copy link
Contributor Author

November cost $2400 just for the Bing API costs. I don't know Google off-hand.

@natalisucks
Copy link
Contributor

@cjyabraham Thanks for opening this issue. With my SIG Docs co-chair hat on, I'd like to know what extra resources the CNCF plan to allocate to this work, given that SIG Docs as a whole at the moment doesn't have the people to work on this. Let's chat about this in the new year at one of our bi-weekly community meetings so that we can understand the scope and give a better estimate of effort in partnership with you.

@sftim
Copy link
Contributor

sftim commented Dec 23, 2023

/priority important-soon

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Dec 23, 2023
@cjyabraham
Copy link
Contributor Author

(@natalisucks I don't know about official allocations of resources from CNCF but @caniszczyk may be able to help...)

As it is, I did a bit of research and got quite excited by this relatively new Hugo search option: Pagefind. I've tried it out on the kubernetes.io site on this PR with preview here: https://deploy-preview-44530--kubernetes-io-main-staging.netlify.app/

You can play with it there on the homepage just as a POC. LMK how that performs from your POV.

FYI you can learn more about Pagefind by watching this vid presenting how it works in general and what features are in the v1.0 release.

@sftim
Copy link
Contributor

sftim commented Dec 26, 2023

The performance (latency) for #44530 looks fine to me. We'd want to tweak the UI. Also, the relevance of results is a lot worse than the Google search.

For example, try searching for “feature gates”.

@sftim
Copy link
Contributor

sftim commented Dec 27, 2023

I suggest we focus on getting to a UI that works and won't provoke issues (right now, if we merged #44530, people would file bug reports). Then we can merge that and iterate on the quality of the search results.

Maybe there is a search API service that is more affordable and that we can use in combination with PageFind (or equivalent). That would be more work, but this is open source. Someone might actually relish the challenge.

@cjyabraham
Copy link
Contributor Author

cjyabraham commented Dec 27, 2023

A few thoughts:

  • I suspect Google Custom Search, which powers all non-China search queries on kubernetes.io, is free, so we could keep that in place if it's better than pagefind. What motivated this issue initially is that the cost of the Bing custom search (used for China users) has gone from $40/month to $2400/month since we've had to switch on to their new API and pricing model.
  • Pagefind does have a number of ways it can be fine-tuned which we could experiment with. That said, it may never be as good as Google.

Open questions:

  1. Can someone verify if Google Custom Search is, in fact, free, for kubernetes.io? I don't have access to its control panel, however, I run it for several other CNCF sites and it's free for all of them since we're nonprofit etc.
  2. If Pagefind results are "good enough", should we use it for China requests instead of Bing? If so, I can refine the PR to improve the UI before submitting it for review.
  3. Would it be worth investigating if Bing would be willing to donate credits for this since it's Kubernetes? Not sure if anyone has contacts there...

@sftim
Copy link
Contributor

sftim commented Dec 27, 2023

If Pagefind results are "good enough", should we use it for China requests instead of Bing? If so, I can refine the PR to improve the UI before submitting it for review.

Seems legit. Help is welcome from people whose access to Google Search is blocked through national policy - or from anyone else who'd like to help.

Most people (≅ 67%) in China speak mandarin Chinese as their main language, so we should check more than superficially around at the search results in that language. I know Russia has state censorship too; their main language is Russian; again, the quality of indexing for English is not much of a guide.

@sftim
Copy link
Contributor

sftim commented Dec 27, 2023

On whether Kubernetes pays for the Google Search: I suspect it's easy to check, but hard to find contributors who know who to ask.

@cjyabraham
Copy link
Contributor Author

@sftim
Copy link
Contributor

sftim commented Dec 27, 2023

  • maybe we can rally some contributions into PageFind
  • maybe there is an alternative tool that is particularly good at indexing ideographic languages?

Help is most definitely welcome.

/area localization

@k8s-ci-robot k8s-ci-robot added the area/localization General issues or PRs related to localization label Dec 27, 2023
@sftim
Copy link
Contributor

sftim commented Dec 27, 2023

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 27, 2023
@cjyabraham
Copy link
Contributor Author

Algolia DocSearch is another option we could try. It's free, however, it would add a dependency on the Algolia crawler and search engine, and we'd need to display their branding.

@natalisucks
Copy link
Contributor

@cjyabraham Our next SIG Docs community meeting is on January 9th at 18:30 UTC and I'll be adding this to our agenda for discussion. It would be great to see you there, and joining the SIG Docs Google Group will give you access to the calendar invitation for that call. We'll also post the link to our Zoom meeting in the #sig-docs channel on Kubernetes Slack on the same day

@cjyabraham
Copy link
Contributor Author

Hi @natalisucks , unfortunately that meeting would be at 1:30AM my time so I don't think I'll be able to make it. If you have any questions for me in the run up to the meeting please let me know.

@natalisucks
Copy link
Contributor

natalisucks commented Jan 4, 2024

@cjyabraham Per my comment here, I'd really like a rep from the CNCF to come and chat with us so we can understand how we can work in tandem with you, once again, given resources. Let me know who else I should reach out to. If @kubernetes/steering-committee members also wish to attend and discuss, alongside anyone else involved in the thread on Slack, that would be great.

@idvoretskyi
Copy link
Member

Looping @jeefy @castrojo in here :)

@jeefy
Copy link
Member

jeefy commented Jan 4, 2024

I unfortunately have a conflict during the next SIG-Docs meeting, hopefully @onlydole might be able to join.

I think in terms of timing, we should try to target no later than KubeCon EU, but if we can all rally together sooner, dope.

Another thing to consider is potentially using a more local search engine. Example: Baidu's a member and also services the region. So... :) Those are my immediate two cents.

@caniszczyk
Copy link

You may want to chat with @nate-double-u about experience using https://lunrjs.com on CNCF projects

@natalisucks
Copy link
Contributor

@onlydole Let's chat at the meeting next Tuesday then so we can do some planning and resource work. Thanks!

@onlydole
Copy link
Member

onlydole commented Jan 4, 2024

You betcha! I'll read up on some of these options and will be prepared to chat with our group next Tuesday.

@dipesh-rawat
Copy link
Member

There are few open-source and commercial search alternatives listed in the Hugo documentation (https://gohugo.io/tools/search).

@nate-double-u
Copy link
Contributor

You may want to chat with @nate-double-u about experience using https://lunrjs.com on CNCF projects

I'm just back from vacation now and am catching up on things, always happy to help tho, just need a bit to read the various discussions that have already happened.

@pacoxu
Copy link
Member

pacoxu commented Jan 17, 2024

In Slack thread https://kubernetes.slack.com/archives/CPNFRNLTS/p1703270890381669, we have some inputs.

Some possible choices:

  1. https://docsearch.algolia.com/ is another choice. This is free to opensource. I am not sure if CNCF projects can use it.
  • Kevin: For https://docsearch.algolia.com/, there are some CNCF projects integrated already on their website, but we may need more research on the search quality part.
  1. Another choice is using a redirect to bing.com directly for free.
  2. Currently, can we just support local search for kubernetes.io? I am not sure if Hugo can support the local search like VitePress, which supports fuzzy full-text search using a in-browser index.

A question is there "Are we okay with ads on the chinese side of things? ".

@dylantientcheu
Copy link

Hey there!

At Algolia we love K8s. Indeed DocSearch is open and we'll be happy to provide an awesome (and free) experience your multi language documentation website.

Here's a demo we've set up - https://7cnxmj.csb.app/

@sftim
Copy link
Contributor

sftim commented Feb 25, 2024

What's the next step for this issue? I can't tell.

@natalisucks
Copy link
Contributor

@sftim Hi Tim – as per our SIG Docs meeting on January 9th and the feedback shared there, @onlydole attended and has been tasked with the next step of the CNCF to come up with a plan and further research to improve search – due to the nature of this being a CNCF cost challenge, and not a strictly user/contributor-facing issue that SIG Docs leads would prioritize and lead work on. Taylor will hopefully be updating us on this issue when that plan and/or research is ready to share

@onlydole
Copy link
Member

onlydole commented Mar 5, 2024

Howdy, folks - we will be prioritizing this on the CNCF side after KubeCon + CloudNativeCon EU in Paris wraps up, and @nate-double-u will lead on that effort.

@nate-double-u
Copy link
Contributor

Hi @dylantientcheu, Thanks for building that demo. Could you contact me (natew@cncf.io)? I'd like to chat with you about it.

@nate-double-u
Copy link
Contributor

nate-double-u commented Jun 10, 2024

Sorry for the late update here. I updated the sig-docs meeting notes, but not this issue.

I've done some research and think that PageFind is the best candidate to replace Bing. Algolia was in the running, but between their fee structure and the fact that going with them will add another dependency, we're best off going with PageFind. PageFind is well used around the CNCF and we have some experience supporting it. It's designed to build its index as a part of the regular site build process, making the search a part of the site, sidestepping any issue we may run into with firewalled locations.

  • Regarding Algolia: I spoke with Algolia; pricing depends on traffic. We'd probably need the Grow plan (production) level, which offers 10,000 free calls (50¢/1000 calls after).

We should change as little as possible here, so we should only replace Bing, affecting only the localizations with Bing as their search provider (i.e., I don't think we should remove or update any other search providers).

@cjyabraham has made progress based on @sftim's work to implement PageFind for Chinese users and will soon open a PR for discussion/review.

@cjyabraham
Copy link
Contributor Author

PageFind solution has been deployed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/localization General issues or PRs related to localization area/web-development Issues or PRs related to the kubernetes.io's infrastructure, design, or build processes kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests