Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search to accents not working with sqlite #18429

Closed
neszt opened this issue Jan 27, 2022 · 3 comments · Fixed by #18437
Closed

Search to accents not working with sqlite #18429

neszt opened this issue Jan 27, 2022 · 3 comments · Fixed by #18437
Labels

Comments

@neszt
Copy link

neszt commented Jan 27, 2022

Gitea Version

1.15.10

Git Version

git version 2.30.2

Operating System

Alpine Linux 3.13.7

How are you running Gitea?

Official docker image deployed with docker-compose:

$ cat docker-compose.yml
version: "3.3"
services:
  server:
    restart: always
    image: gitea/gitea:${GITEA_VERSION}
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /usr/share/zoneinfo:/usr/share/zoneinfo:ro
      - ${DATA}:/data
      - ${ENTRYPOINT}:/usr/bin/entrypoint
    ports:
      - "${HTTP_PORT}:3000"
      - "${SSH_PORT}:22"
$ cat .env
GITEA_VERSION=1.15.10
HTTP_PORT=3000
SSH_PORT=2222
DATA=./data
ENTRYPOINT=./custom/entrypoint

Database

SQLite

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Description

I use a custom indexer setting to search for word fragments as well.

bash-5.1# tail -2 /data/gitea/conf/app.ini
[indexer]
ISSUE_INDEXER_TYPE = db

But unfortunately, accented search doesn't work without this option. (neither in the title nor in the content)

I have another gitea system installed with a postgesql database engine, the accented search works correctly there, so i think sqlite is the cause of the malfunction.

Is there any sqlite specific language option I should set?

Screenshots

Without search:

image

With accent search:

image

With accent free search:

image

@Gusted
Copy link
Contributor

Gusted commented Jan 27, 2022

Hmm, SQLite is here wrong it seems. We are currently upper casing all characters of the keyword(to avoid case sensitive searches). Go is handling uppering accent characters: https://go.dev/play/p/MBiTQq-3mAl, however SQLite only supports uppering ASCII characters:
image
image

And thus these 2 will never match each other... I think the best option is to have a special ToUpper function for SQLite that only uppers ASCII strings and thus match SQLite's specs.

@zeripath
Copy link
Contributor

Likely we should be letting the DB do the UPPER() and not expecting the go collation to be the same as the db.

(I would also note that we should probably be considering unicode normalizing the strings...

For example, ë and ë are not the same character.)

@Gusted
Copy link
Contributor

Gusted commented Jan 27, 2022

@zeripath Unless I'm misunderstanding something, but it doesn't seem like we can use the UPPER function for provided values(as in, the values for ?):

image

Gusted pushed a commit to Gusted/gitea that referenced this issue Jan 28, 2022
- Use `ToASCIIUpper` for SQLite database on issues search, this because
`UPPER(x)` on SQLite only transforms ASCII letters.
- Resolves go-gitea#18429
wxiaoguang pushed a commit that referenced this issue Feb 1, 2022
Use `ToASCIIUpper` for SQLite database on issues search, this because `UPPER(x)` on SQLite only transforms ASCII letters. Resolves #18429
Chianina pushed a commit to Chianina/gitea that referenced this issue Mar 28, 2022
Use `ToASCIIUpper` for SQLite database on issues search, this because `UPPER(x)` on SQLite only transforms ASCII letters. Resolves go-gitea#18429
@go-gitea go-gitea locked and limited conversation to collaborators Apr 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants