Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support For Postgres Full Text Search tsvector Data Type #2727

Closed
wants to merge 18 commits into from

Conversation

anshap1719
Copy link

@anshap1719 anshap1719 commented Sep 1, 2023

Closes #2705

Relates to #729

@anshap1719 anshap1719 marked this pull request as ready for review September 4, 2023 12:33
@anshap1719 anshap1719 changed the title Add Support For Postgres Full Text Search Add Support For Postgres Full Text Search Data Types Sep 4, 2023
Copy link
Collaborator

@abonander abonander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it appears that supporting full text search does not require enabling any optional dependencies, it doesn't need a feature flag. It's perfectly fine for it to be unconditionally compiled, like with the support for ltree and lquery.

sqlx-core/Cargo.toml Outdated Show resolved Hide resolved
@anshap1719
Copy link
Author

Hi @abonander, please approve the workflow run, it should now be fixed.

@anshap1719
Copy link
Author

Hi @abonander, Did you get a chance to look at this again yet?

@abonander
Copy link
Collaborator

I review PRs as I get to them. SQLx is only a small part of my full-time job. Pinging me just fills up my notifications, it won't get your PR reviewed any faster.

@anshap1719
Copy link
Author

@abonander I understand. Apologies for bothering you with those.

@abonander
Copy link
Collaborator

@anshap1719 can you please rebase and fix conflicts one last time? We're ready to merge this.

Note that the type mappings for the macros have moved to their corresponding database drivers:

impl_type_checking!(

@anshap1719
Copy link
Author

Thanks @abonander, I have updated this PR.

@abonander
Copy link
Collaborator

Looks like cargo fmt needs to be run.

@anshap1719
Copy link
Author

@abonander Right, my bad. Done now.

sqlx-postgres/src/types/ts_vector.rs Show resolved Hide resolved
sqlx-postgres/src/types/ts_vector.rs Outdated Show resolved Hide resolved
sqlx-postgres/src/types/ts_vector.rs Outdated Show resolved Hide resolved
sqlx-postgres/src/types/ts_vector.rs Outdated Show resolved Hide resolved
@anshap1719
Copy link
Author

@wyhaya Thanks for the review. Will make the changes over the weekend and will also use all the cases you described as test cases.

@anshap1719
Copy link
Author

@wyhaya Updated everything. Thanks for the code as well.

@anshap1719 anshap1719 requested a review from wyhaya April 8, 2024 12:54
sqlx-postgres/src/types/ts_vector.rs Outdated Show resolved Hide resolved
sqlx-postgres/src/types/ts_query.rs Outdated Show resolved Hide resolved
@anshap1719
Copy link
Author

Hi @abonander Just FYI, this PR no longer covers tsquery. I removed it since I realized it's not ready and needs a lot of work, and the tsvector is already done and we want to avoid blocking that.

@anshap1719 anshap1719 requested a review from wyhaya April 21, 2024 08:50
@anshap1719 anshap1719 changed the title Add Support For Postgres Full Text Search Data Types Add Support For Postgres Full Text Search tsvector Data Type Apr 21, 2024
@KirDontsov
Copy link

Oh, guys sorry, but I really need tsvector support, can you check and merge it? @abonander

Copy link
Collaborator

@abonander abonander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review comments plus a couple nice-to-haves:

  • At least a line or two of documentation on each public type.
  • Encode+decode test for Vec<TsVector>.

}

#[derive(Debug)]
pub struct TsVector {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: structs should be sorted in order of importance, it's weird for TsVector to be buried way down in the file. I would do TsVector, then Lexeme, then LexemeMeta.

Group the structs together, then the impls in the same order (this isn't a hard rule, but it's what I'm starting to prefer).

}
}

impl TryFrom<&[u8]> for TsVector {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be a public trait impl. A static method is fine.

}
}

impl TryInto<Vec<u8>> for &TsVector {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. This should be a private method that takes &mut Vec<u8> and writes to it.

Comment on lines +42 to +73
#[derive(Debug)]
pub struct ParseLexemeMetaError {
kind: IntErrorKind,
}

impl From<ParseIntError> for ParseLexemeMetaError {
fn from(value: ParseIntError) -> Self {
Self {
kind: value.kind().clone(),
}
}
}

#[allow(deprecated)]
impl Display for ParseLexemeMetaError {
fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
self.description().fmt(f)
}
}

impl Error for ParseLexemeMetaError {
fn description(&self) -> &str {
match self.kind {
IntErrorKind::Empty => "cannot parse integer from empty string",
IntErrorKind::InvalidDigit => "invalid digit found in string",
IntErrorKind::PosOverflow => "number too large to fit in target type",
IntErrorKind::NegOverflow => "number too small to fit in target type",
IntErrorKind::Zero => "number would be zero for non-zero type",
_ => "unknown",
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this can be replaced by the following:

Suggested change
#[derive(Debug)]
pub struct ParseLexemeMetaError {
kind: IntErrorKind,
}
impl From<ParseIntError> for ParseLexemeMetaError {
fn from(value: ParseIntError) -> Self {
Self {
kind: value.kind().clone(),
}
}
}
#[allow(deprecated)]
impl Display for ParseLexemeMetaError {
fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
self.description().fmt(f)
}
}
impl Error for ParseLexemeMetaError {
fn description(&self) -> &str {
match self.kind {
IntErrorKind::Empty => "cannot parse integer from empty string",
IntErrorKind::InvalidDigit => "invalid digit found in string",
IntErrorKind::PosOverflow => "number too large to fit in target type",
IntErrorKind::NegOverflow => "number too small to fit in target type",
IntErrorKind::Zero => "number would be zero for non-zero type",
_ => "unknown",
}
}
}
#[derive(Debug, thiserror::Error)]
#[error("error parsing lexeme metadata: {inner}")]
pub struct ParseLexemeMetaError {
#[error(from)]
inner: ParseIntError,
}

@@ -234,6 +234,8 @@ mod mac_address;
#[cfg(feature = "bit-vec")]
mod bit_vec;

mod ts_vector;
Copy link
Collaborator

@abonander abonander Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Lexeme and LexemeMeta are meant to be visible, this needs to be pub so their documentation gets rendered (I wouldn't reexport them here since it would be out of context).

use std::str::FromStr;

#[derive(Debug, Copy, Clone)]
pub struct LexemeMeta {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be pub since it's not used outside this module. Or if it's meant to be accessible then it needs some sort of getter from Lexeme.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also rather useless without at least getters for position and weight.

}

impl TsVector {
pub fn words(&self) -> &Vec<Lexeme> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn words(&self) -> &Vec<Lexeme> {
pub fn lexemes(&self) -> &[Lexeme] {

}

impl Lexeme {
pub fn word(&self) -> &str {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn word(&self) -> &str {
pub fn as_str(&self) -> &str {

Comment on lines +23 to +40
impl From<u16> for LexemeMeta {
fn from(value: u16) -> Self {
let weight = (value >> 14) & 0b11;
let position = value & 0x3fff;

Self { weight, position }
}
}

impl From<&LexemeMeta> for u16 {
fn from(LexemeMeta { weight, position }: &LexemeMeta) -> Self {
let mut lexeme_meta = 0u16;
lexeme_meta = (weight << 14) | (position & 0x3fff);
lexeme_meta = (position & 0xc00) | (weight & 0x3fff);

lexeme_meta
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These impls lack context. Maybe from_packed() and to_packed() as private methods.

@@ -251,6 +253,8 @@ pub use range::PgRange;
#[cfg(any(feature = "chrono", feature = "time"))]
pub use time_tz::PgTimeTz;

pub use ts_vector::TsVector;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A listing should also be added to the table at line 23 above.

@abonander
Copy link
Collaborator

Closing due to inactivity. Feel free to reopen as a new PR, but please address my review comments.

@abonander abonander closed this Jul 22, 2024
@anshap1719
Copy link
Author

@abonander I don't think I will. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Postgres TSVector Support
4 participants