Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add native support to bitstring #4328

Merged
merged 21 commits into from
Feb 27, 2024

Conversation

Gigitsu
Copy link
Contributor

@Gigitsu Gigitsu commented Nov 23, 2023

Recently, I had the need to store a variable-size bitstring in a PostgreSQL database. I was surprised to find that there isn't native support in Ecto for this data type, and the :string type doesn't work for me since I need to store a string where the number of bits is not a multiple of 8.

This PR is currently just a draft, but if this could be a desired feature, I would be happy to implement the entire feature set, including an Ecto.Query set of bitwise operators to work with bits and a corresponding Ecto schema type.

@josevalim
Copy link
Member

We will be happy to add this but I believe your first challenge will be adding support for bit strings in Postgrex itself, unless it already works? :)

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Nov 23, 2023

I'm currently using a custom Ecto type with Postgrex, which does pretty much what I've already implemented in this PR, and everything works happily :D

If you agree, I can go on with this PR and make a sample project to end-to-end test this feature with Postgresx.

@josevalim
Copy link
Member

For testing, please add some integration tests in the integration_test folder. Check the README for more info! You can add some tags so we skip these tests on MySQL and other adapters.

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Nov 23, 2023

Great! Thank you.

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Nov 25, 2023

This is the ecto_sql companion PR elixir-ecto/ecto_sql#577

@greg-rychlewski
Copy link
Member

greg-rychlewski commented Nov 25, 2023

It's looking good, thank you. I think only a couple things are missing:

Ecto

  1. Integration tests. We can add them to this repo and tag them @tag :bitstring_type as Jose suggested. Then adapters that don't support it can ignore the tests

Ecto SQL

  1. I believe we can add support for bitwise functions band, bor, bxor, bnot, bsl, bsr. This would be the only way to expose the Postgres xor operator through Ecto. It would involve creating patterns like this for those functions https://github.com/elixir-ecto/ecto_sql/blob/master/lib/ecto/adapters/postgres/connection.ex#L871. Care has to be taken about wrapping in parentheses
  2. Similar to point 2 we can create a pattern for the ~~~ operator to support negation

@greg-rychlewski
Copy link
Member

The integration tests can go here: https://github.com/elixir-ecto/ecto/blob/master/integration_test/cases/type.exs

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Nov 25, 2023

I've just added the pattern for ~~~ :D and I'm working on ecto_sql to convert that into a SQL string.

I'll then add support for bitwise functions and integration tests.

Thank you for your support :) really appreciated.

@Gigitsu Gigitsu marked this pull request as ready for review November 25, 2023 18:52
@greg-rychlewski
Copy link
Member

greg-rychlewski commented Nov 26, 2023

@Gigitsu Could you please revert the last commit? We wouldn't add arbitrary casting with size for any type. I think we would potentially add a parameterized type for bitstrings where the size is the parameter. But also please hold off on that until Jose has time to catch up with our conversations. Thank you.

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Nov 26, 2023

Everything reverted also here!

@greg-rychlewski
Copy link
Member

greg-rychlewski commented Feb 17, 2024

@josevalim there was one other part we were discussing in the ecto_sql PR. it was how to handle literals. one suggestion was to handle it in the builder doing something like this

@always_tagged [:bitstring]

defp do_literal(value, _, current) when current in @always_tagged do
  type = 
    if current == :bitstring do
      quote do: if is_binary(unquote(value)), do: :binary, else: :bitstring
    else
      current
    end

  {:%, [], [Ecto.Query.Tagged, {:%{}, [], [value: value, type: type]}]}
end

and the other suggestion was to handle it in the adapter. Do you have any preference?

@josevalim
Copy link
Member

Let’s handle it in Ecto, I think it would be wrong to say it is a binary when it is actually a bitstring.

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Feb 18, 2024

@josevalim I've removed every implementation of bitwise operators and added support to bitstring in migrations too both here and it the ecto_sql companion PR elixir-ecto/ecto_sql#577

I'm also trying to implement integration tests but I could use some help here.

I've managed to add a bitstring test that works for Postgres, but it seems that I'm having some trouble making it work for MySQL.
It appears that the MySQL connector doesn't handle bitstrings in a where clause like Postgres does (I need to investigate this further). If MySQL can't actually handle bitstrings in a where clause, how can I manage these different behaviors between databases since the integration tests are the same?

@greg-rychlewski
Copy link
Member

@Gigitsu Are you able to rebase on master and resolve the conflicts? We'll be better able to help if we can see the test results here. Thank you.

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Feb 25, 2024

Hi @josevalim @greg-rychlewski , I've updated the Earthfile to make it compatible with ARM (Apple Silicon in my case) CPUs. However, I only have experience with Docker and haven't used Earthly before, so please take a look at it.

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Feb 25, 2024

Let’s handle it in Ecto, I think it would be wrong to say it is a binary when it is actually a bitstring.

To correctly tag bitstring literal, I have two possible ways:

  1. I'm trying to implement a function that detects a quoted bitstring like this:
  # Tagged
  def quoted_type({:<<>>, _, expr}, _vars), do: is_quoted_binary(expr) && :binary || :bitstring
  def quoted_type({:type, _, [_, type]}, _vars), do: type

  defp is_quoted_binary(args) when is_list(args) do
    Enum.all?(args, fn
      {:"::", _, [_, size]} -> rem(size, 8) == 0
      {:"::", _, [_, {spec, _, size}]} when spec in [:size] -> rem(size, 8) == 0
      _ -> true
    end)
  end

but I need to know every possible size specifier that can produce a bitstring that is not a binary

  1. The second approach is to use code eval as suggested by @greg-rychlewski here
  # Tagged
  def quoted_type({:<<>>, _, _} = expr, _vars), do: is_quoted_binary(expr) && :binary || :bitstring
  def quoted_type({:type, _, [_, type]}, _vars), do: type

  defp is_quoted_binary(expr), do: is_binary(Code.eval_quoted(expr))

Option 2 is easier and safer than option 1, but I'm unsure if there could be any issues when evaluating the code. Option 1, on the other hand, is harder to implement.

@josevalim @greg-rychlewski I could really use a second opinion on this :)

@greg-rychlewski
Copy link
Member

Hi @Gigitsu ,

The latest suggestion I had was to do something like this #4328 (comment).

WDYT?

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Feb 25, 2024

The latest suggestion I had was to do something like this #4328 (comment).

Sorry, I forgot about this :(

So quoted_type would always return :bitstring and we will do the check at runtime. It could be the easiest and safest solution possible. It will also add a little bit of runtime complexity but I think it's not a big deal.

I don't have a strong preference for any of the solutions, though. Let's wait for @josevalim.

Thank you @greg-rychlewski

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Feb 26, 2024

Maybe this is a bad idea, but we can further simplify things by always returning a third value (e.g. :maybe_binary or :binary_or_bitstring):

def quoted_type({:<<>>, _, expr}, _vars), do: :binary_or_bitstring

and then check if the literal is a binary or a bitstring in ecto_sql:

    defp expr(%Ecto.Query.Tagged{value: binary, type: type}, _sources, _query)
         when type in [:binary_or_bitstring, :binary] and is_binary(binary) do
      ["'\\x", Base.encode16(binary, case: :lower) | "'::bytea"]
    end

    defp expr(%Ecto.Query.Tagged{value: bitstring, type: type}, _sources, _query)
         when type in [:binary_or_bitstring, :bitstring] and is_bitstring(bitstring) do
      bitstring_literal(bitstring)
    end

In this way, we can avoid double-checking and postpone it until necessary, and the user is still able to explicitly tag a literal with either :bitstring or :binary, getting the expected behaviour.

@josevalim
Copy link
Member

This is a very cheap check, so we can check it more than once. But you are right, at compile-time we don't know the type, we need to move it to runtime. Although I would say at compile-time it should say :binary (and we treat bitstring as a specialization).

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Feb 26, 2024

I've updated both PR and I think that they are ready now.

If you agree, I only have to change back the ecto_sql deps from my repo to the ecto one, ( here and here ) before merging.

@greg-rychlewski
Copy link
Member

greg-rychlewski commented Feb 26, 2024

I think we want to move the conversion from the adapters to Ecto similar to this: #4328 (comment)

But if I understand Jose right, he wanted @always_tagged to be :binary and :bitstring to be the special case rather than the reverse.

Copy link
Member

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Gigitsu! @greg-rychlewski, please ship it if you are also happy!

@greg-rychlewski
Copy link
Member

@josevalim Sorry just wanted to confirm one thing before shipping. Did you want to send non-binary bitstring literals to the adapters as :binary or did you want to convert them to :bitstring in Ecto?

@greg-rychlewski
Copy link
Member

There's a couple reasons I prefer doing it in Ecto:

  1. We don't send the wrong type to the adapter
  2. With the current change in ecto sql, if a user manually tags the :binary type and provides a value that is not binary, it will just be interpreted as bitstring. But this can silently allow an error to continue. Probably they should have to be explicit to use the new type?
defp expr(%Ecto.Query.Tagged{value: binary, type: :binary}, _sources, _query)
         when is_binary(binary) do
      ["'\\x", Base.encode16(binary, case: :lower) | "'::bytea"]
    end

    defp expr(%Ecto.Query.Tagged{value: bitstring, type: type}, _sources, _query)
         when type in ~w(bitstring binary)a and is_bitstring(bitstring) do
      bitstring_literal(bitstring)
    end

@josevalim
Copy link
Member

If we can do it in Ecto, it would be preferable. Perhaps during normalization?

@greg-rychlewski
Copy link
Member

greg-rychlewski commented Feb 27, 2024

We could probably do it here, yeah: https://github.com/elixir-ecto/ecto/blob/master/lib/ecto/query/planner.ex#L1333

Though is the reason not to do something like this in the builder because the exact type might be needed somewhere else when building the query?

defp do_literal(value, _, current) when current in @always_tagged do
  type =  quote do: if is_binary(unquote(value)), do: :binary, else: :bitstring

  {:%, [], [Ecto.Query.Tagged, {:%{}, [], [value: value, type: type]}]}
end

@josevalim
Copy link
Member

The do_literal approach may work too.

@greg-rychlewski
Copy link
Member

@Gigitsu Sorry for all the back and forth. It's your choice:

  1. We can merge as is and I will try making it work in ecto in another PR
  2. You could try to see if the do_literal approach works in this PR

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Feb 27, 2024

@greg-rychlewski no problem, I can do it in this PR. I'm just not sure which is the best approach but you know way better than me the Ecto internals so I trust your judgment.

Earthfile Outdated Show resolved Hide resolved
@greg-rychlewski
Copy link
Member

thanks @Gigitsu! I'm updated the ecto_sql pointer and will merge this. once it's merged could you please have the ecto sql PR point to the latest commit on Ecto and we will merge it.

@greg-rychlewski greg-rychlewski merged commit bed81b9 into elixir-ecto:master Feb 27, 2024
3 of 6 checks passed
@Gigitsu
Copy link
Contributor Author

Gigitsu commented Feb 27, 2024

Sure!! Thank you

@Gigitsu Gigitsu deleted the feature/bitstring-support branch February 27, 2024 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants