Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the minus operation #45

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
## 0.21.6 - 2024-05-14

* Add the `minus` operation (also known as set difference, or EXCEPT in SQL).

## 0.21.5 - 2024-04-24

* Add `ttl` (Time to leave) to the available options when using the bmg-redis
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,7 @@ r.left_join(right, [:a, :b, ...], {...}) # left join with optional default r
r.left_join(right, {:a => :x, ...}, {...}) # left join after right reversed renaming
r.matching(right, [:a, :b, ...]) # semi join, aka where exists
r.matching(right, :a => :x, :b => :y, ...) # semi join, after right reversed renaming
r.minus(right) # set difference
r.not_matching(right, [:a, :b, ...]) # inverse semi join, aka where not exists
r.not_matching(right, :a => :x, ...) # inverse semi join, after right reversed renaming
r.page([[:a, :asc], ...], 12, page_size: 10) # paging, using an explicit ordering
Expand All @@ -276,7 +277,7 @@ r.transform(:foo => :upcase, ...) # specific-attrs tranformation
r.transform([:to_s, :upcase]) # chain-transformation
r.ungroup([:a, :b, ...]) # ungroup relation-valued attributes within parent tuple
r.ungroup(:a) # shortcut over ungroup([:a])
r.union(right) # relational union
r.union(right) # set union
r.unwrap([:a, :b, ...]) # merge tuple-valued attributes within parent tuple
r.unwrap(:a) # shortcut over unwrap([:a])
r.where(predicate) # alias for restrict(predicate)
Expand Down
10 changes: 10 additions & 0 deletions lib/bmg/algebra.rb
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,16 @@ def _union(type, other, options)
end
protected :_union

def minus(other)
return self if other.is_a?(Relation::Empty)
_minus self.type.minus(other.type), other
end

def _minus(type, other)
Operator::Minus.new(type, [self, other])
end
protected :_minus

def unwrap(attrs)
_unwrap self.type.unwrap(attrs), attrs
end
Expand Down
1 change: 1 addition & 0 deletions lib/bmg/operator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ def inspect
require_relative 'operator/image'
require_relative 'operator/join'
require_relative 'operator/matching'
require_relative 'operator/minus'
require_relative 'operator/not_matching'
require_relative 'operator/page'
require_relative 'operator/project'
Expand Down
43 changes: 43 additions & 0 deletions lib/bmg/operator/minus.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
module Bmg
module Operator
#
# Minus operator.
#
# Returns all tuples which are in the left operand but not
# in the right operand.
#
# This implementation is actually a NAry-Minus, since it handles
# an arbitrary number of operands.
#
class Minus
include Operator::Nary

def initialize(type, operands)
@type = type
@operands = operands
end

public

def all?
false
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was under the impression some part of translation of nary operations expected it, but you're right it's not needed, will remove.


def each(&bl)
return to_enum unless block_given?
initial = operands[0].to_a
tuples = operands.drop(1).inject(initial) do |agg, op|
agg - op.to_a
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not the optimal implementation, but I suppose we don't really care about it (?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe using Ruby's Set? Which I presume uses object hashes? That should be reasonably good.
I'll do a perf test to check that it's actually faster (although the code will be very similar).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this. My benchmark went from ~25s to ~1s.

See: https://gist.github.com/felixyz/7151d6acc1ef3d0d674caeacc9538862

Slight worry: the reported remaining number of tuples went down from ~55k to ~42k.

The optimization commit is here: d75847b

tuples.each do |t|
yield t
end
end

def to_ast
[ :except ] + operands.map(&:to_ast)
end

end # class Union
end # module Operator
end # module Bmg
10 changes: 10 additions & 0 deletions lib/bmg/sql/relation.rb
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,16 @@ def _union(type, right, options)
end
end

def _minus(type, right)
if right_expr = extract_compatible_sexpr(right)
expr = before_use(self.expr)
expr = Processor::Merge.new(:except, false, right_expr, builder).call(expr)
_instance(type, builder, expr)
else
super
end
end

# Build a new relation instance for some new type & expression
#
# This method can be overriden by subclasses to provide their
Expand Down
19 changes: 16 additions & 3 deletions lib/bmg/type.rb
Original file line number Diff line number Diff line change
Expand Up @@ -287,20 +287,33 @@ def ungroup(attrlist)
}
end

def union(other)
def check_union_compatible(other, opname)
if typechecked? && knows_attrlist? && other.knows_attrlist?
missing = self.attrlist - other.attrlist
raise TypeError, "Union incompatible: missing right attributes #{missing.join(', ')}" unless missing.empty?
raise TypeError, "#{opname} requires compatible attribute lists, but the right operand is missing the following attributes: #{missing.join(', ')}" unless missing.empty?
extra = other.attrlist - self.attrlist
raise TypeError, "Union incompatible: missing left attributes #{extra.join(', ')}" unless extra.empty?
raise TypeError, "#{opname} requires compatible attribute lists, but the left operand is missing the following attributes: #{extra.join(', ')}" unless extra.empty?
end
end

def union(other)
check_union_compatible(other, "Union")
dup.tap{|x|
### attrlist stays the same
x.predicate = self.predicate | predicate
x.keys = self._keys.union(self, x, other) if knows_keys?
}
end

def minus(other)
check_union_compatible(other, "Minus")
dup.tap{|x|
### attrlist stays the same
x.predicate = self.predicate & predicate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that's correct.

  • self.predicate is certainly correct and safe (but not very strong).
  • Intuitively, I would say self.predicate & !predicate but it should be further checked though...

x.keys = self._keys.union(self, x, other) if knows_keys?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's correct. We simply keep the keys of the left operand I would say, and gain no other one.

}
end

def unwrap(attrlist)
known_attributes!(attrlist) if typechecked? && knows_attrlist?
dup.tap{|x|
Expand Down
2 changes: 1 addition & 1 deletion lib/bmg/version.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ module Bmg
module Version
MAJOR = 0
MINOR = 21
TINY = 5
TINY = 6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking we should upgrade to 0.22.0

end
VERSION = "#{Version::MAJOR}.#{Version::MINOR}.#{Version::TINY}"
end
45 changes: 45 additions & 0 deletions spec/integration/sequel/base/minus.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
- bmg: |-
suppliers.minus(suppliers)
sqlite: |-
SELECT
`t1`.`sid`,
`t1`.`name`,
`t1`.`city`,
`t1`.`status`
FROM
`suppliers` AS 't1'
EXCEPT
SELECT
`t1`.`sid`,
`t1`.`name`,
`t1`.`city`,
`t1`.`status`
FROM
`suppliers` AS 't1'
- bmg: |-
suppliers.minus(suppliers).minus(suppliers)
sqlite: |-
SELECT
`t1`.`sid`,
`t1`.`name`,
`t1`.`city`,
`t1`.`status`
FROM
`suppliers` AS 't1'
EXCEPT
SELECT
`t1`.`sid`,
`t1`.`name`,
`t1`.`city`,
`t1`.`status`
FROM
`suppliers` AS 't1'
EXCEPT
SELECT
`t1`.`sid`,
`t1`.`name`,
`t1`.`city`,
`t1`.`status`
FROM
`suppliers` AS 't1'
36 changes: 36 additions & 0 deletions spec/unit/operator/test_minus.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
require 'spec_helper'
module Bmg
module Operator
describe Minus do

let(:r1) {
[
{ a: 1 },
{ a: 2 },
{ a: 3 },
{ a: 4 },
{ a: 5 }
]
}

let(:r2) {
[
{ a: 1 },
{ a: 4 }
]
}

let(:r3) {
[
{ a: 3 }
]
}

it 'works' do
difference = Minus.new(Type::ANY, [r1, r2, r3])
expect(difference.to_a).to eql([{ a: 2 },{ a: 5 }])
end

end
end
end
Loading