Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added reference_proxy and bitset_view #13

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,11 @@ OPTION(BUILD_TESTS "sparrow test suite" OFF)
# =====

set(SPARROW_HEADERS
${SPARROW_INCLUDE_DIR}/sparrow/array_data.hpp
${SPARROW_INCLUDE_DIR}/sparrow/buffer.hpp
${SPARROW_INCLUDE_DIR}/sparrow/data_type.hpp
${SPARROW_INCLUDE_DIR}/sparrow/dynamic_bitset.hpp
${SPARROW_INCLUDE_DIR}/sparrow/iterator.hpp
${SPARROW_INCLUDE_DIR}/sparrow/sparrow_version.hpp
)

Expand Down
187 changes: 187 additions & 0 deletions include/sparrow/array_data.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#pragma once

#include <climits>
#include <cstdint>
#include <vector>

#include "sparrow/buffer.hpp"
#include "sparrow/dynamic_bitset.hpp"
#include "sparrow/data_type.hpp"

namespace sparrow
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some documentation for each one of these types?

{
struct array_data
{
data_descriptor type;
std::int64_t length = 0;
std::int64_t offset = 0;
using block_type = std::uint8_t;
using bitmap_type = dynamic_bitset<block_type>;
// bitmap buffer and null_count
bitmap_type bitmap;
// Other buffers
std::vector<buffer<block_type>> buffers;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why define bitmap_type but not buffer_type? With or without aliases are both fine to me, although I would prefer if it was consistent.

std::vector<array_data> child_data;
Comment on lines +36 to +37
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be used instead?

Suggested change
std::vector<buffer<block_type>> buffers;
std::vector<array_data> child_data;
buffer<buffer<block_type>> buffers;
buffer<array_data> child_data;

};

struct null_type
{
};
constexpr null_type null;

template <class T>
class reference_proxy
{
public:

using self_type = reference_proxy<T>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference goes to this_type but ok

using value_type = T;
using reference = T&;
using const_reference = const T&;
using pointer = T*;
using size_type = std::size_t;

reference_proxy(reference ref, size_type index, array_data& ad);
reference_proxy(const self_type&) = default;
reference_proxy(self_type&&) = default;

self_type& operator=(const self_type&);
self_type& operator=(self_type&&);
self_type& operator=(null_type);

template <class U>
self_type& operator=(const U&);

template <class U>
self_type& operator+=(const U&);

template <class U>
self_type& operator-=(const U&);

template <class U>
self_type& operator*=(const U&);

template <class U>
self_type& operator/=(const U&);

operator const_reference() const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be explicit.

Copy link
Collaborator Author

@JohanMabille JohanMabille Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires that you explicitly cast each time you want to compare the proxy to a value (for instance), which is paricularly ugly (and does not looks naturaly at all to me):

template <class T>
struct array_impl
{
    using reference = reference_proxy<std::uint8_t>;
    const_reference = const std::uint8_t&;

    using size_type = std::size_t;

    reference operator[](size_type i);
};

array<std::uint8_t> a = ....;
uint8_t expected = 8u;
// With explicit you need to
using const_reference = array<std::uint8_t>::const_reference;
bool b = (const_reference(a[i]) == expected);

// while without, you can simply write:
bool b = a[i] == expected;

How having this operator implicit can be harmful?

Copy link
Collaborator

@Klaim Klaim Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How having this operator implicit can be harmful?

See this guideline for the fundations of why it's a very bad idea. It goes in pair with the guideline about conversion operator that must also be explicit. The cases where this would not work as so rare I, so far, never saw a justifyiable example that isn't improved by explicit.

This requires that you explicitly cast each time you want to compare the proxy to a value (for instance), which is paricularly ugly (and does not looks naturaly at all to me):

That just means that the reference type needs comparison operators with objects of the type being referred to. That conversion is a shortctut that can cause more issues than explicitly stating these operators.

Also note that even like that "ugly" is not a reason to make the code potentially silently incorrect, in particular in a C++ library supposedly trying to make correctness easier than the C api. Worst scenario, static_casts are fine and help clarify where are the potential issues related to conversion. They are noisy for a reason after all.

See as example: https://godbolt.org/z/7jxcvjq3Y

I see no reason to make that operator implicit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot a test in my example, here is the udpated version (I only added the test): https://godbolt.org/z/TYx7sGezj


private:

void update_value(value_type value);
void update_value(null_type);

pointer p_ref;
size_type m_index;
array_data* p_array_data;
};

/**********************************
* reference_proxy implementation *
**********************************/

template <class T>
reference_proxy<T>::reference_proxy(reference ref, size_type index, array_data& ad)
: p_ref(&ref)
, m_index(index)
, p_array_data(&ad)
{
}

template <class T>
auto reference_proxy<T>::operator=(const self_type& rhs) -> self_type&
{
update_value(*rhs.p_ref);
return *this;
}

template <class T>
auto reference_proxy<T>::operator=(self_type&& rhs) -> self_type&
{
update_value(*rhs.p_ref);
return *this;
}

template <class T>
auto reference_proxy<T>::operator=(null_type rhs) -> self_type&
{
update_value(std::move(rhs));
return *this;
}

template <class T>
template <class U>
auto reference_proxy<T>::operator=(const U& u) -> self_type&
{
update_value(u);
return *this;
}

template <class T>
template <class U>
auto reference_proxy<T>::operator+=(const U& u) -> self_type&
{
update_value(*p_ref + u);
return *this;
}


template <class T>
template <class U>
auto reference_proxy<T>::operator-=(const U& u) -> self_type&
{
update_value(*p_ref - u);
return *this;
}

template <class T>
template <class U>
auto reference_proxy<T>::operator*=(const U& u) -> self_type&
{
update_value(*p_ref * u);
return *this;
}

template <class T>
template <class U>
auto reference_proxy<T>::operator/=(const U& u) -> self_type&
{
update_value(*p_ref / u);
return *this;
}

template <class T>
reference_proxy<T>::operator const_reference() const
{
return *p_ref;
}

template <class T>
void reference_proxy<T>::update_value(value_type value)
{
auto& bitmap = p_array_data->bitmap;
bitmap.set(m_index, true);
*p_ref = value;
}

template <class T>
void reference_proxy<T>::update_value(null_type)
{
auto& bitmap = p_array_data->bitmap;
bitmap.set(m_index, false);
}
}

85 changes: 83 additions & 2 deletions include/sparrow/buffer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,23 @@ namespace sparrow
struct buffer_data
{
using value_type = T;
using reference = T&;
using const_reference = const T&;
using pointer = T*;
using size_type = std::size_t;

bool empty() const noexcept;
size_type size() const noexcept;

reference operator[](size_type);
const_reference operator[](size_type) const;

reference front();
const_reference front() const;

reference back();
const_reference back() const;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about begin and end? Is this not a range/container?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah they should be, and I want to refactor this hierarchy to be consistent with the dynamic bitset. We can add it in an incoming PR though, to keep this one quite "small" (it is actually already big)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding TODOs for these so that we dont forget in further passes.

template <class U = T>
U* data() noexcept;

Expand Down Expand Up @@ -62,6 +73,7 @@ namespace sparrow

buffer() = default;
explicit buffer(size_type size);
buffer(size_type size, value_type value);
buffer(pointer data, size_type size);

~buffer();
Expand All @@ -74,9 +86,14 @@ namespace sparrow

using base_type::empty;
using base_type::size;
using base_type::operator[];
using base_type::front;
using base_type::back;
using base_type::data;

void resize(size_type new_size);
void resize(size_type new_size, value_type value);
void clear();

void swap(buffer&) noexcept;
bool equal(const buffer& rhs) const;
Expand Down Expand Up @@ -109,6 +126,9 @@ namespace sparrow

using base_type::empty;
using base_type::size;
using base_type::operator[];
using base_type::front;
using base_type::back;
using base_type::data;

void swap(buffer_view&) noexcept;
Expand Down Expand Up @@ -136,6 +156,42 @@ namespace sparrow
return m_size;
}

template <class T>
auto buffer_data<T>::operator[](size_type pos) -> reference
{
return data()[pos];
}

template <class T>
auto buffer_data<T>::operator[](size_type pos) const -> const_reference
{
return data()[pos];
}

template <class T>
auto buffer_data<T>::front() -> reference
{
return data()[0];
}

template <class T>
auto buffer_data<T>::front() const -> const_reference
{
return data()[0];
}

template <class T>
auto buffer_data<T>::back() -> reference
{
return data()[m_size - 1];
}

template <class T>
auto buffer_data<T>::back() const -> const_reference
{
return data()[m_size - 1];
}

template <class T>
template <class U>
U* buffer_data<T>::data() noexcept
Expand Down Expand Up @@ -174,6 +230,13 @@ namespace sparrow
{
}

template <class T>
buffer<T>::buffer(size_type size, value_type value)
: base_type{allocate(size), size}
{
std::fill(data(), data() + size, value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative.

Suggested change
std::fill(data(), data() + size, value);
std::fill_n(data(), size, value);

}

template <class T>
buffer<T>::buffer(pointer data, size_type size)
: base_type{data, size}
Expand All @@ -188,7 +251,7 @@ namespace sparrow

template <class T>
buffer<T>::buffer(const buffer<T>& rhs)
: base_type{allocate(rhs.m_size), rhs.size()}
: base_type{allocate(rhs.size()), rhs.size()}
{
std::copy(rhs.data(), rhs.data() + rhs.size(), data());
}
Expand Down Expand Up @@ -226,11 +289,29 @@ namespace sparrow
if (n != size())
{
buffer<T> tmp(n);
std::copy(data(), data() + size(), tmp.data());
size_type copy_size = std::min(size(), n);
Klaim marked this conversation as resolved.
Show resolved Hide resolved
std::copy(data(), data() + copy_size, tmp.data());
swap(tmp);
}
}

template <class T>
void buffer<T>::resize(size_type n, value_type value)
{
size_type old_size = size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

resize(n);
if (old_size < n)
{
std::fill(data() + old_size, data() + n, value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, one can use:

Suggested change
std::fill(data() + old_size, data() + n, value);
std::fill_n(data() + old_size, n, value);

}
}

template <class T>
void buffer<T>::clear()
{
resize(size_type(0));
}

template <class T>
void buffer<T>::swap(buffer<T>& rhs) noexcept
{
Expand Down
Loading
Loading