Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add and modify headers and documentation for BLOB support #80

Merged
merged 12 commits into from
Jan 16, 2025
96 changes: 96 additions & 0 deletions docs/internal/blob-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# BLOB API

本ドキュメントは、BLOB対応のためにLimestoneに追加および変更されるAPIについて記述する。

このドキュメントは、BLOB対応に伴う他モジュールとの連携および仕様調整を目的として作成する。
記載内容は作成時点の仕様に基づいているが、将来の仕様変更に際して本ドキュメントが必ずしも
更新されるわけではない。そのため、最新の仕様については、必ずソースコードを参照すること。

本ドキュメントでは、追加および変更されるヘッダごとに、対応するAPIについて記述する。
APIの詳細については、各ヘッダのコメントを参照すること。

## blob_pool.h の追加

**新規追加項目**

* blob_id_type
* blob参照を表す型

* class blob_pool
* BLOB プールの作成、破棄、および BLOB データの仮登録のためのクラス。
* BLOB プールは、`datastore::acquire_blob_pool()` で取得可能。

## blob_file.h の追加

**新規追加項目**

* class blob_file
* BLOB データにアクセスするためのクラス。
* BLOB ファイルのインスタンスは、`datastore::get_blob_file(blob_id_type reference)` で取得可能。
* BLOB ファイルから BLOB を保存しているファイルのパスを取得し、ファイルを読むことによりBLOBデータにアクセスできる。


## datastore.h の修正

**追加メソッド**

* `std::unique_ptr<blob_pool> datastore::acquire_blob_pool()`
* BLOB プールの取得のためのメソッド。

* `blob_file datastore::get_blob_file(blob_id_type reference)`
* BLOB ファイルの取得のためのメソッド。

* `void switch_available_boundary_version(write_version_type version)`
* LimestoneがBLOBデータを削除するには、その BLOB への参照を有するバージョンのエントリが誰からも参照されなくなっていることが必要となる。
これを判断するための情報は、 CC からデータストアに通知するためのメソッド。
* LimestoneはGCによるBLOBデータ削除時に、削除可能なデータの判断のためにこのメソッドで通知された値を使用する。
* コンパクション時のGCで、このメソッドで通知された値を使用する。
* 起動時のGCでは、永続化データから参照されていないBLOBデータを無条件で削除する。

## log_channel.h の修正

**add_entry メソッドのシグネチャ変更**

**修正前**
```cpp
void add_entry(
storage_id_type storage_id,
std::string_view key,
std::string_view value,
write_version_type write_version,
const std::vector<large_object_input>& large_objects
);
```

**修正後**
```cpp
void add_entry(
storage_id_type storage_id,
std::string_view key,
std::string_view value,
write_version_type write_version,
const std::vector<blob_id_type>& large_objects
);
```


## large_object_input.h, large_object_view.h の削除

BLOB対応の方針変更により、以下のクラスを廃止しました。それに伴い、該当ヘッダファイルを削除しています。

* large_object_input クラス
* large_object_view クラス


## cursor.h の修正

BLOB対応の方針変更により、以下のメソッドを廃止しました。

* `std::vector<large_object_view>& large_objects();`







Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright 2022-2023 Project Tsurugi.
* Copyright 2022-2025 Project Tsurugi.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -15,16 +15,26 @@
*/
#pragma once

#include <istream>

namespace limestone::api {

class large_object_view {
/**
* @brief represents a BLOB file that can provide persistent BLOB data.
*/
class blob_file {
public:

std::size_t size();
/**
* @brief retrieves the path to the BLOB file.
* @returns BLOB file path
*/
[[nodiscard]] boost::filesystem::path const& path() const noexcept;

std::iostream open();
/**
* @brief returns whether this BLOB file is available.
* @return true if this is available
* @return false otherwise
*/
[[nodiscard]] explicit operator bool() const noexcept;
};

} // namespace limestone::api
} // namespace limestone::api
88 changes: 88 additions & 0 deletions include/limestone/api/blob_pool.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
/*
* Copyright 2022-2025 Project Tsurugi.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once

#include <boost/filesystem.hpp>

namespace limestone::api {

/// @brief BLOB reference type.
using blob_id_type = std::uint64_t;

/**
* @brief represents a pool for provisional registration of BLOB data.
*/
class blob_pool {
public:

/**
* @brief creates a new object.
*/
blob_pool() = default;

/**
* @brief destroys this object.
*/
virtual ~blob_pool() = default;

blob_pool(blob_pool const&) = delete;
blob_pool(blob_pool&&) = delete;
blob_pool& operator=(blob_pool const&) = delete;
blob_pool& operator=(blob_pool&&) = delete;

/**
* @brief Discards all BLOB data provisionally registered in this pool, except for those that have already been persistent.
* @note After this operation, this pool will be unusable.
* @note This operation is idempotent.
* @attention Undefined behavior if attempting to access the data of non-persistent BLOBs in this pool after this operation.
* It depends on the implementation when the BLOB data is actually removed.
*/
virtual void release() = 0;

/**
* @brief registers a BLOB file provisionally into this BLOB pool.
* @param is_temporary_file true to allow remove the source file, or false to copy the source file
* @return the corresponding BLOB reference
* @attention This only act as provisional registration for the BLOB, and it may be lost after release() was called.
* To avoid it, you need to pass the BLOB references to log_channel::add_entry() to persistent them.
* @throws std::invalid_state if this pool is already released
*/
[[nodiscard]] virtual blob_id_type register_file(
boost::filesystem::path const& file,
bool is_temporary_file) = 0;

/**
* @brief registers a BLOB data provisionally into this BLOB pool.
* @param data the target BLOB data
* @return the corresponding BLOB reference
* @attention This only act as provisional registration for the BLOB, and it may be lost after release() was called.
* To avoid it, you need to pass the BLOB references to log_channel::add_entry() to persistent them.
* @throws std::invalid_state if this pool is already released
*/
[[nodiscard]] virtual blob_id_type register_data(std::string_view data) = 0;

/**
* @brief duplicates the registered BLOB data, and registers the copy provisionally into this BLOB pool.
* @param reference the source BLOB reference
* @return the corresponding BLOB reference of the duplicated one
* @attention This only act as provisional registration for the BLOB, and it may be lost after release() was called.
* To avoid it, you need to pass the BLOB references to log_channel::add_entry() to persistent them.
* @throws std::invalid_state if this pool is already released
*/
[[nodiscard]] virtual blob_id_type duplicate_data(blob_id_type reference) = 0;
};

} // namespace limestone::api
9 changes: 0 additions & 9 deletions include/limestone/api/cursor.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
#include <boost/filesystem/fstream.hpp>

#include <limestone/api/storage_id_type.h>
#include <limestone/api/large_object_view.h>


namespace limestone::internal {
Expand Down Expand Up @@ -80,17 +79,9 @@ class cursor {
*/
void value(std::string& buf) const noexcept;

/**
* @brief returns a list of large objects associated with the entry at the current cursor position
* @return a list of large objects associated with the current entry
*/
std::vector<large_object_view>& large_objects() noexcept;

private:
std::unique_ptr<internal::cursor_impl> pimpl;

std::vector<large_object_view> large_objects_{};

explicit cursor(const boost::filesystem::path& snapshot_file);
explicit cursor(const boost::filesystem::path& snapshot_file, const boost::filesystem::path& compacted_file);

Expand Down
35 changes: 35 additions & 0 deletions include/limestone/api/datastore.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@
#include <boost/filesystem.hpp>

#include <limestone/status.h>
#include <limestone/api/blob_pool.h>
#include <limestone/api/blob_file.h>
#include <limestone/api/backup.h>
#include <limestone/api/backup_detail.h>
#include <limestone/api/log_channel.h>
Expand Down Expand Up @@ -245,6 +247,39 @@ class datastore {
*/
void compact_with_online();

/**
* @brief acquires a new empty BLOB pool.
* @details This pool is used for temporary registration of BLOBs,
* and all BLOBs that are not fully registered will become unavailable when the pool is destroyed.
* @return the created BLOB pool
* @see blob_pool::release()
* @attention the returned BLOB pool must be released by the blob_pool::release() after the usage, or it may cause leaks of BLOB data.
* @attention Undefined behavior if using pool after destroying this datastore.
*/
[[nodiscard]] std::unique_ptr<blob_pool> acquire_blob_pool();

/**
* @brief returns BLOB file for the BLOB reference.
* @param reference the target BLOB reference
* @return the corresponding BLOB file
* @return unavailable BLOB file if the ID is not valid
* @attention the returned BLOB file is only available
* during the transaction that has provided the corresponded BLOB reference.
*/
[[nodiscard]] blob_file get_blob_file(blob_id_type reference);


/**
* @brief change the available boundary version that the entries may be read.
* @details This version comprises the oldest accessible snapshot, that is,
* the datastore may delete anything older than the version included in this snapshot.
* @param version the target boundary version
* @attention this function should be called after the ready() is called.
* @see switch_safe_snapshot()
* @note the specified version must be smaller than or equal to the version that was told by the switch_safe_snapshot().
*/
void switch_available_boundary_version(write_version_type version);

protected: // for tests
auto& log_channels_for_tests() const noexcept { return log_channels_; }
auto epoch_id_informed_for_tests() const noexcept { return epoch_id_informed_.load(); }
Expand Down
42 changes: 0 additions & 42 deletions include/limestone/api/large_object_input.h

This file was deleted.

4 changes: 2 additions & 2 deletions include/limestone/api/log_channel.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@
#include <boost/filesystem.hpp>

#include <limestone/status.h>
#include <limestone/api/blob_pool.h>
#include <limestone/api/storage_id_type.h>
#include <limestone/api/write_version_type.h>
#include <limestone/api/large_object_input.h>

namespace limestone::api {

Expand Down Expand Up @@ -101,7 +101,7 @@ class log_channel {
* Therefore, callers of this API must handle the exception properly as per the original design.
* @attention this function is not thread-safe.
*/
void add_entry(storage_id_type storage_id, std::string_view key, std::string_view value, write_version_type write_version, const std::vector<large_object_input>& large_objects);
void add_entry(storage_id_type storage_id, std::string_view key, std::string_view value, write_version_type write_version, const std::vector<blob_id_type>& large_objects);

/**
* @brief add an entry indicating the deletion of entries
Expand Down
4 changes: 0 additions & 4 deletions src/limestone/cursor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,4 @@ void cursor::value(std::string& buf) const noexcept {
pimpl->value(buf);
}

std::vector<large_object_view>& cursor::large_objects() noexcept {
return large_objects_;
}

} // namespace limestone::api
2 changes: 1 addition & 1 deletion src/limestone/log_channel.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ void log_channel::add_entry(storage_id_type storage_id, std::string_view key, st
TRACE_END;
}

void log_channel::add_entry([[maybe_unused]] storage_id_type storage_id, [[maybe_unused]] std::string_view key, [[maybe_unused]] std::string_view value, [[maybe_unused]] write_version_type write_version, [[maybe_unused]] const std::vector<large_object_input>& large_objects) {
void log_channel::add_entry([[maybe_unused]] storage_id_type storage_id, [[maybe_unused]] std::string_view key, [[maybe_unused]] std::string_view value, [[maybe_unused]] write_version_type write_version, [[maybe_unused]] const std::vector<blob_id_type>& large_objects) {
LOG_AND_THROW_EXCEPTION("not implemented");// FIXME
};

Expand Down