Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradient based sampling for external memory mode on GPU #5093

Merged
merged 70 commits into from
Feb 4, 2020
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
51b50da
add skeleton gradient-based sampler
rongou Nov 20, 2019
ebd67c1
change gpu_hist to use sampling in external memory mode
rongou Nov 21, 2019
a2c446c
add failing tests
rongou Nov 21, 2019
9e91871
wip: poisson sampling
rongou Nov 21, 2019
ed322cc
sample and scale gradient pairs
rongou Nov 23, 2019
9377014
calculate max number of sample rows
rongou Dec 3, 2019
f26415c
add sampler constructor
rongou Dec 4, 2019
f2dd928
collect all pages in memory if they fit
rongou Dec 4, 2019
2e5494c
optimize finalize position
rongou Dec 5, 2019
7fca606
done with sampling
rongou Dec 6, 2019
f661df0
add some docs
rongou Dec 6, 2019
6513f8a
Merge branch 'master' into gradient-based-sampler
rongou Dec 6, 2019
14361af
formatting
rongou Dec 6, 2019
70276f1
explicit constructor
rongou Dec 6, 2019
d7770b4
no need for gmock
rongou Dec 6, 2019
6a29c38
test ellpackpage copy and compact
rongou Dec 6, 2019
967ff16
use subsample to control gradient sampler
rongou Dec 6, 2019
494b179
Merge branch 'master' into gradient-based-sampler
rongou Dec 9, 2019
ecd2419
Merge branch 'master' into gradient-based-sampler
rongou Dec 10, 2019
73ba5af
implement sequential poisson sampling
rongou Dec 10, 2019
d2f2f69
fix compact bug
rongou Dec 11, 2019
827c988
fix cpp test
rongou Dec 11, 2019
9350ec5
Merge branch 'master' into gradient-based-sampler
rongou Dec 11, 2019
4482181
Merge branch 'master' into gradient-based-sampler
rongou Dec 11, 2019
857c9c7
finally working
rongou Dec 12, 2019
7de1620
add uniform sampling
rongou Dec 13, 2019
d3a3dbf
better estimate of sample rows
rongou Dec 14, 2019
780f5e6
Merge branch 'master' into gradient-based-sampler
rongou Dec 16, 2019
f728623
more agressive memory allocation
rongou Dec 16, 2019
c206880
add some documentation
rongou Dec 16, 2019
5652b3b
use mvs
rongou Dec 20, 2019
8ba43fb
Merge branch 'master' into gradient-based-sampler
rongou Dec 20, 2019
a5b57b1
fix windows
rongou Dec 20, 2019
62a9ead
address review comments
rongou Dec 20, 2019
ca64413
Merge branch 'master' into gradient-based-sampler
rongou Jan 7, 2020
2409ecb
add sampling method param
rongou Jan 8, 2020
3b16e66
gradient-based sampling in in-memory mode
rongou Jan 9, 2020
0745686
Merge branch 'master' into gradient-based-sampler
rongou Jan 9, 2020
e41496b
fix clang tidy warning
rongou Jan 9, 2020
fd458c6
add tests for in-core
rongou Jan 10, 2020
6e3a7fa
remove unused code
rongou Jan 10, 2020
cb608a5
Merge branch 'master' into gradient-based-sampler
rongou Jan 11, 2020
4daffbf
relax tests
rongou Jan 11, 2020
5dd01e7
Merge branch 'master' into gradient-based-sampler
rongou Jan 13, 2020
35ac5a7
Merge branch 'master' into gradient-based-sampler
rongou Jan 14, 2020
9aaa9ba
add test to verify sampling
rongou Jan 16, 2020
83fb791
Merge branch 'master' into gradient-based-sampler
rongou Jan 16, 2020
0eccff9
Merge branch 'master' into gradient-based-sampler
rongou Jan 16, 2020
10fce05
inverse probability weighting estimation
rongou Jan 18, 2020
55bbe74
combine weight calculation and gpair scaling
rongou Jan 21, 2020
85d2ec9
Merge branch 'master' into gradient-based-sampler
rongou Jan 21, 2020
be163b7
fix tests
rongou Jan 21, 2020
3a0fd99
review feedback
rongou Jan 21, 2020
c9eb5c9
Merge branch 'master' into gradient-based-sampler
rongou Jan 22, 2020
9680a69
calculate threshold
rongou Jan 23, 2020
c93f20d
tweak test
rongou Jan 23, 2020
d684148
Merge branch 'master' into gradient-based-sampler
rongou Jan 23, 2020
09864ed
more accurate threshold
rongou Jan 24, 2020
7675a9e
Merge branch 'master' into gradient-based-sampler
rongou Jan 24, 2020
f8b7dbf
tweak test tolerance
rongou Jan 24, 2020
3aaae89
wip: refactor the code to disintangle sampling methods
rongou Jan 24, 2020
a83f13e
Merge branch 'master' into gradient-based-sampler
rongou Jan 27, 2020
5cbca76
done with refactoring
rongou Jan 27, 2020
7cf9110
fix tests
rongou Jan 28, 2020
71b21c6
Merge branch 'master' into gradient-based-sampler
rongou Jan 28, 2020
3a734a9
release device memory
rongou Jan 28, 2020
55b36f2
remove scaling in uniform sampling
rongou Jan 28, 2020
97ae33d
Merge branch 'master' into gradient-based-sampler
rongou Jan 28, 2020
df96394
Merge branch 'master' into gradient-based-sampler
rongou Jan 31, 2020
7fd7c31
revert rabit
rongou Jan 31, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions include/xgboost/base.h
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,24 @@ class GradientPairInternal {
return g;
}

XGBOOST_DEVICE GradientPairInternal<T> operator*(float multiplier) const {
GradientPairInternal<T> g;
g.grad_ = grad_ * multiplier;
g.hess_ = hess_ * multiplier;
return g;
}

XGBOOST_DEVICE GradientPairInternal<T> operator/(float divider) const {
GradientPairInternal<T> g;
g.grad_ = grad_ / divider;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful for 0 division.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out don't really need these. Removed.

g.hess_ = hess_ / divider;
return g;
}

XGBOOST_DEVICE bool operator==(const GradientPairInternal<T> &rhs) const {
return grad_ == rhs.grad_ && hess_ == rhs.hess_;
}

XGBOOST_DEVICE explicit GradientPairInternal(int value) {
*this = GradientPairInternal<T>(static_cast<float>(value),
static_cast<float>(value));
Expand Down
25 changes: 22 additions & 3 deletions src/common/compressed_iterator.h
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ class CompressedBufferWriter {
* \fn static size_t CompressedBufferWriter::CalculateBufferSize(int
* num_elements, int num_symbols)
*
* \brief Calculates number of bytes requiredm for a given number of elements
* \brief Calculates number of bytes required for a given number of elements
* and a symbol range.
*
* \author Rory
Expand All @@ -74,7 +74,6 @@ class CompressedBufferWriter {
*
* \return The calculated buffer size.
*/

static size_t CalculateBufferSize(size_t num_elements, size_t num_symbols) {
const int bits_per_byte = 8;
size_t compressed_size = static_cast<size_t>(std::ceil(
Expand All @@ -83,6 +82,26 @@ class CompressedBufferWriter {
return compressed_size + detail::kPadding;
}

/**
* \brief Calculates maximum number of rows that can fit in a given number of bytes.
* \param num_bytes Number of bytes.
* \param num_symbols Max number of symbols (alphabet size).
* \param row_stride Number of features per row.
* \param extra_bytes_per_row Extra number of bytes needed per row.
* \return The calculated number of rows.
*/
static size_t CalculateMaxRows(size_t num_bytes,
size_t num_symbols,
size_t row_stride,
size_t extra_bytes_per_row) {
const int bits_per_byte = 8;
size_t usable_bits = (num_bytes - detail::kPadding) * bits_per_byte;
size_t extra_bits = extra_bytes_per_row * bits_per_byte;
size_t symbol_bits = row_stride * detail::SymbolBits(num_symbols);
size_t num_rows = static_cast<size_t>(std::floor(usable_bits / (extra_bits + symbol_bits)));
return num_rows;
}

template <typename T>
void WriteSymbol(CompressedByteT *buffer, T symbol, size_t offset) {
const int bits_per_byte = 8;
Expand Down Expand Up @@ -188,7 +207,7 @@ class CompressedIterator {

public:
CompressedIterator() : buffer_(nullptr), symbol_bits_(0), offset_(0) {}
CompressedIterator(CompressedByteT *buffer, int num_symbols)
CompressedIterator(CompressedByteT *buffer, size_t num_symbols)
: buffer_(buffer), offset_(0) {
symbol_bits_ = detail::SymbolBits(num_symbols);
}
Expand Down
94 changes: 87 additions & 7 deletions src/data/ellpack_page.cu
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,20 @@ __global__ void CompressBinEllpackKernel(
wr.AtomicWriteSymbol(buffer, bin, (irow + base_row) * row_stride + ifeature);
}

// Construct an ELLPACK matrix with the given number of empty rows.
EllpackPageImpl::EllpackPageImpl(int device, EllpackInfo info, size_t n_rows) {
monitor_.Init("ellpack_page");
dh::safe_cuda(cudaSetDevice(device));

matrix.info = info;
matrix.base_rowid = 0;
matrix.n_rows = n_rows;

monitor_.StartCuda("InitCompressedData");
InitCompressedData(device, n_rows);
monitor_.StopCuda("InitCompressedData");
}

// Construct an ELLPACK matrix in memory.
EllpackPageImpl::EllpackPageImpl(DMatrix* dmat, const BatchParam& param) {
monitor_.Init("ellpack_page");
Expand Down Expand Up @@ -96,6 +110,76 @@ EllpackPageImpl::EllpackPageImpl(DMatrix* dmat, const BatchParam& param) {
monitor_.StopCuda("BinningCompression");
}

// A functor that copies the data from one EllpackPage to another.
struct CopyPageFunction {
common::CompressedBufferWriter cbw;
common::CompressedByteT* dst_data_d;
common::CompressedIterator<uint32_t> src_iterator_d;
// The number of elements to skip.
size_t offset;

CopyPageFunction(EllpackPageImpl* dst, EllpackPageImpl* src, size_t offset)
: cbw{dst->matrix.info.NumSymbols()},
dst_data_d{dst->gidx_buffer.data()},
src_iterator_d{src->gidx_buffer.data(), src->matrix.info.NumSymbols()},
offset(offset) {}

__device__ void operator()(size_t i) {
cbw.AtomicWriteSymbol(dst_data_d, src_iterator_d[i], i + offset);
}
};

// Copy the data from the given EllpackPage to the current page.
size_t EllpackPageImpl::Copy(int device, EllpackPageImpl* page, size_t offset) {
monitor_.StartCuda("Copy");
size_t num_elements = page->matrix.n_rows * page->matrix.info.row_stride;
CHECK_EQ(matrix.info.row_stride, page->matrix.info.row_stride);
CHECK_EQ(matrix.info.NumSymbols(), page->matrix.info.NumSymbols());
CHECK_GE(matrix.n_rows * matrix.info.row_stride, offset + num_elements);
dh::LaunchN(device, num_elements, CopyPageFunction(this, page, offset));
monitor_.StopCuda("Copy");
return num_elements;
}

// A functor that compacts the rows from one EllpackPage into another.
struct CompactPageFunction {
common::CompressedBufferWriter cbw;
common::CompressedByteT* dst_data_d;
common::CompressedIterator<uint32_t> src_iterator_d;
common::Span<size_t> row_indexes;
size_t base_rowid;
size_t row_stride;

CompactPageFunction(EllpackPageImpl* dst, EllpackPageImpl* src, common::Span<size_t> row_indexes)
: cbw{dst->matrix.info.NumSymbols()},
dst_data_d{dst->gidx_buffer.data()},
src_iterator_d{src->gidx_buffer.data(), src->matrix.info.NumSymbols()},
row_indexes(row_indexes),
base_rowid{src->matrix.base_rowid},
row_stride{src->matrix.info.row_stride} {}

__device__ void operator()(size_t i) {
size_t row = base_rowid + i;
size_t row_index = row_indexes[row];
if (row_index == SIZE_MAX) return;
size_t dst_offset = row_index * row_stride;
size_t src_offset = i * row_stride;
for (size_t j = 0; j < row_stride; j++) {
cbw.AtomicWriteSymbol(dst_data_d, src_iterator_d[src_offset], dst_offset + j);
}
}
};

// Compacts the data from the given EllpackPage into the current page.
void EllpackPageImpl::Compact(int device, EllpackPageImpl* page, common::Span<size_t> row_indexes) {
RAMitchell marked this conversation as resolved.
Show resolved Hide resolved
monitor_.StartCuda("Compact");
CHECK_EQ(matrix.info.row_stride, page->matrix.info.row_stride);
CHECK_EQ(matrix.info.NumSymbols(), page->matrix.info.NumSymbols());
CHECK_LE(page->matrix.base_rowid + page->matrix.n_rows, row_indexes.size());
dh::LaunchN(device, page->matrix.n_rows, CompactPageFunction(this, page, row_indexes));
monitor_.StopCuda("Compact");
}

// Construct an EllpackInfo based on histogram cuts of features.
EllpackInfo::EllpackInfo(int device,
bool is_dense,
Expand Down Expand Up @@ -123,7 +207,7 @@ void EllpackPageImpl::InitInfo(int device,

// Initialize the buffer to stored compressed features.
void EllpackPageImpl::InitCompressedData(int device, size_t num_rows) {
size_t num_symbols = matrix.info.n_bins + 1;
size_t num_symbols = matrix.info.NumSymbols();

// Required buffer size for storing data matrix in ELLPack format.
size_t compressed_size_bytes = common::CompressedBufferWriter::CalculateBufferSize(
Expand All @@ -149,7 +233,6 @@ void EllpackPageImpl::CreateHistIndices(int device,

const auto& offset_vec = row_batch.offset.ConstHostVector();

int num_symbols = matrix.info.n_bins + 1;
// bin and compress entries in batches of rows
size_t gpu_batch_nrows = std::min(
dh::TotalMemory(device) / (16 * row_stride * sizeof(Entry)),
Expand Down Expand Up @@ -193,7 +276,7 @@ void EllpackPageImpl::CreateHistIndices(int device,
1);
dh::LaunchKernel {grid3, block3} (
CompressBinEllpackKernel,
common::CompressedBufferWriter(num_symbols),
common::CompressedBufferWriter(matrix.info.NumSymbols()),
gidx_buffer.data(),
row_ptrs.data().get(),
entries_d.data().get(),
Expand Down Expand Up @@ -254,11 +337,9 @@ void EllpackPageImpl::CompressSparsePage(int device) {

// Return the memory cost for storing the compressed features.
size_t EllpackPageImpl::MemCostBytes() const {
size_t num_symbols = matrix.info.n_bins + 1;

// Required buffer size for storing data matrix in ELLPack format.
size_t compressed_size_bytes = common::CompressedBufferWriter::CalculateBufferSize(
matrix.info.row_stride * matrix.n_rows, num_symbols);
matrix.info.row_stride * matrix.n_rows, matrix.info.NumSymbols());
return compressed_size_bytes;
}

Expand All @@ -280,5 +361,4 @@ void EllpackPageImpl::InitDevice(int device, EllpackInfo info) {

device_initialized_ = true;
}

} // namespace xgboost
29 changes: 29 additions & 0 deletions src/data/ellpack_page.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,10 @@ struct EllpackInfo {
size_t row_stride,
const common::HistogramCuts& hmat,
dh::BulkAllocator* ba);

inline size_t NumSymbols() const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for inline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return n_bins + 1;
}
};

/** \brief Struct for accessing and manipulating an ellpack matrix on the
Expand Down Expand Up @@ -200,6 +204,14 @@ class EllpackPageImpl {
*/
EllpackPageImpl() = default;

/*!
* \brief Constructor from an existing EllpackInfo.
*
* This is used in the sampling case. The ELLPACK page is constructed from an existing EllpackInfo
* and the given number of rows.
*/
explicit EllpackPageImpl(int device, EllpackInfo info, size_t n_rows);

/*!
* \brief Constructor from an existing DMatrix.
*
Expand All @@ -208,6 +220,23 @@ class EllpackPageImpl {
*/
explicit EllpackPageImpl(DMatrix* dmat, const BatchParam& parm);

/*! \brief Copy the elements of the given ELLPACK page into this page.
*
* @param device The GPU device to use.
* @param page The ELLPACK page to copy from.
* @param offset The number of elements to skip before copying.
* @returns The number of elements copied.
*/
size_t Copy(int device, EllpackPageImpl* page, size_t offset);

/*! \brief Compact the given ELLPACK page into the current page.
*
* @param device The GPU device to use.
* @param page The ELLPACK page to compact from.
* @param row_indexes Row indexes for the compacted page.
*/
void Compact(int device, EllpackPageImpl* page, common::Span<size_t> row_indexes);

/*!
* \brief Initialize the EllpackInfo contained in the EllpackMatrix.
*
Expand Down
Loading