-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Possible DMatrix refactor #4354
Comments
There's always
I would suggest implementing lazy initialization. Even if we were to provide an explicit interface for specifying type of std::vector<DMatrixType> types;
for (auto up : updaters) {
types.emplace_back(up->PreferredFormat());
}
dmat->BuildData(types); In an updater, say DMatrixType PreferredFormat() const { return DMatrixType::Qunatile; } And lastly in |
echo on this, as I am facing several scenarios really needing low-latency prediction, making DMatrix as a wrapper instead of really allocating memory space to store data would reduce the overhead involved in the whole code path....
isn't it very hard to eliminate virtual functions in this case? either in iterator layer or lower memory format layer, do you have other suggestions?
I would vote for stop training in this case, silently (even putting a no-one-will-read warn) doing something for the users are kind of risky to me |
Thanks for your feedback. I think the action on this is to produce a prototype demonstrating DMatrix as a thin wrapper over an existing data structure and possibly exploring the idea of lazy DMatrix construction, then we will reevaluate. It may be some time before I get to this. |
Hi @hcho3 Could you post some references for feature grouping when time allows? Like:
|
@trivialfis Feature grouping is a heuristic to cut down the number of feature columns in a sparse, high-dimensional dataset. It reduces memory cost and improves performance on multi-core CPUs (by improving work balance among worker threads). Addressing your questions:
|
Absolutely. Thanks for the information. |
Just a note on ongoing work for refactoring: We currently have a flow of data that looks like this: It has been proposed that we do this to save memory in the intermediate step There is a consequence to this that we must implement combinatorially many constructors for every possible pair of external data and internal representation. There are a lot of combinations. In the first example the csr dmatrix works like an intermediate format, where we only need 1 constructor for each external data/internal representation. This doesnt make it impossible to implement example b but is something to consider. Maybe its possible to use interators in the middle step to generalise data copying without extra storage? |
Another thing I'm worrying is change of updater during experiment. |
My goal is to make external memory work for GPUs. I think we likely need something along the lines of https://arxiv.org/abs/1901.09047, but tackling it directly may be too risk/disruptive to the existing code base, thus the need for refactoring. Possible next steps, in no particular order:
|
@rongou Thanks for the link. The paper looks quite interesting, let me read it. |
Just add a thing to the list, is it possible we turn the field |
@trivialfis looks like the |
Groups are used by ranking, qid seems to be an intermediate storage that used to construct group |
Added here as a convenient reference. We need to consider removing the strong reference between booster and dmatrix, see #4482. |
Probably this line (and its effects): https://github.com/dmlc/xgboost/blob/master/src/learner.cc#L646 |
Em, sorry currently i can't look into that. Putting it here as a reminder. |
See #5044 for a proof of concept on re-factoring DMatrix construction from external data. |
@RAMitchell Do you have a roadmap or check lists for this? |
As we add different types of algorithms to xgboost and as the performance of these algorithms improves, the DMatrix class may also be improved in order to improve performance and memory usage for these particular cases.
Current DMatrix
The current DMatrix class has two sub classes. The first is the primary in memory representation where the entire dataset is ingested and converted into a CSR format with values stored as 32-bit floats. The entire dataset may be transposed in memory into a column format for exact style algorithms.
The second is the external memory representation which constructs a number of binary page files (32mb in size by default) which are streamed from disk. If a column format is requested, new transposed pages will be generated and these will be streamed from disk.
The end user does not directly choose the type of DMatrix and these subclasses are not exposed via external APIs. The type of underlying DMatrix is automatically selected according to if the user supplies a cache prefix in the constructor.
Currently a batch of data is stored inside the DMatrix as a 'SparsePage' class. This uses HostDeviceVector in order to expose data to both the CPU and GPU.
Current fast histogram algorithms
The current static histogram algorithms (hist, gpu_hist) convert the DMatrix object on the fly inside of their respective TreeUpdater implementations, not as a part of DMatrix implementation, although I believe that it was eventually intended to be integrated with DMatrix (#1673). 'hist' converts the data into a CSR matrix with integers instead of floats. 'gpu_hist' converts the data into an ELLPACK format with integers instead of floats and additionally applies bitwise compression at run time to both the matrix elements and indices, commonly resulting in 4-5x compression over the standard CSR matrix. In gpu_hist we avoid ever copying the training DMatrix to the GPU if prediction cacheing is used.
Some desirable features for DMatrix
Here I will list some features that we would like to have. Not all of these will be practical under the same design.
Notes on possible implementation
I realise the above is somewhat rambling and does not propose a concrete solution, but I hope to start discussion on these ideas.
@tqchen @hcho3 @trivialfis @CodingCat
The text was updated successfully, but these errors were encountered: