-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADIOS2 User Side KVCache to accelerate the remote query access. #4210
Changes from 9 commits
ce2d56b
1e6e129
d7f5e27
9812eef
6d1bf17
e07c30b
398a0fb
2afdca6
5790fcd
22267e9
8079eac
ced5680
5f5c249
a4f572a
9414a33
8e3de8c
b336025
fbf1cf2
7c93c8a
4218e96
738e47e
52da2e2
27e0aab
49d4000
ac40ef6
1b67484
4168d6c
4352dcd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
#!/bin/bash | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A future idea is that this could be a docker-compose file |
||
|
||
# This script is to build the remote server with cache enabled. | ||
# Attention: hiredis cannot be installed by apt-get, since the default version is too old (libhiredis0.14 (= 0.14.1-2)). | ||
# We need to build it from source code. You can also use the following scripts to install hiredis (v1.2.0). | ||
|
||
# sample usage: in project home directory: | ||
# source scripts/build_scripts/build-adios2-kvcache.sh --build | ||
# source scripts/build_scripts/build-adios2-kvcache.sh --start | ||
# source scripts/build_scripts/build-adios2-kvcache.sh --stop | ||
|
||
if [ -z "${BUILD_DIR}" ] | ||
then | ||
BUILD_DIR="${PWD}"/build | ||
fi | ||
|
||
if [ ! -d "${BUILD_DIR}" ] | ||
then | ||
mkdir -p "${BUILD_DIR}" | ||
fi | ||
|
||
SW_DIR="${BUILD_DIR}"/sw | ||
if [ ! -d "${SW_DIR}" ] | ||
then | ||
mkdir -p "${SW_DIR}" | ||
fi | ||
|
||
build_cache() { | ||
# redis - in-memory data structure store | ||
redis_dir="${SW_DIR}"/redis-7.2.3 | ||
if [ ! -d "${redis_dir}" ] | ||
then | ||
cd "${SW_DIR}" || exit | ||
curl -L -o redis-7.2.3.tar.gz https://github.com/redis/redis/archive/refs/tags/7.2.3.tar.gz | ||
tar -xzf redis-7.2.3.tar.gz | ||
rm redis-7.2.3.tar.gz | ||
cd "${redis_dir}" || exit | ||
# cannot accleerate by 'make -j8'. It will cause error. | ||
make | ||
|
||
# hiredis - C client library to connect Redis server | ||
cd "${redis_dir}"/deps/hiredis || exit | ||
mkdir build && cd build || exit | ||
cmake .. -DCMAKE_INSTALL_PREFIX="${SW_DIR}"/hiredis | ||
make -j32 | ||
make install | ||
fi | ||
|
||
cd "${BUILD_DIR}" || exit | ||
cmake .. -DADIOS2_USE_Cache=ON \ | ||
-DADIOS2_USE_Python=ON \ | ||
-DCMAKE_PREFIX_PATH="${SW_DIR}/hiredis" \ | ||
-DCMAKE_INSTALL_PREFIX="${SW_DIR}"/adios2 | ||
|
||
make -j32 | ||
make install | ||
cd "${BUILD_DIR}"/../ || exit | ||
echo "Build completed." | ||
unset BUILD_DIR | ||
|
||
export DoRemote=1 | ||
export useKVCache=1 | ||
} | ||
|
||
start_services() { | ||
eisenhauer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
echo "Starting redis server and setting environment variables..." | ||
export PYTHONPATH=${SW_DIR}/adios2/local/lib/python3.10/dist-packages/ | ||
nohup "${SW_DIR}"/redis-7.2.3/src/redis-server > "${SW_DIR}"/redis_server.log 2>&1 & | ||
nohup "${SW_DIR}"/adios2/bin/adios2_remote_server -v > "${SW_DIR}"/remote_server.log 2>&1 & | ||
sleep 5 | ||
nohup "${SW_DIR}"/redis-7.2.3/src/redis-cli monitor > "${SW_DIR}"/redis_monitor.log 2>&1 & | ||
echo "Services started and environment variables set." | ||
} | ||
|
||
# Function to stop services (optional, example purpose) | ||
stop_services() { | ||
echo "Stopping services..." | ||
pkill -f redis-server | ||
pkill -f redis-cli | ||
pkill -f remote_server | ||
unset DoRemote | ||
unset useKVCache | ||
unset PYTHONPATH | ||
echo "Services stopped." | ||
} | ||
|
||
# Parse command line options | ||
while [[ "$1" != "" ]]; do | ||
case "$1" in | ||
-b | --build ) | ||
build_cache | ||
;; | ||
-c | --start ) | ||
start_services | ||
;; | ||
-s | --stop ) | ||
stop_services | ||
;; | ||
-h | --help ) | ||
echo "Usage: $0 [OPTIONS]" | ||
echo "Options:" | ||
echo " -b, --build Build the software with cache enabled" | ||
echo " -c, --start Start redis server and set environment variables" | ||
echo " -s, --stop Stop the services" | ||
echo " -h, --help Display this help message" | ||
;; | ||
* ) | ||
echo "Invalid option: $1" | ||
echo "Use -h or --help for usage information." | ||
;; | ||
esac | ||
shift | ||
done |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My lurking worry here is that we have expanded a simple loop with a 2 line body to 100+ lines. Maybe we can break out the KVCache case to a different subroutine to keep the non-cache case simple to understand? Also you might use CoreDims/DimsArray rather than Dims. It's probably not as critical here, but Dims uses std::vector, which means lots of tiny mallocs whenever these are created, destroyed, passed by value, etc. This caused serious performance problems in metadata manipulation with BP4 at scale. As a result, BP5 tries to avoid Dims. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
However, what I can do is like: #ifdef ADIOS2_HAVE_KVCACHE do you think this is fine?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about? #ifdef ADIOS2_HAVE_KVCACHE
if (getenv("useKVCache"))
{
PerformLocalGetsWithKVCache();
}
else
#endif
PerformRemoteGets(); There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I prefer this version... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe I could get rid of if (m_Remote)
{
#ifdef ADIOS2_HAVE_KVCACHE
PerformRemoteGetsWithKVCache();
#else
PerformRemoteGets();
#endif
}
else
{
PerformLocalGets();
} If the user builds with KVCache and wants to use remote Gets (DoRemote=1), KVCache will be used automatically. However, if the user prefers to use only the remote service without KVCache, they will need to rebuild (possibly in a separate folder). We could also consider checking whether the cache connection is active, and if it is not, default to using the pure remote service. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
So, let me tell you my worry. Your code in your cache stuff is in a lot of ways similar to the code in NdCopy (which looks at whether or not two blocks intersect, the extent of the overlap, etc.). We used to have some major performance problems in this when metadata was large (many thousand blocks per variable) because every time it did something like try to determine an intersection it ended up creating a destroying a dozen or more Dims vectors (doing things like converting a taking a StartEnd Box and creating a StartCount box from it, having all those temporary values get free'd when the intersection test returned, only to recreate them later, etc.). We traced the issue back to lots and lots of small malloc/free calls when those Dims were created, initialized and destroyed. It took a ton of effort to rewrite the worst parts of that code to use ArrayDims (stack allocated dimension array, so cheap to create/destroy, own their own data) or CoreDims (the equivalent of a "span", don't own their data but let us pass it in places and operate on it like a vector without a copy). With a lot of careful work kill all the object creation/destruction overhead in NdCopy so its only overhead was necessary control and data movement. (And if you go look at its source in helper/adiosMemory.cpp you'll see extensive use of ArrayDims and CoreDims.). The storm of vector construction and destruction in ADIOS metadata handling wasn't noticeable at small scales and it was easy to ignore the implicit overhead of casual use of Dims, but as we ran at higher scales it became seriously problematic. You're probably not seeing an issue at the scales you're testing caching at either, but we hope to use this in large-metadata situations, so it'll become an issue if it's not done well. There's just too much similarity in your processing and the NdCopy case for me not to be concerned. We obviously didn't eliminate Dims from all the ADIOS code, but we killed it where it was used most intensively. You might do the same. (And of course you do know the dimensionality of the variable in the place where you declare ReqInfo variable, it's in the Req variable that you're iterating over in that loop. Or it would be easy to add a default constructor and initialize the dimcount later. My concern is about places where there might be intensive vector usage.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The test script we talked about just now is: https://github.com/Change72/adios-test/blob/master/TestCoreDims.cpp // will make a value copy of dimsPoint2 // will only copy the pointer There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. However, the following code will also give wrong results, but vector works fine. helper::NdCopy(reinterpret_cast<char *>(ReqInfo.Data), reinterpret_cast<helper::CoreDims&>(ReqInfo.ReqBox.Start), reinterpret_cast<helper::CoreDims&>(ReqInfo.ReqBox.Count), true, false, reinterpret_cast<char *>(Req.Data), Req.Start, Req.Count, true, false, ReqInfo.TypeSize); Vector code: std::vector<size_t> start;
std::vector<size_t> count;
ReqInfo.ReqBox.StartToVector(start);
ReqInfo.ReqBox.CountToVector(count);
helper::NdCopy(reinterpret_cast<char *>(ReqInfo.Data), start, count, true, false, reinterpret_cast<char *>(Req.Data), Req.Start, Req.Count, true, false, ReqInfo.TypeSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add as at adios2_option in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added