Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run CubingByLayer on Cluster #20

Open
TatianaJin opened this issue Nov 29, 2018 · 0 comments
Open

Run CubingByLayer on Cluster #20

TatianaJin opened this issue Nov 29, 2018 · 0 comments
Labels

Comments

@TatianaJin
Copy link
Member

Run an application in a similar way to running Husky applications.

  • Run master ./HuskyMaster -C <your conf file path>
  • Run application ./CubingByLayer -C <your conf file path>
    Alternatively, you may run the application distributedly by ./exec.sh ./CubingByLayer -C <your conf file path>
    The exec.sh file is the one in Husky, which looks like
# This points to a file, which should contains hostnames (one per line).
# E.g.,
#
# worker1
# worker2
# worker3
#
MACHINE_CFG=/data/opt/tmp/tati/husky/build/slaves

# This point to the directory where Husky binaries live.
# If Husky is running in a cluster, this directory should be available
# to all machines.
BIN_DIR=/data/opt/tmp/tati/husky/build
time pssh -t 0 -P -h ${MACHINE_CFG} \
    -x "-t -t" "cd $BIN_DIR && ls $BIN_DIR > /dev/null && ./$@"
  • The config file looks like this:
 master_host=w10                                                                                                                                                                                                                          
 master_port=56789
 comm_port=45678
 
 hdfs_namenode=proj99
 hdfs_namenode_port=9000
 
 serve=0
 
 meta_url=kylin_metadata@hdfs,path=hdfs://localhost:9000/kylin/kylin_metadata/metadata/69a4e318-c3ff-45d4-bfc3-2dcaeaa164d7
 hive_table=hdfs:///kylin/kylin_metadata/kylin-86dffb72-3bf9-4150-b9bd-52332d9a7af5/kylin_intermediate_simple_sales_model_69a4e318_c3ff_45d4_bfc3_2dcaeaa164d7
 table_format=ORC
 output_path=hdfs://proj99:9000/kylin/kylin_metadata/kylin-86dffb72-3bf9-4150-b9bd-52332d9a7af5/simple_sales_model/cuboid/
 
 [worker]
 info=w10:4

The meta_url parameter is the same as the Kylin input to Spark/MR; hive_table is the HDFS path to the flat join table; table_format is the format of the flat join table; and output_path is somewhere on HDFS in which you want to put the cuboids. The sample parameters are for building the example cube in Kylin which is very small. If you want to try large-scale data, you may deploy your own Kylin instance on the cluster, import TPC-H benchmark to get cube descriptions, and run the Kylin pipeline to create the flat join table.

@TatianaJin TatianaJin added the FYI label Nov 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant