Skip to content

Latest commit

 

History

History
51 lines (34 loc) · 1.97 KB

README.md

File metadata and controls

51 lines (34 loc) · 1.97 KB

ServiceNow completed its acquisition of Element AI on January 8, 2021. All references to Element AI in the materials that are part of this project should refer to ServiceNow.

byteSteady

A fast classification and tagging tool using byte-level n-gram embeddings

Reference

Please read our [paper] for details on byteSteady.

@article{zhang2021bytesteady,
  title={byteSteady: Fast Classification Using Byte-Level n-Gram Embeddings},
  author={Zhang, Xiang and Drouin, Alexandre and Li, Raymond},
  journal={arXiv preprint arXiv:2106.13302},
  year={2021}
}

Dependencies

  1. GNU/Linux (byteswap.h from glibc used in CityHash)
  2. C++17 compiler (::std::variant, ::std::filesystem, etc)
  3. Thunder (tensor math)
  4. Google googletest (unit tests)
  5. Google gflags (command-line option parsing)
  6. Google glog (logging and error handling)

Compile

Make sure all the dependencies are installed, and then simply make.

There will be a few outputs:

  • bytesteady/bytesteady: the byteSteady executable
  • bytesteady/libbytesteady.so: the byteSteady dynamic library
  • bytesteady/*_test: unit tests of different modules for byteSteady

Command-line options

byteSteady is built with Google gflags to support command-line flag parsing. The definition of all available flags can be found in bytesteady/flags.cpp. You can also query these flags by

$ bytesteady/bytesteady -helpon bytesteady/flags

The -helpon is provided by Google gflags to show help for flags only defined in some source code file. For full help information, including flags from the other parts of the program (such as Google glog), simply use -help.

Gene classification dataset

The gene classification dataset used in the paper can be downloaded at https://zenodo.org/record/5181235#.YWc4bG3MJb8.