Skip to content

Releases: dje-dev/Ceres

v1.0.1

14 Nov 22:09
Compare
Choose a tag to compare

First official release of Ceres, including several major enhancements:

  • support for Ceres neural networks
  • full support of Chess960 (also known as Fischer Random) and DFRC (Double Fischer Random Chess) with the "UCI_Chess960" option for mode selection (contribution by lepned)
  • support of ONNX neural networks via CUDA or TensorRT execution providers for Ceres and Lc0 networks

Please see the step-by-step instructions for installation.

v0.97RC3

27 Mar 02:49
Compare
Choose a tag to compare
v0.97RC3 Pre-release
Pre-release

Pre-release of version 0.97.

  • Supports new networks with attention policy head (such as T79 or ap-mish networks)
  • Adds major new feature for visual analysis of chess positions or games. See https://github.com/dje-dev/Ceres/blob/main/Graph.md
  • Several small adjustments to play parameters (such as time control)
  • Tournament Manager now supports variance reduction technique (force engine determinism)
  • Various small fixes for use with tools such as Nibbler

v0.96

07 Feb 22:25
Compare
Choose a tag to compare

Version 0.96 features a large set of small to medium-sized speed (5% to 20%) and play quality (5 to 20 Elo) enhancements and a few additional features.

* significantly improved time management and smart pruning logic
* support neural networks having 512 filters on Ampere-based GPUs (based on code of Ankan from Lc0 project)
* dozens of performance optimizations, including better backend scaling with multiple GPU
* corrected some failures to recognize draw by repetition in search tree (thanks to Kovax for identifying problem)
* removed the limitation that draws by repetition are recognized only within last 22 ply
* corrected a problem with search stopping in Chessbase/Fritz GUI
* support incomplete tablebase files (DTM but no DTZ) (thanks to lepned for identifying this prior limitation)
* tournament feature enhancements to support more than 2 players (thanks to lepned)
* add a new UCI verb "dump-info"
* add support for GPU backends with internally partitioned batch sizes (e.g. "GPU:0[266]" for max batch size 266)
* corrected possible tree overflow upon search continuation when running with MaxTreeSize
* significant internal architectural changes to engine code for efficiency
* significantly improved memory management
* expose sibling and uncertainty features in UCI options (default off)
* add Ceres.json setting LimitsManagerName to specify pluggable alternate limits manager
* ongoing efforts to improve code quality and documentation and extend the external API

v0.94

27 Sep 01:54
Compare
Choose a tag to compare

Ceres 0.94 incorporates 3 significant new features as well as numerous smaller tweaks which improve the user experience.

The Elo improvements compared to version 0.93 are believed to be very modest (perhaps 5 Elo) at shorter time controls. However the root swapping enhancement definitely increases search speed by at least 7% with large searches. There are also theoretical reasons (e.g. the greater abundance of transpositions) to believe the other enhancements might be more impactful with larger search trees. But this is not confirmed due to the time and computation resources need to perform such tests.

Major Features

  • a "root swapping" technique greatly reduces the time spent at the beginning of search of moves within a game to reorganize the search tree. Previously this overhead decreased effective search speed by approximately 11% at long time controls, now reduced to about 3%. Ceres can now avoid rebuilding the tree in a most cases (by swapping the new root position and performing some associated data structure update). In addition, the nodes which fall out of the actual search still tree sometimes remain memory resident and available as an addition cache (reducing the number of positions actually being sent to the neural network by approximately 3%). This technique does increase the memory usage, but the aggressiveness is automatically tuned based on the physical RAM available in the computer. This feature can optionally be turned off via the ReducedMemoryMode option in Ceres.json or UCI, or the MaxTreeNodes setting can be used to place a hard limit on the number of nodes that are allowed to be used by any search.

  • a "virtual subtree" technique is used to reduce memory consumption. Ceres avoids materializing physical nodes in the tree when they positions they represent already exist elsewhere in the tree (transpositions). In prior versions of Ceres, this was limited to a single node having its child array virtualized. This has been expanded to up to 2 nodes and 3 child arrays will be virtualized. In many cases the subtree will not be further visited and there will never be a need to materialize the actual nodes/children, thereby further reducing memory consumption by approximately 10% (for smaller searches) to 25% (for very large searches). The semantics of the prior Ceres MCTS search algorithm are preserved.

  • a "sibling blending" technique sometimes averages in information to newly visited nodes from their siblings not yet evaluated (i.e. having lower policy prior probabilities). Specifically, when a child of a node is first visited as a leaf node, the adjacent siblings (within 3 slots and 10% prior probability distance) are also scanned to see if an evaluation is already available (via transpositions or tablebase hits). If so, this value (taken from the Q subtree evaluation in the case of transpositions) is blended into the value backed up higher in the tree. The weight used in the blending depends on the mimimax consideration of giving more weight if the node was better than the node actually being visited (i.e. better than expected), and also the reliability of the sibling evaluation (e.g. tablebase hit is definive and a transposition subtree with large N is highly reliable). In practice about 10% to 15% of visited nodes will be found eligible for this blending. This approach attempts to exploit one of the major advantages that MCTS based search algorithms posess - retaining a full tree of potentially useful information in memory. Currently the Elo gains from this feature are seemingly very modest (e.g. 5 Elo) but potentially larger at very long time controls and/or after future tuning optimizations. See MCTSNodeSiblingEval.cs for more detail in the code.

Minor Enhancements

  • The initialization time to load network files is reduced by 60%, and memory consumption also significantly reduced (for example, by approximately 400mb with a 30b network).

  • The new option "MaxTreeNodes" can be set either in Ceres.json or via UCI interface and limits the maximum total number of physical nodes actually allocated to the specified value (terminating searches if they reach the specified value).

  • A bug with the searchmoves feature (when black was to play) is corrected.

  • Some minor search performance optimizations were made, focused on concurrency efficiency (eliminting some locks and reducing false sharing) yielding about a 3% speedup for large searches on high-end hardware.

v0.93

10 Aug 01:28
Compare
Choose a tag to compare

Update with improvements primarily related to speed and usability for games played at medium to long time controls:

  • the speed of tree rebuilding between moves is significantly improved (up to 40% for very large search trees),
    thereby increasing effective nodes per second (by about 5% at long time controls)
  • a new UCI option MaxTreeNodes is added to limit search trees to a specified maximum size
  • additional performance optimizations improve search speed by about 2%

If installation issues are encountered, please note these most common causes:

v0.91b

21 Jul 00:12
Compare
Choose a tag to compare

Minor update with fixes for improved compatibility with other programs:

  • Values shown by Nibbler GUI when configured to show Q were incorrect
  • Tournaments run using Octagon manager would falsely report time forfeit for Ceres because they relied upon sometimes incorrect Ceres self-reported elapsed search time

v0.91a

17 Jul 00:38
Compare
Choose a tag to compare

Minor update to v0.91, fixing a serious bug causing more than 100x slowdown for searches run using certain time controls (e.g. seconds per move). Thanks to ribbit for reporting.

v0.91

16 Jul 14:13
73d1929
Compare
Choose a tag to compare

Ceres Release Notes - version 0.91

  • Includes the improvements from 0.90-rc1 (Linux support, new backend without LC0.DLL dependency, search improvements, new BENCHMARK and BACKENDBENCH command line modes)

Additionally:

  • Enhanced time management, especially against Alpha/Beta style opponents
  • Further simplified installation: includes a weight file (703810) and pre-populated Ceres.json file in the box to allow immediate running without any configuration (thanks to Chad for this suggestion)
  • The speed of loading weights files is somewhat improved, and multi-gpu weight sharing (to reduce GPU memory usage) is now functional
  • Resolves a number of minor bugs impacting play
  • Tests suggest circa 5 to 10 Elo improvement versus 0.90-rc1, with hopefully yet more significant gains versus Stockfish
  • Note that CUDA 11.3 or greater improves the backend speed by approximately 12% on higher-end (and possibly also mid-range) GPUs
  • Note that the NVIDIA 1060, 1070, and 1080 GPUs are not supported by the new Ceres backend because they do not support fast FP16. Instead, users of these cards can fallback to the prior LC0.DLL backend by adding the following line to Ceres.json:
  "UseLegacyLC0Evaluator": true

v0.90-rc1

05 Jul 15:32
Compare
Choose a tag to compare

Ceres Release Notes - version 0.90-rc1 (Release Candidate 1).

Community help and feedback in testing v0.90 would be welcomed. Primary areas of enhancement include:

  • Compatability (Linux support, also GPUs with limited memory)
  • Installation (LC0.DLL no longer required)
  • Search speed faster by 10% to 25% (due to enhancements to CUDA backend and MCTS engine), especially on more recent NVIDIA hardware and CUDA 11.3+.

Because of the major backend changes (about 10,000 lines of new C# code) it is likely that issues will be identified relating to untested hardware and software configurations. Please feel free to open an issue if you are encountering difficulties, or post in #help of the Discord channel for Leela Chess Zero.

Internal testing suggests that play strength (with T60) is significantly improved, even on relatively modest hardware (Windows laptop with 2070 GPU) and moderate time controls (such as 60 seconds per game). However independent community assessment is necessary and would be welcomed.

Known minor issues:

  • loading of networks (at initialization) is a little slow
  • in multigpu configurations the new feature to reduce GPU memory consumption is disabled

Details

  • The same binaries support both Windows and Linux operating systems. To run on Linux, use: "dotnet ./Ceres.dll"

  • A native C# backend for CUDA instead of relying upon an external library LC0.DLL, yielding simplified installation and improved performance. This backend is largely a transliteration of the Leela Chess Zero CUDA backend in C++ (largely by Ankan) into C#
    This work leveraged open source ManagedCUDA project (by Michael Kunz) to provide the object-oriented bindings to the CUDA C API.
    The implementation also features several enhancements:

    • use of CUDA graphs to precompile the entire neural network into a single CUDA operation, yielding speedups of 5% to 20%
      (greater with smaller networks and smaller batch sizes), especially on more recent hardware and CUDA 11.3 or above.
    • a supplemental CUDA kernel which reduces GPU/host bandwidth requirements (copies policy only for legal moves)
    • reduced GPU memory consumption and network load times (by about 25%)
    • diagnostic and introspection features for developers, such as optionally capturing inner layer timings and activations
  • integrated native Syzygy tablebase probing (thru 7 man) based upon transliteration of the Fathom library (by Ronald de Man, basil, and Jon Dart) from C++ to C#

  • search speed optimizations, particularly for long searches on higher-end GPUs. For example, 425k nodes per second is achievable on 2017 vintage CPUs.

  • some adjustments and tuning to MCTS parameters yielding slightly improved play quality in most situations

  • two additional command line options (BACKENDBENCH and BENCHMARK) emulating LC0 functionality.
    See below for an example of the output from the benchmark command which also compare against LC0 v0.28rc1, e.g.
    "Ceres benchmark opponent=lc0 limit=10sm"

  • various bug fixes, including those involving use of MultiPV feature

  • ongoing work to improve source code clarity and documentation

  • ongoing work to make the Ceres code base (set of object-oriented classes for general chess operations, MCTS search, and neural network training) more flexible, performant, and well documented to facilitate research in Chess programming.

------ Windows Laptop with 2070, T75 network, 10 seconds each -------

Ceres Benchmark Results =======
Total time(ms)   :      339,721
Nodes searched   :   26,618,539
Avg nodes/sec    :       78,354
Median nodes/sec :       46,276
Positions faster :           32

LC0 Benchmark Results =========
Total time(ms)   :      340,487
Nodes searched   :   22,238,029
Avg nodes/sec    :       65,312
Median nodes/sec :       41,076
Positions faster :            2

v0.89

01 Mar 01:14
Compare
Choose a tag to compare

Notable Changes

  • Ceres is more aggressive about sometimes choosing the "best Q" move at root (over the classic "best N"), with the incidence rising from approximately 4% to 8%. There is some evidence this improves play quality, but more testing is needed to quantify impact.
  • Filenames with embedded spaces are now accepted (e.g. for weights files)
  • Network files are autodetected as being in either zipped or unzipped format
  • Culture insensitive numeric formats are now adapted as standard (e.g. decimal point with numbers)
  • Time management is adjusted to be slightly more adaptive and conservative
  • Errors with processing of rapid consecutive UCI go commands have been (at least partly) corrected
  • Two UCI setoptions are available for logging: LogFile is for UCI dialog with engine, and SearchLogFile is for detailed internal information on each search conducted (such as verbose move stats and principal variation dump)