Skip to content

Commit

Permalink
Merge pull request #331 from zevv/histogram
Browse files Browse the repository at this point in the history
v1.5.0-rc1
  • Loading branch information
l8gravely authored Sep 4, 2024
2 parents ffb83e6 + dc9c8c8 commit e66302f
Show file tree
Hide file tree
Showing 18 changed files with 762 additions and 83 deletions.
23 changes: 22 additions & 1 deletion ChangeLog
Original file line number Diff line number Diff line change
@@ -1,4 +1,25 @@

1.5.0-rc1 (2024-09-03)
- new: added support for tkrzw backend DB and made it the default
- this DB is newer and under active support compared to
TokyoCabinet, KyotoCabinet, etc.
- also supports really large filesystems. Big thanks to
stuartthebruce for testing and debugging (Issue #300)
- new: added support to tracking topN largest files in
filesystem. (Issue #284)
- new: added '-T' option to duc index to change maximum default number of
topN files to track.
- new: added '-B' option to duc index to change number of histogram buckets.
- new: added 'topn' command to show topN files stored in
DB. defaults to 10 currently.
- new: added 't' key in duc ui to toggle between regular and topN
display mode. Initial support.
- new: added histogram report of filesizes found during
indexing to 'duc info'. (Issue #284)
- new: added 'H' or '--histogram' option to duc info
- new: added 'duc histogram' command (Issue #284)
- needs work still, especially CGI, UI and GUI output.
- fix:

1.4.5 (2022-07-29)

- new: added '-u' option to duc index to index by username
Expand Down
122 changes: 83 additions & 39 deletions INSTALL
Original file line number Diff line number Diff line change
Expand Up @@ -16,20 +16,49 @@ Generate the configure script when it is not available (cloned git repo):
To get the required dependencies on Debian or Ubuntu, run:

$ sudo apt-get install libncursesw5-dev libcairo2-dev libpango1.0-dev \
build-essential libtokyocabinet-dev
build-essential libtkrzw-dev

On Debian 11 (bullseye), you need to have the following line in your
/etc/apt/sources.list file:

On RHEL or CentOS systems, you need to do:
deb http://deb.debian.org/debian bullseye-backports main

Then you would do:

$ sudo apt update

$ sudo apt-get install libncursesw5-dev libcairo2-dev libpango1.0-dev \
build-essential libtkrzw-dev tkrzw-doc tkrzw-utils


On older RHEL or CentOS systems, you need to do:

$ sudo yum install pango-devel cairo-devel tokyocabinet-devel


Duc comes with various user interfaces and a number of backends for database
access and graph drawing. You can choose which options should be used with the
./configure script to build Duc to fit best in your environment.
RHEL 8 & 9 / Rockly Linux 8 & 9 / Alma Linux 8 & 9

Install epel-release & update

$ sudo yum install epel-release
$ sudo yum update

Install tkrzw and other packages:

$ sudo yum install tkrzw tkrzw-devel tkrzw-doc tkrzw-libs pango-devel cairo-devel tokyocabinet-devel


Configuration Options
---------------------

Duc comes with support for various user interfaces and a number of
backends for database access and graph drawing. You can choose which
options should be used with the ./configure script to build Duc to fit
best in your environment.

This document describes the various options which can be passed to the
./configure script, and the impact these options have on Duc functionality.
./configure script, and the impact these options have on Duc
functionality. But the ./configure --help is the definitive source.


User interfaces
Expand All @@ -38,7 +67,7 @@ User interfaces
Duc comes with the following user interfaces:

- Command line interface (duc ls): This user interface has no external
dependencies and is always enabled
dependencies and is always enabled.

- Ncurses console interface (duc ui): an interactive console interface, which
depends on ncurses or ncursesw. This user interface is enabled by default. If
Expand All @@ -59,51 +88,64 @@ Duc comes with the following user interfaces:
--enable-opengl --disable-x11


Database backend
----------------
Database backends
-----------------

Duc supports various key-value database backends:
- Tokyo Cabinet: tokyocabinet
- LevelDB: leveldb
- Sqlite3: sqlite3
- Lightning Memory-Mapped Database: lmdb
- Kyoto Cabinet: kyotocabinet
- Tkrzw: tkrzw (default as of v1.5.0)

Duc uses Tokyo Cabinet by default: the performance is acceptable and generates
in the smallest database size.
Duc now uses Tkrzw by default: the performance is acceptable and it
handles extremely large databases of volumes with terabytes of storage
and millions of files.

--with-db-backend=ARG

If your system supports none of the above, contact the author to see if we can
add your favourite backend.
If your system supports none of the above, contact the authors to see
if we can add your favourite backend.

Please note: Not all database formats can be shared between machines
with different architectures. Notably, Tokyo Cabinet is built with
non-standard options which break compatibility with other linux
distributions, even on the same architecture [1]. If you are planning
to share databases between different platforms (index machine A,
display on machine B) we recommend using the sqlite3 backend.

Please note: Not all database formats can be shared between machines with
different architectures. Notably, Tokyo Cabinet is built with non-standard
options which break compatibility with other linux distributions, even on the
same architecture [1]. If you are planning to share databases between different
platforms (index machine A, display on machine B) we recommend using the
sqlite3 backend.
Note, Tokyo Cabiner, Kyoto Cabinet, LevelDB and LMDB are all being
deprecated from future versions because the lack of development and
support for these libraries, especially for super large volumes to be
indexed.

1. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=667979

When picking a backend you probably need to choose between speed, size and
robustness. Some measurements on my system of a 372G directory with 1.6M files:

----------------------------------
Database Run time Db size
(s) (kB)
----------------------------------
tokyocabinet [*] 8.4 19.2
leveldb 7.1 31.5
sqlite3 13.5 71.1
lmdb 5.9 78.7
kyotocabinet 8.3 26.7
----------------------------------

[*] Tokyo Cabinet currenty is the default used by Duc because of the good
compression and reasonable performance. A problem is that Tokyo Cabinet is not
very stable and can create corrupt databases when interrupting the indexing. If
this is a problem for you, choose a different db backend.
When picking a backend you probably need to choose between speed, size
and robustness. Some (out of date) measurements on a system with a
372G directory containing 1.6M files:

----------------------------------
Database Run time Db size
(s) (kB)
----------------------------------
tokyocabinet 8.4 19.2
leveldb 7.1 31.5
sqlite3 13.5 71.1
lmdb 5.9 78.7
kyotocabinet 8.3 26.7
tkrzw [*] ??? ???
----------------------------------


[*] Tkrzw currently is the default used by Duc because of it's current
development, good compression and reasonable performance.

Tokyo Cabinet is not very stable and can create corrupt databases when
interrupting the indexing. If this is a problem for you, choose a
different db backend.


Graphics
--------
Expand Down Expand Up @@ -137,7 +179,8 @@ embedded systems not all graphics libraries are available.
Testing
-------

Duc comes with a rudimentary test harness which can be run with
Duc comes with a rudimentary test harness which can be run at the top
level directory with:

./test.sh

Expand All @@ -148,5 +191,6 @@ If you have valgrind and you want to run the tests using it do:
It will complain if you try this and valgrind isn't installed. The
test harness still needs work and more tests, but should hopefully
help keep us from re-introducing bugs as they are fixed and checked
for.
for. We would love to see more tests and a better harness, patches
welcome!

8 changes: 6 additions & 2 deletions Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ duc_SOURCES := \
src/libduc/db.c \
src/libduc/db.h \
src/libduc/db-tokyo.c \
src/libduc/db-tkrzw.c \
src/libduc/db-kyoto.c \
src/libduc/db-leveldb.c \
src/libduc/db-sqlite3.c \
Expand Down Expand Up @@ -44,9 +45,11 @@ duc_SOURCES += \
src/duc/cmd-guigl.c \
src/duc/cmd.h \
src/duc/cmd.h \
src/duc/cmd-histogram.c \
src/duc/cmd-index.c \
src/duc/cmd-info.c \
src/duc/cmd-ls.c \
src/duc/cmd-topn.c \
src/duc/cmd-ui.c \
src/duc/cmd-xml.c \
src/duc/cmd-json.c \
Expand All @@ -56,11 +59,12 @@ duc_SOURCES += \


AM_CFLAGS := @CAIRO_CFLAGS@ @PANGO_CFLAGS@ @PANGOCAIRO_CFLAGS@
AM_CFLAGS += @TC_CFLAGS@ @SQLITE3_CFLAGS@ @GLFW3_CFLAGS@ @LMDB_CFLAGS@ @KC_CFLAGS@
AM_CFLAGS += @TC_CFLAGS@ @SQLITE3_CFLAGS@ @GLFW3_CFLAGS@ @LMDB_CFLAGS@
AM_CFLAGS += @KC_CFLAGS@ @TKRZW_CFLAGS@
AM_CFLAGS += -Isrc/libduc -Isrc/libduc-graph -Isrc/glad

duc_LDADD := @CAIRO_LIBS@ @PANGO_LIBS@ @PANGOCAIRO_LIBS@
duc_LDADD += @TC_LIBS@ @SQLITE3_LIBS@ @GLFW3_LIBS@ @LMDB_LIBS@ @KC_LIBS@
duc_LDADD += @TC_LIBS@ @SQLITE3_LIBS@ @GLFW3_LIBS@ @LMDB_LIBS@ @KC_LIBS@ @TKRZW_LIBS@

man1_MANS = \
doc/duc.1
Expand Down
6 changes: 6 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ I'm still keeping the requests here for future reference or for if I ever get
bored and have a lot of time on my hands. Anybody is free to pick and implement
any of these tasks, of course!

### Edit database to remove path(s) from Index and do all cleanup

This should be a simple change to add, though it does require some hackery to
remove entries from the records[] array in the DB. Needs thought. Currently
only solution would be to index to a totally new DB file with only the path(s)
you want.

### Show increase since last index or time period

Expand Down
18 changes: 14 additions & 4 deletions configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

AC_PREREQ([2.13])

AC_INIT([duc], [1.4.5], [duc@zevv.nl])
AC_INIT([duc], [1.5.0-rc1], [duc@zevv.nl])

LIB_CURRENT=1
LIB_REVISION=0
Expand Down Expand Up @@ -60,8 +60,8 @@ AC_ARG_ENABLE(

AC_ARG_WITH(
[db-backend],
[AS_HELP_STRING([--with-db-backend], [select database backend (tokyocabinet,leveldb,sqlite3,lmdb,kyotocabinet) @<:@default=tokyocabinet@:>@])], ,
[with_db_backend="tokyocabinet"]
[AS_HELP_STRING([--with-db-backend], [select database backend (tokyocabinet,leveldb,sqlite3,lmdb,kyotocabinet,tkrzw) @<:@default=tkrzw@:>@])], ,
[with_db_backend="tkrzw"]
)

AC_MSG_RESULT([Selected backend ${with_db_backend}])
Expand All @@ -75,6 +75,16 @@ case "${with_db_backend}" in
PKG_CHECK_MODULES([TC], [tokyocabinet])
AC_DEFINE([ENABLE_TOKYOCABINET], [1], [Enable tokyocabinet db backend])
;;
tkrzw)
LDFLAGS="$outer_LDFLAGS -ltkrzw"
AC_CHECK_LIB(tkrzw, tkrzw_get_last_status,
[
TKRZW_LIBS="-ltkrzw"
AC_DEFINE([ENABLE_TKRZW], [1], [Enable tkrzw db backend])
], [ AC_MSG_ERROR(Unable to find tkrzw) ])
AC_SUBST([TKRZW_LIBS])
p AC_SUBST([TKRZW_CFLAGS])
;;
leveldb)
AC_CHECK_LIB([leveldb], [leveldb_open])
AC_DEFINE([ENABLE_LEVELDB], [1], [Enable leveldb db backend])
Expand All @@ -98,7 +108,7 @@ case "${with_db_backend}" in
AC_DEFINE([ENABLE_KYOTOCABINET], [1], [Enable kyotocabinet db backend])
;;
*)
AC_MSG_ERROR([Unsupported db-backend])
AC_MSG_ERROR([Unsupported db-backend "${with_db_backend}"])
esac

AC_DEFINE_UNQUOTED(DB_BACKEND, ["${with_db_backend}"], [Database backend])
Expand Down
2 changes: 1 addition & 1 deletion src/duc/cmd-gui.c
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,7 @@ int gui_main(duc *duc, int argc, char *argv[])

int r = duc_open(duc, opt_database, DUC_OPEN_RO);
if(r != DUC_OK) {
duc_log(duc, DUC_LOG_FTL, "%s", duc_strerror(duc));
//duc_log(duc, DUC_LOG_FTL, "%s", duc_strerror(duc));
return -1;
}

Expand Down
88 changes: 88 additions & 0 deletions src/duc/cmd-histogram.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@

#include "config.h"

#include <limits.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <math.h>

#include "cmd.h"
#include "duc.h"

static bool opt_apparent = false;
static bool opt_base = false;
static bool opt_bytes = false;
static char *opt_database = NULL;

static int histogram_db(duc *duc, char *file)
{
struct duc_index_report *report;
duc_size_type st = opt_apparent ? DUC_SIZE_TYPE_APPARENT : DUC_SIZE_TYPE_ACTUAL;
int i = 0;

int r = duc_open(duc, file, DUC_OPEN_RO);
if(r != DUC_OK) {
duc_log(duc, DUC_LOG_FTL, "%s", duc_strerror(duc));
return -1;
}

while(( report = duc_get_report(duc, i)) != NULL) {

printf("Path: %s\n%3s %10s %10s\n",report->path,"Bkt","Size","Count");

size_t count;
size_t bucket_size = 0;
char pretty[32];
setlocale(LC_NUMERIC, "");
for (int i=0; i < report->histogram_buckets; i++) {
count = report->histogram[i];
bucket_size = pow(2, i);
int ret = humanize(bucket_size, 0, 1024, pretty, sizeof pretty);
printf("%3d %10s %'10d\n",i, pretty, count);
}

duc_index_report_free(report);
printf("\n");
i++;
}

duc_close(duc);

return 0;
}


static int histogram_main(duc *duc, int argc, char **argv)
{
return(histogram_db(duc, opt_database));
}


static struct ducrc_option options[] = {
{ &opt_apparent, "apparent", 'a', DUCRC_TYPE_BOOL, "show apparent instead of actual file size" },
{ &opt_bytes, "bytes", 'b', DUCRC_TYPE_BOOL, "show bucket size in exact number of bytes" },
{ &opt_database, "database", 'd', DUCRC_TYPE_STRING, "select database file to use [~/.duc.db]" },
{ &opt_base, "base10", 't', DUCRC_TYPE_BOOL, "show histogram in base 10 bucket spacing, default base2 bucket sizes." },
{ NULL }
};


struct cmd cmd_histogram = {
.name = "histogram",
.descr_short = "Dump histogram of file sizes found.",
.usage = "[options]",
.main = histogram_main,
.options = options,
};


/*
* End
*/

Loading

0 comments on commit e66302f

Please sign in to comment.