Recoll
Recoll is a desktop search tool that provides full-text search in a GUI with a few mandatory external dependencies. It runs on many Unix-like operating systems and is mostly independent of the desktop environment.
Installing[edit | edit source]

sudo apt install recoll recollcmd libimage-exiftool-perl antiword unrtf poppler-utils libwpd-tools untex
Configuring[edit | edit source]
First start by creating the wanted recoll index directory and copy the example config to it, as shown below:
mkdir -p /var/www/recoll cp -R /usr/share/recoll/examples/* /var/www/recoll/
Open the recoll.conf
file in it's new location and change the following two values.
topdirs
is the location the files to be index and in turn searched.
topdirs = /var/www/html
cachedir
is the location the indexes are actually stored on the system.
cachedir = /var/www/recoll
Indexing[edit | edit source]
Now index the system, depending on the size of the site, this may take a long time.
recollindex -c /var/www//recoll -z
The full usage list for recollindex
my be found by using the -h
option, as seen below.
recollindex: Usage: recollindex [-h] Print help recollindex [-z|-Z] [-k] Index everything according to configuration file -z : reset database before starting indexing -Z : in place reset: consider all documents as changed. Can also be combined with -i or -r but not -m -k : retry files on which we previously failed --diagsfile <outputpath> : list skipped or otherwise not indexed documents to <outputpath> <outputpath> will be truncated recollindex -m [-w <secs>] -x [-D] [-C] Perform real time indexing. Don't become a daemon if -D is set. -w sets number of seconds to wait before starting. -C disables monitoring config for changes/reexecuting. -n disables initial incremental indexing (!and purge!). -x disables exit on end of x11 session recollindex -e [<filepath [path ...]>] Purge data for individual files. No stem database updates. Reads paths on stdin if none is given as argument. recollindex -i [-f] [-Z] [<filepath [path ...]>] Index individual files. No database purge or stem database updates Will read paths on stdin if none is given as argument -f : ignore skippedPaths and skippedNames while doing this recollindex -r [-K] [-f] [-Z] [-p pattern] <top> Recursive partial reindex. -p : filter file names, multiple instances are allowed, e.g.: -p *.odt -p *.pdf -K : skip previously failed files (they are retried by default) recollindex -l List available stemming languages recollindex -s <lang> Build stem database for additional language <lang> recollindex -E Check configuration file for topdirs and other paths existence recollindex --webcache-compact : recover wasted space from the Web cache recollindex --webcache-burst <targetdir> : extract entries from the Web cache to the target recollindex --notindexed [filepath [filepath ...]] : check if the file arguments are indexed will read file paths from stdin if there are no arguments recollindex -S Build aspell spelling dictionary.> Common options: -c <configdir> : specify config directory, overriding $RECOLL_CONFDIR -d : call fadvise() with the POSIX_FADV_DONTNEED flag on indexed files (avoids trashing the page cache)
WebUI[edit | edit source]
Run standalone[edit | edit source]
Installed required dependence.
apt install -y git python3-waitress
Then download WebUI to the directory that is not accessible via the web server.
git clone https://framagit.org/medoc92/recollwebui.git /var/www/search
Run webui-standalone.py
and connect to http://localhost:8080
.
/var/www/search/webui-standalone.py -c /var/www/recoll
There's some optional command-line arguments available:
-h, --help show this help message and exit -a ADDR, --addr ADDR address to bind to [127.0.0.1] -p PORT, --port PORT port to listen on [8080] -c CONFDIR, --config CONFDIR Recoll configuration directory to use
Nginx WSGI/CGI Setup[edit | edit source]
apt install -y pwgen git nginx xxd apache2-utils
git clone https://framagit.org/medoc92/recollwebui.git /var/www/html/search
sed -i 's/127.0.0.1/0.0.0.0/g' /var/www/html/search/webui-standalone.py