Jump to content

Recoll

From MattWiki

Recoll is a desktop search tool that provides full-text search in a GUI with a few mandatory external dependencies. It runs on many Unix-like operating systems and is mostly independent of the desktop environment.

Installing[edit | edit source]

noteThis Page was written with Ubuntu 24.04 LTS in mind, and may not work correctly with other versions or distributions.
sudo apt install recoll recollcmd libimage-exiftool-perl antiword unrtf poppler-utils libwpd-tools untex

Configuring[edit | edit source]

First start by creating the wanted recoll index directory and copy the example config to it, as shown below:

mkdir -p /var/www/recoll
cp -R /usr/share/recoll/examples/* /var/www/recoll/

Open the recoll.conf file in it's new location and change the following two values.

topdirs is the location the files to be index and in turn searched.

topdirs = /var/www/html

cachedir is the location the indexes are actually stored on the system.

cachedir = /var/www/recoll

Indexing[edit | edit source]

Now index the system, depending on the size of the site, this may take a long time.

recollindex -c /var/www//recoll -z

The full usage list for recollindex my be found by using the -h option, as seen below.

recollindex: Usage:
recollindex [-h]
    Print help
recollindex [-z|-Z] [-k]
    Index everything according to configuration file
    -z : reset database before starting indexing
    -Z : in place reset: consider all documents as changed. Can also
         be combined with -i or -r but not -m
    -k : retry files on which we previously failed
    --diagsfile <outputpath> : list skipped or otherwise not indexed documents to <outputpath>
       <outputpath> will be truncated
recollindex -m [-w <secs>] -x [-D] [-C]
    Perform real time indexing. Don't become a daemon if -D is set.
    -w sets number of seconds to wait before starting.
    -C disables monitoring config for changes/reexecuting.
    -n disables initial incremental indexing (!and purge!).
    -x disables exit on end of x11 session
recollindex -e [<filepath [path ...]>]
    Purge data for individual files. No stem database updates.
    Reads paths on stdin if none is given as argument.
recollindex -i [-f] [-Z] [<filepath [path ...]>]
    Index individual files. No database purge or stem database updates
    Will read paths on stdin if none is given as argument
    -f : ignore skippedPaths and skippedNames while doing this
recollindex -r [-K] [-f] [-Z] [-p pattern] <top>
   Recursive partial reindex.
     -p : filter file names, multiple instances are allowed, e.g.:
        -p *.odt -p *.pdf
     -K : skip previously failed files (they are retried by default)
recollindex -l
    List available stemming languages
recollindex -s <lang>
    Build stem database for additional language <lang>
recollindex -E
    Check configuration file for topdirs and other paths existence
recollindex --webcache-compact : recover wasted space from the Web cache
recollindex --webcache-burst <targetdir> : extract entries from the Web cache to the target
recollindex --notindexed [filepath [filepath ...]] : check if the file arguments are indexed will read file paths from stdin if there are no arguments
recollindex -S
    Build aspell spelling dictionary.>
Common options:
    -c <configdir> : specify config directory, overriding $RECOLL_CONFDIR
    -d : call fadvise() with the POSIX_FADV_DONTNEED flag on indexed files
          (avoids trashing the page cache)

WebUI[edit | edit source]

Run standalone[edit | edit source]

Installed required dependence.

apt install -y git python3-waitress

Then download WebUI to the directory that is not accessible via the web server.

git clone https://framagit.org/medoc92/recollwebui.git /var/www/search

Run webui-standalone.py and connect to http://localhost:8080.

/var/www/search/webui-standalone.py -c /var/www/recoll

There's some optional command-line arguments available:

-h, --help            show this help message and exit
-a ADDR, --addr ADDR  address to bind to [127.0.0.1]
-p PORT, --port PORT  port to listen on [8080]
-c CONFDIR, --config CONFDIR Recoll configuration directory to use

Nginx WSGI/CGI Setup[edit | edit source]

apt install -y pwgen git nginx xxd apache2-utils
git clone https://framagit.org/medoc92/recollwebui.git /var/www/html/search
sed -i 's/127.0.0.1/0.0.0.0/g' /var/www/html/search/webui-standalone.py

Apache WSGI/CGI Setup[edit | edit source]

Command Line Interface[edit | edit source]