dupd(1)                                                                dupd(1)



NAME
       dupd - find duplicate files

SYNOPSIS
       dupd COMMAND [OPTIONS]

DESCRIPTION
       dupd scans all the files in the given path(s) to find files with dupli‐
       cate content.

       The sets of duplicate files are not displayed during a scan.   Instead,
       the  duplicate  info is saved into a database which can be queried with
       subsequent commands without having to scan all files again.

       Even though dupd can be used as a simple duplicate reporting tool simi‐
       lar  to  how  other duplicate finders work (by running dupd scan ; dupd
       report), the real power of dupd comes from interactively exploring  the
       filesystem  for  duplicates after the scan has completed. See the file,
       ls, dups, uniques and refresh commands.

       Additional documentation and examples  are  available  under  the  docs
       directory  in the source tree. If you don't have the source tree avail‐
       able, see https://github.com/jvirkki/dupd/blob/master/docs/index.md

COMMANDS
       As noted in the synopsis, the first argument to dupd must be  the  com‐
       mand to run.  The command is one of:

       scan - scan files looking for duplicates

       report - show duplicate report from last scan

       file - check for duplicates of one file

       ls  - list info about every file

       dups - list all duplicate files

       uniques - list all unique files

       refresh - remove deleted files from the database

       validate - revalidate all duplicates in database

       rmsh - create shell script to delete all duplicates (use with care!)

       help - show brief usage info

       usage - show this documentation

       man - show this documentation

       license - show license info

       version - show version and exit

OPTIONS
       scan - Perform the filesystem scan for duplicates.

       -p, --path PATH
              Recursively  scan the directory tree starting at this path.  The
              path option can be given  multiple  times  to  specify  multiple
              directory  trees  to  scan.   If  no  path  option is given, the
              default is to start scanning from the current directory.

       -m, --minsize SIZE
              Minimum size (in bytes) to include  in  scan.   By  default  all
              files  with  1 byte or more are scanned.  In practice duplicates
              in files that small are rarely interesting, so you can speed  up
              the scan by ignoring smaller files.

       --buflimit LIMIT
              Limit  read  buffer  size.  LIMIT may be an integer in bytes, or
              include a suffix of M for megabytes or G for gigabytes. The scan
              animation  shows  the  percentage  of  buffer space in use (%b).
              Unless that value goes up to 100% or beyond during a scan  there
              is  no  need  to adjust this limit.  Setting this limit to a low
              value will constrain dupd memory usage but possibly at a cost to
              performance (depends on the data set).

       -X, --one-file-system
              For each path scanned, do not cross over to a different filesys‐
              tem.  This is helpful, for example, if you want to  scan  /  but
              want  to  avoid any other mounted filesystems such as NFS mounts
              or external drives.

       --hidden
              Include hidden files (and hidden directories) in the  scan.   By
              default these are not included.

       --db PATH
              Override  the  default  database  file location.  The default is
              $HOME/.dupd_sqlite.  If  you  override  the  path  during  scan,
              remember  to  provide  this argument and the path for subsequent
              operations so the database can be found.

       -I, --hardlink-is-unique
              Consider hard links to the same  file  content  as  unique.   By
              default  hard  links  are  listed as duplicates.  See HARD LINKS
              section below.  Note that if this option is given  during  scan,
              it cannot be given during interactive operations.

       --stats-file FILE
              On  completion,  create  (or append to) FILE and save some stats
              from the run.  These are the same stats as get displayed in ver‐
              bose mode but are more suitable for programmatic consumption.

       report - Display the list of duplicates.

       --cut PATHSEG
              Remove  prefix PATHSEG from the file paths in the report output.
              This can reduce clutter in the output  text  if  all  the  files
              scanned share a long identical prefix.

       --minsize SIZE
              Report only duplicate sets which consume at least this much disk
              space.  Note this is the total size occupied by all  the  dupli‐
              cates in a set, not their individual file size.

       --format NAME
              Produce  the report in this output format.  NAME is one of text,
              csv, json.  The default is text.

       Note: The database format generated by scan is  not  guaranteed  to  be
       compatible  with  future  versions.  You should run report (and all the
       other commands below which access the database) using the same  version
       of dupd that was used to generate the database.

       file - Report duplicate status of one file.

       To check whether one given file still has known duplicates use the file
       operation.  Note that this does not do a new scan so it will  not  find
       new  duplicates.   This checks whether the duplicates identified during
       the previous scan still exist and verifies (by hash) whether  they  are
       still duplicates.

       --file PATH
              Required: The file to check

       --cut PATHSEG
              Remove prefix PATHSEG from the file paths in the report output.

       --exclude PATH
              Ignore  any  duplicates  under  PATH  when reporting duplicates.
              This is useful if you intend to delete  the  entire  tree  under
              PATH, to make sure you don't delete all copies of the file.

       --hardlink-is-unique
              Ignore  the  existence of hard links to the file for the purpose
              of considering whether the file is unique.

       ls, uniques, dups - List matching files.

       While the file command checks the duplicate status of  a  single  file,
       these commands do the same for all the files in a given directory tree.

       ls - List all files, show whether they have duplicates or not.

       uniques - List all unique files.

       dups - List all files which have known duplicates.

       --path PATH
              Start from this directory (default is current directory)

       --cut PATHSEG
              Remove prefix $PATHSEG from the file paths in the output.

       --exclude PATH
              Ignore any duplicates under PATH when reporting duplicates.

       --hardlink-is-unique
              Ignore  the  existence of hard links to the file for the purpose
              of considering whether the file is unique.

       refresh - Refreshing the database.

       As you remove duplicate files these are still listed in the dupd  data‐
       base.   Ideally you'd run the scan again to rebuild the database.  Note
       that re-running the scan after deleting some  duplicates  can  be  very
       fast because the files are in the cache, so that is the best option.

       However,  when dealing with a set of files large enough that they don't
       fit in the cache, re-running the scan may take a long time.  For  those
       cases the refresh command offers a much faster alternative.

       The  refresh  command checks whether all the files in the dupd database
       still exist and removes those which do not.

       Be sure to consider the limitations of this approach.  The refresh com‐
       mand  does  not  re-verify  whether  all files listed as duplicates are
       still duplicates.  It also, of course, does not detect any  new  dupli‐
       cates which may have appeared since the last scan.

       In  summary, if you have only been deleting duplicates since the previ‐
       ous scan, run the refresh command.  It will prune all the deleted files
       from the database and will be much faster than a scan.  However, if you
       have been adding and/or modifying files since the last scan, it is best
       to run a new scan.

       validate - Validating the database.

       The  validate operation is primarily for testing but is documented here
       as it may be useful if you want to reconfirm that all duplicates in the
       database are still truly duplicates.

       In  most  cases  you  will  be better off re-running the scan operation
       instead of using validate.

       Validate is fairly slow as it will fully hash every file in  the  data‐
       base.

       rmsh - Create shell scrip to remove duplicate files.

       As a policy dupd never modifies the filesystem!

       As  a convenience for those times when it is desirable to automatically
       remove files, this operation can create a shell script to do  so.   The
       output  is  a shell script (to stdout) which can you run to delete your
       files (if you're feeling lucky).

       Review the generated script carefully to see if it truly does what  you
       want!

       Automated  deletion is generally not very useful because it takes human
       intervention to decide which of the duplicates is the best one to  keep
       in  each  case.   While the content is the same, one of them may have a
       better file name and/or location.

       Optionally, the shell script can create either soft or hard links  from
       each  removed  file  to  the copy being kept.  The options are mutually
       exclusive.

       --link Create symlinks for deleted files.

       --hardlink
              Create hard links for deleted files.

       Additional global options

       -q     Quiet, suppress all output.

       -v     Verbose mode.  Can be repeated multiple times for ever  increas‐
              ing verbosity.

       -V, --verbose-level N
              Set the logging verbosity level directly to N.

       -h     Show brief help summary.

       --db PATH
              Override the default database file location.

       -F, --hash NAME
              Specify an different hash function.  This applies to any command
              which uses content hashing.  NAME is one  of:  md5  sha1  sha512
              xxhash

HARD LINKS
       Are  hard  links duplicates or not?  The answer depends on "what do you
       mean by duplicates?" and "what are you trying to do?"

       If your primary goal for removing duplicates is to save disk space then
       it  makes  sense to ignore hardlinks.  If, on the other hand, your pri‐
       mary goal is to reduce filesystem clutter then it makes more  sense  to
       think of hardlinks as duplicates.

       By  default dupd considers hardlinks as duplicates. You can switch this
       around with the --hardlink-is-unique option.  This option can be  given
       either  during scan or to the interactive reporting commands (file, ls,
       uniques, dups).

EXAMPLES
       Scan all files in your home directory and then show the sets of  dupli‐
       cates found:

              % dupd scan --path $HOME

              % dupd report

       Show  duplicate status (duplicate or unique) for all files in docs sub‐
       directory:

              % dupd ls --path docs

       I'm about to delete docs/old.doc but want to check one last  time  that
       it is a duplicate and I want to review where those duplicates are:

              % dupd file --file docs/old.doc -v

       Read  the documentation in the dupd 'docs' directory or online documen‐
       tation for more usage examples.

EXIT
       dupd exits with status code 0 on success, non-zero on error.

SEE ALSO
       sqlite3(1)

       https://github.com/jvirkki/dupd/blob/master/docs/index.md



                                                                       dupd(1)
