% arara: pdflatex
% arara: pdflatex
% arara: pdflatex
% arara: clean: { extensions: [ listing, out, aux, log, toc ] }
\documentclass[12pt,a4paper]{article}

\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}

\usepackage[english]{babel}
\usepackage{tgpagella}

\usepackage[margin=1in]{geometry}

\usepackage[svgnames]{xcolor}
\usepackage[colorlinks, linkcolor={blue}, urlcolor={blue}]{hyperref}

\usepackage[breakable]{tcolorbox}
\tcbuselibrary{listings}

\newcommand{\checkcites}{\texttt{checkcites}}
\newcommand{\email}[1]{\small\texttt{#1}}
\newcommand{\version}{Version 2.8 from December 14, 2024.}

\newenvironment{infoblock}[1]
  {\par\addvspace{\medskipamount}
   \begin{tcolorbox}[colframe=DarkTurquoise,coltitle=black,fonttitle=\bfseries,title=#1]}
  {\end{tcolorbox}\addvspace{\medskipamount}}

\newenvironment{terminal}
  {\par\addvspace{\medskipamount}
   \begin{tcolorbox}[colframe=DarkTurquoise]}
  {\end{tcolorbox}\par\addvspace{\medskipamount}}

\title{The \checkcites\footnote{\version}\ \ script}
\author{%
  Enrico Gregorio\\\email{Enrico.Gregorio@univr.it}\\[3ex]
  Island of \TeX\\\email{https://gitlab.com/islandoftex}%
}
\date{}

\begin{document}

\maketitle

\tableofcontents

\section{Introduction}
\label{sec:intro}

\checkcites\  is  a  Lua  script  written for  the  sole  purpose  of
detecting unused or undefined  references from both \LaTeX\ auxiliary
or bibliography  files. We  use the  term \emph{unused  reference} to
refer  to the  reference  present the  bibliography file~--~with  the
\verb|.bib|  extension~--~but  not  cited in  the  \verb|.tex|  file.
The  term \emph{undefined  reference} is  exactly the  opposite, i.e,
the  item cited  in  the \verb|.tex|  file, but  not  present in  the
\verb|.bib| file.

The    original     idea    came     from    a     question    posted
in    the     \TeX\    community    at    Stack     Exchange    about
\href{http://tex.stackexchange.com/questions/43276}{how    to   check
which  bibliography entries  were not  used}. We  decided to  write a
script  to check  references. We  opted for  Lua, since it is a  very
straightforward language and it has an interpreter available on every
modern \TeX\ distribution.

\begin{infoblock}{Attention!}
From  version  2.1  on,  \checkcites\  relies  on  specific  libraries
available in the \verb|texlua| ecosystem  and thus is not be supported
in  vanilla \verb|lua|  interpreters.  Please make  sure  to use  this
script with  an updated \verb|texlua|  interpreter in order  to ensure
the correct behaviour.
\end{infoblock}

\section{How the script works}
\label{sec:howto}

\checkcites\  uses  the  generated   auxiliary  files  to  start  the
analysis. From version 2.0 on, the scripts supports two backends:

\begin{description}
\item[\texttt{bibtex}]   Default   behavior,    the   script   checks
\verb|.aux|   files   looking  for   citations,   in   the  form   of
\verb|\citation{a}|.   For   every   \verb|\citation|   line   found,
\checkcites\  will   extract  the  citations   and  add  them   to  a
table,  even  for  multiple   citations  separated  by  commas,  like
\verb|\citation{a,b,c}|.  The citation  table  contains no  duplicate
values. At  the same  time \checkcites\  also looks  for bibliography
data,  in  the  form  of  \verb|\bibdata{a}|.  Similarly,  for  every
\verb|\bibdata| line found, the  script will extract the bibliography
data and add them  to a table, even if they  are separated by commas,
like \verb|\bibdata{d,e,f}|. Again, no  duplicate values are allowed.
Stick with this backend if you  are using Bib\TeX\ or Bib\LaTeX\ with
the \verb|backend=bibtex| package option.

\item[\texttt{biber}]   With   this   backend,  the   script   checks
\verb|.bcf| files (which are XML-based) looking for citations, in the
form of  \verb|bcf:citekey| tags.  For every tag  found, \checkcites\
will extract  the corresponding values and  add them to a  table. The
citation  table  contains  no  duplicate values.  At  the  same  time
\checkcites\  also  looks  for  bibliography data,  in  the  form  of
\verb|bcf:datasource|  tags.  Similarly,  for every  tag  found,  the
script will  extract the bibliography data  and add them to  a table.
Again, no  duplicate values are  allowed. Stick with this  backend if
you  are  using Bib\LaTeX\  with  the  default  options or  with  the
\verb|backend=biber| option explicitly set. It  is important to note,
however, that the \verb|glob=true| option is not supported yet.
\end{description}

\begin{infoblock}{Attention!}
If  \verb|\citation{*}|  (Bib\TeX)  or  simply  \verb|*|  (Bib\LaTeX)
is   found,  \checkcites\   will   issue  a   message  telling   that
\verb|\nocite{*}| is in the \verb|.tex| document, but the script will
do the check nonetheless.
\end{infoblock}

Now,  \checkcites\ will  extract  all entries  from the  bibliography
files found  in the previous  steps, regardless of which  backend was
used. For  every element in  the bibliography data table,  the script
will  look  for entries  like  \verb|@BOOK|,  \verb|@ARTICLE| and  so
forth~--~we actually use  pattern matching for this~--  and add their
identifiers to a table. No duplicate values are allowed.

\begin{infoblock}{Attention!}
If \checkcites\ cannot  find a certain bibliography  file, the script
ends. Make sure  to put the correct name of  the bibliography file in
your \verb|.tex| file.
\end{infoblock}

Let  there be  $A$  and $B$  the sets  of  citations and  references,
respectively.  In  order   to  get  all  unused   references  in  the
\verb|.bib| files, we compute the set difference:
\[
B - A = \{ x : x \in B, x \notin A \}.
\]
Similarly,  in  order   to  get  all  undefined   references  in  the
\verb|.tex| file, we compute the set difference:
\[
A - B = \{ x : x \in A, x \notin B \}.
\]
If there are either unused or undefined references, \checkcites\ will
print them  in a list  format. In Section~\ref{sec:usage} there  is a
more complete explanation on how to use the script.

\section{Usage}
\label{sec:usage}

\checkcites\ is  very easy to  use. First of  all, let us  define two
files that will be used here to explain the script usage. Here is our
sample  bibliography  file  \verb|example.bib|, with  five  fictional
entries.

\begin{tcblisting}{colframe=DarkTurquoise,coltitle=black,listing only,
                   title=Bibliography file, fonttitle=\bfseries, breakable,
                   listing options={columns=fullflexible,basicstyle=\ttfamily}}
@BOOK{foo:2012a,
  title = {My Title One},
  publisher = {My Publisher One},
  year = {2012},
  editor = {My Editor One},
  author = {Author One}
}

@BOOK{foo:2012b,
  title = {My Title Two},
  publisher = {My Publisher Two},
  year = {2012},
  editor = {My Editor Two},
  author = {Author Two}
}

@BOOK{foo:2012c,
  title = {My Title Three},
  publisher = {My Publisher Three},
  year = {2012},
  editor = {My Editor Three},
  author = {Author Three}
}

@BOOK{foo:2012d,
  title = {My Title Four},
  publisher = {My Publisher Four},
  year = {2012},
  editor = {My Editor Four},
  author = {Author Four}
}

@BOOK{foo:2012e,
  title = {My Title Five},
  publisher = {My Publisher Five},
  year = {2012},
  editor = {My Editor Five},
  author = {Author Five}
}
\end{tcblisting}

The second  file is  our main \LaTeX{}  document, \verb|document.tex|.
Observe that we will stick with  Bib\TeX\ for now and check Bib\LaTeX\
later on.

\begin{tcblisting}{colframe=DarkTurquoise,coltitle=black,listing only,
                   title=Main document, fonttitle=\bfseries,
                   listing options={columns=fullflexible,basicstyle=\ttfamily}}
\documentclass{article}

\begin{document}

Hello world \cite{foo:2012a,foo:2012c},
how are you \cite{foo:2012f},
and goodbye \cite{foo:2012d,foo:2012a}.

\bibliographystyle{plain}
\bibliography{example}

\end{document}
\end{tcblisting}

Open a terminal and run \verb|checkcites|:

\begin{terminal}
\begin{verbatim}
$ checkcites
\end{verbatim}
\tcblower\small
\begin{verbatim}
     _           _       _ _
 ___| |_ ___ ___| |_ ___|_| |_ ___ ___
|  _|   | -_|  _| '_|  _| |  _| -_|_ -|
|___|_|_|___|___|_,_|___|_|_| |___|___|

checkcites.lua -- a reference checker script (v2.8)
Copyright (c) 2012, 2019, Enrico Gregorio, Paulo Cereda
Copyright (c) 2024, Enrico Gregorio, Island of TeX

--------------------------------------------------------------------------
I am sorry, but you have not provided any command line argument, including
files to check and options. Make sure to invoke the script with the actual
arguments. Refer to the user documentation if you are unsure of how this
tool works. The script will end now.
\end{verbatim}
\end{terminal}

If  you   do  not  have   \checkcites\  installed  with   your  \TeX\
distribution, you can run the standalone script \verb|checkcites.lua|
with  either  \verb|texlua|  or   \verb|lua|.  We  recommend  to  use
\verb|texlua|,  as   it  is  shipped   with  all  the   modern  \TeX\
distributions:
\begin{terminal}
\begin{verbatim}
$ texlua checkcites.lua
\end{verbatim}
\end{terminal}

When you run \checkcites\ without providing any argument to it, the a
message  error  will  appear.  Do  not  panic!  Try  again  with  the
\verb|--help| flag:
\begin{terminal}
\begin{verbatim}
$ checkcites --help
\end{verbatim}
\tcblower\small
\begin{verbatim}
     _           _       _ _
 ___| |_ ___ ___| |_ ___|_| |_ ___ ___
|  _|   | -_|  _| '_|  _| |  _| -_|_ -|
|___|_|_|___|___|_,_|___|_|_| |___|___|

checkcites.lua -- a reference checker script (v2.8)
Copyright (c) 2012, 2019, Enrico Gregorio, Paulo Cereda
Copyright (c) 2024, Enrico Gregorio, Island of TeX

Usage: checkcites.lua [ [ --all | --unused | --undefined ] [ --backend
<arg> ] <file> [ <file 2> ... <file n> ] | --help | --version ]

-a,--all           list all unused and undefined references
-u,--unused        list only unused references in your bibliography files
-U,--undefined     list only undefined references in your TeX source file
-c,--crossrefs     enable cross-reference checks (disabled by default)
-b,--backend <arg> set the backend-based file lookup policy
-j,--json <file>   export the generated report as a JSON file
-h,--help          print the help message
-v,--version       print the script version

Unless specified, the script lists all unused and undefined references by
default. Also, the default backend is set to "bibtex". Please refer to the
user documentation for more details.
\end{verbatim}
\end{terminal}

Since  we  are  using  Bib\TeX,  we   do  not  need  to  set  up  the
backend!  Simply  provide  the  auxiliary file~--~the  one  with  the
\verb|.aux|  extension~--~which is  generated when  you compile  your
main \verb|.tex|  file. For example,  if your main document  is named
\verb|foo.tex|, you probably  have a \verb|foo.aux| file  too. Let us
compile our sample document \verb|document.tex|:
\begin{terminal}
\begin{verbatim}
$ pdflatex document.tex
\end{verbatim}
\end{terminal}

After running \verb|pdflatex| on our \verb|.tex| file, there is now a
\verb|document.aux| file in our work directory.

\begin{tcblisting}{colframe=DarkTurquoise,coltitle=black,listing only,
                   title=Auxiliary file, fonttitle=\bfseries,
                   listing options={columns=fullflexible,basicstyle=\ttfamily}}
\relax 
\citation{foo:2012a}
\citation{foo:2012c}
\citation{foo:2012f}
\citation{foo:2012d}
\citation{foo:2012a}
\bibstyle{plain}
\bibdata{example}
\end{tcblisting}

Now we can run \checkcites\ on the \verb|document.aux| file:

\begin{terminal}
\begin{verbatim}
$ checkcites document.aux
\end{verbatim}
\tcblower\small
\begin{verbatim}
     _           _       _ _
 ___| |_ ___ ___| |_ ___|_| |_ ___ ___
|  _|   | -_|  _| '_|  _| |  _| -_|_ -|
|___|_|_|___|___|_,_|___|_|_| |___|___|

checkcites.lua -- a reference checker script (v2.8)
Copyright (c) 2012, 2019, Enrico Gregorio, Paulo Cereda
Copyright (c) 2024, Enrico Gregorio, Island of TeX

Great, I found 4 citations in 1 file. I also found 1 bibliography file. Let
me check this file and extract the references. Please wait a moment.

Fantastic, I found 5 references in 1 bibliography file. Please wait a
moment while the reports are generated.

--------------------------------------------------------------------------
Report of unused references in your TeX document (that is, references
present in bibliography files, but not cited in the TeX source file)
--------------------------------------------------------------------------

Unused references in your TeX document: 2
=> foo:2012b
=> foo:2012e

--------------------------------------------------------------------------
Report of undefined references in your TeX document (that is, references
cited in the TeX source file, but not present in the bibliography files)
--------------------------------------------------------------------------

Undefined references in your TeX document: 1
=> foo:2012f
\end{verbatim}
\end{terminal}

As  we   can  see  in   the  script  output,   \checkcites\  analyzed
both  \verb|.aux|  and \verb|.bib|  files  and  managed to  find  two
unused  references   in  the   bibliography  file~--~\verb|foo:2012b|
and   \verb|foo:2012e|~--~and   one   undefined  reference   in   the
document~--~\verb|foo:2012f|.

\checkcites\ allows a couple of command  line flags that will tell it
how to behave. For example, check this command line:

\begin{terminal}
\begin{verbatim}
$ checkcites --unused document.aux
\end{verbatim}
\end{terminal}

The \verb|--unused|  flag will make  the script only look  for unused
references  in the  \verb|.bib|  file. The  argument  order does  not
matter, you can also run:

\begin{terminal}
\begin{verbatim}
$ checkcites document.aux --unused
\end{verbatim}
\end{terminal}

The script will behave the same. Similarly, you can use:

\begin{terminal}
\begin{verbatim}
$ checkcites --undefined document.aux
\end{verbatim}
\end{terminal}

The  \verb|--undefined|   flag  will   make  the  script   only  look
for  undefined  references  in  the \verb|.tex|  file.  If  you  want
\checkcites\ to look for both unused and undefined references, run:

\begin{terminal}
\begin{verbatim}
$ checkcites --all document.aux
\end{verbatim}
\end{terminal}

If no  special argument is provided,  the \verb|--all| flag is  set as
default.

Observe  that  our  example  relied  on  the  default  backend,  which
uses  Bib\TeX.  Let   us  change  our  document  a  bit   to  make  it
Bib\LaTeX-compliant:

\begin{tcblisting}{colframe=DarkTurquoise,coltitle=black,listing only,
                   title=Main document, fonttitle=\bfseries,
                   listing options={columns=fullflexible,basicstyle=\ttfamily}}
\documentclass{article}

\usepackage{biblatex}
\addbibresource{example.bib}

\begin{document}

Hello world \cite{foo:2012a,foo:2012c},
how are you \cite{foo:2012f},
and goodbye \cite{foo:2012d,foo:2012a}.

\printbibliography

\end{document}
\end{tcblisting}

As usual, let's compile our sample document \verb|document.tex|:

\begin{terminal}
\begin{verbatim}
$ pdflatex document.tex
\end{verbatim}
\end{terminal}

After running \verb|pdflatex| on our \verb|.tex| file, there is now a
\verb|document.aux| file in our work directory, as expected. However,
since  we are  using Bib\LaTeX\  as well,  there is  another file  of
interest  in  our  working  directory, one  that  has  a  \verb|.bcf|
extension! In  order to  run \checkcites\ on  that specific  file, we
need to provide the \verb|biber| backend:

\begin{terminal}
\begin{verbatim}
$ checkcites --backend biber document.bcf
\end{verbatim}
\end{terminal}

We can  even omit  the file extension,  the script  will automatically
assign one based on the current backend:

\begin{terminal}
\begin{verbatim}
$ checkcites --backend biber document
\end{verbatim}
\end{terminal}

Now, let  us run \checkcites\  on the \verb|.bcf| file,  providing the
\verb|biber| backend:

\begin{terminal}
\begin{verbatim}
$ checkcites --backend biber document.bcf
\end{verbatim}
\tcblower\small
\begin{verbatim}
     _           _       _ _
 ___| |_ ___ ___| |_ ___|_| |_ ___ ___
|  _|   | -_|  _| '_|  _| |  _| -_|_ -|
|___|_|_|___|___|_,_|___|_|_| |___|___|

checkcites.lua -- a reference checker script (v2.8)
Copyright (c) 2012, 2019, Enrico Gregorio, Paulo Cereda
Copyright (c) 2024, Enrico Gregorio, Island of TeX

Great, I found 4 citations in 1 file. I also found 1 bibliography file. Let
me check this file and extract the references. Please wait a moment.

Fantastic, I found 5 references in 1 bibliography file. Please wait a
moment while the reports are generated.

--------------------------------------------------------------------------
Report of unused references in your TeX document (that is, references
present in bibliography files, but not cited in the TeX source file)
--------------------------------------------------------------------------

Unused references in your TeX document: 2
=> foo:2012b
=> foo:2012e

--------------------------------------------------------------------------
Report of undefined references in your TeX document (that is, references
cited in the TeX source file, but not present in the bibliography files)
--------------------------------------------------------------------------

Undefined references in your TeX document: 1
=> foo:2012f
\end{verbatim}
\end{terminal}

If   you  rely   on  cross-references   in  your   bibliography  file,
\checkcites\  might complain  about  unused entries.  We  can try  the
experimental feature  available from version  2.3 on that  attempts to
check  cross-references through  the  \verb|--crossrefs| command  line
flag:

\begin{terminal}
\begin{verbatim}
$ checkcites --crossrefs document.aux
\end{verbatim}
\end{terminal}

This feature is disabled by default and  it is known to work with both
\verb|bibtex| and \verb|biber| backends. Please  report if you find an
issue.

From version 2.7 on, \checkcites\ can export the reference report to a
JSON file through the \verb|--json| command line flag:

\begin{terminal}
\begin{verbatim}
$ checkcites document.aux --json report.json
\end{verbatim}
\end{terminal}

The  script will  generate a  file named  \verb|report.json| with  the
following structure and content:

\begin{tcblisting}{colframe=DarkTurquoise,coltitle=black,listing only,
  title=JSON file, fonttitle=\bfseries, breakable,
  listing options={columns=fullflexible,basicstyle=\ttfamily}}
{
  "settings" : {
    "backend" : "bibtex",
    "operation" : "list all unused and undefined references",
    "crossrefs" : false
  },
  "project" : {
    "forcibly_cite_all" : false,
    "bibliographies" : [ "example" ],
    "citations" : [ "foo:2012a", "foo:2012c",
                    "foo:2012f", "foo:2012d" ],
    "crossrefs" : []
  },
  "results" : {
    "unused" : {
      "active" : true,
      "occurrences" : [ "foo:2012b", "foo:2012e" ]
    },
    "undefined" : {
      "active" : true,
      "occurrences" : [ "foo:2012f" ]
    }
  }
}  
\end{tcblisting}

Note  that   the  JSON   file  has  three   main  groups.   The  first
group  contains  the execution  settings  and  has the  backend  used,
a  description   of  the   operation  being  performed,   and  whether
cross-references  checks  were  enabled.  The  second  group  contains
relevant information  about the  project itself,  such as  whether all
references will  be cited (when  \verb|\nocite{*}| is found),  and the
list of bibliographies, citations and cross-references found. Finally,
the  third  group  contains  the  analysis  results,  with  a  special
\verb|active|  key that  indicates whether  that particular  check has
been performed, and a list of occurrences. That is all, folks!

\section{License}
\label{sec:license}

This        script        is         licensed        under        the
\href{http://www.latex-project.org/lppl/}{\LaTeX\    Project   Public
License}.  If  you   want  to  support  \LaTeX{}   development  by  a
donation,   the  best   way   to   do  this   is   donating  to   the
\href{http://www.tug.org/}{TeX Users Group}.

\end{document}