%% This is part of the OpTeX project, see http://petr.olsak.net/optex

\_codedecl \hisyntax {Syntax highlighting of verbatim listings <2022-04-04>} % preloaded in format

   \_doc -----------------------------
   The macros `\replfromto` and `\replthis` manipulate the
   verbatim text that is already stored in the `\_tmpb` macro.

   \`\replfromto` `{<from>}{<to>}{<replacement>}` finds the first
   occurrence of `<from>` and the first occurrence of `<to>` following it. The
   `<text>` between them is packed into `#1` and available to `<replacement>`
   which ultimately replaces `<text>`.

   `\replfromto` continues by finding next `<from>`, then, next `<to>`
   repeatedly over the whole verbatim text. If the verbatim text ends with
   opening `<from>` but has no closing `<to>`, then `<to>` is appended to the
   verbatim text automatically and the last part of the verbatim text is replaced too.

   The first two parameters are expanded before use of `\replfromto`. You can
   use `\csstring\%` or something else here.
   \_cod -----------------------------

\_def\_replfromto #1#2{\_edef\_tmpa{{#1}{#2}}\_ea\_replfromtoE\_tmpa}
\_def\_replfromtoE#1#2#3{% #1=from #2=to #3=replacement
   \_def\_replfrom##1#1##2{\_addto\_tmpb{##1}%
      \_ifx\_fin##2\_ea\_replstop \_else \_afterfi{\_replto##2}\_fi}%
   \_def\_replto##1#2##2{%
      \_ifx\_fin##2\_afterfi{\_replfin##1}\_else
        \_addto\_tmpb{#3}%
        \_afterfi{\_replfrom##2}\_fi}%
   \_def\_replfin##1#1\_fin{\_addto\_tmpb{#3}\_replstop}%
   \_edef\_tmpb{\_ea}\_ea\_replfrom\_tmpb#1\_fin#2\_fin\_fin\_relax
}
\_def\_replstop#1\_fin\_relax{}
\_def\_finrepl{}

   \_doc -----------------------------
   The \`\replthis` `{<pattern>}{<replacement>}` replaces each `<pattern>`
   by `<replacement>`.
   Both parameters of `\replthis` are expanded first.
   \_cod -----------------------------

\_def\_replthis#1#2{\_edef\_tmpa{{#1}{#2}}\_ea\_replstring\_ea\_tmpb \_tmpa}

\_public \replfromto \replthis ;

   \_doc -----------------------------
   The patterns `<from>`, `<to>` and `<pattern>` are not found when they are
   hidden in braces `{...}`. E.g.
   \begtt
   \replfromto{/*}{*/}{\x C{/*#1/*}}
   \endtt
   replaces all C comments by `\x C{...}`. The patterns inside `{...}` are
   not used by next usage of `\replfromto` or `\replthis` macros.

   The \`\_xscan` macro replaces occurrences of `\x` by `\z` in the post-processing
   phase. The construct `\x <letter>{<text>}` expands to `\_xscan {<letter>}<text>^^J^`.
   If `#3` is `\_fin` then it signals that something wrong happens, the
   `<from>` was not terminated by legal `<to>` when `\replfromto` did work.
   We must to fix this by using the \`\_xscanR` macro.
   \_cod -----------------------------

\_def\_xscan#1#2^^J#3{\_ifx\_fin#3 \_ea\_xscanR\_fi
   \z{#1}{#2}%
   \_ifx^#3\_else ^^J\_afterfi{\_xscan{#1}#3}\_fi}
\_def\_xscanR#1\_fi#2^{^^J}

   \_doc -----------------------------
   The \`\hicolor` `<letter> <color>` defines `\_z:<letter>{<text>}`
   as `{<color><text>}`. It should be used in the context of
   `\x <letter>{<text>}` macros.
   \_cod -----------------------------

\_def\_hicolor #1#2{\_sdef{_z:#1}##1{{#2##1}}}

   \_doc -----------------------------
   \`\hisyntax``{<name>}` re-defines default \^`\_prepareverbdata``<macro><verbtext>`,
   but in order to do it does more things:
   It saves `<verbtext>` to `\_tmpb`, appends `\n` around spaces and
   `^^J` characters in pre-processing phase, opens `hisyntax-<name>.opm`
   file if `\_hisyntax<name>` is not defined. Then `\_the\_hisyntax<name>`
   is processed. Finally, the post-processing phase is realized by setting
   appropriate values to the `\x` and `\y` macros and doing
   `\_edef\_tmpb{\_tmpb}`.
   \_cod -----------------------------

\_def\_hisyntax#1{\_def\_prepareverbdata##1##2{%
   \_let\n=\_relax \_let\b=\_relax \_def\t{\n\_noexpand\t\n}\_let\_start=\_relax
   \_adef{ }{\n\_noexpand\ \n}\_edef\_tmpb{\_start^^J##2\_fin}%
   \_replthis{^^J}{\n^^J\b\n}\_replthis{\b\n\_fin}{\_fin}%
   \_let\x=\_relax  \_let\y=\_relax \_let\z=\_relax \_let\t=\_relax
   \_hicomments % keeps comments declared by \commentchars
   \_endlinechar=`\^^M
   \_lowercase{\_def\_tmpa{#1}}%
   \_ifcsname _hialias:\_tmpa\_endcsname \_edef\_tmpa{\_cs{_hialias:\_tmpa}}\_fi
   \_ifx\_tmpa\_empty \_else
      \_unless \_ifcsname _hisyntax\_tmpa\_endcsname
          \_isfile{hisyntax-\_tmpa.opm}\_iftrue \_opinput {hisyntax-\_tmpa.opm} \_fi\_fi
      \_ifcsname _hisyntax\_tmpa\_endcsname
          \_ifcsname hicolors\_tmpa\_endcsname
              \_cs{_hicolors\_tmpa}=\_cs{hicolors\_tmpa}%
          \_fi
          \_ea\_the \_csname _hisyntax\_tmpa\_endcsname % \_the\_hisyntax<name>
          \_the\_hicolors  % colors which have precedece
      \_else\_opwarning{Syntax "\_tmpa" undeclared (no file hisyntax-\_tmpa.opm)}
   \_fi\_fi
   \_replthis{\_start\n^^J}{}\_replthis{^^J\_fin}{^^J}%
   \_def\n{}\_def\b{}\_adef{ }{\_dsp}%
   \_bgroup \_lccode`\~=`\ \_lowercase{\_egroup\_def\ {\_noexpand~}}%
   \_def\w####1{####1}\_def\x####1####2{\_xscan{####1}####2^^J^}%
   \_def\y####1{\_ea \_noexpand \_csname ####1\_endcsname}%
   \_edef\_tmpb{\_tmpb}%
   \_def\z####1{\_cs{_z:####1}}%
   \_def\t{\_hskip \_dimexpr\_tabspaces em/2\_relax}%
   \_localcolor
}}
\_public \hisyntax \hicolor ;

   \_doc -----------------------------
   Aliases for languages can be declared like this.
   When `\hisyntax{xml}` is used then this is the same as `\hisyntax{html}`.
   \_cod -----------------------------

\_sdef{_hialias:xml}{html}
\_sdef{_hialias:json}{c}

\_endcode % -------------------------------------------

The user can write

\begtt \adef/{\_csstring\\}
\begtt \hisyntax{C}
...
/endtt
\endtt
to colorize the code using C syntax.
The user can also write `\everytt={\hisyntax{C}}` to have all verbatim listings
colorized.

\^`\hisyntax``{<name>}` reads the file `hisyntax-<name>.opm` where the
colorization is declared. The parameter `<name>` is case insensitive and the
file name must include it in lowercase letters. For example, the file
`hisyntax-c.opm` looks like this:

\printdoc hisyntax-c.opm

\OpTeX/ provides `hisyntax-{c,lua,python,tex,html,kt}.opm` files.
You can take inspiration from these files and declare more languages.

Users can re-declare default colors by \^`\hicolors``={<list of color declarations>}`.
This value has precedence over `\_hicolors<name>` values declared in the
`hicolors-<name>.opm` file.
For example \^`\hicolors``={`\^`\hicolor`` S \Brown}` causes all strings in brown color.

Another way to set non-default colors is to declare
`\newtoks\hicolors<name>` (without the `_` prefix) and set the color palette there.
It has precedence before `\_hicolors<name>` (with the `_` prefix) declared in
the `hicolors-<name>.opm` file. You must re-declare all colors used in the
corresponding `hisyntax-<name>.opm` file.

\secl4 Notes for hi-syntax macro writers
The file `hisyntax-<name>.opm` is read only once and in a \TeX/ group.
If there are definitions then they must be declared as global.

The file `hisyntax-<name>.opm` must (globally) declare `\_hisyntax<name>` token list
where the action over verbatim text is declared typically by using the `\replfromto` or
`\replthis` macros.

The verbatim text is prepared by the {\em pre-processing phase}, then
`\_hisyntax<name>` is applied and then the {\em post-processing phase} does final
corrections. Finally, the verbatim text is printed line by line.

The pre-processing phase does:

\begitems
* Each space is replaced by {\visiblesp`\n\ \n`}, so `\n<word>\n` is the pattern for
  matching whole words (no subwords). The `\n` control sequence is removed in
  the post-processing phase.
* Each end of line is represented by `\n^^J\n`.
* The `\_start` control sequence is added before the verbatim text and the `\_end` control
  sequence is appended to the end of the verbatim text. Both are removed
  in the post-processing phase.
\enditems

Special macros are working only in a group when processing the verbatim
text.


\begitems
* `\n` represents nothing but it should be used as a boundary of words as mentioned above.
* `\t` represents a tabulator. It is prepared as `\n\t\n` because it can be at
  the boundary word boundary.
* `\x <letter>{<text>}` can be used as replacing text. Consider the example
  \begtt
  \replfromto{/*}{*/}{\x C{/*#1*/}}
  \endtt
  This replaces all C comments `/*...*/` by `\x C{/*...*/}`. But
  C comments may span multiple lines, i.e. the `^^J` should be inside it.

  The macro `\x <letter>{<text>}` is replaced by one or more occurrences of
  `\z <letter>{<text>}` in the post-processing phase, each parameter `<text>` of
  `\z` is from from a single line. Parameters not crossing line boundary are represented
  by `\x C{<text>}` and replaced by `\z C{<text>}` without any change.
  But:
  \begtt \catcode`\<=13
  \x C{<text1>^^J<text2>^^J<text3>}
  \endtt
  is replaced by
  \begtt \catcode`\<=13
  \z C{<text1>}^^J\z C{<text2>}^^J\z C{<text3>}
  \endtt
  `\z <letter>{<text>}` is expanded to `\_z:<letter>{<text>}` and if
  `\hicolor <letter> <color>` is declared then
  `\_z:<letter>{<text>}` expands to `{<color><text>}`. So, required color is
  activated for each line separately (e.g. for C comments spanning multiple lines).
* `\y {<text>}` is replaced by `\<text>` in the post-processing phase. It should
  be used for macros without a parameters. You cannot use unprotected macros
  as replacement text before the post-processing phase, because the post-processing
  phase is based on the expansion of the whole verbatim text.
\enditems

\endinput

2022-05-12 \hicolors can include only selected re-declarations.
2020-04-04 Introduced.