Discussion:
[tug-summer-of-code] A hyperlinked and highlighted version of sample2e.tex
Jonathan Fine
2009-03-22 17:37:17 UTC
Permalink
Hi

This is about displaying and navigating TeX source with a web browser.

As you probably know, sample2e.tex is a brief introduction-by-example to
LaTeX, and I'm using it as my example file.

Today I gave myself a one-day project, which was to syntax highlight
this file, and provide hyperlinks for control sequences.

I'm finished now, and you can see what I've done at:
<http://pytex.svn.sourceforge.net/viewvc/pytex/trunk/pytex/sandbox/jfine/macroload/sample2e.html?revision=65>

You can see (highlighted) the Python script that produced this output at:
<http://pytex.svn.sourceforge.net/viewvc/pytex/trunk/pytex/sandbox/jfine/macroload/highlight.py?revision=65&view=markup>

BTW, this output has hyperlinks, but the targets don't exist yet. That
is another project.

You can see SourceForge's highlighted version of sample2e.tex at:
<http://pytex.svn.sourceforge.net/viewvc/pytex/trunk/pytex/sandbox/jfine/macroload/sample2e.tex?view=markup>

You might ask: Why don't you just use the same highlighting tool as
Sourceforge? I have two main reasons for writing my own. First, I want
to put hyperlinks in the output, and Pygments (used by Sourceforge)
doesn't do this (and can't easily be extended to do this). Second, I
want a simple and reliable tokenizer for TeX/LaTeX input that does not
change category codes.

I've copied the TUG Google Summer of Code mailing list because work in
this direction was one of the projects TUG was proposing. Sadly, TUG
did not this year accept us as a mentoring organisation.
--
Jonathan
Arthur Reutenauer
2009-03-22 19:25:39 UTC
Permalink
Sadly, TUG did not
this year accept us as a mentoring organisation.
I hope you mean Google :-)

Arthur
Jonathan Fine
2009-03-22 20:01:20 UTC
Permalink
Post by Arthur Reutenauer
Sadly, TUG did not
this year accept us as a mentoring organisation.
I hope you mean Google :-)
Yes, you're right here. I was wanting to say that "TUG was not accepted
this year ...", but changed construction in mid-phrase.

Arthur: Do you have any helpful or encouraging comment about the work I
reported in my message?
--
Jonathan
Arthur Reutenauer
2009-03-22 22:16:07 UTC
Permalink
Post by Jonathan Fine
Arthur: Do you have any helpful or encouraging comment about the work I
reported in my message?
I wondered about how you handled things like:

\newcommand{\ip}[2]{(#1, #2)}

in the current code. Both control sequences are linked to an
independent page, which I think is not right: \newcommand should of
course have a page of its own (I understand it still needs to be
written; that's why the link is broken); but \ip certainly shouldn't,
since it's only defined in the document.

Maybe trying to do both syntax highlighting and command reference in
the same code is too much; do you have any plan on how to deal with
that?

Arthur
Jonathan Fine
2009-03-24 06:59:03 UTC
Permalink
Post by Arthur Reutenauer
Post by Jonathan Fine
Arthur: Do you have any helpful or encouraging comment about the work I
reported in my message?
\newcommand{\ip}[2]{(#1, #2)}
in the current code. Both control sequences are linked to an
independent page, which I think is not right: \newcommand should of
course have a page of its own (I understand it still needs to be
written; that's why the link is broken); but \ip certainly shouldn't,
since it's only defined in the document.
Arthur: Thank you for this.

Yes. This is a little bit thorny. However, I'm confident that the
hyperlinks will be useful even if some of them are broken, and so I this
I'm inclined to be guided by the experience of early users.

Here are some possible solutions:

1. Have the hyperlinker understand \newcommand.
2. Link only to commands that are in a dictionary.
3. Have the document state up front, that such-and-such are the
commands it provides. (This is like 1, except easier for me).
4. Do nothing.

I'm inclined to have simple constructs like \newcommand understood, but
implementing this is not yet a priority. However, (2) might also be
sensible, if the dictionary is large enough.
Post by Arthur Reutenauer
Maybe trying to do both syntax highlighting and command reference in
the same code is too much; do you have any plan on how to deal with
that?
I've written only about 200 lines of fairly simple code, so the it's not
yet time for refactoring.

I think my next priority is to write a 'dtx2html' translator, so that I
can produce some pages to link to.

My other priority is to write regular expressions that will hilight and
hyperlink style files (rather than tex documents).
--
Jonathan
Ross Moore
2009-03-22 22:40:11 UTC
Permalink
Post by Jonathan Fine
Post by Arthur Reutenauer
Post by Jonathan Fine
Sadly, TUG
did not this year accept us as a mentoring organisation.
I hope you mean Google :-)
Yes, you're right here. I was wanting to say that "TUG was not
accepted this year ...", but changed construction in mid-phrase.
Arthur: Do you have any helpful or encouraging comment about the
work I reported in my message?
Surely there should be some extra blank lines (via <br/>)
Post by Jonathan Fine
Post by Arthur Reutenauer
Post by Jonathan Fine
The ends of words and sentences are marked
by spaces. It doesn't matter how many
spaces you type; one is as good as 100. The
end of a line counts as a space.
One or more blank lines denote the end
of a paragraph.
Otherwise the intention is lost.


Also, it's interesing that the HTML coding you generate
has no DOCTYPE, so cannot be validated, except in the most
generic sense.
Post by Jonathan Fine
Post by Arthur Reutenauer
Post by Jonathan Fine
<p class=tex><a class=csname href="cs/end.html">\end</a><span
class=chars>{document}
I'd prefer to see home-grown attribute values properly quoted; viz,
Post by Jonathan Fine
Post by Arthur Reutenauer
Post by Jonathan Fine
<p class="tex"><a class="csname" href="cs/end.html">\end</
a><span class="chars">{document}
so as to be more consistent with XHTML usage.
That way you'd be able to Copy/Paste, or otherwise include, such
automatically-generated HTML coding within Wikis, etc.


I have other comments too, regarding source-code layout.
But these aren't generally adopted.


e.g., I really hate the use of \ip as a macro name.

1- and 2-letter (lowercase) names for home-grown macros
should be completely discouraged, as there is too great
a chance of a clash with the macro for a letter or special
character that can occur in a foreign script or language.
(Just think of what this would do to someone's name that
may need that letter, in a bibliographic entry, say.)


When producing materials that are intended to teach people
how to use LaTeX, then some thought should be given to
avoiding such undesirable practices.
Post by Jonathan Fine
--
Jonathan
Hope this helps,

Ross


------------------------------------------------------------------------
Ross Moore ross at maths.mq.edu.au
Mathematics Department office: E7A-419
Macquarie University tel: +61 (0)2 9850 8955
Sydney, Australia 2109 fax: +61 (0)2 9850 8114
------------------------------------------------------------------------
Jonathan Fine
2009-03-24 07:12:50 UTC
Permalink
Post by Ross Moore
Surely there should be some extra blank lines (via <br/>)
Post by Jonathan Fine
The ends of words and sentences are marked
by spaces. It doesn't matter how many
spaces you type; one is as good as 100. The
end of a line counts as a space.
One or more blank lines denote the end
of a paragraph.
If you look at the HTML closely enough, I think you'll see that there is
extra vertical space between paragraphs. (That's what I get on FF2.)

This sort of think is controlled by a style-sheet, which is a present
hard coded.
Post by Ross Moore
Also, it's interesing that the HTML coding you generate
has no DOCTYPE, so cannot be validated, except in the most
generic sense.
I ran out of time. Like the style-sheet, top and tail of the page comes
from a couple of simple string templates. So this is easy to fix, or to
make configurable.
Post by Ross Moore
Post by Jonathan Fine
<p class=tex><a class=csname href="cs/end.html">\end</a><span
class=chars>{document}
I'd prefer to see home-grown attribute values properly quoted; viz,
Post by Jonathan Fine
<p class="tex"><a class="csname" href="cs/end.html">\end</a><span
class="chars">{document}
Well, I suppose that depends on the DOCTYPE. I can see benefits in
being able to apply a parser to validate the output, and if the
validation requires the quote then we supply the quotes.

The href, and perhaps also the class names, need to be configurable.
I'm not sure yet what the best way of solving this problem is, in part
because I've not yet had to configure the system.
Post by Ross Moore
so as to be more consistent with XHTML usage.
That way you'd be able to Copy/Paste, or otherwise include, such
automatically-generated HTML coding within Wikis, etc.
Copy-and-paste is really important, as is being able to incorporate
snippets into another page.

At some point I should sit down and see how Pygments (the widely used
hilighter coded in Python) does these various things. One of my goals
is to write something that will work 'drop in' on sites that use Pygments.
Post by Ross Moore
I have other comments too, regarding source-code layout.
But these aren't generally adopted.
e.g., I really hate the use of \ip as a macro name.
The sample file comes from the LaTeX2e distribution, and I'm not going
to change it. But I can process some other sample files.
Post by Ross Moore
1- and 2-letter (lowercase) names for home-grown macros
should be completely discouraged, as there is too great
a chance of a clash with the macro for a letter or special
character that can occur in a foreign script or language.
(Just think of what this would do to someone's name that
may need that letter, in a bibliographic entry, say.)
When producing materials that are intended to teach people
how to use LaTeX, then some thought should be given to
avoiding such undesirable practices.
Perhaps there's a call for a TeX-focussed site similar to
http://www.djangosnippets.org/
--
Jonathan
Loading...