[aur-dev] [PATCH] Make external links in comments clickable (FS#20137).

PyroPeter abi1789 at googlemail.com
Thu Sep 30 14:56:56 EDT 2010

On 09/30/2010 06:38 PM, Lukas Fleischer wrote:
> On Thu, Sep 30, 2010 at 06:18:24PM +0200, PyroPeter wrote:
>>> +  $url = str_replace('&','&', $url);
>>> +  $url = str_replace('&', '&', $url);
>> What about the occurrences of "&(html-entity-code-here);" you
>> produced the line before?
> Nothing? Any occurrence of an HTML entity code is correctly encoded as
> "&". People shouldn't be able to manually insert HTML entities in
> comments. The first line is actually even superfluous as I realized just
> now since ampersands should already have been replaced by
> htmlspecialchars() before at the time this line is executed (didn't
> check that before, this part of code has been extracted from the
> DokuWiki plugin).

Well, but you are encoding existing entities, that are not "&" as
"&foo;". See the example below.

>>> +  $patterns[] = '(\b(?i)www?(?-i)\.[' . $host . ']+?\.[' . $host . ']+?[' . $any . ']+?(?=[' . $punc . ']*[^' . $any . ']))';
>>> +  $patterns[] = '(\b(?i)ftp?(?-i)\.['. $host . ']+?\.[' . $host . ']+?[' . $any . ']+?(?=[' . $punc . ']*[^' . $any . ']))';
>> I am not that experienced with PHP, but this looks like the $patterns
>> array got replaced instead of extended.
> Nope, it doesn't. Check [1].

I see, "$var[] = foo" creates the array $var if necessary and appends

>>> +  $comment = htmlspecialchars($comment);
>> Won't this render the next instruction useless if there are
>> html-characters in a link?
> Nope. Links need to be escaped as well. Not sure what happens if a link
> contains quotes or "<"/">". This shouldn't happen too often tho.

Imo, you should split the message at the link boundaries.
( "foo ", "http://foo.bar.tld", " baz")
Then you should encode the html-entities in all elements, wrap the links
in <a>'s, and then join all that together.

I can not think of a way to connect <a>-tags with proper encoding of
the user input using just "normal" string functions like replace, or

I would be happy if you could prove the opposite, and will help by
providing you with input that breaks your system.

== example 1 ==

input: "foo http://foo.tld/iLikeToUseApersands/foo&bar.html baz"

If I am not mistaken, $regex would be
(are the "/" correctly escaped? I will assume they are.)

Then, $regex would be:

$comment would be set by htmlspecialchars() to:
"foo http://foo.tld/iLikeToUseApersands/foo&amp;bar.html baz"

=> preg_replace_callback() would not match, as & got replaced.

>> Generally I would not make hostnames ("www.foo.tld") clickable.
>> If people are not able to provide proper URL's, they have a serious
>> problem. (there is also the technical argument that the hostname is not
>> a good indicator for the kind of service the host provides.)
> Why not? What if you explicitly want to link to a project's home page?
> It'll also just convert hostnames if they start with a "www" or "ftp"
> subdomain, so comments refering to domains in other ways won't be
> converted.

You can also link to a homepage using valid URL's. The additional
"feature" may be nice, but makes the code more complex. It also
trains users to omit the "http://" and produces more work for devs,
as they all now have to parse this invalid hostname+path stuff.

Unrelated: You seem to accept only a-zA-Z in hostnames? Or does
PHP's \w include 0-9 and language-dependent letters? What about

Why does the <a>'s content only include the Path of the URL?

Regards, PyroPeter
freenode/pyropeter                          "12:50 - Ich drücke Return."

More information about the aur-dev mailing list