ZF-7323: Zend_Validate_Hostname not properly validating .com & .net hostnames.

Description

Initially caught through an issue with email validation through Zend_Validate_EmailAddress, I've discovered that the root of the issue happens to be Zend_Validate_Hostname not properly validating hostnames. It's allowing hostnames such as ya#hoo.com and p!ace.com validate.

I've tested this with .com, .net, .org, .it and .ly domain names. .com and .net are the only ones that claims "ya#hoo" to be valid, which makes sense given my explanation of possible cause below.

Sample code: require_once ('Zend/Loader/Autoloader.php'); $al = Zend_Loader_Autoloader::getInstance();

$host = 'ya#hoo.com'; $validator = new Zend_Validate_Hostname();

if ( $validator->isValid($host)) { echo "Valid"; } else { echo "Invalid"; }

Expected Results: Invalid Actual Results: Valid

Possible issue: Within Zend_Validate_Hostname during the $domainParts validation, "ya#hoo" is actually passing the $regexKey 5 (index 5 under $regexChars) validation (which is far too long to past in here) on line number 497 in Zend/Validate/Hostname.php. The regex itself is over 8k characters long, and a little daunting... I think I'll yield this to the original creator of that regex.

Edit: Update to issue; These values, in their entirety (including the local-part and the @ sign), are validating as valid hostnames: 'place@y*ahoo.com', 'place@ya&hoo.com'

Comments

Aren't those valid hostnames with IDN?

I'm sure the original authors can shed more light on this. I did my research before posting this, and all I could find is RFC 1035 stating a domain is any combination of letters (A-z), digits (0-9), and hyphen (-).

Are there more RFCs out there that define the domain differently under other situations?

It's also worth noting these domains are validating as "Valid" as well: 'place@yah&oo.com'; 'place@y*ahoo.com';

Added a couple additional values that are validating as "Valid" hostnames.

Then take a look at RFC-3490, RFC-3491, RFC-3492, and the COM registrar: http://verisign.com/domain-name-services/…

Since 2002 also non-english characters are allowed. RFC-1035 is outdated since more than 7 years.

Okay, my analytical hat is on.

I'm looking over the RFCs and I'm still not seeing how it would allow non-word characters within domain names. RFC-3490 still references 1034 and 1035. To my knowledge on RFCs, if an RFC still references a 7 year old RFC, it's still up to date. Only when an RFC is deemed obsolete is it "invalid." I don't see 1035 or 1034 being deemed obsolete anywhere.

RFC-3490 is also about unicode mapping. It's domain name application layer that converts unicode domains into ascii counter-parts to allow them to be used within the current DNS structure, defined in 4343, referencing 1034 and 1035 for domain name make-up.

Now, I understand that the ZFW can't exactly just take a unicode domain name and convert it into its ascii counter-part, through VeriSign, it would appear. But that shouldn't stop ZFW from blocking non-valid already ascii based domain names.

Everything I'm reading still says that "place@ya*hoo.com" is an invalid domain name. Am I still missing something here?

Search at Verisign for the supported charset tables within COM and NET domains. In fact Verisign as main COM registrar allows the usage of IDN characters since 2001/2.

This means that also domain-names like "MyÉrâtà.com" are supported which would be denied by RFC-1035. Note that IDN support extends RFC-1035.

As several TLDs do not accept IDN the "old" RFC is still active for them.

I see how "MyÉrâtà.com" would be denied within RFC-1035. That's where RFC-3490 comes in. RFC-3490 defines the mapping of "MyÉrâtà.com" to "xn--myrt-3nak1c.com" (that's exact mapping, btw) which is supported within RFC-1035. That's what I see RFC-3490 being used for. It technically doesn't define additional character sets for domain names. It defines how to convert the extended characters sets to the ASCII character set to be used within the current DNS systems, under RFC-1034/5.

There is one big difference...

In your browser you will probably add "MyÉrâtà.com" and not "xn--myrt-3nak1c.com". Or do you think that user will be able to convert from IDN to punycode manually ?

Your browser does the IDN to punycode conversion internally in the background. But this does not mean that the validator has to work this way.

For simplicity, and usability it accepts all characters which are also accepted by IANA and their TLD registrars.

Changing this as you propose, to deny all non-RFC 1034 chars means also to deny all IDN hostnames.

You are right. I'm not suggesting to deny unicode domains, but merely suggesting an additional check. "place@ya&hoo.com" is clearly an invalid domain per RFC 1034/5, and not superceeded by 3490.

Would a potiential solution be first checking to see if the domain part, or domain as a whole, is ASCII or unicode and then perform the validations specific to that? I seem to recall dscussions somewhere about determining whether a string is Unicode or ASCII at one point in time. I do not recall if that was possible or not.

No this would be no solution.

Chars like "ä", "ü", "ß" are not accepted by RFC1034 but are still valid ASCII codes. Additionally you can't expect that a user gives a IDN domain coded in Unicode.

Also this whole discussion is useless as the mentioned chars "#", "@" and so on are NOT accepted for hostnames within the next minor release.

You may have noted that this issue is marked as FIXED and not as WON'T FIX or NOT REPRODUCABLE.

Point taken. My apologies, I did forget about the extended-ascii character set. I'll look for the fix in the next release. Is there a place I can browse to see the actual patch fix? So that I may implemented it my side (as it's actually affecting some public sites of ours when users type in "place@#yahoo.com" by accident). I thought there would be a patch tagged with this issue, but I don't see it.

As always all changes are available within SVN. See the wiki for details. Patches are only created by users when they don't have SVN access and want others to add their patch to ZF.