Validation in Depth – a retort to using just regular expressions

I’ve noticed that Richard Heyes, who professes himself to be a php guru, deleted my comment on his “Some common regular expressions” posting which simply pointed out his expressions didn’t quite do the job and suggested a few PEAR packages that should be used instead of the expressions that he proffered for the following:

  • Email addresses
  • Usernames
  • Telephone numbers
  • Postal codes
  • IP addresses
  • An SQL date
  • A domain
  • A UK sort code

Why he deleted it is anybody’s guess – he deleted a few others too.

Anyway, for the record I thought I’d reproduce my comment from memory (I didn’t think to make a backup copy for obvious reasons but hey nobody expects the Spanish Inquisition).

The problem with just relying on a regular expression for validating data is there is no “defense in depth” to that solution. Sure the expression might catch the main bulk of data entered but there’s always going to be data that get’s through.

For example a simple regular expression for validating phone numbers won’t catch area codes or country that don’t actually exist and another that’s used for validating entered dates might not catch leap-year based exceptions.

  • Email addresses – use the PEAR Validate package for email address validation
  • Usernames
  • Telephone numbers – use Validate_UK; this package will also validate UK specific details such as:
    • SSN (National Insurance/IN)
    • Postal Code
    • Sort Code
    • Bank AC
    • Car registration numbers
    • Passports
    • Driver license
  • Postal codes – use Validate_UK or counterpart as appropriate.
  • IP addresses – use the Net_Check PHP5 port of Net_CheckIP or the original Net_CheckIP for php4 if you really have to.
  • An SQL date – what Richard provided validates the form of a date in yyyy-mm-dd format but not that the entered value is a date; one could enter 2008-13-42. Again, I’d suggest using the Validate package.
  • A domain – You could, in theory use the Validate package’s uri method, prefixing the domain with ‘http://’.
  • A UK sort code – Validate_UK.
  • If you follow these suggestions it should make your input validation more robust than simply relying on regular expressions and nothing more.

    12 Responses to “Validation in Depth – a retort to using just regular expressions”

    1. I saw your comment Ken, it didn’t go unheard!

    2. I’m seeing a few of these slowly creep into the Zend Framework as proposals – very few for now though. Have you looked into the possibility of discussing porting some of these to the ZF? At the moment some of the ZF validation I use wraps the PEAR Validate classes rather than running off to write my own for no good reason.

    3. kenguest says:

      Pádraic, I’m not sure that I would port them to ZF – is it not possible to simply import/reference third party packages, such as PEAR packages in this instance, in ZF? I suppose this is precisely what you are doing, come to think of it.

      At least that way there is no divergence involved and bug fixes as well as additional functionality would only have to be coded just once.

    4. Robin Mehner says:

      Hi there,

      why not use ip2long / long2ip for IP Validation? It’s builtin in PHP and if you only want to validate IPv4 addresses it should be sufficient.

      The only point that I see, is that you don’t want to have “127.0.0″ (which will be converted to 127.0.0.0) as valid address.

    5. kenguest says:

      Hi Robin,
      that’s true and as far as I can tell the Net_CheckIP packages validate IPv4 addresses, not IPv6 ones.

    6. Within my models I define a validation array, and you can define whether it’s a required field, whether you want to do a simple regex validation or something more in depth. You can pass in a third paramater to the field e.g.

      array(‘field_name’ => array(REQUIRED, ADVANCED))

      ^^ pseudo code

      By setting the advanced param to true it checks the model for a function called verify_field_name and passed in the $value. This allows for more advanced validation. That is where I then can perform application logic like checking foreign keys exist, and passing in values to more complex validation routines like the PEAR packages you suggested :)

      Thanks for the heads up on Validate_UK btw ;)

    7. Dave says:

      Richard Heyes comes across as a bit of a clown. After his amazing blogging entry filling us in on one of the best kept secrets of php http://www.phpguru.org/#295 I asked if he would mind not spamming the planet php feed with trivial articles or pricing changes for his commercial services (although i would be surprised if he has any customers) but my suggestions were met with total derision.

      Im not at all surprised he would do something like deleting your posts.

    8. Well, there goes mine too.

      Ken; do you think we could make a:
      Validate::assertBlogCommentViewPointIsAcceptable($my_world_view, $comment)

      ?

    9. Aisling says:

      Well done Ken ! You are the true guru of PHP !!

    10. kenguest says:

      @Aisling:
      very kind of you to think so ;-)

    11. scott says:

      There are multiple levels of IP address validation.

      Sadly, PEAR lacks an equivalent to the Perl CPAN module Data::Validate::IP.

      Filling this void is something I’d like to do, except I’m swamped right now, and while my PHP is acceptable, I know I’m not exactly writing PEAR/CPAN module-quality code.

    12. kenguest says:

      I’ve only just now had a look at the CPAN equivalent – it certainly seems a lot more comprehensive: tests for private/public/loopback/testnet/linklocal addresses. This certainly leaves the Check_IP[2] and Validate for dust!

      I wouldn’t be too worried about whether you’re current standard of code/development is up to scratch; speaking as a member of the PEAR community, we’re generally quick to give a friendly pointer on how something could be improved; and for major gaffes there’s always the PHP_CodeSniffer package that you can use to check that your code adheres to the PEAR (and other) Coding Standards.

      Your contributions don’t even have to be code – you could, for example, log suggestions against a likely package asking for better functionality; helping out with writing documentation.

      Thanks for the pointer ;-)