Regular expressions are great. Anyone who delves in to scripting should pick up a little knowledge of how they work. For me – and for the development of our recent products like Unify and Just Made My Day – they have become invaluable. They work in almost any scripting language (JS, PHP, ASP, etc.) and they will save you a lot of time.
To understand their importance, think about the last time you tried to recall a TV show or song to someone but couldn’t recall the exact name. You probably said something like “It starts with a ‘P’” or maybe you hummed a few bars. And then… Eureka! They knew what you were talking about. Regular expressions are the equivalent of humming a few bars of code so that your script knows what you need.
Now, there are plenty of great resources out there that will help you get started with these crafty little expressions. I am not here to talk about them, but if you need some bearings, check out these links:
The following are the most valuable and extensible Regular Expressions that I use:
Disclaimer – These work for me in the context of code that I have written. They may not work in your code, but give them a whirl anyway. I hope they save you some time!
Tag Open w/ Specific Class
Find any HTML tag with a specific class. In this example, that class is “unicorns”. Make sure it is case insensitive for IE.
/<[^>]*class=(?:\"|\'|)[^>]*\bunicorns\b[^>]*?(?:\"|\'|)[^>]*>/i
PHP include
Find PHP includes on your pages. Returns the include path in variable 3.
/(include|include_once|require|require_once)\s*\(*(\"|\')([^\"\'<>]*)(\"|\')\)*;/
SSI – Apache
Find Apache Server-Side Includes. Returns the include path in variable 2.
/<!--#include\svirtual\=(\"|\')([^\"\'<>]*)(\"|\')\s*-->/
PHP on the page
Find any on page PHP in the midst of your HTML. Handy for removing scripts while cleaning up code.
/<\?(.|\n|\r)+?\?>/
PHP only file
Check the contents of a file to see if it only contains PHP.
/^<\?php([^?]|\?[^>])*\?>$/
Any Specific Tag + Contents
Find specific tags and return them and their contents. In this case, I have used the invalid and little known “spaghetti” tag. Again, case insensitive for IE.
/<spaghetti[^>]*>(.|\n)*?<\/\s*spaghetti>/i
Meta – Charset
Find the given charset in the page contents. Returns the charset in variable 1. Case insensitive.
/<meta[^>]*charset=\"*([^\"<]+)/i
DocType
Find the given DOCTYPE of a page contents. Case insensitive.
/<!DOCTYPE[^>]*>/i
TinyMCE Attributes
Working with TinyMCE can yield unexpected attributes. This will help you filter them out. Returns the attribute plus its value in variable 1. Case insensitive.
/<[^>]*(mce_[^=]+=(?:\"|\'|)[^\"]*(?:\"|\'|)\s*)/i
Valid File Types
A simple list of file types. Start here and eliminate/add what you want.
/\.psd|\.pdf|\.swf|\.sit|\.tar|\.tgz|\.zip|\.gzip|\.bmp|\.gif|\.jpeg|\.jpg|\.jpe|\.png|\.txt|\.doc|\.docx|\.xl|\.xls|\.flv|\.mov|\.qt|\.mpg|\.mpeg|\.mp3|\.aiff|\.aif|\.aac|\.wav|\.ppt|\.rtf|\.html|\.shtml|\.htm|\.php|\.cfm|\.phtml/
Inline Elements
A list of inline elements. A negative match would give you block elements. Case insensitive.
/\ba\b|abbr|acronym|\bb\b|basefont|bdo|big|br|cite|code|dfn|em\b|font|\bi\b|input|kbd|label|\bq\b|\bs\b|samp|select|small|span|strike|strong|sub|textarea|tt|\bu\b|var/i
Email
This is not my own. It has had a long life on the internet before now, but here it is again. Match a proper email pattern.
/^[a-zA-Z0-9,!#\$%&'\*\+\/=\?\^_`\{\|}~-]+(\.[a-z0-9,!#\$%&'\*\+\/=\?\^_`\{\|}~-]+)*@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}$/
Multiple Emails:
I adapted this one from above. Make sure all of the emails in a list of comma separated emails are of the proper pattern.
/([a-zA-Z0-9,!#\$%&'\*\+\/=\?\^_`\{\|}~-]+(\.[a-z0-9,!#\$%&'\*\+\/=\?\^_`\{\|}~-]+)*@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4},*)+$/
Strong Password
More than 5 chars, at least one uppercase and at least one numeral.
/^\w*(?=\w*\d)(?=\w*[a-z])(?=\w*[A-Z])\w{5,}$/
Punctuation – Sentence End
Text too dull? Make every sentence an EXCLAMATION!
/(\?|\!|\.)+$/
Valid Domain
Matches a proper domain pattern.
/[^,\s]+\.{1,}[^,\s]{2,}/
Sub-Domain
Returns the sub-domain of a proper domain.
/^([^\.]+\.)/
No Special Characters
Make sure that a string is URL Friendly, and contains no special characters.
/^[a-zA-Z0-9\.\-,\s]+$/
Do you have any Regular Expressions you find useful?