Regular Expression Code Snippets

Posted by Shaun Geisert Wednesday, June 29, 2011 2:51:00 PM

I know I'm duplicating another page on this site, but it's easier to get here from the home page and I'm lazy.  More regex's are at http://www.regxlib.com

Zip Code

Matches 80523 and 80523-1000

^\d{5}([\-]\d{4})?$

Campus Delivery Code

Matches four digits

^\d{4}$

Phone Number

Matchings phone numbers with or with out extensions, (555) 555-555 and (555) 555-5555 123

^\+?\(?\d+\)?(\s|\-|\.)?\d{1,3}(\s|\-|\.)?\d{4}[\s]*[\d]*$

Email

Matches an email address

([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})

Here's another that conforms to RFC 3696 specs:

^(?:(?:[^@,"\[\]\x5c\x00-\x20\x7f-\xff\.]|\x5c(?=[@,"\[\]\x5c\x00-\x20\x7f-\xff]))(?:[^@,"\[\]\x5c\x00-\x20\x7f-\xff\.]|(?<=\x5c)[@,"\[\]\x5c\x00-\x20\x7f-\xff]|\x5c(?=[@,"\[\]\x5c\x00-\x20\x7f-\xff])|\.(?=[^\.])){1,62}(?:[^@,"\[\]\x5c\x00-\x20\x7f-\xff\.]|(?<=\x5c)[@,"\[\]\x5c\x00-\x20\x7f-\xff])|"(?:[^"]|(?<=\x5c)"){1,62}")@(?:(?:[a-z0-9][a-z0-9-]{1,61}[a-z0-9]\.?)+\.[a-z]{2,6}|\[(?:[0-1]?\d?\d|2[0-4]\d|25[0-5])(?:\.(?:[0-1]?\d?\d|2[0-4]\d|25[0-5])){3}\])$

Website URL

Matches an URL (loosely, URL's are hard to predict)

((https?):((//)|(\\\\))+\w\d:#@%/;$()~_?\+-*)

Another that follows RFC Guidelines

(([\w]+:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?

 

FTP URL

Matches an ftp address

((ftp?):((//)|(\\\\))+\w\d:#@%/;$()~_?\+-*)

CSU ID

Matches a valid CSU ID

[8]\d{8}


Image Validation (eg, using FileUpload control)

Matches a valid jpg, gif, or png file

^.+\.((jpg)|(JPG)|(gif)|(GIF)|(jpeg)|(JPEG)|(png)|(PNG))$


GPA

Matches a valid GPA in US format (0.0 - 4.0)

^[0]|[0-3]\.(\d?\d?)|[4].[0]$
Tools

Here's a link to an easy-to-use (and free) regex tool - http://www.codeproject.com/KB/dotnet/expresso.aspx.   

Here's another to an online regex library - http://regexlib.com

Basic HTML Tag Stripper:

/// <summary>
        /// Strip all html tags, even the clean ones
        /// </summary>
        /// <param name="text"></param>
        /// <returns></returns>
        public static string stripHTML(string text)
        {
            return Regex.Replace(text, @"<(.|\n)*?>", string.Empty);
        }

"Dirty" Word HTML (C# example):

/// <summary>
        /// Strip dirty Word HTML
        /// </summary>
        /// <param name="text"></param>
        /// <returns></returns>
        public static string stripWordHTML(string text)
        {
            ////industrial grade word html cleaner
            //courtesy of http://www.codinghorror.com/blog/2006/01/cleaning-words-nasty-html.html

            StringCollection sc = new StringCollection();
            // get rid of unnecessary tag spans (comments and title)
            sc.Add(@"<!--(\w|\W)+?-->");
            sc.Add(@"<title>(\w|\W)+?</title>");
            // Get rid of classes and styles
            sc.Add(@"\s?class=\w+");
            sc.Add(@"\s+style='[^']+'");
            // Get rid of unnecessary tags
            sc.Add(
            @"<(meta|link|/?o:|/?style|/?div|/?st\d|/?head|/?html|body|/?body|/?span|!\[)[^>]*?>");
            // Get rid of empty paragraph tags
            sc.Add(@"(<[^>]+>)+&nbsp;(</\w+>)+");
            // remove bizarre v: element attached to <img> tag
            sc.Add(@"\s+v:\w+=""[^""]+""");
            // remove extra lines
            sc.Add(@"(\n\r){2,}");
            foreach (string s in sc)
            {
                text = Regex.Replace(text, s, "", RegexOptions.IgnoreCase);
            }

            return text;
        }

Comments are closed on this post.