|
YOUR FEEDBACK
Did you read today's front page stories & breaking news?
SYS-CON.TV |
TOP THREE LINKS YOU MUST CLICK ON General Java Simplify Pattern Matching
Use java.util.regex
By: Anant Athale
Apr. 7, 2005 12:00 AM
Pattern matching using "regular expressions" can help automate a number of text-processing operations like search and replace, input validation, text conversion, and filters. What otherwise requires significant amounts of code can be done in just a few lines with regular expressions because of the powerful underlying regular expressions processing engine. Some programming languages such as Perl and operating systems utilities such as grep have supported regular expressions for a number of years. But before J2SE 1.4, Java (J2SDK) didn't support it and one had to use external packages like Jakarta Regexp, IBM's commercial package (com.ibm.regex). Thankfully that changed with the introduction of the java.util.regex package. The package provides standard implementations for specifying and handling regular expressions. This article will show you how you can quickly use it to implement regular expressions for pattern-based search features. The article starts out by reviewing some important regular expressions fundamentals and then dives into the details of the package. The embedded examples demonstrate the important constructs through simple use cases.
A regular expression is a mechanism to specify a textual pattern and detect the presence of the pattern in a given character sequence. In other words, it's a pattern language. A regular expressions pattern is typically specified as a combination of two types of characters, literals and meta-characters. Literals are normal text characters (a, b, c, 1, 2) while meta-characters (ex. *, $, etc.) convey a special meaning to the regular expression engine discussed in the next few sections. A regular expression engine understands the pattern language. The engine interprets the regular expression, does the pattern match, and processes the results. The language and the engine together make regular expressions a powerful tool that simplifies pattern matching. A given implementation like java.util.regex and JRegex provides additional query and utility functions (replace, split, etc.) that are useful in modifying the target text. For details about other Java implementations and implementations available in other languages, please consult the references section.
Meta-Characters
Anchors
Character Classes, Class Shorthands and Alternation Special class meta-characters such as (-) can be used to specify a range of values, so class [a-z] specifies all letters from a through z. Class shorthand is a simplified representation of commonly used classes such as the class digit (\d), word (\w), whitespace, etc. A list of class shorthands available in Java is shown in Listing 1.
Quantifiers
Mode Modifiers
Example 1: Input Validation Listings 2 and 3 show two possible solutions to the same problem. The first approach (Listing 2) uses the built-in regular expression support inside the java.lang.String matches() method. The second approach (Listing 3) uses the classes provided by the java.util.regex package. The underlying mechanics are the same in either case and are discussed next. I'll leave the API specifics to the next section. Let's see how the solution meets the specified requirements. The regular expression pattern on Line 3 (Listing2) is same as the Patttern pContent (Line5, Lisiting 3). The pattern uses a combination of the meta-characters, namely the character class [a-z], class shorthand (\d shorthand for character class [0-9]), and greedy quantifiers (*, +). When put in a solution context the pattern "\\b(?i)([a-z]*\\d+[a-z]*)\\b" is successful if between the word boundaries, there are 0 or more letters followed by 1 or more digits followed by 0 or more letters. The mode modifier ?i is used to indicate that the search is case-insensitive. Notice that there are a couple of differences in the regular expressions in the two listings. The obvious one is the use of comments in Listing 3. The other difference is more subtle but important, did you find it? Check out the next section (Capturing, Grouping) to verify the answer. The pattern on line 4 (Listing 2) addresses the password-length requirement, using the {min,max} quantifier that imposes minimum and maximum limits on the number of successful matches. In this case a match is successful if "\\b(?i)([a-z0-9]){6,32}\\b" there are between six and 32 instances of alphanumeric characters between the word boundaries. Notice that in Listing 3 the case-insensitive option is specified using the final variables in the class Pattern, which makes the expression more readable. The variables are discussed further in the following sections. LATEST JAVA STORIES & POSTS
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK SPONSORED BY INFRAGISTICS
BREAKING JAVA NEWS
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||