Java supports regular expressions like many other programming languages, the syntax for regular expressions across all languages are very simila, below is a table that lists the special characters used for regular expressions.
Special Characters |
|
. character | matches any character except the newline character, the special combination of .* tries to match as much as possible. |
+ character | means one or more of the preceding characters |
[ ] character | enable you to define patterns that match one of a group of alternatives, you can also uses ranges such as [0-9] or [a-z,A-Z] |
* character | match zero or more occurrences of the preceding character |
? character | match zero or one occurrence of the preceding character |
Pattern anchor | there are a number of pattern anchors, match at beginning of a string (^ or \A), match at the end of a string ($ or \Z), match on word boundary (\b) and match inside a work (\B - opposite of \b) |
Escape sequence | if you want to include a character that is normally treated as a special character, you must precede the character with a backslash, you can use the \Q to tell perl to treat everything after as a normal character until it see's \E |
Excluding | you can exclude words or characters by using the ^ inside square brackets [^] |
Character-Range escape sequences | there are special character range escape sequences such as any digit (\d), anything other than a digit (\D) |
Specified number of occurrences | you can define how any occurrences you want to match using the {<minimum>,<maximum>} |
specify choice | the special character | (pipe) enables you to specify two or more alternatives to choose from when matching a pattern |
Portition reuse | some times you want to store what has been matched, you can do this by using (), the first set will be store in \1 (used in pattern matching) or $1 (used when assigning to variables) , the second set \2 or $2 and so on. |
Different delimiter | you can specify a different delimiter |
Special Characters Examples |
|
. character | d.f # could match words like def, dif, duf d.*f # could match words like deaf, deef, def, dzzf, etc |
+ character | de+f # could match words like def, deef, deeef, deeeef, etc + # match words between multiple spaces |
[ ] character | d[eE]f # match words def or dEf d[a-z]f # match words like def, def, dzf, dsf, etc |
* character | de*f # match words like df, def, deef, deeef, etc |
? character | de?f # match only the words df and def (not deef only matches one occurence) |
Pattern anchors | ^hello # match only if line starts with hello \Bdef # matches abcdef (opposite of \b) |
Escape sequence | \+salary # will match the word +salary, the + (plus) is treated as a normal character because of the \ \Q**++\E # will match **++ |
Excluding | d[^eE]f # 1st character is d, 2nd character is anything other than e or E, last character is f |
Character-Range escape sequences | \d # match any digit \d+ # match any number of digits |
Specified number of occurrences | de{3}f # match only deeeef the {3} means three preceding e's de{1,3} # match only deef, deeef and deeeef ( minimum = 1, maximum = 3 occurrences) |
specify choice | def|ghi # match either def or ghi |
The simplest way to learn Java and regular expressions is to show some examples as they are not very difficult to learn
Basic Example | Pattern pat; Matcher mat; boolean found; pat = Pattern.compile("Java"); mat = pat.matcher("Java"); found = mat.matches(); // check for a match System.out.println("Testing Java against Java."); if(found) System.out.println("Matches"); else System.out.println("No Match"); System.out.println(); System.out.println("Testing Java against Java 8."); mat = pat.matcher("Java 8"); // create a new matcher found = mat.matches(); // check for a match if(found) System.out.println("Matches"); else System.out.println("No Match"); |
Using a qualifier | Pattern pat = Pattern.compile("W+"); Matcher mat = pat.matcher("W WW WWW"); while(mat.find()) System.out.println("Match: " + mat.group()); |
Using a wildcard and qualifier | Pattern pat = Pattern.compile("e.+d"); Matcher mat = pat.matcher("extend cup end table"); while(mat.find()) System.out.println("Match: " + mat.group()); |
Using the ? qualifier | // Use reluctant matching behavoir. Pattern pat = Pattern.compile("e.+?d"); Matcher mat = pat.matcher("extend cup end table"); while(mat.find()) System.out.println("Match: " + mat.group()); |
Using replaceAll() | String str = "Jon Jonathan Frank Ken Todd"; Pattern pat = Pattern.compile("Jon.*? "); Matcher mat = pat.matcher(str); System.out.println("Original sequence: " + str); str = mat.replaceAll("Eric "); System.out.println("Modified sequence: " + str); |
Using split() | // Match lowercase words. Pattern pat = Pattern.compile("[ ,.!]"); String strs[] = pat.split("one two,alpha9 12!done."); for(int i=0; i < strs.length; i++) System.out.println("Next token: " + strs[i]); |