Chapter 14 - Review

Output

What function is used to match a regular expression? What function is used to find all matches of a regular expression? What function is used to replace matches of a regular expression?

preg_match() is used to find the first match of a regular expression in a subject. It returns either 1 on success or 0 on failure or False on error (use the === operator to determine return value).
preg_match_all() will find all matches of a regular expression in a subject. It will return the number of matches made (0 if none, false on error) Either of the previous functions, if passed a third argument will store any matches from the expression in it as an array.
preg_replace() is used to replace a string with another value if it matches the pattern. It returns all values with any substitutions made.
preg_filter() is similar to preg_replace but it will filter out any non matching values and return only those modified.

What characters can you use and not use to delineate a regular expression?

When using the PCRE functions, it is required that the pattern is enclosed by delimiters. A delimiter can be any non-alphanumeric, non-backslash, non-whitespace character. Often used delimiters are forward slashes (/), hash signs (#) and tildes (~). The following are all examples of valid delimited patterns.
/foo bar/
#^[^0-9]$#
+php+
%[a-zA-Z0-9_-]%
If the delimiter needs to be matched inside the pattern it must be escaped using a backslash. If the delimiter appears often inside the pattern, it is a good idea to choose another delimiter in order to increase readability.

How do you match a literal character or string of characters?

Most characters stand for themselves in a pattern, and match the corresponding characters in the subject.

What are meta-characters? How do you escape a meta-character?

Meta-characters are encoded in the pattern and do not stand for themselves but instead are interpreted in some special way. They include:

\ general escape character with several uses
^ assert start of subject (or line, in multiline mode)
$ assert end of subject or before a terminating newline (or end of line, in multiline mode)
. match any character except newline (by default)
[ start character class definition
] end character class definition
| start of alternative branch
( start subpattern
) end subpattern
? extends the meaning of (, also 0 or 1 quantifier, also makes greedy quantifiers lazy (see repetition)
* 0 or more quantifier
+ 1 or more quantifier
{ start min/max quantifier
} end min/max quantifier

To escape a meta-character, prepend the backslash.

What meta-character do you use to bind a pattern to the beginning of a string? To the end?

The ^ marks the beginning of a string and the $ is used to mark the end.

How do you create subpatterns (aka groupings)?

Subpatterns are contained within parenthesis, which can be nested. Subpatterns allow for localizing a set of alternative strings. It also sets up the subpattern to be capturing which sends the matching string back to the calling function as a numbered match. For example, if the string "the red king" is matched against the pattern the ((red|white) (king|queen)) the captured substrings are "red king", "red", and "king", and are numbered 1, 2, and 3.

What are the quantifiers? How do you require 0 or 1 of a character or string? 0 or more? 1 or more? Precisely X occurrences? A range of occurrences? A minimum of occurrences?

The general repetition quantifier specifies a minimum and maximum number of permitted matches, by giving the two numbers in curly brackets (braces), separated by a comma. Common quantifiers are:

? 0 or 1
* 0 or more
+ 1 or more
{x} Exactly x occurrences
{x,y} Between x and y (inclusive)
{x,} At least x occurrences

What are character classes?

A character class is a set of characters contained within square brackets. Use the ^ to negate the character class. A character class matches a single character in the subject; the character must be in the set of characters defined by the class. For example, the character class [aeiou] matches any lower case vowel, while [^aeiou] matches any character that is not a lower case vowel.

What meta-characters still have meaning within character classes?

The only meta-characters that are useful within a character class are:

\ general escape character
^ negate the class, but only if the first character
- indicates character range

What shortcut represents the “any digit” character class? The “any white space” class? “Any word”? What shortcuts represent the opposite of these?

The most common character classes have shortcuts to represent their represented characters.

/d any digit, [0-9]
/w any word, [A-Za-z0-9_]
/s any whitespace, [\f\r\t\n\v]
/D NOT any digit, [^0-9]
/W NOT any word, [^A-Za-z0-9_]
/S NOT any whitespace, [^\f\r\t\n\v]

What are boundaries? How do you create boundaries in patterns?

A boundary in a match can be represented by the /b shortcut. It allows for isolating words based on values represented by the /w shortcut. For non-word boundaries use /B. For example, the pattern \bfor\b matches they've come for you but doesn’t match force or forebode and \bfor\B would match force but not informal.

How do you make matches “lazy”? And what does that mean anyway?

By default, matches in regular expressions are greedy, meaning they will match the largest grouping possible, to make them lazy (match the shortest groupings possible) you can use the ? after the repetition quantifier or insert a negative character class.

For example:
test using <.+> will return test
Instead use either <.+?> or <[^>].> to return ,

What are the pattern modifiers?

Pattern modifiers control the universal behavior for the pattern. They are placed after the closing delimiter. Some examples include:

A Anchors the pattern to the beginning of the string
i Enables case-insensitive mode
m Enables multiline matching
s Has the period match every character, including newline
x Ignores most white space
U Performs a non-greedy match

What is back referencing? How does it work?

Backreferences match the same text as previously matched by a capturing group. Suppose you want to match a pair of opening and closing HTML tags, and the text ,in between. By putting the opening tag into a backreference, we can reuse the name of the tag for the closing tag.
Here's how: <([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1>.
This regex contains only one pair of parentheses, which capture the string matched by [A-Z][A-Z0-9]*. The backreference \1 (backslash one) references the first capturing group. \1 matches the exact same text that was matched by the first capturing group. The / before it is a literal character. It is simply the forward slash in the closing HTML tag that we are trying to match.
You can reuse the same backreference more than once. ([a-c])x\1x\1 matches axaxa, bxbxb and cxcxc.

Source

<?php
$review = array(
	1 => array(
		'q' =>'What function is used to match a regular expression? What function is used to find all matches of
		 a regular expression? What function is used to replace matches of a regular expression?',
		'a' =>'<p><b>preg_match()</b> is used to find the first match of a regular expression in a subject. It returns either 
		1 on success or 0 on failure or False on error (use the === operator to determine return value).<br/><b>preg_match_all()</b>
		 will find all matches of a regular expression in a subject. It will return the number of matches made (0 if none, false on error)
		  Either of the previous functions, if passed a third argument will store any matches from the expression in it as an array.
		  <br/><b>preg_replace()</b> is used to replace a string with another value if it matches the pattern. It returns all values with any 
		  substitutions made.<br/><b>preg_filter()</b> is similar to preg_replace but it will filter out any non matching values and return only 
		  those modified.</p>'
	),
	2 => array(
		'q' =>'What characters can you use and not use to delineate a regular expression?',
		'a' =>'<p> When using the PCRE functions, it is required that the pattern is enclosed by delimiters. A delimiter can be any
		 non-alphanumeric, non-backslash, non-whitespace character. Often used delimiters are forward slashes (/), hash signs (#) 
		 and tildes (~). The following are all examples of valid delimited patterns.<br />/foo bar/<br />#^[^0-9]$#<br />+php+<br />%[a-zA-Z0-9_-]%<br />
		If the delimiter needs to be matched inside the pattern it must be escaped using a backslash. If the delimiter appears 
		often inside the pattern, it is a good idea to choose another delimiter in order to increase readability.</p>'
	),
	3 => array(
		'q' =>'How do you match a literal character or string of characters?',
		'a' =>'<p>Most characters stand for themselves in a pattern, and match the corresponding characters in the subject. </p>'
	),
	4 => array(
		'q' =>'What are meta-characters? How do you escape a meta-character?',
		'a' =>'<p>Meta-characters are encoded in the pattern and do not stand for themselves but instead are interpreted
		 in some special way. They include:<ul>
		<li><b>\</b>	general escape character with several uses</li>
		<li><b>^</b>	assert start of subject (or line, in multiline mode)</li>
		<li><b>$</b>	assert end of subject or before a terminating newline (or end of line, in multiline mode)</li>
		<li><b>.</b>	match any character except newline (by default)</li>
		<li><b>[</b>	start character class definition</li>
		<li><b>]</b>	end character class definition</li>
		<li><b>|</b>	start of alternative branch</li>
		<li><b>(</b>	start subpattern</li>
		<li><b>)</b>	end subpattern</li>
		<li><b>?</b>	extends the meaning of (, also 0 or 1 quantifier, also makes greedy quantifiers lazy (see repetition)</li>
		<li><b>*</b>	0 or more quantifier</li>
		<li><b>+</b>	1 or more quantifier</li>
		<li><b>{</b>	start min/max quantifier</li>
		<li><b>}</b>	end min/max quantifier</li></ul> To escape a meta-character, prepend the backslash.</p>'
	),
	5 => array(
		'q' =>' What meta-character do you use to bind a pattern to the beginning of a string? To the end?',
		'a' =>'<p>The <b>^</b> marks the beginning of a string and the <b>$</b> is used to mark the end.</p>'
	),
	6 => array(
		'q' =>'How do you create subpatterns (aka groupings)?',
		'a' =>'<p>Subpatterns are contained within parenthesis, which can be nested. Subpatterns allow for localizing a set of alternative strings.
		It also sets up the subpattern to be capturing which sends the matching string back to the calling function as a numbered match.
		 For example, if the string "the red king" is matched against the pattern the ((red|white) (king|queen)) the captured substrings are "red king", "red", and "king", and are numbered 1, 2, and 3.</p>'
	),
	7 => array(
		'q' =>'What are the quantifiers? How do you require 0 or 1 of a character or string? 0 or more? 1 or more? 
		Precisely X occurrences? A range of occurrences? A minimum of occurrences?',
		'a' =>'<p>The general repetition quantifier specifies a minimum and maximum number of permitted matches, 
		by giving the two numbers in curly brackets (braces), separated by a comma. Common quantifiers are:<ul>
		<li><b>?</b> 0 or 1</li>
		<li><b>*</b> 0 or more</li>
		<li><b>+</b> 1 or more</li>
		<li><b>{x}</b> Exactly x occurrences</li>
		<li><b>{x,y}</b> Between x and y (inclusive)</li>
		<li><b>{x,}</b> At least x occurrences</li></ul></p>'
	),
	8 => array(
		'q' =>'What are character classes? ',
		'a' =>'<p>A character class is a set of characters contained within square brackets. Use the <b>^</b> to negate the character class.
		A character class matches a single character in the subject; the character must be in the set of characters defined by the class.
		For example, the character class <b>[aeiou]</b> matches any lower case vowel, while <b>[^aeiou]</b> matches any character that is not a lower case vowel. </p>'
	),
	9 => array(
		'q' =>'What meta-characters still have meaning within character classes?',
		'a' =>'<p>The only meta-characters that are useful within a character class are: <ul>
		<li><b>\</b>	general escape character</li>
		<li><b>^</b>	negate the class, but only if the first character</li>
		<li><b>-</b>	indicates character range</li></ul></p>'
	),
	10 => array(
		'q' =>'What shortcut represents the “any digit” character class? The “any white space” class? “Any word”?
		 What shortcuts represent the opposite of these?',
		'a' =>'<p>The most common character classes have shortcuts to represent their represented characters.<ul>
		<li><b>/d</b> any digit, [0-9]</li>
		<li><b>/w</b> any word, [A-Za-z0-9_]</li>
		<li><b>/s</b> any whitespace, [\f\r\t\n\v]</li>
		<li><b>/D</b> NOT any digit, [^0-9]</li>
		<li><b>/W</b> NOT any word, [^A-Za-z0-9_]</li>
		<li><b>/S</b> NOT any whitespace, [^\f\r\t\n\v]</li>
		</ul></p>'
	),
	11 => array(
		'q' =>' What are boundaries? How do you create boundaries in patterns?',
		'a' =>'<p>A boundary in a match can be represented by the <b>/b</b> shortcut. It allows for isolating words based on values represented by
		the /w shortcut. For non-word boundaries use <b>/B</b>. For example, the pattern <b>\bfor\b</b> matches they\'ve come for you 
		but doesn’t match force or forebode and <b>\bfor\B</b> would match force but not informal.</p>'
	),
	12 => array(
		'q' =>'How do you make matches “lazy”? And what does that mean anyway?',
		'a' =>'<p>By default, matches in regular expressions are greedy, meaning they will match the largest grouping possible, to make them 
		lazy (match the shortest groupings possible) you can use the ? after the repetition quantifier or insert a negative character class.
		</p>
		<p>For example:<br/>&lt;em&gt;test&lt;/em&gt; using &lt;.+&gt; will return &lt;em&gt;test&lt;/em&gt;<br/>Instead use either &lt;.+?&gt; or &lt;[^&gt;].&gt; to return &lt;em&gt;,&lt;/em&gt;</p>'
	),
	13 => array(
		'q' =>' What are the pattern modifiers?',
		'a' =>'<p>Pattern modifiers control the universal behavior for the pattern. They are placed after the closing delimiter. Some examples include:
		<ul><li><b>A</b> Anchors the pattern to the beginning of the string</li>
		<li><b>i</b> Enables case-insensitive mode</li>
		<li><b>m</b> Enables multiline matching</li>
		<li><b>s</b> Has the period match every character, including newline</li>
		<li><b>x</b> Ignores most white space</li>
		<li><b>U</b> Performs a non-greedy match</li></ul></p>'
	),
	14 => array(
		'q' =>'What is back referencing? How does it work?',
		'a' =>'<p>Backreferences match the same text as previously matched by a capturing group. 
		Suppose you want to match a pair of opening and closing HTML tags, and the text ,in between. By putting the opening 
		tag into a backreference, we can reuse the name of the tag for the closing tag. <br />Here\'s how: 
		&lt;([A-Z][A-Z0-9]*)\b[^&gt;]*&gt;.*?&lt;/\1&gt;. <br/>This regex contains only one pair of parentheses, which capture the string matched 
		by [A-Z][A-Z0-9]*. The backreference \1 (backslash one) references the first capturing group. \1 matches the exact same text
		 that was matched by the first capturing group. The / before it is a literal character. It is simply the forward slash in the
		 closing HTML tag that we are trying to match.<br/>You can reuse the same backreference more than once. ([a-c])x\1x\1 matches axaxa, bxbxb and cxcxc.</p>'
	)
);
include('templates/review.php');
?>

Includes

Includes

PHP Percolate 6

Chapter 14 - Review