Views: 6
Regular Expressions: Charsets
Searching for Specific Strings
- Use
grep 'string' <file>
to search for an exact match. - To find patterns rather than exact strings, Regular Expressions (regex) are used.
Charsets in Regex
- Definition: Enclosed in
[ ]
, a charset matches any character(s) inside. - Basic Examples:
[abc]
→ Matches any occurrence of ‘a’, ‘b’, or ‘c’.[abc]zz
→ Matches ‘azz’, ‘bzz’, and ‘czz’.[a-c]zz
→ Equivalent to[abc]zz
.
Using Ranges
a-z
→ Matches lowercase letters.A-Z
→ Matches uppercase letters.0-9
→ Matches any digit.[a-cx-z]zz
→ Matches ‘azz’, ‘bzz’, ‘czz’, ‘xzz’, ‘yzz’, ‘zzz’.
Matching and Excluding Patterns
[a-zA-Z]
→ Matches any single letter (lowercase or uppercase).file[1-3]
→ Matches ‘file1’, ‘file2’, ‘file3’.[^k]ing
→ Matches ‘ring’, ‘sing’, ‘$ing’, but NOT ‘king’.[^a-c]at
→ Matches ‘fat’, ‘hat’, but NOT ‘bat’ or ‘cat’.
Important Notes
- Charset vs. String Matching:
[abc]
matches any occurrence of ‘a’, ‘b’, or ‘c’ in a string, not necessarily “abc” in order. - Order Matters: When specifying charsets, match the given order in the question.
- Efficiency in Regex:
- Be specific when possible (e.g.,
[a-c]
instead of[a-z]
if only ‘a’ to ‘c’ is needed). - Avoid unnecessary complexity (e.g.,
[a-z]
is preferable if many scattered characters are required).
Regular Expressions: Wildcards and Optional Characters
Wildcard Matching (.
Dot)
.
(dot) matches any single character (except line breaks).- Example:
a.c
matches: aac
,abc
,a0c
,a!c
, etc.
Optional Characters (?
Question Mark)
?
makes the preceding character optional.- Example:
abc?
matches: ab
(withoutc
)abc
(withc
)
Matching a Literal Dot (\.
)
.
is a special character, so to match a literal dot (.
), use\.
.- Example:
a.c
matchesabc
,a@c
,a#c
, etc.a\.c
matches onlya.c
.
Regular Expressions: Line Anchors and Grouping
Line Anchors
- ^ → Matches the start of a line.
- Example: ^abc matches lines starting with “abc”.
- $ → Matches the end of a line.
- Example: xyz$ matches lines ending with “xyz”.
Important Note:
- ^ has two meanings:
- Inside [] brackets: Excludes characters ([^abc] means “not a, b, or c”).
- Outside brackets: Specifies the start of a line.
Grouping and Either/Or (|)
- Grouping with (): Used to group patterns or repeat patterns.
- Either/Or (|): Works like an “OR” condition.
- Example: during the (day|night) matches:
- “during the day”
- “during the night”
Repeating Groups
- (pattern){n} repeats the pattern n times.
- Example: (no){5} matches “nonononono”.