Table of Contents

This entry is part 8 of 4 in the series Splunk 101

Regular Expressions: Charsets

Searching for Specific Strings

Use grep 'string' <file> to search for an exact match.
To find patterns rather than exact strings, Regular Expressions (regex) are used.

Charsets in Regex

Definition: Enclosed in [ ], a charset matches any character(s) inside.
Basic Examples:
[abc] → Matches any occurrence of ‘a’, ‘b’, or ‘c’.
[abc]zz → Matches ‘azz’, ‘bzz’, and ‘czz’.
[a-c]zz → Equivalent to [abc]zz.

Using Ranges

a-z → Matches lowercase letters.
A-Z → Matches uppercase letters.
0-9 → Matches any digit.
[a-cx-z]zz → Matches ‘azz’, ‘bzz’, ‘czz’, ‘xzz’, ‘yzz’, ‘zzz’.

Matching and Excluding Patterns

[a-zA-Z] → Matches any single letter (lowercase or uppercase).
file[1-3] → Matches ‘file1’, ‘file2’, ‘file3’.
[^k]ing → Matches ‘ring’, ‘sing’, ‘$ing’, but NOT ‘king’.
[^a-c]at → Matches ‘fat’, ‘hat’, but NOT ‘bat’ or ‘cat’.

Important Notes

Charset vs. String Matching: [abc] matches any occurrence of ‘a’, ‘b’, or ‘c’ in a string, not necessarily “abc” in order.
Order Matters: When specifying charsets, match the given order in the question.
Efficiency in Regex:

Be specific when possible (e.g., [a-c] instead of [a-z] if only ‘a’ to ‘c’ is needed).
Avoid unnecessary complexity (e.g., [a-z] is preferable if many scattered characters are required).

Regular Expressions: Wildcards and Optional Characters

Wildcard Matching (`.` Dot)

. (dot) matches any single character (except line breaks).
Example: a.c matches:
aac, abc, a0c, a!c, etc.

Optional Characters (`?` Question Mark)

? makes the preceding character optional.
Example: abc? matches:
ab (without c)
abc (with c)

Matching a Literal Dot (`\.`)

. is a special character, so to match a literal dot (.), use \..
Example:
a.c matches abc, a@c, a#c, etc.
a\.c matches only a.c.

Regular Expressions: Line Anchors and Grouping

Line Anchors

^ → Matches the start of a line.
Example: ^abc matches lines starting with “abc”.
$ → Matches the end of a line.
Example: xyz$ matches lines ending with “xyz”.

Important Note:

^ has two meanings:
Inside [] brackets: Excludes characters ([^abc] means “not a, b, or c”).
Outside brackets: Specifies the start of a line.

Grouping and Either/Or (|)

Grouping with (): Used to group patterns or repeat patterns.
Either/Or (|): Works like an “OR” condition.
Example: during the (day|night) matches:
- “during the day”
- “during the night”

Repeating Groups

(pattern){n} repeats the pattern n times.
Example: (no){5} matches “nonononono”.

Sometimes it’s very useful to specify that we want to search by a certain pattern in the beginning or the end of a line. We do that with these characters:
^ – starts with
$ – ends with

So for example, if you want to search for a line that starts with abc, you can use ^abc.
If you want to search for a line that ends with xyz, you can use xyz$.

Note: The ^ hat symbol is used to exclude a charset when enclosed in [square brackets], but when it is not, it is used to specify the beginning of a word.

You can also define groups by enclosing a pattern in (parentheses). This function can be used for many ways that are not in the scope of this tutorial. We will use it to define an either/ or pattern, and also to repeat patterns. To say “or” in Regex, we use the | pipe.

For an “either/or” pattern example, the pattern during the (day|night) will match both of these sentences: during the day and during the night.
For a repetition example, the pattern (no){5} will match the sentence nonononono.

Metacharacters

There are easier ways to match bigger charsets. For example, \d is used to match any single digit. Here’s a reference:
\d matches a digit, like 9
\D matches a non-digit, like A or @
\w matches an alphanumeric character, like a or 3
\W matches a non-alphanumeric character, like ! or #
\s matches a whitespace character (spaces, tabs, and line breaks)
\S matches everything else (alphanumeric characters and symbols)

Note: Underscores _ are included in the \w metacharacter and not in \W. That means that \w will match every single character in test_file.

Often we want a pattern that matches many characters of a single type in a row, and we can do that with repetitions. For example, {2} is used to match the preceding character (or metacharacter, or charset) two times in a row. That means that z{2} will match exactly zz.

Here’s a reference for each repetition along with how many times it matches the preceding pattern:

{12} – exactly 12 times.
{1,5} – 1 to 5 times.
{2,} – 2 or more times.
* – 0 or more times.
+ – 1 or more times.

Series Navigation<< Splunk SIEM: Search Processing Language (SPL) BasicsData Manipulation in Splunk: PART I >>

Regular Expressions

Regular Expressions: Charsets

Searching for Specific Strings

Charsets in Regex

Using Ranges

Matching and Excluding Patterns

Important Notes

Regular Expressions: Wildcards and Optional Characters

Wildcard Matching (`.` Dot)

Optional Characters (`?` Question Mark)

Matching a Literal Dot (`\.`)

Regular Expressions: Line Anchors and Grouping

Line Anchors

Important Note:

Grouping and Either/Or (|)

Repeating Groups

Metacharacters

Like this:

Related

Regular Expressions: Charsets

Searching for Specific Strings

Charsets in Regex

Using Ranges

Matching and Excluding Patterns

Important Notes

Regular Expressions: Wildcards and Optional Characters

Wildcard Matching (. Dot)

Optional Characters (? Question Mark)

Matching a Literal Dot (\.)

Regular Expressions: Line Anchors and Grouping

Line Anchors

Important Note:

Grouping and Either/Or (|)

Repeating Groups

Metacharacters

Share this:

Like this:

Related

Wildcard Matching (`.` Dot)

Optional Characters (`?` Question Mark)

Matching a Literal Dot (`\.`)