Regular Expressions

This entry is part 8 of 4 in the series Splunk 101

Views: 6

Regular Expressions: Charsets

Searching for Specific Strings

  • Use grep 'string' <file> to search for an exact match.
  • To find patterns rather than exact strings, Regular Expressions (regex) are used.

Charsets in Regex

  • Definition: Enclosed in [ ], a charset matches any character(s) inside.
  • Basic Examples:
  • [abc] → Matches any occurrence of ‘a’, ‘b’, or ‘c’.
  • [abc]zz → Matches ‘azz’, ‘bzz’, and ‘czz’.
  • [a-c]zz → Equivalent to [abc]zz.

Using Ranges

  • a-z → Matches lowercase letters.
  • A-Z → Matches uppercase letters.
  • 0-9 → Matches any digit.
  • [a-cx-z]zz → Matches ‘azz’, ‘bzz’, ‘czz’, ‘xzz’, ‘yzz’, ‘zzz’.

Matching and Excluding Patterns

  • [a-zA-Z] → Matches any single letter (lowercase or uppercase).
  • file[1-3] → Matches ‘file1’, ‘file2’, ‘file3’.
  • [^k]ing → Matches ‘ring’, ‘sing’, ‘$ing’, but NOT ‘king’.
  • [^a-c]at → Matches ‘fat’, ‘hat’, but NOT ‘bat’ or ‘cat’.

Important Notes

  1. Charset vs. String Matching: [abc] matches any occurrence of ‘a’, ‘b’, or ‘c’ in a string, not necessarily “abc” in order.
  2. Order Matters: When specifying charsets, match the given order in the question.
  3. Efficiency in Regex:
  • Be specific when possible (e.g., [a-c] instead of [a-z] if only ‘a’ to ‘c’ is needed).
  • Avoid unnecessary complexity (e.g., [a-z] is preferable if many scattered characters are required).

Regular Expressions: Wildcards and Optional Characters

Wildcard Matching (. Dot)

  • . (dot) matches any single character (except line breaks).
  • Example: a.c matches:
  • aac, abc, a0c, a!c, etc.

Optional Characters (? Question Mark)

  • ? makes the preceding character optional.
  • Example: abc? matches:
  • ab (without c)
  • abc (with c)

Matching a Literal Dot (\.)

  • . is a special character, so to match a literal dot (.), use \..
  • Example:
  • a.c matches abc, a@c, a#c, etc.
  • a\.c matches only a.c.

Regular Expressions: Line Anchors and Grouping

Line Anchors

  • ^ → Matches the start of a line.
  • Example: ^abc matches lines starting with “abc”.
  • $ → Matches the end of a line.
  • Example: xyz$ matches lines ending with “xyz”.

Important Note:

  • ^ has two meanings:
  • Inside [] brackets: Excludes characters ([^abc] means “not a, b, or c”).
  • Outside brackets: Specifies the start of a line.

Grouping and Either/Or (|)

  • Grouping with (): Used to group patterns or repeat patterns.
  • Either/Or (|): Works like an “OR” condition.
  • Example: during the (day|night) matches:
    • “during the day”
    • “during the night”

Repeating Groups

  • (pattern){n} repeats the pattern n times.
  • Example: (no){5} matches “nonononono”.
Series Navigation<< Splunk SIEM: Search Processing Language (SPL) BasicsData Manipulation in Splunk: PART I >>