Regular expressions
http://yourbasic.org/golang/regexp-cheat-sheet/
A regular expression is a sequence of characters that define a search pattern.
- Basics
- Compile
- Raw strings
- Cheat sheet
- Choice and grouping
- Repetition
- Character classes
- Special characters
- Text boundary anchors
- Code examples
- First match
- Location
- All matches
- Replace
- Split
- Implementation
Basics
The regular expression a.b matches any string that starts with an a, ends with a b, and has a single character in between (the period matches any character).
To check if there is a substring matching a.b, use the regexp.MatchString
function.
matched, err := regexp.MatchString(`a.b`, "aaxbb")
fmt.Println(matched) // true
fmt.Println(err) // nil (regexp is valid)
To check if a full string matches a.b, anchor the start and the end of the regexp:
- the caret ^ matches the beginning of a text or line,
- the dollar sign $ matches the end of a text.
Similarly, we can check if a string starts with or ends with a pattern by using only the start or end anchor.matched, _ := regexp.MatchString(`^a.b$`, "aaxbb")
fmt.Println(matched) // false
Compile
For more complicated queries, you should compile a regular expression to create a Regexp object. There are two options:
re1, err := regexp.Compile(`regexp`) // error if regexp invalid
re2 := regexp.MustCompile(`regexp`) // panic if regexp invalid
Raw strings
It’s convenient to use raw strings
when writing regular expressions – both ordinary string literals and regular expressions use backslashes for special characters.
Cheat sheet
Choice and grouping
Repetition
Character classes
Special characters
To match a special character \^$.|?*+-[]{}()
literally, escape it with a backslash. For example { matches an opening brace symbol.
Other escape sequences are:
Text boundary anchors
Code examples
First match
Use the FindString method to find the text of the first match. If there is no match, the return value is an empty string.
re := regexp.MustCompile(`foo.?`)
fmt.Printf("%q\n", re.FindString("seafood fool")) // "food"
fmt.Printf("%q\n", re.FindString("meat")) // ""
Location
Use the FindStringIndex method to find loc, the location of the first match, in a string s. The match is at s[loc[0]:loc[1]]. A return value of nil indicates no match.
re := regexp.MustCompile(`ab?`)
fmt.Println(re.FindStringIndex("tablett")) // [1 3]
fmt.Println(re.FindStringIndex("foo") == nil) // true
All matches
Use the FindAllString method to find the text of all matches. A return value of nil indicates no match.
The method takes an integer argument n; if n >= 0
, the function returns at most n matches.
re := regexp.MustCompile(`a.`)
fmt.Printf("%q\n", re.FindAllString("paranormal", -1)) // ["ar" "an" "al"]
fmt.Printf("%q\n", re.FindAllString("paranormal", 2)) // ["ar" "an"]
fmt.Printf("%q\n", re.FindAllString("graal", -1)) // ["aa"]
fmt.Printf("%q\n", re.FindAllString("none", -1)) // [] (nil slice)
Replace
Use the ReplaceAllString method to replace the text of all matches. It returns a copy, replacing all matches of the regexp with a replacement string.
re := regexp.MustCompile(`ab*`)
fmt.Printf("%q\n", re.ReplaceAllString("-a-abb-", "T")) // "-T-T-"
Split
Use the Split method to slice a string into substrings separated by the regexp. It returns a slice of the substrings between those expression matches. A return value of nil indicates no match.
The method takes an integer argument n; if n >= 0
, the function returns at most n matches.
a := regexp.MustCompile(`a`)
fmt.Printf("%q\n", a.Split("banana", -1)) // ["b" "n" "n" ""]
fmt.Printf("%q\n", a.Split("banana", 0)) // [] (nil slice)
fmt.Printf("%q\n", a.Split("banana", 1)) // ["banana"]
fmt.Printf("%q\n", a.Split("banana", 2)) // ["b" "nana"]
zp := regexp.MustCompile(`z+`)
fmt.Printf("%q\n", zp.Split("pizza", -1)) // ["pi" "a"]
fmt.Printf("%q\n", zp.Split("pizza", 0)) // [] (nil slice)
fmt.Printf("%q\n", zp.Split("pizza", 1)) // ["pizza"]
fmt.Printf("%q\n", zp.Split("pizza", 2)) // ["pi" "a"]
More functions
There are 16 functions following the naming pattern
Find(All)?(String)?(Submatch)?(Index)?
For example: Find, FindAllString, FindStringIndex, …
Implementation
- The regexp package implements regular expressions with RE2 syntax.
- It supports UTF-8 encoded strings and Unicode character classes.
- The implementation is very efficient: the running time is linear in the size of the input.
- Backreferences are not supported since they cannot be efficiently implemented.
Further reading
Regular expression matching can be simple and fast (but is slow in Java, Perl, PHP, Python, Ruby, …).
ft_update_time2018-03-20 14:00