A quick comparison between different Go file walk implementations

http://www.boyter.org/2018/03/quick-comparison-go-file-walk-implementations/

Whats the fastest way to get all the names of all files in a directory using Go? I had a feeling that the native walk might not be the fastest way to do it. A quick search showed that several projects claimed to be faster. Since the application I am currently working on needs a high performance scanner I thought I would try the main ones out.

Note that I have updated the code and the results based on feedback from reddit. The first change is I set it to just count the files rather than print the output to avoid measuring output buffering. I did do this before but noticed that while running in hyperfine it made no difference. I updated it anyway to avoid this being called into question again. The second was based on feedback from the godirwalk author. Setting the “unsorted” true option manages to pull another ~150ms of speed out of the bag which is perfect for me. Since the goroutine implementations have the same sorting issue (as far as I can see) it seemed fair to turn it on.

  1. package main
  2. import (
  3. "fmt"
  4. "os"
  5. "path/filepath"
  6. )
  7. func main() {
  8. count := 0
  9. filepath.Walk("./", func(root string, info os.FileInfo, err error) error {
  10. if err != nil {
  11. return err
  12. }
  13. count++
  14. return nil
  15. })
  16. fmt.Println(count)
  17. }
  1. package main
  2. import (
  3. "fmt"
  4. "github.com/MichaelTJones/walk"
  5. "os"
  6. )
  7. func main() {
  8. count := 0
  9. walk.Walk("./", func(root string, info os.FileInfo, err error) error {
  10. if err != nil {
  11. return err
  12. }
  13. count++
  14. return nil
  15. })
  16. fmt.Println(count)
  17. }
  1. package main
  2. import (
  3. "fmt"
  4. "github.com/iafan/cwalk"
  5. "os"
  6. )
  7. func main() {
  8. count := 0
  9. cwalk.Walk("./", func(root string, info os.FileInfo, err error) error {
  10. if err != nil {
  11. return err
  12. }
  13. count++
  14. return nil
  15. })
  16. fmt.Println(count)
  17. }
  18. package main
  19. import (
  20. "fmt"
  21. "github.com/karrick/godirwalk"
  22. )
  1. func main() {
  2. count := 0
  3. godirwalk.Walk("./", &godirwalk.Options{
  4. Unsorted: true,
  5. Callback: func(osPathname string, de *godirwalk.Dirent) error {
  6. count++
  7. return nil
  8. },
  9. ErrorCallback: func(osPathname string, err error) godirwalk.ErrorAction {
  10. return godirwalk.SkipNode
  11. },
  12. })
  13. fmt.Println(count)
  14. }

And the results. All were run in the WSL for Linux on a Surface Book 2 against a recent checkout of the Linux kernel with there being 67359 files in the directory.

  1. $ hyperfine './cwalk' && hyperfine './godirwalk' && hyperfine './nativewalk' && hyperfine './walk'
  2. Benchmark #1: ./cwalk
  3. Time (mean ± σ): 1.812 s ± 0.059 s [User: 368.4 ms, System: 6545.8 ms]
  4. Range (min max): 1.753 s 1.934 s
  5. Benchmark #1: ./godirwalk
  6. Time (mean ± σ): 695.9 ms ± 16.7 ms [User: 73.0 ms, System: 619.2 ms]
  7. Range (min max): 671.2 ms 725.6 ms
  8. Benchmark #1: ./nativewalk
  9. Time (mean ± σ): 3.896 s ± 0.489 s [User: 153.0 ms, System: 3757.4 ms]
  10. Range (min max): 3.560 s 5.034 s
  11. Benchmark #1: ./walk
  12. Time (mean ± σ): 1.674 s ± 0.071 s [User: 399.7 ms, System: 6383.3 ms]
  13. Range (min max): 1.571 s 1.769 s

For comparison ripgrep which is probably the fastest disk scanner comes in at ~600ms. That is not a fair comparison though as it ignores certain directories but it gives you an idea of the upper bounds of useful performance.

Turns out that the native implementation that ships with Go is indeed the slowest. The fastest by a long shot is godirwalk however. It is at least 2x as fast as the next quickest implementation. So if bleeding performance matters it would seem that using godirwalk is the best option. If however you want a drop in replacement for some additional speed I would suggest going with cwalk or walk. Of course if you aren’t scanning the linux kernel its hard to go wrong with even the native implementation which is generally fast enough for most cases.

ft_authoradmin  ft_create_time2018-03-20 13:34
 ft_update_time2018-03-20 13:36