搭建你自己的Go Runtime metrics环境

https://tonybai.com/2017/07/04/setup-go-runtime-metrics-for-yourself/
七月 4, 2017 0 条评论

自从Go 1.5开始,每次Go release, Gopher Brian Hatfield都会将自己对新版Go的runtime的性能数据(与之前Go版本的比较)在twitter上晒出来。就连Go team staff在世界各地做speaking时也在slide中引用Brian的图片。后来,Brian Hatfield将其用于度量runtime性能数据的代码打包成library并放在github上开源了,我们也可以使用这个library来建立我们自己的Go Runtime metrics设施了。这里简要说一下搭建的步骤。

一、环境与原理

Brian Hatfield的go-runtime-metrics library实现的很简单,其runtime data来自于Go runtime package中的MemStats、NumGoroutine和NumCgoCall等。被测试目标程序只需要import该library即可输出runtime states数据:

  1. import _ "github.com/bmhatfield/go-runtime-metrics"

go-runtime-metrics library将启动一个单独的goroutine,并定时上报runtime数据。目前该library仅支持向statsD输出数据,用户可以通过配置将statsD的数据导入graphite并使用graphite web查看,流程如下图:

img{512x368}

本次实验环境为ubuntu 16.04.1:

  1. $ uname -rmn
  2. tonybai-ThinkCentre-M6600s-N000 4.4.0-83-generic x86_64

二、搭建步骤

1、安装go-runtime-metrics library

我们直接go get就可以下载go-runtime-metrics library:

  1. $ go get github.com/bmhatfield/go-runtime-metrics

我们编写一个目标程序:

  1. //main.go
  2. package main
  3. import (
  4. "flag"
  5. "log"
  6. "net/http"
  7. "os"
  8. _ "github.com/bmhatfield/go-runtime-metrics"
  9. )
  10. func main() {
  11. flag.Parse()
  12. cwd, err := os.Getwd()
  13. if err != nil {
  14. log.Fatal(err)
  15. }
  16. srv := &http.Server{
  17. Addr: ":8000", // Normally ":443"
  18. Handler: http.FileServer(http.Dir(cwd)),
  19. }
  20. log.Fatal(srv.ListenAndServe())
  21. }

我的ubuntu主机上安装了四个go版本,它们分别是go 1.5.4、go 1.7.6、go 1.8.3和go1.9beta2,于是我们分别用这四个版本的server作为被测程序进行go runtime数据上报,以便对比。

  1. $ GOROOT=~/.bin/go154 ~/.bin/go154/bin/go build -o server-go154 main.go
  2. $ GOROOT=~/.bin/go174 ~/.bin/go174/bin/go build -o server-go174 main.go
  3. $ GOROOT=~/.bin/go183 ~/.bin/go183/bin/go build -o server-go183 main.go
  4. $ GOROOT=~/.bin/go19beta2 ~/.bin/go19beta2/bin/go build -o server-go19beta2 main.go
  5. $ ls -l
  6. -rwxr-xr-x 1 tonybai tonybai 6861176 7 4 13:49 server-go154
  7. -rwxrwxr-x 1 tonybai tonybai 5901876 7 4 13:50 server-go174
  8. -rwxrwxr-x 1 tonybai tonybai 6102879 7 4 13:51 server-go183
  9. -rwxrwxr-x 1 tonybai tonybai 6365648 7 4 13:51 server-go19beta2

2、安装、配置和运行statsD

statsD这个工具用于收集统计信息,并将聚合后的信息发给后端服务(比如:graphite)。statsD是采用js实现的服务,因此需要安装nodejsnpm和相关modules:

  1. $ sudo apt-get install nodejs
  2. $ sudo apt-get install npm

接下来,我们将statsD项目clone到本地并根据exampleConfig.js模板配置一个我们自己用的goruntimemetricConfig.js(基本上就是保留默认配置):

  1. // goruntimemetricConfig.js
  2. {
  3. graphitePort: 2003
  4. , graphiteHost: "127.0.0.1"
  5. , port: 8125
  6. , backends: [ "./backends/graphite" ]
  7. }

启动statsD:

  1. $ nodejs stats.js goruntimemetricConfig.js
  2. 3 Jul 11:14:20 - [7939] reading config file: goruntimemetricConfig.js
  3. 3 Jul 11:14:20 - server is up INFO

启动成功!

3、安装、配置和运行graphite

graphite是一种存储时序监控数据,并可以按用户需求以图形化形式展示数据的工具,它包括三个组件:

whisper是一种基于file的时序数据库格式,同时whisper也提供了相应的命令和API供其他组件调用以操作时序数据库;

carbon用于读取外部推送的metrics信息,进行聚合并写入db,它还支持缓存热点数据,提升访问效率。

graphite-web则是针对用户的图形化系统,用于定制展示监控数据的。

Graphite的安装和配置是略微繁琐的,我们一步一步慢慢来。

a) 安装graphite

  1. $sudo apt-get install graphite-web graphite-carbon
  2. whisper将作为依赖自动被安装。

b) local_settings.py

graphite的主配置文件在/etc/graphite/local_settings.py,文件里面有很多配置项,这里仅列出有关的,且本次生效的配置:

  1. // /etc/graphite/local_settings.py
  2. TIME_ZONE = 'Asia/Shanghai'
  3. LOG_RENDERING_PERFORMANCE = True
  4. LOG_CACHE_PERFORMANCE = True
  5. LOG_METRIC_ACCESS = True
  6. GRAPHITE_ROOT = '/usr/share/graphite-web'
  7. CONF_DIR = '/etc/graphite'
  8. STORAGE_DIR = '/var/lib/graphite/whisper'
  9. CONTENT_DIR = '/usr/share/graphite-web/static'
  10. WHISPER_DIR = '/var/lib/graphite/whisper'
  11. LOG_DIR = '/var/log/graphite'
  12. INDEX_FILE = '/var/lib/graphite/search_index' # Search index file
  13. DATABASES = {
  14. 'default': {
  15. 'NAME': '/var/lib/graphite/graphite.db',
  16. 'ENGINE': 'django.db.backends.sqlite3',
  17. 'USER': '',
  18. 'PASSWORD': '',
  19. 'HOST': '',
  20. 'PORT': ''
  21. }
  22. }

c) 同步数据库

接下来执行下面两个命令来做database sync(同步):

  1. $ sudo graphite-manage migrate auth
  2. .. ....
  3. Operations to perform:
  4. Apply all migrations: auth
  5. Running migrations:
  6. Rendering model states... DONE
  7. Applying contenttypes.0001_initial... OK
  8. Applying contenttypes.0002_remove_content_type_name... OK
  9. Applying auth.0001_initial... OK
  10. Applying auth.0002_alter_permission_name_max_length... OK
  11. Applying auth.0003_alter_user_email_max_length... OK
  12. Applying auth.0004_alter_user_username_opts... OK
  13. Applying auth.0005_alter_user_last_login_null... OK
  14. Applying auth.0006_require_contenttypes_0002... OK
  15. $ sudo graphite-manage syncdb
  16. Operations to perform:
  17. Synchronize unmigrated apps: account, cli, render, whitelist, metrics, url_shortener, dashboard, composer, events, browser
  18. Apply all migrations: admin, contenttypes, tagging, auth, sessions
  19. Synchronizing apps without migrations:
  20. Creating tables...
  21. Creating table account_profile
  22. Creating table account_variable
  23. Creating table account_view
  24. Creating table account_window
  25. Creating table account_mygraph
  26. Creating table dashboard_dashboard
  27. Creating table events_event
  28. Creating table url_shortener_link
  29. Running deferred SQL...
  30. Installing custom SQL...
  31. Running migrations:
  32. Rendering model states... DONE
  33. Applying admin.0001_initial... OK
  34. Applying sessions.0001_initial... OK
  35. Applying tagging.0001_initial... OK
  36. You have installed Django's auth system, and don't have any superusers defined.
  37. Would you like to create one now? (yes/no): yes
  38. Username (leave blank to use 'root'):
  39. Email address: xx@yy.com
  40. Password:
  41. Password (again):
  42. Superuser created successfully.

这里我们创建一个superuser:root,用于登录graphite-web时使用。

d) 配置carbon

涉及carbon的配置文件如下,我们保持默认配置不动:

  1. /etc/carbon/carbon.conf(内容太多,这里不列出来了)
  2. /etc/carbon/storage-schemas.conf
  3. [carbon]
  4. pattern = ^carbon\.
  5. retentions = 60:90d
  6. [default_1min_for_1day]
  7. pattern = .*
  8. retentions = 60s:1d
  9. [stats]
  10. pattern = ^stats.*
  11. retentions = 10s:6h,1min:6d,10min:1800d

carbon有一个cache功能,我们通过下面步骤可以将其打开:

  1. 打开carbon-cache使能开关:
  2. $ vi /etc/default/graphite-carbon
  3. CARBON_CACHE_ENABLED=true
  4. 启动carbon-cache
  5. $ sudo cp /usr/share/doc/graphite-carbon/examples/storage-aggregation.conf.example /etc/carbon/storage-aggregation.conf
  6. $ systemctl start carbon-cache

e) 启动graphite-web

graphite-web支持多种主流web server,这里以apache2为例,graphite-web将mod-wsgi方式部署在apache2下面:

  1. $sudo apt-get install apache2 libapache2-mod-wsgi
  2. $ sudo service apache2 start
  3. $ sudo a2dissite 000-default
  4. Site 000-default disabled.
  5. $ sudo service apache2 reload
  6. $ sudo cp /usr/share/graphite-web/apache2-graphite.conf /etc/apache2/sites-available
  7. $ sudo a2ensite apache2-graphite
  8. Enabling site apache2-graphite.
  9. To activate the new configuration, you need to run:
  10. service apache2 reload
  11. $ sudo systemctl reload apache2

由于apache2的Worker process默认以www-data:www-data用户权限运行,但数据库文件的访问权限却是:_graphite:_graphite:

  1. $ ll /var/lib/graphite/graphite.db
  2. -rw-r--r-- 1 _graphite _graphite 72704 7 3 13:48 /var/lib/graphite/graphite.db

我们需要修改一下apache worker的user:

  1. $ sudo vi /etc/apache2/envvars
  2. export APACHE_RUN_USER=_graphite
  3. export APACHE_RUN_GROUP=_graphite

重启apache2生效!使用Browser打开:http://127.0.0.1,如无意外,你将看到下面graphite-web的首页:

img{512x368}

三、执行benchmarking

这里我将使用wrk这个http benchmarking tool分别对前面的四个版本的目标程序(server-go154 server-go174 server-go183 server-go19beta2)进行benchmarking test,每个目标程序接收10分钟的请求:

  1. $ ./server-go154
  2. $ wrk -t12 -c400 -d10m http://127.0.0.1:8000
  3. $ ./server-go174
  4. $ wrk -t12 -c400 -d10m http://127.0.0.1:8000
  5. $ ./server-go183
  6. $ wrk -t12 -c400 -d10m http://127.0.0.1:8000
  7. $ ./server-go19beta2
  8. $ wrk -t12 -c400 -d10m http://127.0.0.1:8000

四、结果展示

用浏览器打开graphite-web,在左边的tree标签下以此打开树形结构:Metrics -> stats -> gauges -> go -> YOUR_HOST_NAME -> mem -> gc -> pause,如果顺利的话,你将会在Graphite Composer窗口看到折线图,我们也以GC pause为例,GC pause也是gopher们最为关心的:

img{512x368}

通过这幅图(左侧坐标轴的单位为nanoseconds),我们大致可以看出:

Go 1.5.4的GC pause约在600μs左右;
Go 1.7.4的GC pause约在300μs左右;
Go 1.8.3和Go 1.9beta2的GC pause基本都在100μs以下了。Go 1.9的GC改进似乎不大。不过这里我的程序也并不足够典型。

其他结果:

Go routines number:

img{512x368}

GC count:

img{512x368}

memory allocations:

img{512x368}

除了查看单个指标曲线,你也可以通过graphite-web提供的dashboard功能定制你要monitor的面板,这里就不赘述了。

五、参考资料


微博:@tonybai_cn
微信公众号:iamtonybai
github.com: https://github.com/bigwhite

© 2017, bigwhite. 版权所有.

Related posts:

  1. 制作go-talks.appspot.com应用镜像
  2. Go 1.6中值得关注的几个变化
  3. Go 1.5中值得关注的几个变化
  4. Go 1.7中值得关注的几个变化
  5. 使用Apache2配置多个站点
ft_authoradmin  ft_create_time2019-06-22 12:28
 ft_update_time2019-06-22 12:28