Prometheus监控有所思:多标签埋点及Mbean
使用 grafana+prometheus+jmx 作为普通的监控手段,是比较有用的。我之前的文章介绍了相应的实现办法。https://www.cnblogs.com/yougewe/p/11140129.html
但是,按照之前的实现,我们更多的只能是监控 单值型的数据,如请求量,tps 等等,对于复杂组合型的指标却不容易监控。
这种情况一般带有一定的业务属性,比如想监控mq中的每个topic的消费情况,每类产品的实时订单情况等等。当然,对于看过完整的 prometheus 的监控数据的同学来说,会觉得很正常,因为你会看到如下的数据:
# HELP java_lang_MemoryPool_PeakUsage_max java.lang.management.MemoryUsage (java.lang<type=MemoryPool, name=Metaspace><PeakUsage>max)# TYPE java_lang_MemoryPool_PeakUsage_max untypedjava_lang_MemoryPool_PeakUsage_max{name="Metaspace",} -1.0java_lang_MemoryPool_PeakUsage_max{name="PS Old Gen",} 1.415053312E9java_lang_MemoryPool_PeakUsage_max{name="PS Eden Space",} 6.96778752E8java_lang_MemoryPool_PeakUsage_max{name="Code Cache",} 2.5165824E8java_lang_MemoryPool_PeakUsage_max{name="Compressed Class Space",} 1.073741824E9java_lang_MemoryPool_PeakUsage_max{name="PS Survivor Space",} 5242880.0
这里面的 name 就是普通标签嘛,同理于其他埋点咯。应该是可以实现的。
是的,prometheus 是方便实现这玩意的,但是我们之前不是使用 jmx_exportor 作为导出工具嘛,使用的埋点组件是 io.dropwizard.metrics:metrics-core 。
而它则是重在单值的监控,所以,用它我们是实现不了带指标的数据的监控了。
那怎么办呢?三个办法!
1. 直接替换原有的 metrics-core 组件为 prometheus 的client 组件,因为官方是支持这种操作的;
2. 使用 prometheus-client 组件与 metrics-core 组件配合,各自使用各自的功能;
3. 自行实现带标签的埋点,这可能是基于 MBean 的;
以上这几种方案,各有优劣。方案1可能改动太大,而且可能功能不兼容不可行; 方案2可能存在整合不了或者功能冲突情况,当然如果能整合,绝对是最好的; 方案3实现复杂度就高了,比如监控值维护、线程安全、MBean数据吐出方式等等。
好吧,不管怎么样,我们还是都看看吧。
一、 使用 prometheus-client 埋点实现带标签的监控
1. 引入 pom 依赖
<dependency><groupId>io.prometheus</groupId><artifactId>simpleclient</artifactId><version>0.8.0</version></dependency><dependency><groupId>io.prometheus</groupId><artifactId>simpleclient_hotspot</artifactId><version>0.8.0</version></dependency><dependency><groupId>io.prometheus</groupId><artifactId>simpleclient_servlet</artifactId><version>0.8.0</version></dependency>
2. 框架注册监控
@Configurationpublic class PrometheusConfig {@Beanpublic ServletRegistrationBean servletRegistrationBean(){// 将埋点指标吐出到 /metrics 节点return new ServletRegistrationBean(new MetricsServlet(), "/metrics");}}
3. 业务埋点数据
// 注册指标实例io.prometheus.client.Counter c = io.prometheus.client.Counter.build().name("jmx_test_abc_ffff").labelNames("topic").help("topic counter usage.").register();public void incTopicMetric(String topic) {// c.labels("test").inc(); // for test}
4. 获取埋点数据信息
curl http://localhost:8080/metrics# 对外暴露http接口调用,结果如下# HELP jmx_test_abc_ffff counter usage.# TYPE jmx_test_abc_ffff counterjmx_test_abc_ffff{topic="bbb",} 1.0jmx_test_abc_ffff{topic="2",} 2.0jmx_test_abc_ffff{topic="test",} 1.0
可以看出,效果咱们是实现了。但是,对于已经运行的东西,要改这玩意可能不是那么友好。主要有以下几点:
1. 暴露数据方式变更,原来由javaagent进行统一处理的数据,现在可能由于应用端口的不一,导致收集的配置会变更,不一定符合运维场景;
2. 需要将原来的埋点进行替换;
二、 prometheus-client 与 metrics-core 混合埋点
不处理以前的监控,将新监控带标签数据吐入到 jmx_exportor 中。
我们试着使用如上的埋点方式:
// 注册指标实例io.prometheus.client.Counter c = io.prometheus.client.Counter.build().name("jmx_test_abc_ffff").labelNames("topic").help("topic counter usage.").register();public void incTopicMetric(String topic) {// c.labels("test").inc(); // for test}
好像数据是不会进入的到 jmx_exportor 的。这也不奇怪,毕竟咱们也不了解其原理,难道想靠运气取胜?? 细去查看 metrics-core 组件的埋点实现方案,发现其是向 MBean 中吐入数据,从而被 jmx_exportor 抓取的。
// com.codahale.metrics.jmx.JmxReporter.JmxListener#onCounterAdded@Overridepublic void onCounterAdded(String name, Counter counter) {try {if (filter.matches(name, counter)) {final ObjectName objectName = createName("counters", name);registerMBean(new JmxCounter(counter, objectName), objectName);}} catch (InstanceAlreadyExistsException e) {LOGGER.debug("Unable to register counter", e);} catch (JMException e) {LOGGER.warn("Unable to register counter", e);}}// 向 mBeanServer 注册监控实例// 默认情况下 mBeanServer = ManagementFactory.getPlatformMBeanServer();private void registerMBean(Object mBean, ObjectName objectName) throws InstanceAlreadyExistsException, JMException {ObjectInstance objectInstance = mBeanServer.registerMBean(mBean, objectName);if (objectInstance != null) {// the websphere mbeanserver rewrites the objectname to include// cell, node & server info// make sure we capture the new objectName for unregistrationregistered.put(objectName, objectInstance.getObjectName());} else {registered.put(objectName, objectName);}}
而 prometheus-client 则是通过 CollectorRegistry.defaultRegistry 进行注册实例的。
// io.prometheus.client.SimpleCollector.Builder#register()/*** Create and register the Collector with the default registry.*/public C register() {return register(CollectorRegistry.defaultRegistry);}/*** Create and register the Collector with the given registry.*/public C register(CollectorRegistry registry) {C sc = create();registry.register(sc);return sc;}
所以,好像原理上来讲是不同的。至于到底为什么不能监控到数据,那还不好说。至少,你可以学习 metrics-core 使用 MBean 的形式将数据导出。这是我们下一个方案要讨论的事。
这里我可以给到一个最终简单又不失巧合的方式,实现两个监控组件的兼容,同时向 jmx_exportor 进行导出。如下:
1. 引入 javaagent 依赖包
<!-- javaagent 包,与 外部使用的 jmx_exportor 一致 --><dependency><groupId>io.prometheus.jmx</groupId><artifactId>jmx_prometheus_javaagent</artifactId><version>0.12.0</version></dependency>
2. 使用 agent 的工具类进行埋点
因为 javaagent 里面提供一套完整的 client 工具包,所以,我们可以使用。
// 注册指标实例// 将 io.prometheus.client.Counter 包替换为 io.prometheus.jmx.shaded.io.prometheus.client.Counterio.prometheus.client.Counter c = io.prometheus.client.Counter.build().name("jmx_test_abc_ffff").labelNames("topic").help("topic counter usage.").register();public void incTopicMetric(String topic) {// c.labels("test").inc(); // for test}
3. 原样使用 jmx_exportor 就可以导出监控数据了
为什么换一个包这样就可以了?
因为 jmx_exportor 也是通过注册 CollectorRegistry.defaultRegistry 来进行收集数据的,我们只要保持与其实例一致,就可以做到在同一个jvm内共享数据了。
三、 基于 MBean自行实现带标签的埋点
// 测试类public class PrometheusMbeanMetricsMain {private static ConcurrentHashMap<String, AtomicInteger> topicContainer = new ConcurrentHashMap<>();private static MBeanServer mBeanServer = ManagementFactory.getPlatformMBeanServer();public static void main(String[] args) throws Exception {// 模拟某个topicString commingTopic = "test_topic";AtomicInteger myTopic1Counter = getMetricCounter(commingTopic);System.out.println("jmx started!");while(true){System.out.println("---");// 计数增加myTopic1Counter.incrementAndGet();Thread.sleep(10000);}}private static AtomicInteger getMetricCounter(String topic) throws MalformedObjectNameException, NotCompliantMBeanException, InstanceAlreadyExistsException, MBeanRegistrationException {AtomicInteger myTopic1Counter = topicContainer.get(topic);if(myTopic1Counter == null) {myTopic1Counter = new AtomicInteger(0);Hashtable<String, String> tab = new Hashtable<>();tab.put("topic", topic);// 占位符,虽然不知道什么意思,但是感觉很厉害的样子tab.put("_", "_value");ObjectName objectName = new ObjectName("mydomain_test", tab);// 注册监控实例 到 MBeanServer 中ObjectInstance objectInstance = mBeanServer.registerMBean(new JmxCounter(myTopic1Counter, objectName), objectName);}return myTopic1Counter;}}// JmxCounter, MBean 要求: 1. 接口必须定义成Public的; 2. 接口命名规范符合要求, 即接口名叫 XYZMBean ,那么实现名就必须一定是XYZ;// DynamicMBeanpublic interface JmxCounterMBean {public Object getCount() throws Exception;}public class JmxCounter implements JmxCounterMBean {private AtomicInteger metric;private ObjectName objectName;public JmxCounter(AtomicInteger metric, ObjectName objectName) {this.objectName = objectName;this.metric = metric;}@Overridepublic Object getCount() throws Exception {// 返回监控结果return metric.get();}}
最后,见证奇迹的时刻。结果如下:
# HELP mydomain_test_value_Count Attribute exposed for management (mydomain_test<_=_value, topic=b_topic><>Count)# TYPE mydomain_test_value_Count untypedmydomain_test_value_Count{topic="b_topic",} 1.0mydomain_test_value_Count{topic="a_topic",} 88.0
很明显,这是一个糟糕的实现,不要学他。仅为了演示效果。
所以,总结下来,自然是使用方案2了。两个组件兼容,实现简单,性能也不错。如果只是为了使用,到此就可以了。不过你得明白,以上方案有取巧的成分在。
四、 原理: jmx_exportor 是如何获取数据的?
jmx_exportor 也是可以通过 http_server 暴露数据。
// io.prometheus.client.exporter.HTTPServer/*** Start a HTTP server serving Prometheus metrics from the given registry.*/public HTTPServer(InetSocketAddress addr, CollectorRegistry registry, boolean daemon) throws IOException {server = HttpServer.create();server.bind(addr, 3);// 使用 HTTPMetricHandler 处理请求HttpHandler mHandler = new HTTPMetricHandler(registry);// 绑定到 /metrics 地址上server.createContext("/", mHandler);server.createContext("/metrics", mHandler);executorService = Executors.newFixedThreadPool(5, DaemonThreadFactory.defaultThreadFactory(daemon));server.setExecutor(executorService);start(daemon);}/*** Start a HTTP server by making sure that its background thread inherit proper daemon flag.*/private void start(boolean daemon) {if (daemon == Thread.currentThread().isDaemon()) {server.start();} else {FutureTask<Void> startTask = new FutureTask<Void>(new Runnable() {@Overridepublic void run() {server.start();}}, null);DaemonThreadFactory.defaultThreadFactory(daemon).newThread(startTask).start();try {startTask.get();} catch (ExecutionException e) {throw new RuntimeException("Unexpected exception on starting HTTPSever", e);} catch (InterruptedException e) {// This is possible only if the current tread has been interrupted,// but in real use cases this should not happen.// In any case, there is nothing to do, except to propagate interrupted flag.Thread.currentThread().interrupt();}}}
所以,可以主要逻辑是 HTTPMetricHandler 处理。来看看。
// io.prometheus.client.exporter.HTTPServer.HTTPMetricHandler#handlepublic void handle(HttpExchange t) throws IOException {String query = t.getRequestURI().getRawQuery();ByteArrayOutputStream response = this.response.get();response.reset();OutputStreamWriter osw = new OutputStreamWriter(response);// 主要由该 TextFormat 进行格式化输出// registry.filteredMetricFamilySamples() 进行数据收集TextFormat.write004(osw,registry.filteredMetricFamilySamples(parseQuery(query)));osw.flush();osw.close();response.flush();response.close();t.getResponseHeaders().set("Content-Type",TextFormat.CONTENT_TYPE_004);if (shouldUseCompression(t)) {t.getResponseHeaders().set("Content-Encoding", "gzip");t.sendResponseHeaders(HttpURLConnection.HTTP_OK, 0);final GZIPOutputStream os = new GZIPOutputStream(t.getResponseBody());response.writeTo(os);os.close();} else {t.getResponseHeaders().set("Content-Length",String.valueOf(response.size()));t.sendResponseHeaders(HttpURLConnection.HTTP_OK, response.size());// 写向客户端response.writeTo(t.getResponseBody());}t.close();}}
五、 原理: jmx_exportor 是如何获取Mbean 的数据的?
jmx_exportor 有一个 JmxScraper, 专门用于处理 MBean 的值。
// io.prometheus.jmx.JmxScraper#doScrape/*** Get a list of mbeans on host_port and scrape their values.** Values are passed to the receiver in a single thread.*/public void doScrape() throws Exception {MBeanServerConnection beanConn;JMXConnector jmxc = null;// 默认直接获取本地的 jmx 信息// 即是通过共享 ManagementFactory.getPlatformMBeanServer() 变量来实现通信的if (jmxUrl.isEmpty()) {beanConn = ManagementFactory.getPlatformMBeanServer();} else {Map<String, Object> environment = new HashMap<String, Object>();if (username != null && username.length() != 0 && password != null && password.length() != 0) {String[] credent = new String[] {username, password};environment.put(javax.management.remote.JMXConnector.CREDENTIALS, credent);}if (ssl) {environment.put(Context.SECURITY_PROTOCOL, "ssl");SslRMIClientSocketFactory clientSocketFactory = new SslRMIClientSocketFactory();environment.put(RMIConnectorServer.RMI_CLIENT_SOCKET_FACTORY_ATTRIBUTE, clientSocketFactory);environment.put("com.sun.jndi.rmi.factory.socket", clientSocketFactory);}// 如果是远程获取,则会通过 rmi 进行远程通信获取jmxc = JMXConnectorFactory.connect(new JMXServiceURL(jmxUrl), environment);beanConn = jmxc.getMBeanServerConnection();}try {// Query MBean names, see #89 for reasons queryMBeans() is used instead of queryNames()Set<ObjectName> mBeanNames = new HashSet<ObjectName>();for (ObjectName name : whitelistObjectNames) {for (ObjectInstance instance : beanConn.queryMBeans(name, null)) {mBeanNames.add(instance.getObjectName());}}for (ObjectName name : blacklistObjectNames) {for (ObjectInstance instance : beanConn.queryMBeans(name, null)) {mBeanNames.remove(instance.getObjectName());}}// Now that we have *only* the whitelisted mBeans, remove any old ones from the cache:jmxMBeanPropertyCache.onlyKeepMBeans(mBeanNames);for (ObjectName objectName : mBeanNames) {long start = System.nanoTime();scrapeBean(beanConn, objectName);logger.fine("TIME: " + (System.nanoTime() - start) + " ns for " + objectName.toString());}} finally {if (jmxc != null) {jmxc.close();}}}// io.prometheus.jmx.JmxScraper#scrapeBeanprivate void scrapeBean(MBeanServerConnection beanConn, ObjectName mbeanName) {MBeanInfo info;try {info = beanConn.getMBeanInfo(mbeanName);} catch (IOException e) {logScrape(mbeanName.toString(), "getMBeanInfo Fail: " + e);return;} catch (JMException e) {logScrape(mbeanName.toString(), "getMBeanInfo Fail: " + e);return;}MBeanAttributeInfo[] attrInfos = info.getAttributes();Map<String, MBeanAttributeInfo> name2AttrInfo = new LinkedHashMap<String, MBeanAttributeInfo>();for (int idx = 0; idx < attrInfos.length; ++idx) {MBeanAttributeInfo attr = attrInfos[idx];if (!attr.isReadable()) {logScrape(mbeanName, attr, "not readable");continue;}name2AttrInfo.put(attr.getName(), attr);}final AttributeList attributes;try {// 通过 MBean 调用对象,获取所有属性值,略去不说attributes = beanConn.getAttributes(mbeanName, name2AttrInfo.keySet().toArray(new String[0]));} catch (Exception e) {logScrape(mbeanName, name2AttrInfo.keySet(), "Fail: " + e);return;}for (Attribute attribute : attributes.asList()) {MBeanAttributeInfo attr = name2AttrInfo.get(attribute.getName());logScrape(mbeanName, attr, "process");// 处理单个key的属性值, 如 topic=aaa,ip=1 将会进行再次循环处理processBeanValue(mbeanName.getDomain(),// 获取有效的属性列表, 我们可以简单看一下过滤规则, 如下文jmxMBeanPropertyCache.getKeyPropertyList(mbeanName),new LinkedList<String>(),attr.getName(),attr.getType(),attr.getDescription(),attribute.getValue());}}// 处理每个 mBean 的属性,写入到 receiver 中// io.prometheus.jmx.JmxScraper#processBeanValue/*** Recursive function for exporting the values of an mBean.* JMX is a very open technology, without any prescribed way of declaring mBeans* so this function tries to do a best-effort pass of getting the values/names* out in a way it can be processed elsewhere easily.*/private void processBeanValue(String domain,LinkedHashMap<String, String> beanProperties,LinkedList<String> attrKeys,String attrName,String attrType,String attrDescription,Object value) {if (value == null) {logScrape(domain + beanProperties + attrName, "null");}// 单值情况,数字型,字符串型,可以处理else if (value instanceof Number || value instanceof String || value instanceof Boolean) {logScrape(domain + beanProperties + attrName, value.toString());// 解析出的数据存入 receiver 中,可以是 jmx, 或者 控制台this.receiver.recordBean(domain,beanProperties,attrKeys,attrName,attrType,attrDescription,value);}// 多值型情况else if (value instanceof CompositeData) {logScrape(domain + beanProperties + attrName, "compositedata");CompositeData composite = (CompositeData) value;CompositeType type = composite.getCompositeType();attrKeys = new LinkedList<String>(attrKeys);attrKeys.add(attrName);for(String key : type.keySet()) {String typ = type.getType(key).getTypeName();Object valu = composite.get(key);processBeanValue(domain,beanProperties,attrKeys,key,typ,type.getDescription(),valu);}}// 更复杂型对象else if (value instanceof TabularData) {// I don't pretend to have a good understanding of TabularData.// The real world usage doesn't appear to match how they were// meant to be used according to the docs. I've only seen them// used as 'key' 'value' pairs even when 'value' is itself a// CompositeData of multiple values.logScrape(domain + beanProperties + attrName, "tabulardata");TabularData tds = (TabularData) value;TabularType tt = tds.getTabularType();List<String> rowKeys = tt.getIndexNames();CompositeType type = tt.getRowType();Set<String> valueKeys = new TreeSet<String>(type.keySet());valueKeys.removeAll(rowKeys);LinkedList<String> extendedAttrKeys = new LinkedList<String>(attrKeys);extendedAttrKeys.add(attrName);for (Object valu : tds.values()) {if (valu instanceof CompositeData) {CompositeData composite = (CompositeData) valu;LinkedHashMap<String, String> l2s = new LinkedHashMap<String, String>(beanProperties);for (String idx : rowKeys) {Object obj = composite.get(idx);if (obj != null) {// Nested tabulardata will repeat the 'key' label, so// append a suffix to distinguish each.while (l2s.containsKey(idx)) {idx = idx + "_";}l2s.put(idx, obj.toString());}}for(String valueIdx : valueKeys) {LinkedList<String> attrNames = extendedAttrKeys;String typ = type.getType(valueIdx).getTypeName();String name = valueIdx;if (valueIdx.toLowerCase().equals("value")) {// Skip appending 'value' to the nameattrNames = attrKeys;name = attrName;}processBeanValue(domain,l2s,attrNames,name,typ,type.getDescription(),composite.get(valueIdx));}} else {logScrape(domain, "not a correct tabulardata format");}}} else if (value.getClass().isArray()) {logScrape(domain, "arrays are unsupported");} else {// 多半会返回不支持的对象然后得不到jmx监控值// mydomain_test{3=3, topic=aaa} java.util.Hashtable is not exportedlogScrape(domain + beanProperties, attrType + " is not exported");}}// 我们看下prometheus 对 mbeanName 的转换操作,会将各种特殊字符转换为 属性列表// io.prometheus.jmx.JmxMBeanPropertyCache#getKeyPropertyListpublic LinkedHashMap<String, String> getKeyPropertyList(ObjectName mbeanName) {LinkedHashMap<String, String> keyProperties = keyPropertiesPerBean.get(mbeanName);if (keyProperties == null) {keyProperties = new LinkedHashMap<String, String>();// 转化为 string 格式String properties = mbeanName.getKeyPropertyListString();// 此处为 prometheus 认识的格式,已经匹配上了Matcher match = PROPERTY_PATTERN.matcher(properties);while (match.lookingAt()) {keyProperties.put(match.group(1), match.group(2));properties = properties.substring(match.end());if (properties.startsWith(",")) {properties = properties.substring(1);}match.reset(properties);}keyPropertiesPerBean.put(mbeanName, keyProperties);}return keyProperties;}// io.prometheus.jmx.JmxMBeanPropertyCache#PROPERTY_PATTERNprivate static final Pattern PROPERTY_PATTERN = Pattern.compile("([^,=:\\*\\?]+)" + // Name - non-empty, anything but comma, equals, colon, star, or question mark"=" + // Equals"(" + // Either"\"" + // Quoted"(?:" + // A possibly empty sequence of"[^\\\\\"]*" + // Greedily match anything but backslash or quote"(?:\\\\.)?" + // Greedily see if we can match an escaped sequence")*" +"\"" +"|" + // Or"[^,=:\"]*" + // Unquoted - can be empty, anything but comma, equals, colon, or quote")");
六、 原理: jmx_exportor 为什么输出的格式是这样的?
prometheus 的数据格式如下,如何从埋点数据转换?
# HELP mydomain_test_value_Count Attribute exposed for management (mydomain_test<_=_value, topic=b_topic><>Count)# TYPE mydomain_test_value_Count untypedmydomain_test_value_Count{topic="b_topic",} 1.0mydomain_test_value_Count{topic="a_topic",} 132.0
是一个输出格式问题,也是一协议问题。
// io.prometheus.client.exporter.common.TextFormat#write004public static void write004(Writer writer, Enumeration<Collector.MetricFamilySamples> mfs) throws IOException {/* See http://prometheus.io/docs/instrumenting/exposition_formats/* for the output format specification. */while(mfs.hasMoreElements()) {Collector.MetricFamilySamples metricFamilySamples = mfs.nextElement();writer.write("# HELP ");writer.write(metricFamilySamples.name);writer.write(' ');writeEscapedHelp(writer, metricFamilySamples.help);writer.write('\n');writer.write("# TYPE ");writer.write(metricFamilySamples.name);writer.write(' ');writer.write(typeString(metricFamilySamples.type));writer.write('\n');for (Collector.MetricFamilySamples.Sample sample: metricFamilySamples.samples) {writer.write(sample.name);// 带 labelNames 的,依次输出对应的标签if (sample.labelNames.size() > 0) {writer.write('{');for (int i = 0; i < sample.labelNames.size(); ++i) {writer.write(sample.labelNames.get(i));writer.write("=\"");writeEscapedLabelValue(writer, sample.labelValues.get(i));writer.write("\",");}writer.write('}');}writer.write(' ');writer.write(Collector.doubleToGoString(sample.value));if (sample.timestampMs != null){writer.write(' ');writer.write(sample.timestampMs.toString());}writer.write('\n');}}}

腾讯、阿里、滴滴后台面试题汇总总结 — (含答案)
面试:史上最全多线程面试题 !
最新阿里内推Java后端面试题
JVM难学?那是因为你没认真看完这篇文章

关注作者微信公众号 —《JAVA烂猪皮》
了解更多java后端架构知识以及最新面试宝典


看完本文记得给作者点赞+在看哦~~~大家的支持,是作者源源不断出文的动力
作者:等你归去来
出处:https://www.cnblogs.com/yougewe/p/11891405.html
