avatarYitaek Hwang

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

2536

Abstract

uster to host Vault or ChartMuseum). Prometheus Operators also come with a ton of default rules (e.g. etcd, kube-api-server) that you may be getting via another tool (e.g. Stackdriver for GKE) so you may spend a bit more time refining what you need initially.</p><h2 id="7159">Multi-Cloud or Multi-Cluster Setup</h2><p id="d466">As Kubernetes clusters scale, there may arise a need for a centralize multi-cluster monitoring solution. It could be to tie together clusters deployed in different regions; a multi-cloud setup to pull CloudWatch, Stackdriver, GKE, and EKS metrics; or a hierarchical architecture to pull distributed job metrics across different job-specific clusters. Prometheus is flexible in that a Prometheus server in one cluster can also act as a target for another Prometheus. In other words, Prometheus supports federated monitoring.</p><p id="783d">Instead of exposing Prometheus via an ingress, Prometheus can authenticate via API server using standard Kubernetes TLS authentication and authorization via cluster roles. You can see an example setup on this <a href="https://readmedium.com/monitoring-multiple-kubernetes-clusters-88cf34442fa3">blog post from THG</a>.</p><p id="9361">Another option is to use <a href="https://thanos.io/">Thanos</a> (another CNCF project) to take advantage of its long-term storage and querying capabilities. Thanos uses the sidecar pattern to connect to Prometheus and export it to a more scalable cloud storage. One of the advantages of using Thanos is its querying engine that can partition the data and downsample to quickly run queries spanning multiple months or years.</p><p id="d7e0">To add Thanos to existing Prometheus deployment:</p><ul><li><a href="https://github.com/coreos/prometheus-operator/tree/master/example/thanos">Adding to Prometheus Operator</a></li><li><a href="https://github.com/thanos-io/thanos/blob/f7802edbf8306c07cc51435425d8f61b7fb94716/tutorials/kubernetes-helm/README.md">Adding to Prometheus Helm Chart</a></li></ul><h1 id="7d9c">Installing Prometheus & Grafana</h1><p id="0df1">For the sake of simplicity, the rest of the series will assume a single cluster Prometheus and Grafana setup using the official Helm Chart. However, the instructions can be easily modified to work with operators or Thanos addons.</p><p id="b040">Without further ado, let’s deploy both charts! Assuming Helm is already installed onto your cluster (if not you can use <a href="https://cloud.google.com/blog/products/devops-sre/deploying-a-production-grade-helm-release

Options

-on-gke-with-terraform">Google’s Terraform module</a> or follow <a href="https://docs.bitnami.com/kubernetes/get-started-gke/">Bitnami’s guide</a>), it’s as simple as:</p><div id="bd5a"><pre>helm install prometheus stable/prometheus </pre></div><div id="76fd"><pre><span class="hljs-string">//</span> or using Helm 2 <span class="hljs-string">//</span> helm upgrade <span class="hljs-params">--install</span> prometheus stable/prometheus</pre></div><p id="343f">and for Grafana</p><div id="6b96"><pre>helm install grafana stable/grafana</pre></div><div id="45b5"><pre><span class="hljs-string">//</span> or using Helm 2 <span class="hljs-string">//</span> helm upgrade <span class="hljs-params">--install</span> grafana stable/grafana</pre></div><p id="f857">Now you should have Prometheus server, alertmanager, pushgateway, kube-metrics, node exporter, and Grafana deployed. Prometheus Helm chart provides default scrape configs to collect metrics from Kubernetes apps inside your cluster.</p><figure id="64c0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*2OiFV7kGE_wMjpL4.png"><figcaption>Prometheus & Grafana Architecture — <a href="https://sysdig.com/blog/kubernetes-monitoring-with-prometheus-alertmanager-grafana-pushgateway-part-2/">Image Credit: Sysdig</a></figcaption></figure><p id="4705">In the <a href="https://readmedium.com/practical-monitoring-with-prometheus-grafana-part-ii-5020be20ebf6">next post</a>, we will explore setting up blackbox monitoring to run a simple uptime check. This is a simple alternative to running Pingdom or uptime bots and a great candidate to host in a centralized Vault, ChartMuseum, or cross-cloud Prometheus to ping across clouds (i.e. check AWS endpoints from GCP).</p><p id="ee82">You can find other posts in this series here:</p><ul><li>Part I: <a href="https://readmedium.com/practical-monitoring-with-prometheus-grafana-part-i-22d0f172f993">Installing Prometheus + Grafana via Helm in 5 Minutes</a></li><li>Part II: <a href="https://readmedium.com/practical-monitoring-with-prometheus-grafana-part-ii-5020be20ebf6">Using Prometheus blackbox exporter for free uptime checks</a></li><li>Part III: <a href="https://towardsdatascience.com/practical-monitoring-with-prometheus-grafana-part-iii-81f019ecee19">Applying simple statistics for anomaly detection using Prometheus</a></li><li>Part IV: <a href="https://readmedium.com/practical-monitoring-with-prometheus-grafana-part-iv-d4f3f995cc78">Securing Grafana with Identity-Award Proxy</a></li></ul></article></body>

Practical Monitoring with Prometheus & Grafana (Part I)

Installing Prometheus + Grafana via Helm in 5 Minutes

Prometheus at Scale: Architecture Considerations

In August 2018, Prometheus joined Kubernetes as the second project to graduate from CNCF and solidified itself as the de facto standard for open source monitoring tool for Kubernetes. From commercial offerings from Sysdig and Weaveworks to Prometheus operator charts from CoreOS and Bitnami, users now have more choices than before to install and deploy Prometheus onto Kubernetes. With so many options, how should one deploy Prometheus for scale?

Operators vs. Helm

If you aren’t familiar with operators, they are software extensions to Kubernetes that package together application-specific custom resources and configurations. For a monitoring bundle that often includes Prometheus (server, alertmanager, push gateway) and Grafana, using a Prometheus operator from CoreOS or Bitnami provides preconfigured alerts and dashboards. It’s a matter of preference to use CoreOS’s kube-prometheus operator, which uses ksonnet or the prometheus-operator Helm chart.

There are some known issues for using the prometheus-operator on a private GKE cluster, so if you don’t want to change firewall settings or prefer Bitnami’s charts (perhaps you run their Redis or Postgres charts), then Bitnami’s prometheus-operator chart works great as well.

Personally, I found the operator pattern to be bit heavy on small or workload-specific clusters (e.g. a central cluster to host Vault or ChartMuseum). Prometheus Operators also come with a ton of default rules (e.g. etcd, kube-api-server) that you may be getting via another tool (e.g. Stackdriver for GKE) so you may spend a bit more time refining what you need initially.

Multi-Cloud or Multi-Cluster Setup

As Kubernetes clusters scale, there may arise a need for a centralize multi-cluster monitoring solution. It could be to tie together clusters deployed in different regions; a multi-cloud setup to pull CloudWatch, Stackdriver, GKE, and EKS metrics; or a hierarchical architecture to pull distributed job metrics across different job-specific clusters. Prometheus is flexible in that a Prometheus server in one cluster can also act as a target for another Prometheus. In other words, Prometheus supports federated monitoring.

Instead of exposing Prometheus via an ingress, Prometheus can authenticate via API server using standard Kubernetes TLS authentication and authorization via cluster roles. You can see an example setup on this blog post from THG.

Another option is to use Thanos (another CNCF project) to take advantage of its long-term storage and querying capabilities. Thanos uses the sidecar pattern to connect to Prometheus and export it to a more scalable cloud storage. One of the advantages of using Thanos is its querying engine that can partition the data and downsample to quickly run queries spanning multiple months or years.

To add Thanos to existing Prometheus deployment:

Installing Prometheus & Grafana

For the sake of simplicity, the rest of the series will assume a single cluster Prometheus and Grafana setup using the official Helm Chart. However, the instructions can be easily modified to work with operators or Thanos addons.

Without further ado, let’s deploy both charts! Assuming Helm is already installed onto your cluster (if not you can use Google’s Terraform module or follow Bitnami’s guide), it’s as simple as:

helm install prometheus stable/prometheus 
// or using Helm 2
// helm upgrade --install prometheus stable/prometheus

and for Grafana

helm install grafana stable/grafana
// or using Helm 2
// helm upgrade --install grafana stable/grafana

Now you should have Prometheus server, alertmanager, pushgateway, kube-metrics, node exporter, and Grafana deployed. Prometheus Helm chart provides default scrape configs to collect metrics from Kubernetes apps inside your cluster.

Prometheus & Grafana Architecture — Image Credit: Sysdig

In the next post, we will explore setting up blackbox monitoring to run a simple uptime check. This is a simple alternative to running Pingdom or uptime bots and a great candidate to host in a centralized Vault, ChartMuseum, or cross-cloud Prometheus to ping across clouds (i.e. check AWS endpoints from GCP).

You can find other posts in this series here:

Monitoring
Prometheus
Grafana
Kubernetes
DevOps
Recommended from ReadMedium