Hey folks,
I've alertmanager integrated with Opsgenie working well, however I'm facing a problem when my annotations match more than one record fired.
For example, I have a PrometheusRule to monitor kubernetes pods in crash/pending state and if more than one pod is having problems, the description annotation below does not appear on opsgenie, only runbook and dashboard, if only one pod is having problems, I can see the description normally on opsgenie.
does not appear on opsgenie if more than 1 pod is firing.
description: Pod {{ $labels.pod }} in the namespace {{ $labels.namespace }}
I guess is something related to arrays, not sure where and how to fix it.
Alertmanager template config
config: | |
global: {} | |
receivers: | |
- name: opsgenie | |
opsgenie_configs: | |
- api_key: ${opsgenie_key} | |
description: |- | |
{{ range .CommonAnnotations.SortedPairs }} | |
- {{ .Name }} = {{ .Value }} | |
{{- end }} | |
message: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.alertname }}' | |
priority: '{{ if .GroupLabels.priority }}{{ .GroupLabels.priority }}{{ else }}p2{{ end }}' | |
responders: | |
- name: '{{ if .GroupLabels.responders }}{{ .GroupLabels.responders }}{{ else }}platform{{ end }}' | |
type: team |
Prometheus Rule
apiVersion: monitoring.coreos.com/v1 | |
kind: PrometheusRule | |
metadata: | |
labels: | |
app: kube-prometheus-stack | |
release: kube-prometheus-stack | |
name: kube-pod-crash-looping-platform | |
namespace: platform | |
spec: | |
groups: | |
- name: eks | |
rules: | |
- alert: KubePodCrashLooping | |
annotations: | |
description: Pod {{ $labels.pod }} in the namespace {{ $labels.namespace }} | |
runbook: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodcrashlooping | |
dashboard: https://my-grafana-url | |
expr: max_over_time(kube_pod_container_status_waiting_reason{pod=~"liftbridge-.*|nats-.*|redis-.*|consul-server-.*|vault-0|vault-1|vault-2|vault-agent-injector-.*|argocd-.*|argo-rollouts-.*|coredns-.*|istio-.*|istiod-.*|hubbble-.*|external-.*|keda-.*", reason="CrashLoopBackOff"}[10m]) >= 1 | |
for: 10m | |
labels: | |
env: dev | |
priority: p2 | |
responders: platform |
Hi @daniel.rosa,
This is Darryl. I am here to help. 😃
Understand that you would like to understand why some details from Prometheus AlertManager were rendered successfully on Opsgenie alerts.
In order to dive deeper into the logs, we will need your consent to access your Opsgenie and it would be much more efficient to communicate over a support request.
Please consider raising a support request to our team via this link.
Thanks.
Kind regards,
Darryl Lee
Support Engineer, Atlassian
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.