Opsgenie Edge Connector from A Developer’s Perspective
OEC is an “edge” application in Opsgenie’s literature — it indicates that this type of applications of Opsgenie run in the customer’s environment synergically with Opsgenie product. Also, OEC is known as the successor of Marid. In some ways, OEC enhanced the features of Marid, and in some other way, it abandoned some of them that do not focus on the primary purpose. The main purpose of OEC is that listening alert action events of Opsgenie and then running a mapped action (a script or executable) against that event in the customer’s on-premise environments. To construct this, OEC is empowered by two AWS services and a couple of abstractions. This article will tell about the main concepts that create OEC and the problems faced while developing it.
OEC needs to process many SQS messages that represent Opsgenie events and Golang is a language that is cut out for our multi-threaded program to handle lots of concurrent jobs at a very low cost. Memory usage is very tiny, starting and stopping time cost of the program is low, unlike a JVM based program. Marid was developed as a Java application, and in contrast to that previous experience, Golang offers an efficient way to run the application with even on low systems relatively.
Also, OEC does not need any Java dependency like Marid. You don’t need a Java environment, just a binary executable is enough for running it. With the help of it, we expect that this approach will provide us less customer trouble on installing and running an on-premise program.
Lastly, the simplicity of writing and reading code and fast compile and startup time were other good points to go with Go.
As stated before, Golang is a good choice for concurrent programming. Besides memory efficiency, Golang especially focuses on constructing and managing concurrency easily. “Goroutines” are lightweight threads of Golang. Summoning a goroutine is as easy as calling a method with the help of a “go” keyword. And also, communicating across goroutines is simplistic and efficient by using “channels” in one line of code.
At this point, Golang provides most of the need to write a concurrent program. But we needed a dynamic worker pool to manage the concurrency. Each worker represents a goroutine and does a job related to an Opsgenie event. Worker pool should scale the number of workers up when the incoming event load is increasing, and vice versa. Although there are a few libraries to handle this issue, finding a generally accepted one was hard. Then we wrote our own worker pool for our needs.
The worker pool is configurable by the customer in terms of the maximum and the minimum number of workers. It dynamically scales the numbers as the load is increased and decreased. Also, the interval for monitoring pool metrics and worker alive times for scaling are also configurable parameters. All those settings come with predefined values for the customers that don’t want to customize the pool.
Like the many other examples in Opsgenie, Amazon Simple Queue Serviceis used for communication between Opsgenie and OEC. As you can guess, in terms of reliability and getting rid of the overhead of messaging, SQS is the simplest way to receive Opsgenie callbacks. Standard queues and short polling for getting messages immediately are suitable for our case. OEC also offers to set some SQS configurations optionally.
While calling the SQS endpoint, we needed to exchange the credentials. Another Amazon Service helps us to allow OEC client to receive queue messages. By using Amazon Simple Token Service, an Opsgenie endpoint returns a token and this token grants OEC client that a few action permissions on SQS. Each token works for one unique queue, and multiple tokens can be returned by Opsgenie to allow to listen to different regions at the same time.
Tokens expire in 1-hour and OEC is trying to refresh its token in 1-minute intervals to avoid denial from AWS. When Opsgenie decides to refresh the client’s token by comparing its expiry time, the new token is returned to the client. In this way, OEC clients’ behaviors are controlled by Opsgenie. Time amounts — expiring and refreshing request interval — are configured by Opsgenie side and they can change for different needs for the future.
Token renewal cycles produce the need of a concurrent token refreshing logic because one SQS client is used by several actions like receiving or deleting a message at the same. A connection is persisted by one SQS client to perform queue actions for every queue listener. To do that, if OEC receives a new token, it refreshes SQS client and blocks queue actions that use this client temporarily while doing it.
OEC had needed logging capabilities to satisfy the requirements such as log levels, well formatting, rotating, etc. Log files are the rescuers of us when a customer has an issue with the product. For this purpose, we had to find suitable libraries. Logrus was one of the choices to fit those needs. Logrus offers us to log levels, level colors, timestamp, and all other configurable options.
Lumberjack was another complementary part of the logging feature of OEC. It is used for rotating the log file mainly. Max file size and age are the important configurations while separating the log files.
Clear and detailed logging is a critical need for debugging especially for an on-premise tool. Understanding and reproducing an error is much harder than our cloud environment. When a problem arises in the customer side, OEC logging capability will recover all the effort that is put on. After well explained and trackable logs, we are expecting a decreasing on support issues that need a technical eye and solving problems quickly.
Previously, Marid was using Groovy scripts depend on libraries that belong to Marid codebase. It was restricting script diversity and forcing users to use Groovy. We wanted to remove this obstacle to increase flexibility. Thus, OEC uses “os” library of Golang to execute terminal commands basically.
OEC has several options to run scripts. Executable binary files and sh scripts are natively usable options that do not require a dependency. Cmd and Powershell are usable for Windows environments. Python, Go and Groovy are usable options when setup requirements are satisfied in the customer’s environment. These alternatives are fixed for now but new ones can be added easily via modifying OEC. Additionally, custom configuration support to run different types of scripts can be provided in the future.
While creating a new approach, OEC had to persist Groovy support and old customers scripts that are written in this language. To run those, we need some dependencies that come from Marid codebase. An easy solution was that writing a new intermediary Groovy script depends on jar files that are subtracted from old codes. As a consequence, backward compatibility is ensured, and old scripts are still usable with existing old configurations.
On the other hand, Python is our new preferred script language for developing official scripts come with default OEC packages. Each OEC package that coupled with an integration contains certain Python scripts that perform an action related to the integration. Those out-of-the-box OEC scripts are supported for the customers that don’t need to do a custom job.
Lastly, arguments and environment variables are all configurable by changing the configuration file. Hence desired information can be passed to scripts easily.
OEC supports a git feature to use configuration and script files stored in any git repository. A user can map their scripts to a git repository. Hence mapping is not restricted with local files. This feature allows customers to automate numerous OEC clients running on different environments with the same or similar configurations.
Also, while one of the scripts is mapped to a git repository, another one can be mapped to another repository. So, multiple repositories are supported to achieve different files from each other. While doing this, each repository is being pulled by 1-minute time intervals on any changes. After pushing to a repository, all OEC clients that use the scripts in the repository get updated, although clients are not restarted.
Golang does not have a generally accepted mocking framework or mocking capability for Go “structs”. Some packages, such as httptest, help us to mock some behaviors. But writing tests are different than other languages like Java overall. If you want to mock an object, your object should implement an interface. Then, you should manually provide a mock struct simulates the actual one.
On the other hand, closures and function variables are powerful features for manipulating the scope and behaviors that use the function variables. OEC tests use them for mocking intensively. The abilities, that the certain test libraries have, like capturing argument, verifying method call and mocking functions can be performed easily as long as while using “interfaces”.
While using Golang, at one point we had faced with cross compilation problem because of packages that use Cgo. Cgo is the way using C libraries in Go code and enabling it causes a problem while running the executable on different Os environments. The simplest solution is getting rid of packages that require Cgo if possible. For example, OEC was using “os/user” package to get home dir of the current user and when the problem has risen, we got rid of this package. The environment variable is used for getting the home directory simply. Our case is not complicated and we have alternatives, but it is good to be cautious while using Cgo.
Running OEC as a background process was the last step of the developing phase. For Linux environments, “systemd” is a suitable service manager and there is no need extra effort unless creating an installation package. But when you look at the Windows side, things are confusing. Service support for Windows was just a bit painful because of the limited information that comes from a lack of community support on the Internet. The solution was found in another Golang library called service by writing an extra piece of code to run OEC on Windows as a service. As a result, OEC Windows package contains one more executable written in Golang to be able to install, start and stop it.
Developing OEC was a different experience in several areas. Using Golang is the most attractive part of it because of concise and clear coding conventions, and different points of view on OOP. It is worth to take a look at “Effective Go” guide if you are thinking to develop a Go program.
Also, the whole of the development cycle shows us the importance of design phase one more time. Well-Defined designs always make the development time shorter and disappear the confusion in the middle of the process. Using the benefits of cloud services is definitely a part of this cycle to reach maximum efficiency and spend minimum time.
Metehan Öztürk
0 comments