Imagine for a moment you're lighting a candle and you're so focused on starting the wick, you don't realize how close your finger is to the flame. Suddenly, you feel pain running through your hand, and you quickly pull away and let go.
All of this happens in a matter of seconds. Once your finger is too close to the fire for too long, small receptors in your skin start to pick up the changing temperature and your body then fires a nervous impulse, an electric signal that transmits an urgent message to other parts of your body: "Danger! Move fast!”. Before you consciously even notice that something's wrong, your body already reacts.
What does this have to do with event-driven microservices?
To answer this, we need to first understand that the definition of data has changed over time. Several years ago, it was very common to focus mostly on storing transactional data about users, products, orders, and other related information in relational databases. This state-centric view of the world dominated for many years.
Now, it is very common to also include event data that records what happens. In other words, we store information about user activity, system responses, and much more. We no longer focus on simply updating snapshots of the current state of our systems. Event-level data points are to our companies what electrical impulses are to our bodies.
Storing information in relational databases is still very useful and powerful, but limited in many ways. We’ll go through some of these limitations throughout the post. The main idea we want to share is that it's possible for information to flow in real-time through our organizations, just like impulses travel through living organisms.
Let’s start to develop a shared understanding by defining a common language. Alexander Dean, CEO of the Behavioral Data Platform Snowplow, clearly defines an event as “anything that we can observe occurring at a particular point in time”. For example:
User 123 (Fernanda Valdés) invests in fund XYZ at 13:46:53 on 28/07/2021
Traveler AB123 (Juan Manuel Cortés) buys flight tickets from Colombia at 11:23:21 on 13/06/2021
Machine 88-C breaks down at 23:54:31 on 13/09/2021
Events can usually be tied to a specific point in time. For example, an ongoing state like “the system is down” is not an event (the system breaks down at 13:21 on Monday is an event).
One of the main benefits of using events is how they simplify collaboration. Imagine how it would be if the brain were constantly requesting information from the rest of the body regardless of whether something interesting is happening or not. This would be a huge waste of activity.
With event collaboration, you don’t need one component (software or organic) asking another to do anything. Instead, each component simply signals that something has changed and other components listening can react however they want.
The data integration problem
Event data comes with a wide series of challenges. For example, traditional data integration approaches are usually not capable of managing event data because it’s usually orders of magnitude larger than transactional data.
When data is stored in different tables and databases, we often find developers creating connections from their applications directly to the databases to get the data they need. Over time, this creates a wide web of different connections that is extremely difficult to understand and makes it almost impossible to modify the data in any way without breaking the different applications.
This type of complexity usually makes the data unreliable. Data discrepancies are very difficult and time-consuming to identify, and finding the source of the discrepancies is even more difficult. This means that any report or data store derived from this system becomes questionable at best.
To solve this problem, we can turn to an often underrated or underestimated hero: the log. Many of us only care about logs when trying to identify why our system stopped working. But, according to Jay Kreps (co-founder and CEO of Confluent, the company behind the popular Apache Kafka streaming platform), it turns out that the log is actually a natural data structure for handling data flow between systems.
The main idea is to put all of the organization’s data into a central log that different systems and agents can subscribe to in real time.
What is a log?
A log as a data structure is extremely simple: it is an append-only sequence or stream of immutable facts.
These facts are equal to records about the events described earlier in this post and when they happened. This stream of records is ordered by time and new records can only be appended (added to the tail of the sequence). This makes the stored records immutable.
Events can be stored indefinitely, at large scale and can be consumed over and over again. This is actually one of the main benefits of this data structure, we have in essence a persistent and replayable record of history.
Martin Fowler has shared interesting use cases around events stored in this manner. One of these use cases is known as “Event sourcing”. If we have a replayable record of history, we can recreate the state of our application whenever we wish. If the application crashes, we can simply re-run the events and recreate the current state without loss of information. If we were to lose the current state of a user profile, we could reconstruct that user profile by running all the events produced by that user.
Fowler takes this idea even further: “Since our application state is entirely defined by the events we've processed, we can build alternative application states - which I call Parallel Models”. This allows us to go back in time to understand why someone did what they did or even create alternate realities for counterfactual analysis.
For example, we could try to ask ourselves “what would our operations look like by the end of 2020 if the pandemic didn’t happen or if it had lasted only a few months?”. To answer this we could re-run all the stored events from 2020 with our assumptions and compare the alternative end state to the actual state.
Another interesting use case is to counteract the consequences of incorrect information. Imagine someone made a mistake when designing the algorithm for a specific metric. If we only have access to the resulting metric month after month it becomes very difficult to know how the numbers would look if the algorithm had been correct from the start. With event sourcing we could go back to when the faulty algorithm was introduced and re-run the events using the correct algorithm.
Now that we understand the power of tracking events and the central idea of the immutable log, we can start adding more sources and agents into the mix to describe event-driven architectures.
In an event-driven architecture, the central log has a well-defined API for adding data and systems communicate with each other by sending and consuming events stored in the log. Consuming an event does not destroy it, it is still part of the log so that other systems can consume it as well.
Each data source can send events to the central log or can be modeled as its own log before converging in the central log. This central log naturally becomes the single source of truth for all related applications.
Finally, it is important to note that this event-streaming approach decouples the production and ownership of data from the access to it. Producers concentrate on creating and sending well-defined data, while data access and modeling requirements are shifted to the consumer.
We have shared multiple resources in the past about the benefits of decoupling functionality on the front end with micro-frontends, and decoupling your systems on the back end with microservices. So, we won’t focus on the benefits of microservices here.
Using language borrowed from Domain-Driven Design, a microservice is a small application built to fulfill a specific bounded context.
Event-driven microservices can also be separated into producers and consumers. Producer microservices send events to the streams for other services to consume, while consumer microservices consume and process events from one or more input event streams. It is common for microservices to be both producers and consumers.
One great benefit is that instead of using synchronous calls and callbacks, each microservice can write directly to the event stream. The stream of user information, for example, is easily available to any other downstream service that is listening and has an interest in user accounts. This allows us to create modular and composable services.
In the diagram below, data from the tracker of user activity is sent to the central unified log. Different databases and services are listening to the stream and act on the data accordingly. For example, ther CRM might store information regarding users that have used the financial simulator and shown interest in a loan.
Data stored in the data warehouse can also be used to train a machine learning model or to feed rule-based engines designed to produce next-best-action or next-best-offer marketing to users. The predictions or recommendations of these models can also be sent to the stream, which would be picked up by the recommendations widget in the digital product. This would result in personalized recommendations for the user when they visit the site.
As salespeople log information from direct communication with a user, the CRM can also produce events that are shared to the stream. The Users database and data warehouse will also be updated accordingly without having to create direct coupling between all of these services.
Use case: North American Bank
One of the largest banks in the world that serves over 16 million clients and has 85,000+ employees wanted to improve customer experience while staying compliant with industry regulations. They decided to migrate to an event-based architecture and stream data in real-time.
As is the case for most financial institutions, the bank has to be very careful with how it handles customer data and as the bank grew over the year, leveraging data across different business functions became increasingly complex.
They started this transformation by slicing their large monolithic applications into small cloud-based microservices. Allowing these microservices to synchronize in real-time through event processors, instead of connecting them to a database or through ETL patterns, also made it possible to create more flexible and decoupled systems without having to rewrite everything from scratch.
These changes have resulted in increased efficiency when building or updating apps and data can now be shared easily across teams. The bank was also capable of reducing anomaly detection time from weeks to real-time after migrating to an event-based architecture.
This last decade, hype around machine learning and artificial intelligence has left companies focusing on how to use algorithms and computations to generate value. It turns out that data itself is a powerful source of value. Incredible things can happen when we start putting together different pieces of data that were previously locked up in different silos.
Event-based architectures and microservices are not so much about a new type of technology (because it’s not new), they're more about measuring things for the first time that previously weren’t even recorded, and bringing together data that was previously separated.
Sending messages instantaneously throughout the organization opens up new possibilities for collaboration that were not possible before. Just like how brain activity increases when a person wakes up from a deep sleep, we can see processes start to speed up and new ideas and functions can develop when organizations wake up to the possibility of a central nervous system.