Mark Papadakis
6 min readAug 17, 2018

A practical and interesting application of events processing

This blog post is slightly out of date; another overseer-service has been added since so that there is a separate service that matches users with “personalized deals alerts” and another that processes events about those pairings and builds notifications. The post will be updated soon to reflect those changes.

Our company, among other things, builds and operates a products discovery, research, and comparison platform, BestPrice.gr. (it has since been spun off to separate corporate entity). There are some fun challenges involved with building and running such e-commerce services, that span IR(search, recommendations, etc), ‘big data’ processing(ML, probabilistic models, etc), user experience, and more.

I thought it may be useful to describe how one of the many different features/functions we provide works because it highlights how decoupling and streams can help organizations build applications and scale quickly with little effort.

Users can subscribe to receive notifications for products on BestPrice that are classified as ‘deals’. They choose which categories of products they are interested in (video games, mobile phones, apparel, shoes, books, etc), the price drop-off floor in percentage(i.e if a product’s price drops by 20% or more), and how to be notified for those events(via email, via slack, and via Google Chrome push notifications). It’s a pretty useful feature, if I may say so, and it’s been quite popular with our users.

Configuring deals notifications options

We partner with almost 1700 merchants, and we periodically (depending on per-merchant configuration and other business rules, frequency varies from 15 minutes to 2 hours, with various exceptions) retrieve product feeds from them via a system that’s purpose-built to download feeds concurrently and store them locally for later processing. Every 15 minutes or so, another service runs that checks for any available updates processes them, merges updates with the current data sets, updates Trinity indices, and more. It also emits a lot of events for, among other things, all products where we detected changes in their properties(price, availability, etc) since the last indexing session. We use TANK for persisting events in partitions, organized in topics.

Whenever events about product properties changes are produced into a specific TANK topic, many (dozens) different services consume those updates and act on them in all kinds of different and useful ways. This is also true for other types of events on other topics. We have maybe 100 or so different services that each monitors usually 1 but often times as many as 10 different topics for updates, and those services can, in turn, produce(emit) more events, to be processed by more ‘consumers’ later, and so on.

One of those services that monitor the properties updates topic is responsible for identifying deals. It considers all updated products and based on a model that relies on various heuristics, considering the price history of a product and the forecast for the product’s price trajectory, it may transition a product into a ‘deal’, or transition out existing deals (i.e a product is no longer considered a deal). Those various transitions computed (product id, deal/no-deal) are produced into a different TANK topic, which is used to track deals transitions.

This product is a ‘deal’, at 25% off the regular price

Another service that monitors that topic processes all those updates. For products no longer marked as deals, it identifies users who were notified for them and deletes the respective notifications from their notifications timeline (BestPrice has all kinds of social features, and also displays all deals you were notified about in a page and a notifications tray).
For all products that are now marked as deals, it identifies all users who may be interested in them, based on their settings, and generates timelines events for them, and also produces more TANK events, one for each supported notification method(email, slack, push). All of those new events are produced atomically to yet another TANK topic. They events effectively capture the notification context(user information(recipient), products information, notification method, etc).
It’s interesting that this service caches information about users settings and delivery options in-memory, and updates them whenever those options are updated by reacting to events produced into TANK topics whenever users update their options.

Price drops in the notifications tray

There is another service that consumes from that topic, responsible for handling scheduled notifications. For slack events, it directly interfaces via concurrent HTTP requests with the Slack API. For Push Notifications, it interfaces with a node servers clusters we run locally which in turn interfaces with Google Push Notification APIs.
For emails though, it gets a bit more interesting. This service will interface (via RPC/concurrent HTTP requests) with a cluster of node services that is responsible for generating email messages. Our engineers built this service which accepts JSON payloads that describe the type and other specific properties and generate the email message to be delivered to the user. Maybe someday someone from our front-end engineering team will describe how this works in practice. It’s quite elegant.
Once the ‘notifications overseer’ (we call all of those services overseers), collects the generated email messages, it generates new events for them that include subject, body, recipient email address, and more, and packs them all into new events that are produced to yet another TANK topic.

Another overseer is monitoring that TANK topic that we use for scheduling outgoing email messages. As soon as it consumes one such message, it unpacks it and, using our mail client library, interfaces with another service that is described in the next paragraph, to queue it for delivery. Each such email queued for delivery is assigned a distinct ID by the mail client library. This client library associates that outgoing mail message ID with the BestPrice email event (so that we can know that an email scheduled on BestPrice of a specific type, at a specific timestamp, to a specific recipient etc, is associated with mail id).

The outgoing mails scheduler service is responsible for managing all outgoing emails of our company. It speaks SMTP and interfaces with our Postfix MTAs to schedule emails. It understands the various responses and supports retries, graceful backoff, and other useful features.
It will, in turn, emit events that include the mail message ID and a status(failed, successfully queued, bounced, etc). When the mail message is queued for delivery with Postfix, the Postfix specific message id is associated with our mail message ID.

Another service monitors Postfix’s log for issues related to mails delivery(bounces, etc). It attempts to match the Postfix message id to mail message ID, and if it does, it emits a new event for (mail message id, failure context) into a TANK topic we use for tracking mail delivery issues.

Yet another service(tired yet?:), monitors that topic and processes those events. For example, if we get a bounce, and we can associate the mail message ID to a bestprice email notification, then it flags the email address as unreachable, so that we will stop sending emails to it in the future and will also warn the user when she uses BestPrice again so that she can perhaps try a different email address.

There are actually more services involved in the deals service, like the one that captures interaction and engagement metrics (i.e mail open rates and clicks in the email message) to be persisted to yet another TANK topic, to be processed by yet another overseer that builds reports and, you guessed it, emits more events.

There are features that involve a lot more overseers. One in particular monitors over 15 topics and manages the inventory of all 7 or so million products, and generates events that are processed by two dozen other overseers. It can be overwhelming, but it’s really very straight-forward because of how they are all decoupled and isolated from each other.

There are many benefits associated with events processing. We can troubleshoot issues by replaying events from a specific point in time, we can capture events even if we have no current need for them, and use them later when we think of something useful to do with them. You can read more about such benefits here.

If you are interested in any of that, we are hiring.

Mark Papadakis

Seeking Knowledge 24x7. Head of R&D at Phaistos Networks | Simple is Beautiful — https://markpapadakis.com/