Challenges & Solutions for Production Recommendation Systems

How to deal with unseen data, optimise response times, and update models frequently.


There are lots of articles about training and evaluating recommenders, but few explain how to overcame the challenges involved in setting up a full-scale system.

Most libraries don’t support scalable production systems out of the box. The challenges are usually:

  • Predicting dynamically – When you have a very large user/items dimensionality, it can be very inefficient – or impossible – to precompute all the recommendations.
  • Optimising response times – When you create predictions dynamically, the time you need to retrieve them is very important.
  • Frequently updating models – When the system needs to incorporate new data as it becomes available, it’s crucial to frequently update the models.
  • Predicting based on unseen data – This means dealing with unseen users or items and continously changing features.

This post will tell you how you can modify a model to extend its functionality for a full-scale production environment.

Hybrid Recommender Models Deal with Real-World Challenges Better

We use a LightFM model, a very popular python recommendation library that implements a hybrid model. It’s best suited for small- to middle-sized recommender projects – where you don’t need distributed training.

Short Recap of Different Recommender Approaches

There are two basic approaches to recommendation:

Collaborative models use only collaborative information – implicit or explicit interactions of users with items (like movies watched, rated, or liked). They don’t use any information on the actual items (like movie category, genre, etc.).

Collaborative models can achieve high precision with little data, but they can’t handle unknown users or items (the cold start problem).

Content-based models work purely on the available data about items or users – entirely ignoring interactions between users and items. - So they approach recommendations very differently than collaborative models.

Content-based models usually:

  • Require much more training data (you need to have user/item examples available for almost every single user/item combination), and
  • Are much harder to tune than collaborative models.

But they can make predictions for unseen items and usually have better coverage compared to collaborative models.

Hybrid Recommenders – like LightFM – combine both approaches and overcome a lot of the challenges of each individual approach.

They can deal with new items or new users:

When you deploy a collaborative model to production, you’ll often run into the problem that you need to predict for unseen users or items – like when a new user registers or visits your website, or your content team publishes a new article.

Usually you have to wait at least until the next training cycle, or until the user interacts with some item, to be able to make recommendations for these users.

But the hybrid model can make predictions even in this case: It will simply use the partially available features to compute the recommendations.

Hybrid models can also deal with missing features:

Sometimes features are missing for some users and items (simply because you haven’t been able to collect them yet), which is a problem if you’re relying on a content-based model.

Hybrid recommenders perform for returning users (those who are known from training) as well as new users/items, as long as you have features about them. This is especially useful for items, but also for new users (you can ask users what they’re interested in when they visit your site for the first time).

System Components

This system assumes that there are far fewer items than users, since it always retrieves predictions for all items. But it can serve as the basis for more complex recommenders.

The core of the system is a flask app that receives a user ID and returns the relevant items for that user. It will (re)load the LightFM model and query a redis instance for item and/or user features.

We’ll assume that user and item features are stored and serialised in a redis database and can be retrieved by the flask app at any time.

All applications will be deployed as microservices via docker containers.

How LightFM Makes Predictions

But how does it work?

The LightFM paper is very informative for an academic reader, but maybe a little brief for someone who isn’t very familiar with the domain. I’ll outline the LightFM model predicition process more simply below.

Explanation of the formulas:

  • Lowercase letters refer to vectors, and uppercase letters refer to matrices.
  • The subscript u refers to a single user, and U refers to the complete set of all users. Items are referred to in the same way.

Most of the naming here is consistent with the LightFM paper.

Model Components

So LightFM combines to best of the collaborative and the content-based approaches. You might say it models one component for each of the two approaches. Both are necessary to give us the properties we want from the recommender.

Collaborative component

The collaborative component allows you to fall back on a collaborative filtering algorithm in case you don’t have any features – or the features aren’t informative.

State-of-the-art collaborative filtering algorithms are implemented with a matrix factorisation. They estimate two latent (unobserved) matrix representations, which, when multiplied by each other, will reproduce the matrix of interactions for each item and user the model saw during training. Of course, there’s an error term to allow for noise and avoid overfitting.

A simple analogy: Try to factorise 12. We can do this with 2 and 6, 3 and 4, 1 and 12, etc. It’s similar for matrices.

We’ll call those matrices latent represenations, since they’re a compressed form of our interaction data.

Content-based components

The content-based component allows you to get predictions even if you have no interaction data.

LightFM incorporates user and item features by associating the features with the latent representations. The assumption is that features and latent representation are linearly related. So in vector form:

qu is the latent user representation, fu is a single user’s features row vector, Eu are the estimated item embeddings, and bu are the biases for the user emeddings. (For simplicity, we’ll leave them out from now on.)

Looks similar to linear regression, right? Except EU is a matrix, as opposed to ß, which is usually a vector. In fact, this actually performs multiple regressions: one for each model component. Again, it’s analogous for items.

During training, both the user embeddings and the item embeddings are estimated with the help of gradient descent algorithms. The embedding matrix will have a row for each feature. The columns of the embedding matrix are called components. The number of components is set as a model hyperparameter, which we’ll refer to from now as d.

The above image recaps this process for all users and all items. So in Step I, we have a matrix multiplication of the user feature matrix of shape Nusers×Nuserfeatures with the embedding matrix of shape Nuserfeatures×d. The same applies for the second multiplications of the item features by the item embeddings, respectively. The result from Step I is two matrices of shape Nusers×d and Nitems×d, respectively. So each user/item is represented as a latent vector of size d.

In the final step, these two matrices are mutliplied, resulting in the final score for each user and item of shape Nusers×Nitems.

Now you can easily get all the representations for a single user with the following term:

qu is a row vector of the user’s latent representations, and QI is a matrix of the latent representations of all items.

Enable Fallback to Collaborative Mode with Indicator Matrices

LightFM can only generate models with collaborative information.

It uses a very effective trick: If no user or item features are used at all, the model will accept an identity matrix of the size respective to Nusers or Nitems. This is very effective, since it then learns d components – one for each user. This way, the model can always fall back on the best pure collaborative method. You can think of these components as the model’s memory of the users and items it has already seen during training.

You can also force the model to fall back to collaborative mode – even when you do have features: You can modify the feature matrix by appending an identity matrix to it. Sometimes you’ll need this to get your model to converge. However, this usually means your features are too noisy or don’t carry enough information for the model to converge to a minimum by itself.

Finally, using this trick increases the effort needed to take the model to production: During training, the user’s index is used to retrieve the correct row of the corresponding feature/identity matrix – and this information might no longer be available in a production environment; plus the LightFM model hands this responsibility off to the user.

Interesting fact

The latent representations of similar items/users (in terms of collaborative information) you can obtain by using only indicator features will be close in Euclidean space. This model estimates them based on collaborative information. So you can use those to find similarities between your items or users.

Recreating Indicators and Features on the Fly

Now let’s implement a model that can fall back on collaborative mode, keeps track of IDs, and is thus able to reconstruct the correct features and indicators.

We’ll focus on implementing a complete approach. This is quite complex, because at the same time, it should be able to give predictions in most situations. We’ll subclass the LightFM class and add a special predict_online method, which is intended to be used during production.

This way, we can still use LightFM’s cythonised predictions functions and avoid handling user and item ID mappings separately.

It should satisfy these requirements:

  1. Reconstruct the indicator feature if the user/item was seen during training;
  2. Make online predictions no matter what data is available on a certain user;
  3. Make those predictions as quickly as possible.
ID Mappings

To achieve the first requirement, you’ll have to use the same class during training as well. You also need to adjust your subclass so it only accepts sparsity SparseFrame objects during training, and therefore creates and saves ID mappings.

Reconstructing Features

In order to achieve the second requirement, you need to check the available data every time a request comes in. There are 16 cases you’ll have to handle:

In cases IV, VIII, and XII, we simply return our baseline predictions. For cases XIII through XVI, we can’t give any predictions, because we don’t know enough about the items.

To summarise: We basically want to create a row vector which contains the user features, if they’re available. Otherwise it’s all zeros at the respective indices. It will also contain the user indicator feature set at the correct index, if the user was seen during training.

The item features are analogous to the user features, except we expect them to fit into memory easily to allow for a cache. You might consider using a different caching strategy (like TTLCache) based on your usecase, or not caching at all.

We also want to support not adding indicators, or only adding them to user or item features, which might make the implementation a little more complex. Still, we’ve tried to keep it as simple as possible.

Below you’ll find a sample implementation of the approach described above. This implementation should handle all cases up to VIII correctly. But it’s possible that not all item cases are implemented, since our application didn’t require it. So predicting known items without item features isn’t possible, but it should be very easy to add.

Part II of this post uses this class, connects it to a redis database, and serves its prediction dynamically with flask. We’ll also show you how to update the model without downtime with a background thread that starts from within in the flask application.


<p> CODE:</p>

Energy Transmission

Anticipating and Preventing Power Grid Failure

Massive power outages cause chaos for the general public, and they cost utility providers roughly $49 billion a year.

This wouldn’t be much of a problem if massive power outages were rare, but outages affecting more than 50,000 people have increased dramatically in recent years. This means utility companies need to find new ways of anticipating and managing these outages.

These days, smart grids are producing massive amounts of data, which means predicting and managing outages is easier than ever. Unlike traditional power grids, which are one-directional (meaning they only transmit power in one direction), smart grids are two-directional. They can capture data from every possible source in the grid at the same time as they’re providing electricity. They collect and monitor data from sources like smart meters, IoT devices, and power generation stations, providing a clear, real-time look at power usage.

Machine learning can use this data to anticipate and prevent massive power outages in the grid. Machine learning helps identify non-obvious patterns in the data that can be a precursor to grid failure, which helps maintenance teams preempt failure.

Balancing the Grid

Balancing the grid — making sure energy supply matches energy demand — is one of the most important jobs a transmission operator has. But renewable energy sources depend heavily on the weather, making them harder to predict.

Transmission operators spend millions each year fixing planning mistakes that lead to producing too much or too little power. In hybrid systems — which rely on both renewable energy sources and fossil fuels to generate electricity — these mistakes have to be corrected at the last minute by buying more energy or compensating power plants for the excess.

Machine learning is the most accurate method available to forecast the output of renewable energy. Advanced methods, like Long Short-Term Neural Networks (LSTMs), can weigh the many factors involved — wind, temperature, sunlight, and humidity forecasts — and make the best predictions. This saves money for operators and preserves resources for power plants.

Preventing Blackouts and Brownouts With Real-time Monitoring and AI Prediction

Power grids have a lot of obstacles to overcome in providing continuous energy to customers. Weather patterns, usage, internal failure, even wildcard incidents like lightning strikes and interference from wild animals can all affect power delivery.

Machine learning is increasingly being used to help predict potential brownout and blackout conditions. By feeding historical data into the AI and running Monte Carlo simulations to predict potential outcomes, grid operators can use machine learning to identify conditions that could lead to grid failure. And they can act accordingly.

Sensors like phase measurement units (PMU) and smart meters can provide usage information in real-time. When combined with both historical and simulation data, AI can help mitigate potential grid failure, using techniques like grid balancing and demand response optimization. Incidents that would otherwise have affected millions of people can be contained to a smaller area and fixed faster for less money.

Differentiate Power System Disturbances from Cyber Attacks

Cyber attacks are increasingly used to target important infrastructure, like shutting down hospitals with Ransomware attacks (when attackers break into the system and lock legitimate users out until a ransom is paid). With utility grids, a cyber attack can have widespread consequences and affect millions of users.

Detecting these attacks is critical.

Developers are using machine learning to differentiate between a fault (a short-circuit, for example) or a disturbance (such as line maintenance) in the grid and an intelligent cyber attack (like a data injection).

Since deception is a huge component of these attacks, the model needs to be trained to look for suspicious activity – things like malicious code or bots – that get left behind after the deception has occurred.

One such method uses feature extraction with Symbolic Dynamic Filtering (an information theory-based pattern recognition tool) to discover causal interactions between the subsystems, without overburdening computer systems. In testing, it accurately detected 99% of cyber attacks, with a true-positive rate of 98% and a false-positive rate of less than 2%. This low false-positive rate is significant because false alarms are one of the biggest concerns in detecting cyber attacks.

Balance Supply and Demand

Utility providers are looking for ways to better predict power usage while maintaining maintaining energy supply at all times. This becomes critical when renewable power sources (like solar or wind) are introduced into the grid.

Because these renewable power sources rely on elements beyond human control (like the weather), utility providers know they can’t always rely on renewables for continuous production. Knowing precisely when demand levels will peak allows utility providers to connect to secondary power sources (like conventionally generated electricity) to bolster the available resources and ensure constant service provision.

More and more utility providers are turning to machine learning for help. We can feed historical data into machine learning algorithms -- like Support Vector Machines (SVM) -- to accurately forecast energy usage and ensure sufficient levels and constant supply.

Detect Power Grid Faults

Current methods for detecting faults in the grid consume a lot of unnecessary time and resources. This creates a situation where power transmission is interrupted and customers are without electricity while faults are first located, then fixed.  

Machine learning can find faults quickly and more accurately helping you minimize service interruption for your customers.. Support Vector Machines (SVM) are combined with Discrete Wavelet Transformation (DWT) to locate faults in the lines using a traveling wave-based location method.

When we apply  DWT (a form of numerical and functional analysis that captures both frequency and location information) to the transient voltage recorded on the transmission line, we can determine the location of the fault by calculating aerial and ground mode voltage wavelets. So far, this method has detected fault inception angles, fault locations, loading levels, and non-linear high-impedance faults for both aerial and underground transmission lines.

Detect Non-Technical Power Grid Losses

In the energy world, “non-technical losses” means energy theft or fraud from the system.

There are two common types of non-technical losses. The first is when a customer uses more energy than the meter reports. The second involves rogue connections stealing energy from paying customers. To pull off this theft or fraud, bad actors can bypass smart meters completely or insert chips into the system that change how meters track energy use. Meter readers can also be bribed to report lower numbers (though thanks to smart meters, this is increasingly hard to do).

Because these non-technical losses cost $96 billion annually, utility providers are turning to machine learning to combat the problem.

We can help utility providers mine historical customer data to discover irregularities that indicate theft or fraud. These can be things like unusual spikes in usage, differences between reported and actual usage, and even evidence of equipment tampering.

Energy Distribution

Better Predict Energy Demand

Accurately predicting customers’ energy needs is critical for any utility provider. To date, we haven’t found an adequate solution for bulk energy storage, which means energy needs to be transmitted and consumed almost as soon as it’s produced.

We're using machine learning to increase the accuracy of these predictions. Historical energy use data, weather forecasts, and the types of businesses or buildings operating on a given day all play a role in determining how much energy is used.

For example, a hot summer day mid-week means more energy usage because office buildings run air conditioning at a high capacity. Weather forecasts and historical data can help identify those patterns in time to prevent rolling blackouts caused by air conditioners in the summer.

Machine Learning finds complicated patterns in the various influencing factors (such as day, time, predicted wind and solar radiation, major sports events, past demand, mean demand, air temperature, moisture and pressure, wind direction, day of the week, etc.) to explain the development of demand. Because machine learning finds more intricate patterns, its predictions are more accurate. This means energy distributors can increase efficiency and decrease costs when they buy energy – without having to make expensive adjustments.

Energy Generation

Predict Turbine Malfunction

Wind is a great renewable energy source, but wind turbine maintenance is notoriously expensive. It accounts for up to 25% of the cost per kWh. And fixing problems after they occur can be even more expensive.

Machine learning can help you get ahead of this problem. The goal is to reduce maintenance costs by catching problems before the turbine malfunctions. This is particularly important when wind farms are located in hard-to-access places, such as the middle of the ocean, which makes repair costs even higher.

Real-time data gathered with Supervisory Control and Data Acquisition (SCADA) can help identify possible malfunctions in the system far enough in advance to prevent failure.

For example, data from sensors found within the turbines – such as oil, grease, and vibration sensors – have been used to train machine learning models to identify precursors to failure, such as low levels of lubricant.

This method can train machine learning models to predict failures up to 60 days in advance.

Consumption / Retail

Accurately Predict Energy Prices

As personal power generation (using solar or wind power) gets easier and cheaper, consumers and businesses are increasingly producing their own power.

Personal power generation allows people to make, consume, and store their own energy. Depending on where they live, they may even be able to sell surplus power back to the local power utility.

Machine learning can help find the best time to produce, store, or sell this energy. Ideally, energy should be consumed or stored when prices are low and sold back to the grid when prices are high.

By looking at historical data, usage trends, and weather forecasts, machine learning models have made accurate predictions on an hourly basis. People with personal and business energy generation systems can use these predictions to make strategic decisions about whether to use, store, or sell their energy.

For example, Adaptive Neural Fuzzy Inference System (ANFIS) has been used to predict short-term wind patterns for wind power generation. This allows producers to maximize energy production and sell it when energy prices are at their peak.

Reduce Customer Churn

In open energy markets, where customers have a choice of utility providers, understanding which customers are going to churn out can be critical. Churn rates, which is the percentage of customers who stop using your service in a year, can be as high as 25%. Being able to predict churn and stay ahead of it is essential to survival.

Machine learning is helping utility owners predict when a customer is getting ready to churn out. By using techniques such as Cross-industry Standard Process for Data Mining (CRISP-DM), AdaBoost, and Support Vector Machines, as well as historical usage data, utility providers can identify key indicators of whether or not a customer is going to churn. These indicators include things like customer satisfaction, employment status, energy consumption, home ownership or rental status. A change in any of these can indicate a customer is getting ready to terminate their service.

When these indicators are identified far enough in advance, it’s possible to avoid churn by working with customers to solve any problems they’re experiencing.

Energy Trading

Predict Energy Prices

Just like natural gas and oil, wholesale energy is a market commodity. So naturally it's important for traders to be aware of market fluctuations and pricing when it comes to buying and selling energy.

To help make sense of the massive amounts of data used to make trading decisions, traders are increasingly turning to machine learning.

A mix of statistical analysis and machine learning can help commodity traders make better predictions. Classical statistical analysis techniques like time series analysis, Seasonal Autoregressive Integrated Moving Average (SARIMA), and regression models are used to deal with the data. And machine learning makes connections between the various data points.

What’s more, machine learning trains itself to make increasingly accurate predictions using the constant flow of real-time data.

Keep reading

No items found.
No blog posts found.