We caught up with Or Hiltch, co-founder and CTO at Skyline AI, to learn more about how machine learning can be used to increase the value of real estate portfolios. He tells us about the common mistakes companies make, and how to avoid them: you need to understand what machine learning is, focus on the data, and think twice before building your own machine learning team in-house.
Or taught himself to code at the age of 13 and never looked back. Although he started his career working at well-established corporations, he was soon founding his own. He raised funds from Sequoia Capital for a previous venture, StreamRail, and again for his current company, Skyline AI, where he leads machine learning teams in developing intelligent investment-management software for the real estate industry.
Or has been recognized as an AWS Community Hero for his blog posts and community involvement, and he openly shared his knowledge with us too, covering everything from the mistakes he’s seen people make to common misconceptions about machine learning to how he verifies whether machine learning solutions are actually working.
Common trip points for companies adopting machine learning
No one has (yet) written a playbook for companies just starting out with building machine learning solutions. Having worked with many managers, technical and non-technical alike, Or has an excellent overview of the common mistakes they make due to lack of guidance:
“There is really no textbook on how to adopt AI at the company level. There are lots of good tutorials on getting into machine learning as an engineer or a scientist, but not so much as a company.”
Luckily, Or has some expert advice to help you avoid the most common mistakes.
Trip point 1: Machine learning vs AI – What’s the difference?
The terms machine learning (ML) and artificial intelligence (AI) are often used interchangeably, but strictly speaking, machine learning is a subset of AI. The fact that different people use the terminology in different ways doesn’t help when it comes to adoption.
As Or notes, people tend to claim they’re “using AI” in order to generate hype and excitement about their product. But it isn’t always useful to describe basic automation as AI. Think about autopilot on an airplane: a standard, off-the-shelf feature that’s been around for decades.
“The autopilot does most of the work nowadays, even landing the airplane. But for many years, it’s been an off-the-shelf product. People used to refer to this as a heuristic algorithm, or simply as automation. But if an autopilot solutions company was established today, they’d refer to it as AI.”
The confusion between these terms is so common that Or’s team has a joke:
“What's the difference between machine learning and AI? If it's code, it's machine learning. If it's in PowerPoint, it's AI.”
Trip point 2: Proprietary datasets are usually not big enough for machine learning
Most companies Or has worked with have their own collection of internal proprietary data, and they’re interested in using machine learning to extract value from that data. In most cases, datasets collected by a single company are not big enough to use machine learning for predictive analysis (though you can do other useful analysis on smaller datasets).
“Consider a big asset manager with hundreds of properties. Hundreds of properties is actually pretty useful for all sorts of analysis, but we can’t do machine learning with hundreds of properties. You need hundreds of thousands of properties to be able to leverage some of these algorithms.”
Even if you add data from larger, public datasets, it can be hard to use it effectively. While machine learning algorithms are often considered “smart,” they always need humans to draw the initial connections between data.
Trip point 3: Datasets needs to be clean and well-structured, not just large
Size isn’t everything, and machine learning practitioners often talk about “GIGO” – “Garbage in, garbage out” – to emphasize that you can’t get good results from bad data.
According to Or, companies often underestimate how much manual cleaning and preprocessing work they’ll need to do on their datasets before they’re suitable for machine learning. If the data isn’t consistently labelled and structured, then machine learning algorithms can’t draw meaningful connections. As Or says:
“In real estate, this is a huge challenge. There’s no single dataset that has all this data. So before you even get to analysis – let alone machine learning – you need to construct this data set.”
Or also notes that using different conventions in datasets presents challenges for machine learning. Often the same property is labelled in two different ways, which means the algorithm analyzes it as two distinct properties.
But this doesn’t mean you can just add a few months to your timeline to cover cleaning the data. It took Or and a team of 15 engineers almost 2 years to clean and structure their data.
“We were 15 engineers, and we spent almost 2 years just building this data warehouse. That means developing what we call an entity resolution algorithm, which considers different streams of data that we have from the properties we manage. We have to combine those resources on the individual property level to make sure the combination is accurate. The same property may have a different address on different sources, or a different name.”
And even if you have clean, large datasets, you still need the necessary expertise to use this data, which takes us to:
Trip point 4: Building machine learning teams in-house is more difficult than managers expect
Another common mistake companies make is hiring only PhD candidates. While people who’ve spent years researching novel algorithms create huge amounts of value at companies like Google, Facebook, and Amazon, they do this in collaboration with engineers, not on their own. In fact, the majority of companies need machine learning engineers, not data scientists.
Building a machine learning team
Or describes the ideal machine learning team as consisting of machine learning engineers, data scientists, frontend engineers, designers, UX experts, and product leads.
As a starting point, you need engineers who know how to build scalable, predictable infrastructure around your data.
“You need machine learning engineers: data engineers. They specialize in creating data pipelines or data applications that reliably and robustly load data on schedule. These pipelines need to be monitored, and they need to be highly scalable.”
Frontend engineers and design experts
Even if your system will be mainly internal, ease of use is still important. Many companies ignore this aspect and end up with a powerful solution that produces no value.
“You also need engineers on the frontend. You need designers to design the system. And you need user-experience people to design the functionality of the platform.”
Some theoretical work is usually a requirement, too. It’s not that data scientists aren’t important, it’s just that they can often only create value with the support of good engineers. As Or says:
“You also need data scientists to step in once the data is already in the data warehouse. After the data engineers have done their work, you have this layer of data that’s accessible to the data science team, and then they can start having an impact.”
While engineers are responsible for building something well, it’s also important to make sure you’re building the correct thing. This is where product people come into the picture:
“Sometimes the CTO, the VP of R&D, or the CEO can play this role. But typically there are one or two people who are ultimately responsible for this. Their job is to ask what should be developed.”
Because building your own machine learning team is hard, sometimes it’s better to find an external consultancy instead: work with people who have done it before.
Choosing whether to build an in-house machine learning team
While it’s becoming more common for non-technical companies to have their own in-house software engineering team, it doesn’t always make sense to build your own machine learning team. While expertise in machine learning can be valuable on its own, if it’s not core to your actual offering, it’s often an expensive distraction.
For example, many companies need some kind of search function, but Or points out that, unlike Google, they don’t usually try to build their own from scratch:
“It wouldn't make sense for most companies to develop a search algorithm to compete with Google. That would be completely nuts. In the same way, trying to develop your own in-house data warehouse and then build the model on top of that is a pretty huge risk if that's not your core business.”
Especially since there’s no playbook for adopting machine learning, it’s often beneficial to rely on experts who’ve already made all the mistakes at least once, and who can figure out how to avoid them.
Trip point 5: Machine learning is fast, but industries can be slow
Managers often assume that once they have a machine learning model, they can throw data into it and get instant value out. While this is true in fast-paced industries such as finance, it’s not the case for “slower” industries like real estate. A house may only be sold once or twice over many decades, which is a strong contrast to something like the stock market, where assets may change hands many times per day. Since Or used to work in the finance industry, he knows how this works:
“If you look at the stock market, you see it's a perfect market. You have all the data, and you can pretty much just start. You can fire up a notebook and code predictive models from day one, because you already have everything you need.”
“Internet industry” companies like Facebook are similarly fast-paced:
“A data scientist at Facebook might be tasked with the job of reordering the newsfeed so that the ‘likes per second’ metric increases. The goal is to make it more interesting so there are more likes. A data scientist can do that and then immediately measure its performance.”
But real estate is very different. It takes longer to get value from the models, you need to be more careful about the data you feed in, and it’s harder to verify how good the results are.
To combat this, Or uses three different verification approaches.
- Verification by returns: This is the strongest form of verification. If using the machine learning solution leads to better investment returns, that’s a great indication that it’s doing what it’s meant to. But this kind of verification can take decades.
- Verification by backtesting: If you have enough data, you can train the model using only part of the data, and then the decisions it makes can be verified by testing it on data it hasn’t seen. As Or explains: “Let's say you use data from up until the last three years to train the model, and then you ask the model to predict transactability for the last year. That way, you see how accurate the model actually is.” But Or cautions that this method isn’t always reliable: “It's always possible that you messed up and the model is overfitted, or you had some issues with your benchmark. That's why you need to be very careful. You actually need really smart data scientists doing this job.”
- Verification by consensus: If you don’t have time or a lot of data, you need to use human experts to evaluate the predictions the algorithm generates. But according to Or, this isn’t easy either: “In these cases – and these are the hardest algorithms to verify – you really need to work closely with your team to try to reach a consensus concerning your results.”
The post-COVID world: Can machine learning solutions adapt?
Machine learning rose to fame because of its versatility. The same algorithms can be used to detect cancer, translate languages, and give traders an edge in financial markets. The COVID-19 crisis had a huge impact on the property market, Or points out, causing huge investment deals to dry up completely in many parts of the world.
But this doesn’t mean it’s time for machine learning in real estate to go into hibernation mode. Instead, the solutions initially created to help analysts handle vast quantities of deals efficiently can be adapted to source rare opportunities in an illiquid market.
Or describes how this transition is happening:
“The low-hanging fruit for technology – and for AI technology specifically – is in creating scale. If you can build something that lets you use five analysts to do the work of 500 analysts, then you have a pretty big advantage, because there are a lot of properties on the market.”
“But now, in the standard channels, the flow of incoming deals has really dried up. And that's where AI can actually provide the cure, because you can adapt it to proactively source off-market opportunities.”
“You could be the one approaching specific owners, based on automated analysis, and still create a pipeline of deals, even though listings are basically down to nothing.”
This is comparable to the restaurant delivery business. Without delivery services, restaurants would have no customers during lockdown or shelter-in-place periods. But delivery businesses create demand where it wouldn’t otherwise exist. As Or says:
“In some sense, machine learning is like restaurant delivery, because it enables you to say, ‘Okay, maybe nothing is listed, but I can still create a fairly interesting pipeline of deals to look at and then transact.’”
“We do see these types of transactions happening, and that's a really interesting concept in this era.”
Building machine learning solutions can be daunting, but if it’s done well, these solutions can also help you stay ahead of the curve. Even in periods of economic instability and global crisis, a well-built machine learning algorithm and a good dataset can be adapted to create value in unexpected ways.