How is Airbnb optimising Search and Discovery using Knowledge Graphs? 🎯
Insight behind the Airbnb's graph tech making it the go-to travel platform
Presenting you the first edition of ProductX in 2023, welcoming the 20+ folks who joined the community since the last story about Swiggy and its use of ML to process payments.
If you are new here, please consider subscribing to ProductX, on Substack so that these weekly stories reach directly to your inbox.
Well, it is the season of holidays coming to an end, and what’s a better story to get insights into than that of Airbnb?
Like Google, Airbnb is essentially a company that optimizes search and discovery for its users, in a niche category-homestays. To cater to the aspirations of travelers it needs to go the extra mile. Travel is a category that brings with it a host of questions.
Rewind to the time when you were planning your last trip. The very first question you had was “where to start?” Further, into the rabbit hole, you begin questioning what and where to eat, where to visit, and how to make the most of the experience at the destination. Ahhh… honestly, that isn’t very clear.
Airbnb is solving the woes of travelers and in the process, is making its move to become an end-to-end travel platform compared to just being a place for discovering homestays at an affordable cost. Here’s how…
Quick Context 👀
Discovery of what you want and need to know about a destination is crucial to a user for the overall trip experience, especially in edge cases where the user is traveling to a place they have never been to before.
To provide relevant context to users, the need is for a way to represent relationships between distinct but related entities on Airbnb.
Enter Knowledge Graphs…
Knowledge graphs are Airbnb’s solution to define these abstract relationships flexibly while ensuring technical scalability to power all of Airbnb’s verticals. But before we get into what knowledge graphs are and how they solve the purpose of Airbnb, let’s understand the information essential for traveling.
Information types needed for Traveling 🧳
According to Airbnb’s Brian Chesky, the key to creating an 11-star experience is going beyond just providing a welcoming homestay on a trip, which leads us back to Product 101- Understanding the users and their needs.
Information needs can be categorized into two phases:
The first phase is all about figuring out the destination for the trip and points of interest (POIs) in that destination. It includes popular/trending destinations amongst people, activities, and neighborhoods that best match the user’s interests.
The second phase is about what you would want to do and see. This includes the food, activities, POIs, etc.
So, now that we have context about the information, how do we surface all this to people in a generalizable and scalable manner?
Knowledge Graph 🕸️
Knowledge Graphs represent a network of real-world entities—i.e. objects, events, situations, or concepts—and illustrate the relationship between them.
It has been successfully used by Google to power their search engine and surface relevant context for particular queries.
As I mentioned earlier, it is all about optimizing the search and discovery for Airbnb (just like Google).
Why is a graph structure scalable? 📈
Graph structures are used to store and organize data on relationships between entities, rather than just individual rows of data as in a traditional relational structure.
This allows for more flexibility and scalability in categorizing and organizing data, as new relationships and entities can easily be added to the graph.
The graph structure allows for easy traversal of these relationships, making it quick to access and retrieve data.
The graph can also be hierarchical, with high-level concepts branching down to more specific details, allowing for a streamlined data organization.
As the graph grows and reaches a critical mass of data, it can automatically infer new relationships between entities, reducing the need for manual categorization.
For instance:
Surfing that is an “Experience” has to be associated with “Hawaii”.
By having the same object representing all of the things in the world, the operational overhead is removed for redefining the world whenever a new product is introduced to the platform.
How is the “Graph” structured? 🧐
A knowledge graph is a structure that hierarchically organizes data and consists of nodes and edges.
Nodes => Entities (such as restaurants or experiences)
Edges => Relationships between these entities
The goal of the knowledge graph is to be mutually exclusive and collectively exhaustive, meaning it aims to cover all relevant information without duplication.
It can be queried using an API to surface relevant information, and inventory items can be indexed by their unique identifiers in the knowledge graph.
So, now that we know what Knowledge Graphs are, let’s understand the infrastructure and the way Airbnb implements it.
The Infrastructure 🏗️
The Knowledge Graph Server can be divided into 3 parts:
Graph Storage
Graph Query API
Storage Mutator
Graph Storage:
A knowledge graph infrastructure was built with a graph storage module using an in-house relational database.
Node store and Edge store were implemented on top of this DB, to perform CRUD operations. In addition to that, global unique identifier is assigned to each node or edge.
Nodes in the graph storage are divided into different types, each with a unique schema.
For example, a place node is defined by the name and GPS coordinate while the event node type is defined by the name, date, and venue.
Edges can also be of different types and have constraints on the types of nodes they connect.
For instance, landmark-in-city and language-spoken-in-country.
Each edge type has a configurable constraint for the type of nodes that it starts from and connects to.
For example, a landmark-in-city has to connect from a landmark node to a city node.
The graph storage is designed to store edges from multiple data sources and can store additional payload for edges.
Hence, each edge also stores the source and confidence score for each edge.
Additional payload can be understood with the following example:
The distance between the Home listing and the landmark for a home-near-landmark edge.
A daily snapshot of the nodes and edges is dumped into a data warehouse for offline usage and machine learning purposes.
The decision to use a relational database was made due to the reliability and features of the existing database, as well as the overhead of setting up and debugging a new graph database.
Graph Query API:
Airbnb implemented a knowledge graph API with CRUD endpoints and a graph query endpoint to support product needs
The graph query endpoint allows users to traverse the knowledge graph by specifying a path of edge types and data filters and receiving the traversed subgraph in a structured format.
The graph query API has a recursive interface, allowing for multiple steps in traversing the knowledge graph.
Consider Airbnb’s product detail page, the knowledge graph is queried to display points of interest (POI) near the Home listing, and photos for each of the restaurants, museums, or landmarks mentioned.
With terminologies in graph theory, this query needs to traverse
(1) all place nodes that are connected to a specific Home listing node
(2) photo nodes connected with the place nodes fetched in the previous step.
Users can specify which data sources to query in a graph query.
Airbnb is also working on a data reconciliation layer to provide a consistent view of data from multiple sources and resolve conflicts.
Some use cases cannot be directly supported with a graph query, and Airbnb is incorporating metadata and personalization signals through machine learning to address these "fuzzy" queries.
For example, use-case to fetch the most popular landmark around a Home.
Let’s take a look at an example:
If one wants to find all place nodes connected with the city node “Beijing” with edges of type “contains_location” such that they (1) have more than 5,000 listings around and (2) belong to the “scenic” category. This query can be written as follows.
Storage Mutator:
To import data to the knowledge graph, Airbnb implemented a storage mutator that allows data pipelines to send mutation requests through a Kafka message bus rather than directly through the knowledge graph API.
This pattern simplifies the process of writing data to the knowledge graph from various pipelines and is now the primary way for Airbnb to import data.
The storage mutator also includes a mutation publisher to propagate data mutations to the Kafka message bus for use by downstream pipelines.
One example of a downstream pipeline using this method is the search index pipeline, which uses the knowledge graph to populate categorization data into the search index.
For instance,
“Which Homes are best for families and which allow 24-hour check-in?”
To support this edge case, a rich taxonomy in knowledge graph is used and applied to categorize all of the inventories at Airbnb.
Where is it live? 📲
Airbnb uses the knowledge graph to provide contextual travel insights to users through several product features:
The knowledge graph stores hundreds of destination photos for use in inspiring users to select a destination.
The knowledge graph is used to surface context and insight about Homes in a destination, such as popular amenities and top landmarks or neighborhoods, to help users choose a Home to book.
The knowledge graph is also used to provide more context about a specific Home, including its proximity to key landmarks, on the product detail page.
Lessons that can be learned from Airbnb Case Study ✅
Airbnb's work with the knowledge graph has helped them improve searching, supply groupings, and content delivery, but there have been challenges with data quality and online performance.
The use of knowledge graphs provides a consistent interface to clean, current, and complete structured data about Airbnb inventory and the world of travel, and helps them improve the guest and host experience through the delivery of connected and high-quality data.
That was the Airbnb Knowledge Graph breakdown ✨
If you derived valuable insights from this edition, share this with your friends on WhatsApp.
Also, do not forget to subscribe to the ProductX newsletter for more insightful technical case studies, delivered right to your inbox. 👇🏻
References for the article:
Airbnb Engineering Blog: https://medium.com/airbnb-engineering
Good Read.