GenAI spurs demand for vector search startups, but database giants are also taking note
Vector databases are all the rage, judging by the number of startups entering the space and the investors ponying up for a piece of the pie. The proliferation of large language models (LLMs) and the generative AI (GenAI) movement have created fertile ground for vector database technologies to flourish.
While traditional relational databases such as Postgres or MySQL are well-suited to structured data — predefined data types that can be filed neatly in rows and columns — this doesn’t work so well for unstructured data such as images, videos, emails, social media posts, and any data that doesn’t adhere to a predefined data model.
Vector databases, on the other hand, store and process data in the form of vector embeddings, which convert text, documents, images, and other data into numerical representations that capture the meaning and relationships between the different data points. This is perfect for machine learning, as the database stores data spatially by how relevant each item is to the other, making it easier to retrieve semantically similar data.
This is particularly useful for LLMs, such as OpenAI’s GPT-4, as it allows the AI chatbot to better understand the context of a conversation by analyzing previous similar conversations. Vector search is also useful for all manner of real-time applications, such as content recommendations in social networks or e-commerce apps, as it can look at what a user has searched for and retrieve similar items in a heartbeat.
Vector search can also help reduce “hallucinations” in LLM applications, through providing additional information that might not have been available in the original training dataset.
“Without using vector similarity search, you can still develop AI/ML applications, but you would need to do more retraining and fine-tuning,” Andre Zayarni, CEO and co-founder of vector search startup Qdrant, explained to TechCrunch. “Vector databases come into play when there’s a large dataset, and you need a tool to work with vector embeddings in an efficient and convenient way.”
In January, Qdrant secured $28 million in funding to capitalize on growth that has led it to become one of the top 10 fastest growing commercial open source startups last year. And it’s far from the only vector database startup to raise cash of late — Vespa, Weaviate, Pinecone, and Chroma collectively raised $200 million last year for various vector offerings.
Since the turn of the year, we’ve also seen Index Ventures lead a $9.5 million seed roundinto Superlinked, a platform that transforms complex data into vector embeddings. And a few weeks back, Y Combinator (YC) unveiled its Winter ’24 cohort, which included Lantern, a startup that sells a hosted vector search engine for Postgres.
Elsewhere, Marqo raised a $4.4 million seed round late last year, swiftly followed by a $12.5 million Series A round in February. The Marqo platform provides a full gamut of vector tools out of the box, spanning vector generation, storage, and retrieval, allowing users to circumvent third-party tools from the likes of OpenAI or Hugging Face, and it offers everything via a single API.Marqo co-founders Tom Hamer and Jesse N. Clark previously worked in engineering roles at Amazon, where they realized the “huge unmet need” for semantic, flexible searching across different modalities such as text and images. And that is when they jumped ship to form Marqo in 2021.“Working with visual search and robotics at Amazon was when I really looked at vector search — I was thinking about new ways to do product discovery, and that very quickly converged on vector search,” Clark told TechCrunch. “In robotics, I was using multi-modal search to search through a lot of our images to identify if there were errant things like hoses and packages. This was otherwise going to be very challenging to solve.”