2021 was quite an exciting year in terms of funding for both data startups and established companies. We tracked more than a hundred data-related funding events during the year.
Several vendor has raised more than once, most notably AirByte that went through three rounds ($5.2M, $26M and $150M). Databricks was the largest fundraiser with a total of USD $2.6 billion in 2 rounds.
Click here for the hi-res version
General data platforms and infrastructure
This category mostly includes data companies not belonging to any of the more specific subcategories. Databricks leads with 2 mega-rounds, honorable mentions include Dremio (data lakes), Anyscale (working on scaling Python with Ray) and Coiled (working on scaling Python with Dask).
- Databricks - $1.6B series H in August and $1B series G in February 2021
- Dremio - $135M series D in January 2021
- AnyScale - $100M series C in December 2021
- StreamNative - $23M series A in September 2021
- Treeverse - $23M series A in July 2021
- Coiled - $21M series A in June 2021
Databases and SQL engines
This is our largest category due to the many different flavours of databases and SQL & NOSQL engines. The graph database segment was quite active with Neo4j and TigerGraph raising large rounds, followed by ArangoDB and MemGraph. Funding for analytic SQL engines and cloud DW platforms included Clickhouse, Firebolt, Starburst (Presto), Imply (Druid), StarTree (Apache Pinot), and Ahana (Presto). Other honorable mentions include Redis Labs (in-memory DB), distributed SQL vendors Yugabyte and Cockroach Labs, and time-series specialist TimeScale and QuestDB.
- Neo4j - $325M series F in June 2021
- Clickhouse - $250M series B in October 2021 and $50M series A in September
- YugaByte - $188M series C in October 2021 and $48M series B in March 2021
- Cockroach Labs - $160M series E in January 2021
- Firebolt - $127M series B in June 2021
- Redis Labs - $110M series G in April 2021
- TigerGraph - $105M series C in February 2021
- Starburst - $100M series C in January 2021
- SingleStore - $80M series F in September 2021
- Imply - $70M series C in June 2021
- TimeScale - $40M series B in May 2021
- ArangoDB - $27.8M series B in October 2021
- Startree - $24M series A in May 2021
- Ahana - $20M series A in August 2021
- QuestDB - $12M series A in November 2021
- MemGraph - $9.3M seed in October 2021
Data Integration, ETL and Reverse ETL
Cloud-based ETL was all the rage, where Fivetran raised the largest round, followed by Matillion and Hevo Data. In the open source segment Airbyte has gone from a $5.2M seed round in March to a $150M series B December in ten months and Meltano spun off Gitlab raising a seed round. Reverse ETL vendors Hightouch and Census raised early rounds. Prefect and ElementEtl (the company behind Dagster) focus on data orchestration and workflow management.
- Fivetran - $565M series D in September 2021
- AirByte - $5.2M seed in March 2021, $26M series A in May, $150M series B in December
- dbt Labs - $150M series C in June 2021
- Matillion - $100M series D in February 2021
- Hightouch - $12M series A in July 2021, $40M series B in November 2021
- Prefect - $11.5M series A in February 2021, $32M series B in June 2021
- Hevo Data - $30M series B in December 2021
- Census - $16M series A in February 2021
- ElementETL - $14M series A in November 2021
- Meltano - $4.2M seed in June 2021
Data Quality and observability
Data observability, data reliability and data quality vendors attracted a lot of attention in 2021. Monte Carlo and Big Eye raised twice within the year, and there are several other startups raising their first rounds.
- Monte Carlo - $60M series C in August and $25M series B in February
- BigEye - $45M series B in September 2021 and $17M series A in April 2021
- AccelData - $35M series B in September 2021
- Anomalo - $33M series A in October 2021
- Datafold - $20M series A in November 2021
- Soda - $13M series A in February 2021
Data governance, metadata, data catalogs
Collibra and Alation, both well established players in the data governance segment have raised significant rounds. There also new entrants raising their first rounds of capital: Stemma works on Amundsen which originates from Lyft, while Acryl Data focuses on the DataHub project coming from LinkedIn.
- Collibra - $250M series F in November 2021
- Alation - $110M series D in June 2021
- Atlan - $16M series A in May 2021
- Acryl Data - $9M seed in June 2021
- Stemma - $4.8M seed in June 2021
BI and data visualization
Grafana (operational dashboards), Jedox (financial planning & EPM) and ThoughtSpot (search-based BI) are well-known vendors in the space, raising significant amounts of additional capital. Upcoming startups Metabase and Preset (working on Apache Superset) focuses on open source data visualization software. Noteable offers collaborative data visualization notebooks.
- Grafana - $220M series C in August 2021
- Jedox - $100M in January 2021
- ToughtSpot - $100M series F in November 2021
- Preset - $35.9M series B in August 2021
- Metabase - $30M series B in August 2021
- Noteable - $21M series A in November 2021
Data Science, ML, AI
It's a very broad and active category so the list is just a sample of a few selected subcategories: General data science and ML platforms (Dataiku, DataRobot, H20.ai), MLOPS (Weights & Biases, Comet), data labeling and annotation (Scale AI, Snorkel, Sama) and synthetic data generation (Gretel, Tonic). Streamlit and Hex provides productivity tools for data scientists and Iterative works on DVC.
- Dataiku - $400M series E in August 2021
- Scale AI - $325M series E in April 2021
- DataRobot - $300M series G in July 2021
- Weights & Biases - $135M series C in October 2021
- Snorkel - $85M series C in August 2021 and $35M series B in April 2021
- H2O.AI - $100M series E in November 2021
- Sama - $70M series B in November 2021
- Comet - $50M series B in November 2021
- Gretel - $50M series B in October 2021
- Streamlit - $35M series B in April 2021
- Tonic.ai - $35M series B in September 2021
- Hex - $5.5M seed in March 2021 and $16M series A in October
- Iterative - $20M Series A in July 2021
Errors or omissions? Please drop me an email (email@example.com) or reach out on LinkedIn.