Tag Archives: 2021

Fundraising by data companies in 2021

2021 was quite an exciting year in terms of funding for both data startups and established companies. We tracked more than a hundred data-related funding events during the year.

Several vendor has raised more than once, most notably AirByte that went through three rounds ($5.2M, $26M and $150M).  Databricks was the largest fundraiser with a total of USD $2.6 billion in 2 rounds.

Click here for the hi-res version


General data platforms and infrastructure
This category mostly includes data companies not belonging to any of the more specific subcategories. Databricks leads with 2 mega-rounds, honorable mentions include Dremio (data lakes), Anyscale (working on scaling Python with Ray) and Coiled (working on scaling Python with Dask).

  • Databricks - $1.6B series H in August and $1B series G in February 2021
  • Dremio - $135M series D in January 2021
  • AnyScale - $100M series C in December 2021
  • StreamNative - $23M series A in September 2021
  • Treeverse - $23M series A in July 2021
  • Coiled - $21M series A in June 2021

Databases and SQL engines
This is our largest category due to the many different flavours of databases  and SQL & NOSQL engines. The graph database segment was quite active with Neo4j and  TigerGraph raising large rounds, followed by ArangoDB and MemGraph. Funding for analytic SQL engines and cloud DW platforms included Clickhouse, Firebolt, Starburst (Presto), Imply (Druid), StarTree (Apache Pinot),  and Ahana (Presto). Other honorable mentions include Redis Labs (in-memory DB), distributed SQL vendors Yugabyte and Cockroach Labs, and time-series specialist TimeScale and QuestDB.

  • Neo4j - $325M series F in June 2021
  • Clickhouse - $250M series B in October 2021 and $50M series A in September
  • YugaByte - $188M series C in October 2021 and $48M series B in March 2021
  • Cockroach Labs - $160M series E in January 2021
  • Firebolt - $127M series B in June 2021
  • Redis Labs - $110M series G in April 2021
  • TigerGraph - $105M series C in February 2021
  • Starburst - $100M series C in January 2021
  • SingleStore - $80M series F in September 2021
  • Imply - $70M series C in June 2021
  • TimeScale - $40M series B in May 2021
  • ArangoDB - $27.8M series B in October 2021
  • Startree - $24M series A in May 2021
  • Ahana - $20M series A in August 2021
  • QuestDB - $12M series A in November 2021
  • MemGraph - $9.3M seed in October 2021

Data Integration, ETL and Reverse ETL

Cloud-based ETL was all the rage, where Fivetran raised the largest round, followed by Matillion and Hevo Data.   In the open source segment Airbyte has gone from a $5.2M seed round in March to a $150M series B December in ten months and Meltano spun off Gitlab raising a seed round. Reverse ETL vendors Hightouch and Census raised early rounds. Prefect and ElementEtl (the company behind Dagster) focus on data orchestration and workflow management.



  • Fivetran - $565M series D in September 2021
  • AirByte - $5.2M seed in March 2021, $26M series A in May, $150M series B  in December
  • dbt Labs - $150M series C in June 2021
  • Matillion - $100M series D in February 2021
  • Hightouch - $12M series A in July 2021, $40M series B in November 2021
  • Prefect - $11.5M series A in February 2021, $32M series B in June 2021
  • Hevo Data - $30M series B in December 2021
  • Census - $16M series A in February 2021
  • ElementETL - $14M series A in November 2021
  • Meltano - $4.2M seed in June 2021

Data Quality and observability

Data observability, data reliability and data quality vendors attracted a lot of attention in 2021. Monte Carlo and Big Eye raised twice within the year, and there are several other startups raising their first rounds.

  • Monte Carlo - $60M series C in August and $25M series B in February
  • BigEye - $45M series B in September 2021 and $17M series A in April 2021
  • AccelData - $35M series B in September 2021
  • Anomalo - $33M series A in October 2021
  • Datafold - $20M series A in November 2021
  • Soda - $13M series A in February 2021

Data governance, metadata, data catalogs

Collibra and Alation, both well established players in the data governance segment have raised significant rounds. There also new entrants raising their first rounds of capital: Stemma works on Amundsen which originates from Lyft, while Acryl Data focuses on the DataHub project coming from LinkedIn.

  • Collibra - $250M series F in November 2021
  • Alation - $110M series D in June 2021
  • Atlan - $16M series A in May 2021
  • Acryl Data - $9M seed in June 2021
  • Stemma - $4.8M seed in June 2021

BI and data visualization

Grafana (operational dashboards),  Jedox (financial planning & EPM) and ThoughtSpot (search-based BI) are well-known vendors in the space, raising significant amounts of additional capital. Upcoming startups Metabase and Preset (working on Apache Superset) focuses on open source data visualization software. Noteable offers collaborative data visualization notebooks.

  • Grafana - $220M series C in August 2021
  • Jedox - $100M in January 2021
  • ToughtSpot - $100M series F in November 2021
  • Preset - $35.9M series B in August 2021
  • Metabase - $30M series B in August 2021
  • Noteable - $21M series A in November 2021

Data Science, ML, AI

It's a very broad and active category so the list is just a sample of a few selected subcategories: General data science and ML platforms (Dataiku, DataRobot, H20.ai), MLOPS (Weights & Biases, Comet),  data labeling and annotation (Scale AI, Snorkel, Sama) and synthetic data generation (Gretel, Tonic). Streamlit and Hex provides productivity  tools for data scientists and Iterative works on DVC.

  • Dataiku - $400M series E in August 2021
  • Scale AI - $325M series E in April 2021
  • DataRobot - $300M series G in July 2021
  • Weights & Biases - $135M series C in October 2021
  • Snorkel - $85M series C in August 2021 and $35M series B in April 2021
  • H2O.AI - $100M series E in November 2021
  • Sama - $70M series B in November 2021
  • Comet - $50M series B in November 2021
  • Gretel - $50M series B in October 2021
  • Streamlit - $35M series B in April 2021
  • Tonic.ai - $35M series B in September 2021
  • Hex - $5.5M seed in March 2021 and $16M series A in October
  • Iterative - $20M  Series A in July 2021


Errors or omissions? Please drop me an email (bence@adat.blog) or reach out on LinkedIn.

Online adatkonferenciák márciusban és áprilisban


Microsoft Ignite
2021.március 2-4. A Microsoft éves konferenciája fejlesztők és IT területen dolgozók számára. A részvétel ingyenes, regisztrációhoz kötött.

Reinforce Conference
2021.március 3-5. Az Ericsson nemzetközi online AI konferenciája, számos magyar előadóval. Részvételi díj 195 euro-tól.

Women in Data Science CEE
2021. március 8. A Nőnapon megrendezésre kerülő esemény több, mint 60 Data Science területen dolgozó nő megszervezésével létrejött konferencia. A részvétel ingyenes.

Business Intelligence Conference
2021. március 9. Az IIR Hungary online BI konferenciája. Részvételi díj 99.000 Ft + ÁFA.

Great Lakes Data & Analytics Summit 2021
2021. március 9-11. A WIT amerikai tanácsadó cég által megrendezett konferencia. A részvétel ingyenes.

AI Accelator Festival
2021.március 16-19 Az AI Accelerator Institute online konferenciája. A részvételi díj 99$.

DataOps Unleashed
2021.március 17. A konferencia fő témája az adatvezérelt rendszerek, találkozhatunk AI/ML előadásokkal is. A részvétel ingyenes.

Data Festival Online
2021. március 24-25.  A BARC online adatkonferenciája. A részvétel ingyenes.

2021. április

Wrangle Summit
2021.április 7-9. A Google Cloud és a Trifacta első közös adatkonferenciája, data engineering témában. A részvétel ingyenes.

Python for ML and AI GLOBAL SUMMIT'21
2021.április 8-9. A geekle Python tematikájú rendezvénye, Machine Learning és AI területen. Az előadások ingyenesen is megtekinthetőek.

2021.április 8-9. A Graphic Hunters online adatvizualizációs konferenciája. A részvételi díj 45 euro + 3,38 euro.

Informatica World 2021
2021.április.13-15. Az Informatica online adatkonferenciája, AI, Cloud és Data témában. A részvétel ingyenes.

Budapest ML Fórum
2021. április 15. A data science, gépi tanulás és AI alkalmazásának és technológiájának konferenciája. Az Early bird jegyek ára 28.000 Ft + ÁFA.

Online adatkonferenciák januárban és februárban

2021. január

2021.január 21. Az RStudio online konferenciája, a részvétel ingyenes.

Subsurface Live
2021.január 27-28. A Dremio online cloud data konferenciája, a részvétel ingyenes.

2021. február

2021.február 3-4. A MicroStrategy első online konferenciája. Az általános jegyek ingyenesek, viszont Premium Pass vásárlására is van lehetőség.

2021. február 4-5. A Data Visualization Society adatvizualizációs konferenciája.  A standard jegyárak 49 és 299 dollár között vannak, egyedi kérésre ingyen belépők is elérhetőek.

2021. február 9-10. A Presto elosztott SQL motorról ismert Starburst Data rendezvénye.  A  részvétel ingyenes.

AWS Innovate
2021. február 24. Az Amazon és az Intel közös virtuális konferenciája AI & Machine Learning témában. A részvétel ingyenes.