Fundraising by data companies in 2021

2021 was quite an exciting year in terms of funding for both data startups and established companies. We tracked more than a hundred data-related funding events during the year.

Several vendor has raised more than once, most notably AirByte that went through three rounds ($5.2M, $26M and $150M).  Databricks was the largest fundraiser with a total of USD $2.6 billion in 2 rounds.


Click here for the hi-res version

 

General data platforms and infrastructure
This category mostly includes data companies not belonging to any of the more specific subcategories. Databricks leads with 2 mega-rounds, honorable mentions include Dremio (data lakes), Anyscale (working on scaling Python with Ray) and Coiled (working on scaling Python with Dask).

  • Databricks - $1.6B series H in August and $1B series G in February 2021
  • Dremio - $135M series D in January 2021
  • AnyScale - $100M series C in December 2021
  • StreamNative - $23M series A in September 2021
  • Treeverse - $23M series A in July 2021
  • Coiled - $21M series A in June 2021

Databases and SQL engines
This is our largest category due to the many different flavours of databases  and SQL & NOSQL engines. The graph database segment was quite active with Neo4j and  TigerGraph raising large rounds, followed by ArangoDB and MemGraph. Funding for analytic SQL engines and cloud DW platforms included Clickhouse, Firebolt, Starburst (Presto), Imply (Druid), StarTree (Apache Pinot),  and Ahana (Presto). Other honorable mentions include Redis Labs (in-memory DB), distributed SQL vendors Yugabyte and Cockroach Labs, and time-series specialist TimeScale and QuestDB.


  • Neo4j - $325M series F in June 2021
  • Clickhouse - $250M series B in October 2021 and $50M series A in September
  • YugaByte - $188M series C in October 2021 and $48M series B in March 2021
  • Cockroach Labs - $160M series E in January 2021
  • Firebolt - $127M series B in June 2021
  • Redis Labs - $110M series G in April 2021
  • TigerGraph - $105M series C in February 2021
  • Starburst - $100M series C in January 2021
  • SingleStore - $80M series F in September 2021
  • Imply - $70M series C in June 2021
  • TimeScale - $40M series B in May 2021
  • ArangoDB - $27.8M series B in October 2021
  • Startree - $24M series A in May 2021
  • Ahana - $20M series A in August 2021
  • QuestDB - $12M series A in November 2021
  • MemGraph - $9.3M seed in October 2021

Data Integration, ETL and Reverse ETL

Cloud-based ETL was all the rage, where Fivetran raised the largest round, followed by Matillion and Hevo Data.   In the open source segment Airbyte has gone from a $5.2M seed round in March to a $150M series B December in ten months and Meltano spun off Gitlab raising a seed round. Reverse ETL vendors Hightouch and Census raised early rounds. Prefect and ElementEtl (the company behind Dagster) focus on data orchestration and workflow management.

 

 

  • Fivetran - $565M series D in September 2021
  • AirByte - $5.2M seed in March 2021, $26M series A in May, $150M series B  in December
  • dbt Labs - $150M series C in June 2021
  • Matillion - $100M series D in February 2021
  • Hightouch - $12M series A in July 2021, $40M series B in November 2021
  • Prefect - $11.5M series A in February 2021, $32M series B in June 2021
  • Hevo Data - $30M series B in December 2021
  • Census - $16M series A in February 2021
  • ElementETL - $14M series A in November 2021
  • Meltano - $4.2M seed in June 2021

Data Quality and observability

Data observability, data reliability and data quality vendors attracted a lot of attention in 2021. Monte Carlo and Big Eye raised twice within the year, and there are several other startups raising their first rounds.

  • Monte Carlo - $60M series C in August and $25M series B in February
  • BigEye - $45M series B in September 2021 and $17M series A in April 2021
  • AccelData - $35M series B in September 2021
  • Anomalo - $33M series A in October 2021
  • Datafold - $20M series A in November 2021
  • Soda - $13M series A in February 2021

Data governance, metadata, data catalogs

Collibra and Alation, both well established players in the data governance segment have raised significant rounds. There also new entrants raising their first rounds of capital: Stemma works on Amundsen which originates from Lyft, while Acryl Data focuses on the DataHub project coming from LinkedIn.

  • Collibra - $250M series F in November 2021
  • Alation - $110M series D in June 2021
  • Atlan - $16M series A in May 2021
  • Acryl Data - $9M seed in June 2021
  • Stemma - $4.8M seed in June 2021

BI and data visualization

Grafana (operational dashboards),  Jedox (financial planning & EPM) and ThoughtSpot (search-based BI) are well-known vendors in the space, raising significant amounts of additional capital. Upcoming startups Metabase and Preset (working on Apache Superset) focuses on open source data visualization software. Noteable offers collaborative data visualization notebooks.

  • Grafana - $220M series C in August 2021
  • Jedox - $100M in January 2021
  • ToughtSpot - $100M series F in November 2021
  • Preset - $35.9M series B in August 2021
  • Metabase - $30M series B in August 2021
  • Noteable - $21M series A in November 2021

Data Science, ML, AI

It's a very broad and active category so the list is just a sample of a few selected subcategories: General data science and ML platforms (Dataiku, DataRobot, H20.ai), MLOPS (Weights & Biases, Comet),  data labeling and annotation (Scale AI, Snorkel, Sama) and synthetic data generation (Gretel, Tonic). Streamlit and Hex provides productivity  tools for data scientists and Iterative works on DVC.

  • Dataiku - $400M series E in August 2021
  • Scale AI - $325M series E in April 2021
  • DataRobot - $300M series G in July 2021
  • Weights & Biases - $135M series C in October 2021
  • Snorkel - $85M series C in August 2021 and $35M series B in April 2021
  • H2O.AI - $100M series E in November 2021
  • Sama - $70M series B in November 2021
  • Comet - $50M series B in November 2021
  • Gretel - $50M series B in October 2021
  • Streamlit - $35M series B in April 2021
  • Tonic.ai - $35M series B in September 2021
  • Hex - $5.5M seed in March 2021 and $16M series A in October
  • Iterative - $20M  Series A in July 2021

 

Errors or omissions? Please drop me an email (bence@adat.blog) or reach out on LinkedIn.

Fundraising by data companies in 2021 – July and August

July and August were again quite active months, with 13 notable fundraising events. The list includes the 1.6 billion raise by Databricks and large rounds for Dataiku ($400M), Datarobot ($300M) and Grafana Labs ($220M).

It's interesting to see that the 2 major open source data visualization companies raised funds during the period: Preset (creators of Apache Superset, $39.5M) and Metabase ($30M).

It's also worth mentioning the launch of Voltron Data by Wes McKinney and Josh Patterson to further develop the Apache Arrow ecosystem and the $200M Couchbase IPO.

 


(hi-res version)

Data Platforms and Query Engines:

BI/Dataviz:

ETL, Data Quality and Observability:

AI  and Machine Learning:

 

Fundraising by data companies in 2021 – Q1 & Q2

There was a huge influx of capital into the data technology sector in the first half of 2021. Databases, platforms, query engines, data quality and governance tools were all popular.

The orange color denotes the top 5 raises (DatabricksNeo4jCockroach Labsdbt Labs (formerly Fishtown Analytics)Dremio) and the small markers show the (rudimentary) category for each company.

The chart now contains 29 fundraising events with $2.83 billion in total funding, not including the $828M Confluent IPO and the $5.3 billion Cloudera buyout.

 

Data Platforms and Query Engines:

Databases:

Metadata and datacatalogs:

Data Quality and Observability:

ETL and workflow:

Other: 

Fundraising by data companies in 2021 from July

The Python Dataviz Landscape talk at EuroPython 2020

I gave a talk on the main Python data visualization libraries at the EuroPython 2020 Online conference.

The (slightly updated) presentation is here and the code examples are in this Google Colab notebook.

 

PyData Budapest #5 – Dataviz Evolution

We are running the second event in our PyData Evolution meetup series on June 30, covering the quickly changing Python Data visualization landscape.

Meetup event page: www.meetup.com/PyData-Budapest/events/270866403

1) Philipp Rudiger: Holoviews/hvPlot & Panel

Materials

Useful links:

2) Nicolas Kruchten: Plotly Express & Dash

Materials

Useful links:

3) Maarten Breddels and Martin Renou : Voilà

Materials

More useful links: