New York-based startup Materialize on Monday unveiled a streaming, distributed database as a managed service, offering the software to existing customers prior to general availability.
The company launched the initial version of its namesake software two years ago as a single binary designed to input data from Kafka, allowing users to use standard SQL to query and join streaming data.
Now the company—which was founded in 2019 and has raised about $100 million from investors such as Lightspeed, Kleiner Perkins and Redpoint—says it has incorporated a scalable storage layer into the software and is offering it on a database-as-a-service (DBaaS) model. The revamped software is available to current customers; the company has not yet announced a timeframe for general availability.
A distributed database is one that executes on multiple clusters in multiple data centers, yet acts as one logical database.
What is a streaming database?
A streaming database, according to Materialize, captures streamed data from different sources and runs compute to answer different queries.
The idea is that Materialize is making it easy for enterprise users to connect the database to a data stream or streams, said IDC research vice president Carl Olofson.
“Streaming database is a bit of a misnomer since the database itself doesn’t stream, but it executes quickly enough to be able to capture streaming data as it arrives,” Olofson said.
The announcement comes at a time when enterprises are looking to analyze more and more data in an effort to chart a strategy to become resilient in the face of economic headwinds and geopolitical uncertainty, leading to an increase in online analytical processing (OLAP) queries, a feature that the company’s database claims to support at lesser cost than databases that offer batch processing systems.
The reduction in cost is made possible by two computational frameworks within the database, said Seth Wiesman, director of field engineering at Materialize. These are Timely DataFlow, a framework for managing and executing data-parallel dataflow computations, and Differential DataFlow—another data-parallel programming framework, designed to efficiently process and respond to changes in large volumes of data.
Latency, and cost advantage over batch processing
Typically, in order to generate an answer to a query, a batch processing system runs through all data that has been input into a system, making it expensive in terms of compute, and also maing the query less of a real-time process.
By contrast, Materialize, using its computational frameworks, can run a query (or “view” in database parlance), cache it in the form of Materialized Views, detect any incremental change to the user’s dataset—rather than re-analyzing the entire data set—and update the query result, Wiesman explained.
As users create tables, sources, and materialized views, and introduce data to them, the DBaaS version of Materialize will record and maintain that data, and make both snapshots and update streams immediately available to all computers subscribing to the service, according to the company.
“Enterprise users may either query the results for fast, high-concurrency reads, or subscribe to changes for pure event-driven architectures,” said Wiesman.
The managed distributed database service, in its present iteration, uses Amazon Web Services (AWS) S3, the company said, adding that support for native object store across major cloud providers is expected soon.
Support for PostgreSQL
Materialize’s interface, according to the company, is PostgreSQL-compatible and comes with full ANSI SQL support.
In contrast to generic data systems that need programming for data capture, Materialize’s DBaaS comes with a dataflow engine that requires no or negligible functional programming, the company said.
Enterprise users can model a SQL query as a dataflow that can take in a change data capture stream, apply a set of transformations to it, and then display the final results, it added.
The most common data system used for streaming data capture, Redis, according to Olofson, puts a burden of programming on the enterprise user as it comes with no schema or query language.
“There are two products to look at as potential competitors: SingleStore (which is a memory optimized for relational databases used for streaming data capture among other things) and CockroachDB,” Olofson said, adding that Hazelcast can also be considered a rival as it uses an in-memory data sharing platform that has been adding query capabilities to its feature list.
Materialize said it follows the Snowflake pricing model: companies purchase credits to pay for the software on a usage basis. The price of credits is based on where users are located, Wiesman said.
Copyright © 2022 IDG Communications, Inc.