Real-time analytics is analytics on your live operational database, up to date to the current moment. ClustrixDB allows real-time insights into your business and fast, current reports for your business and self-serve customers.
You might be an e-commerce company that wants
to know which offers are increasing your revenue
during the Black Friday sale.
Or you might be an ad company trying to tune your strategies for advertising, and the last-day results from data warehouses such Hadoop put you at a competitive disadvantage.
Industry Trend: In-Memory Analytics
Clustrix believes that in-memory computing gives you a great advantage.
A wave of in-memory analytics products such as SAP HANA and MemSQL, is hitting the market, claiming that putting all your data in memory is the way to go. Oracle with Oracle 12c has entered the fray, as well.
Clustrix believes that in-memory computing gives you a great advantage. However, we realize that putting all your data into memory can be very expensive. Also, if all your data is in-memory, you need a second durable database and ETL in the in-memory system. Engineers at Clustrix have noticed that most queries–even in demanding areas such as real-time bidding–take tens of milliseconds and SSD read speeds are in microseconds, allowing us to keep the host data in memory and the less frequently used data a few microseconds away in SSDs. This approach gives our customers the durability and speed they need at the right cost, allowing them to use a single database for data ingestion and analytics.
High Performance, Real-Time Analytics: Reality or Myth?
Why ClustrixDB: Excels at Fast Real-Time Analytics
ClustrixDB is built with two key features that allow it to run fast real-time analytics on the database while ingesting massive volumes of data.
1Massively Parallel Processing (MPP)
Clustrix brings the massively parallel processing used in data warehouses to the primary database. ClustrixDB uses multiple cores on a single node and multiple nodes in parallel to make your analytic queries and reports go faster. The more nodes you add, the faster your analytics get. ClustrixDB does distributed processing for joins and aggregates, as well. When you have tables with billions of rows, such as a 15-table join and a 6-way aggregate, pulling all the data to a single node is just not feasible. ClustrixDB evaluates joins on all nodes in parallel and does partial distributed aggregation of data on each node. This capability minimizes data movement and maximizes parallel processing, allowing ClustrixDB to get faster as you add nodes and to scale as you run more analytic queries.
2Distributed Multi-Version Concurrency Control (MVCC)
ClustrixDB uses distributed multi-version concurrency control so that your reads and writes do not interfere with each other. Your reads and analytics see a consistent snapshot of the database when they come in. Any writes to the data will write newer versions, and they do not have to wait for the reads and analytics to finish. This approach removes all interference between reads and writes making both of them scale. The data seen by the analytic queries is always consistent so your reports and analytics are reliable.
With the move from expensive and limited scale-up to scale-out, more resources such as processors, memory and storage can be added cheaply to a cluster, using commodity off-the-shelf hardware. Your primary operational database can now have the resources not just for simple reads, writes, and updates, but also for increasingly complex analytics.