PeekAnalytics Scales Social Media Analytics with ClustrixDB

PeekAnalytics Chooses Clustrix for Scaling Social Media Analytics

Most social media analytics companies focus on unique presentations of a platform’s own data. When it comes to Twitter, only PeekAnalytics can analyze your network and provide additional insights about your followers–or even people tweeting specific types of content beyond what itself can provide. PeekAnalytics can correlate the age, gender, income level, hobbies/interests, job title, employment, residence, school, daily online usage, network of friends and followers, and much more – to specific Twitter events and trends.

PeekAnalytics analyzes most of this information about millions of Twitter users individually, but it also aggregates customers’ need to see the bigger pictures. This results in comprehensive reporting, which then partitions social audiences into demographic, firmographic, and psychographic segments.

PeekAnalytics data is powered by proprietary technology developed over the past seven years. The data capture programs analyze billions of web pages, carefully matching the content of each to the identity of the person who created it or of the person it is about. For every Twitter account in your social audience, PeekAnalytics checks to see what is known about an author from up to 60 other social networks and every major blog hosting platform. In the aggregate, this yields the most well rounded view of one person or firm’s social audience.

Why ClustrixDB?

Real-time data availability is key. PeekAnalytics constantly adds/updates/reorganizes information belonging to a particular profile, and that new information is made available immediately for querying. These profiles are constantly updated with information not only from Twitter, but also from up to sixty other social sites. They aren’t querying against a static set of data that updates once per day, for example – this simply would not be acceptable to support the business.

Pavel Baranov, CTO at PeekAnlaytics said they chose ClustrixDB because:

  • Drop in replacement for regular MySQL. It is more like a MySQL Cluster replacement really.
  • Ability to shard at the database level. This allowed us to scale beyond one server without rewriting the code and/or schema adjustments.
  • Ease of deployment and administration.
  • Personalized support.

Baranov states, “We tested multiple storage engines including MySQL Cluster, Cassandra, and some others – none of them worked for our needs/purposes. While MySQL Cluster claims to be production, ready we encountered numerous problems including nodes completely failing and going offline with 500+ concurrent connections. We submitted multiple bugs to MySQL two months ago – still no reply. Cassandra was a very strong candidate but its CQL v3 limitations eventually made us turn away from it. We kept ‘hacking’ around the limitations of the engine but at the end it was not worth administration time that would’ve been added.”