In this overview and demo, we will show you what sentiment analysis is and how to build a quick mashup that combines real-time access to multiple data sources using tools from Microsoft.
Sentiment analysis is one of the hottest topics in the Big Data space. Sentiment analysis is the process of analyzing customer comments and feedback from Facebook, Twitter, Email, and more. The purpose of the analysis is to understand the overall sentiment the customer is trying to convey. This could be a negative sentiment, when the customer is unhappy with a company or its product. Neutral sentiment, when the customer is only mentioning a company or product, in passing, without a good or a bad feeling. The last is positive sentiment, when a customer is happy or excited about a company or its product.
Traditionally sentiment analysis was complicated because it required a mixture of very complex platforms and tools. Each component required for sentiment analysis was offered from a different company and required a large amount of custom work. The difficulty is further exasperated by hard-to- achieve business requirements. When we discuss sentiment analysis there are 3 key business requirements we see repeated:
- Real-time access
- Full granular data set (structured & unstructured)
- BI and SQL front-end
Real-time Access
In the case of real-time access, business users need access to fresh data. In the world of social media, customer sentiment can change rapidly. With images and videos quickly being posted with re-tweets and Facebook ‘like’ capabilities, a good or bad aspect of a company’s product can go viral in minutes. Business users need to have the ability to analyze data as it comes in, in real-time. We will show in our overview video and demo, how we can utilize Microsoft’s StreamInsight technology for real-time data analysis and complex-event processing.
Full Granular Data Set
In the case of full granular data, in practice we have seen that using a traditional database system can hinder development. This is because a lot of the data that comes in for sentiment analysis such as email, is in a semi-structured or unstructured format. This means the data is not easily modeled into a database. The data does not come in a simple row/column format. Thus we utilize our Big Data technology that is meant for this type of data: HDInsight (Hadoop). HDInsight is essentially Hortonworks Data Platform running on Windows. In our case we utilize HDInsight to land all of the data, in its raw original format, into the distributed file system HDFS. This allows us to ingest any kind of data, regardless of structure, and store that data online for further analysis at low cost. The Hadoop software is open-source and readily available.
BI and SQL Front-End
The most important area around delivering sentiment analysis to the business is access, making sure we are able to provide the data both in real-time (and high-fidelity) within the tools that our business users know and love. Previously when our customers were doing sentiment analysis on Hadoop systems, BI and SQL access was not available. This was not because the tools could not integrate with Hadoop systems. This was because they could not scale or have the same level of functionality. Some BI users have chosen Hive ODBC in Hadoop, which many claim to be slow and ‘buggy’. Instead here we utilize one of our flagship technologies: PolyBase. With PolyBase we expose the data in Hadoop, and relational SQL Server, with one T-SQL query. What this means is users can use BI tools like Excel, SSAS, or other 3rd party tools. They can then utilize PolyBase within Analytics Platform System (APS) to query that data either in Hadoop, or Parallel Data Warehouse (SQL Server), or mash up the data from both systems!
How It Works
Now we will show you how to use all of the tools from the SQL Server data platform to achieve sentiment analysis. This will allow you to quickly deploy and meet all 3 business requirements through a set of tools and platforms that are very easy to use, fully integrated, and ‘just work’ together.
Let’s get started with the first video (~5 minutes) where we present sentiment analysis using Microsoft technologies. We show you how sentiment analysis works, and how the Microsoft products fit. We then follow up by discussing the architecture in detail surrounding StreamInsight, HDInsight, and Analytics Platform System.
Watch the overview video:
Demo
In the second video (~7 minutes), we show you sentiment analysis in action. The demo will include a full sentiment-analysis engine running in real-time against Twitter data along with a web dashboard. We then stream Twitter data to both HDInsight and Parallel Data Warehouse. Finally, we end the demo by showcasing PolyBase, our flagship technology. With Polybase we can do data mashups combining data from relational and non-relational systems. We will use Polybase to write standard T-SQL queries against this data to determine tweet analytics and how social sentiment is fairing for our marketing campaigns and products.
Watch the demo video: