Improved Datamining using Azure Machine Learning #AzureML – Part 1 – Introduction
This is the English version of my German article on Microsoft Tech Hub: https://www.microsoft.com/germany/technet/case-studies/verbesserte-informationsbeschaffung-durch-azure-machine-learning.aspx
Sepago as a technology and consulting company located in Cologne, Hamburg and Munich stands out due to its expertise in Automation of Cloud and Application Infrastructure. Sepago’s IT infrastructure serving their more than 70 employees as well as external partners is provided through Microsoft Azure and additionally on premise servers, which make desktops and applications for notebooks, tablets and smartphones (Windows phone, Android and Apple) available via Citrix technologies. In addition to the LoB applications, Office 365 and Visual Studio are the main applications.
For the detection of IT trends Sepago uses an analysis procedure running in Azure, which analyses specific Tweets. This project called “Twends” has already been described at IT Pro hub (German): http://www.microsoft.com/germany/technet/case-studies/informationsbeschaffung-mithilfe-von-microsoft-azure.aspx. “Twends” is meant to be extended to directly highlight and show interesting/important tweets from the flow of information on Twitter.
- With this extension the following goals shall be achieved:
- Development of a model for machinable evaluation (AI = artificial intelligence)
- Automatic evaluation of incoming tweets in regards to “interestingness/importance”
- Provision of Tweets marked as particularly interesting/important for all user groups
- Reduction of information overflow
The biggest challenge consists in the evaluation of individual Tweets. A human individual can do this in a more or less unbiased way, but is not qualified here because he or she can certainly not process the flood of information in real-time. Moreover, the automated processing of data is also an important goal of the project.
Directly after having been created, a Tweet only consists of information about the author and the text itself, lacking information like retweets and favors at this time. For this reason the automation system must have AI (Artificial Intelligence) and make a decision on classification based on text and author only. Luckily Microsoft with Azure Machine Learning (AzureML) offers a solution that allows the construction of AI systems. AzureML offers administration, AI software and corresponding compute resources as a cloud service.
The incoming tweets are passed to the AzureML model using a web job. If AzureML classifies a Tweet as “positive”, it gets displayed on the project web site.
The prediction model created in AzureML is trained by 30,000 records of classified tweets. Basing on this data it is able to classify new tweets.
The classification “positive” or “negative” is based on the count of favors and retweets. ”Knowing” what is positive the prediction model then calculates the probability of a “positive” classification of new tweets and thus acts as an indicator for an important/interesting Tweet.
Only Tweets classified as positive are reflected and displayed on a web page this project makes use of.