BLOG
Wissenstransfer von IT-Spezialisten
| |

Improved Datamining using Azure Machine Learning #AzureML – Part 3 – Build the prediction model

Build the prediction model

Using “Setup Web-services / Predictive Web Service (Recommended)” AzureML creates the Predictive Experiment based on the training model.

Web Service Input” and “Web Service Output” are added to the predictive experiment as well in order to display input (Tweets) and output (Forecast).

Between “Sore Model” and “Web Service Output” the function “Project Columns” is inserted and configured. This provides a way to only chose ‚Yes‘ or ‚No‘ (Label(s)) and the probability value (score(s)) being displayed in the result of the predictions. Otherwise it contains all columns of the hash table as well as the input values.

The prediction experiment must be started hitting “RUN” and deployed using “DEPLOY WEB SERVICE”.

Test the Web service

A prediction can be started by clicking on “Test” in the Web GUI. The model expects the fields ID, FROM, DATE, RETWEETS, FAVORS, TWEET. That may be irritating but is due to the structure of the training model. Required are only Tweeter (FROM) and Text (Tweet). All the other fields are not relevant for the predictive experiment.

The result of the prediction will be shown at the bottom of the studio. For this test, the tweet is classified as not positive. The probability (Score) for the Tweet being positive is only 0.37%.

AzureML provides a Web API. Using this API you can send Tweets to the experiment in order to classify them. An Example code is available in AzureML Studio (click on REQUEST/RESPONSE). The displayed API key is used to authenticate the web job towards the Web Service.

Web job

The Web job is built with Visual Studio 2013/2015 as a c# console application. The Web job runs every hour and sends all unclassified Tweets from the project “Twends” to the predictive experiment. The results are written to the “Twends” database.

 The Web job contains the following relevant methods for machine learning:

1

PredictLastTweets(Int64 Since, int Last)

Starts the forecast of all Tweets with ID > Since. The maximum count of Tweets for prediction is “Last”.

1

GetTweetFromDB (Int64 TweetID)

Extracts data out of a Tweet identified by TweetID (for example: Tweeter, date, text contained in the Tweet).

1

GetPrediction(string Tweeter, string Tweet)

Connects to the AzureML API, sends the name of the Tweeter and the text. The classification and its probability is returned. Example code for c# is provided by Microsoft in AzureML studio.

1

SetPredictionToDB(Int64 TweetID, sPrediction Prediction)

Stores the result of the prediction into the database.

Note on using the AzureML API (in a non US/EN environment):

The transmission of data is secured via TLS. Since data are transmitted in json format and AzureML expects a US notation of the formats and results, data transformations have to be done at various points if not using US locale. For example, I have to use CultureInfo. InvariantCulture.NumberFormat in the conversion of the return value of the probability (score).

Azure SQL database

The Azure SQL database  is the central data storage for the project “Twends”.

In order to predict relevant tweets another table (Prediction) was added. It contains the prediction results of the analyzed Tweets.

The prediction result is associated with the unique ID of its Tweet. ScoredLabel is its classification   and ScoredProbability the probability that the result is “Yes” (<0.5 results in “No”).

 Azure Web App

The presentation tier was expanded by a new web page to display the positively rated Tweets.

The new web page „Prediction.aspx“ was created using Visual Studio 2013/2015 and utilizes database elements to access to the Azure SQL database:

  • SQLDataSource
  • DataList
  • Repeater

 It represents the last 30 positively evaluated tweets:

1

SELECT TOP 30. Prediction.ID, Tweets. [From], Tweets. [Date], Tweets.Tweet, Tweets.Retweets, Tweets.Favors, [ScoredLabel], [ScoredProbability] FROM [prediction] LEFT JOIN [tweets] ON Prediction.ID = Tweets.ID where ScoredLabel = ‚Yes‘ order by Tweets.ID DESC;

<- Back