Part 1. API data acquisition. Goal: Get Twitter data for weather search keyword
1. If you have not done so, register for Twitter account
2. Complete RapidMiner Twitter mining exercise: http://docs.rapidminer.com/studio/how-to/cloud-connectivity/twitter.html.
Make sure you are able to view search results.
Part 2. Iterate Twitter search
3. The main problem with this process is that the search is run only ones. Let us modify the search so that it runs every minute. We use Loop for that.
For example, these parameters will run loop 100000 times and results of the search will be saved.
Double-click “Loop”. Now you will specify what should be done each time Loop is run.
Search Twitter – the search in Twitter for keyword you need.
Append: append results of search to the results of the previous searches.
Remove duplicates: part of results of the new search will be same as in the previous search. Remove those duplicates. ID uniquely identifies a tweet, making it the best attribute for duplicate removal. All tweets with same IDs will be removed.
Write csv: save file
Delay: stop execution for a pre-set number of milliseconds (60000 1 minute).
Make sure your code runs. Do not return results.
Part 2. Improve Twitter search code
The search program we just completed has multiple shortcomings:
1. Important parameters of search are all over different operators. It will be hard to find them should you want to change your search;
2. If the search fails for some reason, the program will just stop. You will be losing data until you notice it stopped.
3. Data is being written to same file again and again; file becomes too large. Need to change file name every few iterations.
Solution:
1. Use Macros to keep parameters you are likely to change frequently
2. Use Exception handling to tell what to do if there is an error
3. Change file name every few iterations.
Build the NESTED process for Twitter search. We want to scrape from Twitter the tweets with hashtag #weather as they are coming. For that, every 1 minute we will scrape 1000 latest tweets with this word. Note that 1 minute is way too short; we are using that only for this lab. In a real life search you would use such short time period only for a highly popular event – e.g., during the Gators game. Additionally, we want to ensure that in a case of an error the process would not crush. Because of that, our process will have three nested levels.
Import the searchTwitter.xml process from today’s lab and follow explanations:
Level 1: set macros and initiate loop
Macros:
- delay time between search calls (ms)
- keyword for the search
- where to write search results
- Change file name every 100 iterations of search
Level 2: double-click Loop
Handle exception operator will contain all functionality of the search. After that, just delay until the next search.
Level 3: double-click Handle exception. The main functionality is hidden here.
Notice that there are two panes: Try and Catch.
Try part is what you try to do. Catch is what you should do if Try fails. Normally the program will stop – but we don’t want it; we want just skip this iteration and try again – that’s why you just connect In and Out in Catch.
Try part:
Branch will control result save. It contains the following condition:
mod(parse(%{iteration}),parse(%{fileWriteN}))!=0 - We will look into it later.
You know Search Twitter already.
Append will append new results to the old ones – but after you change writing to a new file, the old results will have to be thrown away. This will be done inside Branch.
Remove Duplicates will remove duplicated results.
Level 4. Double-click Branch
This is a very simple operator. If the condition is true, then just connect the input with output. That will channel results of the previous search to append operator.
If condition is false – notice that there are not connectors inside. Nothing is returned. However there is “generate Macro” operator. It will change name of the output file by adding the number of iteration to it and also add “.csv” extension. Then data will write to this file. After that the data is lost – nothing goes to Append operator on the previous level after we saved data to a file.
Let us return to the condition of branch.
mod(parse(%{iteration}),parse(%{fileWriteN}))!=0
mod is the function that returns remainder of division of two numbers. Those numbers are iteration number and the macro that we set at the beginning that tells how often we want to change file name. Those macros data type is text; parse change data type to numerical.
!= means “not equal”
That is, the left hand of equation will turn 0 when the division result is 0 – and then the right hand branch will execute. Otherwise, left hand will execute.
Modify the process to search for #Gators hashtag. Run the process. Return the first 10 lines of Twitter search results.
Part 4. Sentiment analysis with Lexicon approach.
Make sure you have Text Processing extension; if not – download it from Rapidminer site.
You will notice two sentiment analysis operators in different toolboxes: one in /Operator Toolbox/Text processing, another in / Text processing. These text processing toolboxes are different. One apparent difference is that the first “extract sentiment” operator works with examples (data format you were working with so far) while the second one works with documents. Think of documents as of examples with text. A special operator Data to Documents is required to transform examples to documents.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme