In class, you have learned distributed representations that try to encode each word as a vector in a multi-dimensional Euclidean vector space. Answer for the following questions:
(a)A naive encoding is to use |V |-dimensional representation for each word where V is the set of vocabulary. As given in the lecture, you can construct such representations by counting either the co-existence (0-1) or the co-occurrence (frequency) with other words in the data. Explain the benefits and limitations of these approaches.
(b)Suppose another encoding provides you vman = (0.3, 0.1, 0.4) and vwoman = (0.3, 0.1, −0.6). Compute the cosine similarity between man and woman as learned in the class. Interpret some of the three dimensions by comparing lexical semantics of these two words.
(c)Recall two word-vectors given in (b). Assuming vboy = (−0.7, −0.9, 0.3), guess the best word-vector for vgirl. Try to come up with vector operations that can evaluate vgirl in terms of addition/subtraction(s) of vboy, vman, and vwoman, thereby justifying your guess.
Word2vec is the most popular word-vector embedding that brings up innovations for various applications in Natural Language Processing. The following questions ask basic understanding about word2vec’s theoretical foundations.
(d)Given a target word t ∈ V and a context word c ∈ V , the skip-gram models the conditional probability p(c|t) by the following formula:
Explain why the exponential function is necessary and how this formula properly converts relationships in the vector space into a probability distribution.
(e)In class, we derive the partial derivative of the log-likelihood version of the above equation with respect to vt. Evaluate the partial derivative of it with respect to uc. In other words, compute ∂ log p(c|t;uc,vt) . Then explain how to learn word-vector embeddings.
Problem 2: Part-Of-Speech Tagging and Parsing [25 points]
In order for POS tagging, supervised dataset is generally required. Each example in the data consists of a sentence instance and the true label: tags that mark the true POS for each word. Once models learn from the data, then it can predict the most-likely POS tags of each word in unseen examples. Penn Treebank is the most popular supervised dataset.
(a)Machine learning models predict well the labels of instances if they are already seen during the training process. Construct a simple sentence which is a part of supervised dataset, but the learned POS tagger could incorrectly predict its POS tags when testing on the same sentence.
(b)All word to in the Penn Treebank is tagged simply as TO rather than as a precise POS. Explain potential problems by making your own examples that includes to. If you try to make an elaborated POS tagger that can distinguish different syntactic roles of to in your examples, what could you do?
Answer the following questions about parsing given the 9 rules: 1) S → NP VP; 2) S → VP;
3) NP → Det NP; 4) NP → Proper-Noun Noun; 5) VP → Verb NP; 6) Det → the; 7) Noun →
run | marathon; 8) Verb → run; 9) Proper-Noun → Chicago.
(c)Show a possible bottom-up parsing for the sentence: “Run the Chicago marathon”.
(d)Show a possible top-down parsing for the sentence: “Run the Chicago marathon”.
(e)If you only draw a parse-tree of the sample sentence used in (c) and (d), can you tell which derivation algorithm you used between the top-down and the bottom-up approaches?
Problem 3: Programming Project [100+20 points]
Word Sense Disambiguation (WSD) is a task to find the correct meaning of a word given context, which can be a building block for various high-level NLP tasks. As many words in languages have more than a single meaning, humans perform WSD with respect to various verbal and non-verbal signals. In this problem, you are going to implement a WSD system by using two different models: ontological model and supervised model. To start, read the English Lexical Sample Task written by Mihalcea, Chklovski and Kilgarriff in the following link.1
The data files are lightly preprocessed for the class project. They consist of training, validation, and test data provided with a XML formatted dictionary that describes commonly used senses for each word. Every lexical element in the dictionary contains multiple sense items, assigning one integer id per each sense.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme