Second-language speakers (L2) struggle to learn the conventional ways of language use in their second language. Among the hardest linguistic properties to learn in L2 is proper use of prepositions, such as, for, to, and in. In this project, you will model the knoweldge of prepositions as a KNN clustering problem. The goal of the project is to compare three language models, English as a native speaker (Egnlish-L1), Mandarin as a native language (Mandarin-L1), and Mandarin as native language with English as a second language (Mandarin-English-L2). The three models will be trained on data that represents their learning scenario and will be compared on the ability to choose the correct preposition for an English sentence.
To do, you will implement the following steps:
1. Pre-process training sentences.
2. Pre-process testing sentences.
3. Extract and combine semantic vector representation for training and teseting data.
4. Train a KNN model for three language models using the training sentences (English-L1, Mandarin-L1, and Mandarin-English-L2).
5. Test the model by predicting the correct preposition for a set of test sentences.
6. Report your results using statstical analysis methods.
You will provided with 4 files:
1. Sentence for English training
2. Sentence for Mandarin training (Mandarin data will be provided as English text already translated).
3. Word2Vec embeddings to represent the words and sentences in the data.
4. Test data - Multiple choice questions with the correct choice and choice-probability based on a human-subject study.
The data for the project is obtained through research agreement. It is protect with privacy agreements and cannot be latter used for used for industrial or personal reasons. You can use any code written in the course to complete the project. You are allow to use any code freely available through web-search to assist you with any of the steps.
The data for training is taken from CHILDES database (MacWhinney & Snow, 1990). CHILDES includes child-directed speech recorded, transcribed and annotated using data from caregivers, re- searchers, and children interacting in naturalistic settings. The data for training has the following form presented in Table 1.
In both language types, each sentence is written in a single line. Sentences are separated by an empty line. Words in a sentence are separated by a space, or a ‘ ’. The words in the sentence are given with their part-of-speech and the lemmatized form. The part-of-speech indicate the syntactic role of the word in the sentence. The lemmatize form removes any inflections from the word to get a canonical form, e.g. saw, seen, seeing would all be lemmatized into see. The part-of-speech is separated from the words using delimiter. The word may contain lexical information separated by ‘&’ or ‘-’, and in Mandarin, the translation separated with a ‘=’. For example, n toe PL stands for toes, v see&PAST stands for saw, and v zhao4xiang4=take a picture stands for take a picture.
Each sentence represents one input item for a specific prepositions.For training, you will need to identify the preposition represented by the sentence and extract all the relevant words. Notice that a sentence can include more than one preposition. You can identify the prepositions since they start with ‘prep‘ tag. You will need to use each sentence as a training example for all the prepositions in it. The cleaning process includes:
1. Identify all preposition in the sentence
2. Extract the individual words from the sentence using the delimiters ‘ ‘, and ‘ ’.
3. From the list of extracted words, remove words with any of the following parts-of-speech: ‘pro:rel’, ‘co’, ‘det:art’, ‘det:poss’, ‘neg’, ‘aux’, ‘mod’, ‘cop’, ‘cl’, and ‘cm’ (Think why this tags are not helpful in training).
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme