CSCI 544 — Applied Natural Language Processing

Final paper: NLP prototype design proposal

Updates

[2020-11-21] Due date extended by two days to November 26.
[2020-11-09] Added some notes about the assignment.

Due: November 26, 2020 (extended from November 24)

Every student will receive a personalized submission link through Crowdmark. Do not share the link with others: it is linked to your email. The completed assignments will be accepted only through the online system.

Overview

The prototype design proposal is an in-depth activity, where students describe the design for a future natural language processing application. The proposal can be on any aspect of natural language processing, for any human language. You will formulate a research question, identify potential resources and methods to address the question, describe a procedure for evaluating the prototype.

The design proposal must be a new effort, conducted specifically for this class. You may build on and extend previous research, but this proposal needs to add to that research, not just reuse old material.
The design proposal is an individual assignment. You may not work in teams, or collaborate with other students or authors. You must be the sole author of 100% of the work you turn in.

Proposal structure

The proposal should be a document of about 1500 words, written in English in good academic style. Proposals that substantially exceed this length (above 1600 words) will be penalized. The structure of the document should be as follows.

Title for the proposal.
Name, USC ID, and USC email of the author.
Introduction. Motivate a specific problem that you will consider, which involves the use of real-world human language data. Describe the problem you are trying to solve, why it is interesting or challenging, and what applications it might have. If possible, explain the linguistic insights that the proposal is trying to capture and make use of. Note: There is no need to motivate Natural Language Processing in general, but rather your specific application.
Method.
- Materials. Identify a potential source of data that you may use, such as a specific corpus that you can get access to or collect yourself. Describe the data in some detail, including the source, the amount of data, what kinds of annotation it has or needs, and what effort it might take to obtain the data and adapt it for the purpose of the prototype.
- Procedure. Describe the experimental procedure, that is what models and methods can be used to process the data. This may include algorithms, features, and tools, and any required annotations. Well-known methods (such as Naive Bayes or Convolutional Neural Networks) do not need to be explained, but you do need to explain how you use them, for example the features you choose.
- Evaluation. Describe a process for evaluating the system’s performance, and the annotation procedure (if needed). What measures can be used? What baseline can the system be compared to?
Discussion. Discuss implications of your design: how might it compare to other possible solutions (either from the literature or an alternative design), what advantages your proposed designs have, what shortcomings, and how these affect potential use.
References cited. There is no need to cite sources for well-known methods. You should cite sources from which you borrow specific ideas, but the focus of the paper should be your original design, not a literature survey (if your design does not use specific ideas from existing papers, you may not need to cite any references).
Word count for the document, excluding references.

If you need to give examples of text in languages other than English, please use the following multi-line format, to make the examples readable to English speakers. Below is an example for how to present a sentence in Hindi.

किस	ने	दवाई	को	खरीदा	(the original text in its native script)
kis	ne	davaaii	ko	khariidaa	(a transcription into Latin script)
who	ERG	medicine	ACC	bought	(a word-by-word gloss)
‘Who bought the medicine?’					(a translation into English)

The explanations on the right (in parentheses) are part of the instructions: they do not need to be repeated with the example. The second line (transcription into Latin script) is not needed if the language natively uses a version of the Latin script.

Grading

The grade for the assignment will be broken down as follows.

10% Originality and innovativeness.
15% Motivation: a clearly articulated problem with real-world implications.
15% Materials: appropriate choice of data that is feasible to acquire or collect, and can help with a solution to the problem.
15% Model: appropriate choice of methods and models to reason or learn from the data.
15% Evaluation: appropriate choice of a procedure to assess the proposed solution.
15% Analysis: Critical evaluation of the proposal, alternative solutions, and potential implications.
15% Quality and clarity of writing.

The prototype design proposal counts for 20% of the overall course grade.

Notes

The following are edited versions of responses to student questions about the assignment.

The application does not need to fall into the research areas discussed in class; it can be any problem that involves the use of real-world human language data.
The constraint is on the problem (human language processing), not the model. Any suitable model is fine, whether discussed in class or not.
Originality and innovativeness refer to the solution, not the problem. There are many problems that nobody has tried before, but if an approach is standard (for example, label some data and train an LSTM to replicate these labels), then the paper will not score high on originality and innovativeness. Conversely, a new and creative solution to a known problem could be considered original and innovative. An important aspect is for the proposed solution to capture some insight into the nature of the problem, and offer a way to operationalize that insight.
For certain applications, there may be practical and ethical considerations about using an automated model for the proposed application in the real world; such considerations should be highlighted in the discussion section.
The starting point should be the motivation for the problem that the proposal is trying to address: What is the proposal trying to do? Why would this be a useful thing? The motivation should guide the development of the rest of the paper. How can we tell if an application is good for what it is trying to achieve? What data, procedure, and measurements are needed to assess if the idea is suitable for the purpose? What are the advantages and disadvantages of the proposed design for this particular purpose? All of these should go into the paper.
A proposal for an evaluation method, which takes the output of an NLP process and provides a quality assessment, should follow the same structure as outlined above: proposing a model (in this case an evaluation model), and using some language data, an experimental procedure, and and evaluation method to assess the model. The discussion section should talk about expected advantages and disadvantages of the proposed evaluation model.