University of Southern California

CSCI 544 — Applied Natural Language Processing

Written paper: NLP prototype design proposal

Updates

Due: April 11, 2021

Students with presentations on April 12 or 14 may submit the written paper by April 18.

Overview

The prototype design proposal is an in-depth activity, where students describe the design for a future natural language processing application. The proposal can be on any aspect of natural language processing, for any human language. You will formulate a research question, identify potential resources and methods to address the question, and describe a procedure for evaluating the prototype.

Proposal structure

The proposal should be a document of about 1500 words, written in English in good academic style. Proposals that substantially exceed this length (above 1600 words) will be penalized. The structure of the document should be as follows.

Submission

Submit the paper as a PDF file using the “Written Paper” assignment on Blackboard. Use a style with clear section headings and a single-column format (two-column formats are designed to save paper, but are difficult to read on a computer screen because of all the up-and-down movement).

Grading

The grade for the assignment will be broken down as follows.

The prototype design proposal counts for 16% of the overall course grade.

Notes

The following notes are based on questions and answers from students in past semesters.

Does the paper need to be on one of the research areas covered in class?
The application can address any problem that involves the use of real-world human language data; it does not need to fall under the research areas discussed in class.
Does the paper need to use one of the methods discussed in class?
The constraint is on the problem (human language processing), not the model. Any suitable model is fine, whether discussed in class or not.
Is it OK to submit a paper that uses some of the methods covered in class for an application in domains other than human language?
The paper must be on the processing of human language. This is a class on Natural Language Processing, and the papers must model an application for human language.
Is it OK to model human language together with some other domain?
It is fine to model human language together with another domain, as long as the focus is on the language part. For example, an application that creates text descriptions of financial data, or predicts financial trends from human language texts, can be appropriate for this assignment if most of the modeling goes into the language aspects; but if most of the work is on modeling the financial data, this can be a very interesting application, but it is not appropriate for this class.
Why is the structure of the paper so rigid?
The constraints on the format are designed to help students with their writing. My experience, based on many student proposals and project reports, is that when students stray from the prescribed guidelines, the papers often end up missing important information.
Should I include graphics in the paper?
Graphics may be used to help clarify an idea. However, it is important to always explain the ideas in words, so graphics should not substitute for a textual description.
How can I make the paper fit within the word limit?
Try to focus on your own ideas, and give less prominence to ideas by other people. Identify the main idea (or ideas) in your proposal, and make the entire paper support that idea, while trimming parts that are only tangential to supporting that idea. If you feel you don't have enough space for a particular section, the solution is to cut somewhere else. One way to achieve this is to write out the section the way you would like it to be, see how much this brings you over the limit, then identify the least essential parts in the entire paper, and trim them.
How should I format the references?
Use any common citation format; make sure it is consistent, and includes all the information needed to locate the reference. Where possible, include links to the cited papers/articles.
How do I express the motivation for a paper?
The starting point should be the problem that the proposal is trying to address: What is the proposal trying to do? Why would this be a useful thing? The motivation should guide the development of the rest of the paper. How can we tell if an application is good for what it is trying to achieve? What data, procedure, and measurements are needed to assess if the idea is suitable for the purpose? What are the advantages and disadvantages of the proposed design for this particular purpose? All of these should go into the paper, in the appropriate sections.
What counts for creativity? Is taking an existing model and changing some settings considered creative? How about fine-tuning a pre-trained model, or using a mix of NLP techniques?
It depends. If the change to an existing model is minor or obvious, if the fine-tuning just feeds different data to a process, then it is not very creative. But if a change to a model requires some thought or is specifically suited for the problem, if there is something interesting about how new data relate to the problem or how they are used in tuning, if the argument is made that a particular mix of techniques is a good way to operationalize the problem, then the solution can be considered creative.
Should the paper consider moral and ethical issues?
For certain applications, there may be practical and ethical considerations about using an automated model for the proposed application in the real world; such considerations should be highlighted in the discussion section.
Can the paper propose a new evaluation method?
A proposal for an evaluation method, which takes the output of an NLP process and provides a quality assessment, should follow the same structure as outlined above: proposing a model (in this case an evaluation model), and using some language data, an experimental procedure, and and evaluation method to assess the model. The discussion section should talk about expected advantages and disadvantages of the proposed evaluation model.
Do we need to give out accuracy and other set of measures for our proposed model?
This is a design project, not an implementation project. The design needs to include a procedure for evaluation, as explained above, including descriptions of the measures that should be taken. If you have certain expectations about performance, these can be included as well; but since there is no implementation, there is no way to measure actual performance.
Is it OK to also implement the proposal and report on that?
I would advise against reporting on an implementation, because this takes up space that could be used for describing the design in more detail.
How should I present data in a foreign language?
If you need to give examples of text in languages other than English, please use the following multi-line format, to make the examples readable to English speakers. Below is an example for how to present a sentence in Hindi.
किसने दवाई को खरीदा (the original text in its native script)
kisne davaaii kokhariidaa (a transcription into Latin script)
whoERG medicine ACCbought (a word-by-word gloss)
‘Who bought the medicine?’ (a translation into English)
The explanations on the right (in parentheses) are part of the instructions: they do not need to be repeated with the example. The second line (transcription into Latin script) is not needed if the language natively uses a version of the Latin script.