CSCI 544 — Applied Natural Language Processing
Final paper: NLP prototype design proposal
Due: November 26, 2020 (extended from November 24)
Every student will receive a personalized submission link through
Crowdmark. Do not share the link with others: it is linked to your
email. The completed assignments will be accepted only through the
online system.
Overview
The prototype design proposal is an in-depth activity, where
students describe the design for a future natural language processing
application. The proposal can be on any aspect of natural language
processing, for any human language. You will formulate a research
question, identify potential resources and methods to address the
question, describe a procedure for evaluating the prototype.
- The design proposal must be a new effort, conducted specifically
for this class. You may build on and extend previous research, but this
proposal needs to add to that research, not just reuse old material.
- The design proposal is an individual assignment. You may not
work in teams, or collaborate with other students or authors. You
must be the sole author of 100% of the work you turn in.
The proposal should be a document of about
1500 words, written in English in good academic
style. Proposals that substantially exceed this length (above
1600 words) will be penalized. The structure of the document
should be as follows.
- Title for the proposal.
- Name, USC ID, and USC email of the author.
- Introduction. Motivate a specific problem that you will consider,
which involves the use of real-world human language data. Describe
the problem you are trying to solve, why it is interesting or
challenging, and what applications it might have.
If possible, explain the linguistic insights that the proposal
is trying to capture and make use of.
Note: There is no need to motivate Natural Language Processing in
general, but rather your specific application.
- Method.
- Materials. Identify a potential source of data that you may
use, such as a specific corpus that you can get access to or
collect yourself.
Describe the data in some detail, including the source, the amount
of data, what kinds of annotation it has or needs, and what effort
it might take to obtain the data and adapt it for the purpose of
the prototype.
- Procedure. Describe the experimental procedure, that is
what models and methods can be used to process the data.
This may include algorithms, features, and tools, and any
required annotations. Well-known methods (such as Naive Bayes or
Convolutional Neural Networks) do not need to be explained, but you do
need to explain how you use them, for example the features you
choose.
- Evaluation. Describe a process for evaluating the
system’s performance, and the annotation procedure (if
needed). What measures can be used? What baseline can the system
be compared to?
- Discussion. Discuss implications of your
design: how might it compare to other possible solutions (either
from the literature or an alternative design), what advantages your
proposed designs have, what shortcomings, and how these affect
potential use.
- References cited.
There is no need to cite sources for well-known methods.
You should cite sources from which you borrow specific ideas, but
the focus of the paper should be your original design, not a
literature survey (if your design does not use specific ideas from
existing papers, you may not need to cite any references).
- Word count for the document, excluding references.
If you need to give examples of text in languages other than English,
please use the following multi-line format, to make the examples
readable to English speakers. Below is an example for how to present a
sentence in Hindi.
किस | ने | दवाई | को | खरीदा |
(the original text in its native script) |
kis | ne | davaaii | ko | khariidaa |
(a transcription into Latin script) |
who | ERG | medicine | ACC | bought |
(a word-by-word gloss) |
‘Who bought the medicine?’ |
(a translation into English) |
The explanations on the right (in parentheses) are part of the
instructions: they do not need to be repeated with the example.
The second line (transcription into Latin script) is not needed if the
language natively uses a version of the Latin script.
Grading
The grade for the assignment will be broken down as follows.
- 10% Originality and innovativeness.
- 15% Motivation: a clearly articulated problem with real-world implications.
- 15% Materials: appropriate choice of data that is feasible to
acquire or collect, and can help with a solution to the problem.
- 15% Model: appropriate choice of methods and models to reason
or learn from the data.
- 15% Evaluation: appropriate choice of a procedure to assess the
proposed solution.
- 15% Analysis: Critical evaluation of the proposal, alternative
solutions, and potential implications.
- 15% Quality and clarity of writing.
The prototype design proposal counts for 20% of the overall course grade.
The following are edited versions of responses to student questions
about the assignment.
- The application does not need to fall into the research areas
discussed in class; it can be any problem that involves the use of
real-world human language data.
- The constraint is on the problem (human language processing),
not the model. Any suitable model is fine, whether discussed in
class or not.
- Originality and innovativeness refer to the solution, not the
problem. There are many problems that nobody has tried before, but
if an approach is standard (for example, label some data and train
an LSTM to replicate these labels), then the paper will not score
high on originality and innovativeness. Conversely, a new and
creative solution to a known problem could be considered original
and innovative. An important aspect is for the proposed solution to
capture some insight into the nature of the problem, and offer a way
to operationalize that insight.
- For certain applications, there may be practical and ethical
considerations about using an automated model for the proposed
application in the real world; such considerations should be
highlighted in the discussion section.
- The starting point should be the motivation for the
problem that the proposal is trying to address: What is the proposal
trying to do? Why would this be a useful thing? The
motivation should guide the development of the rest of the
paper. How can we tell if an application is good for what it is
trying to achieve? What data, procedure, and measurements are needed
to assess if the idea is suitable for the purpose? What are the
advantages and disadvantages of the proposed design for this
particular purpose? All of these should go into the paper.
- A proposal for an evaluation method, which takes the output of
an NLP process and provides a quality assessment, should follow the
same structure as outlined above:
proposing a model (in this case an evaluation model), and using some
language data, an experimental procedure, and and evaluation method
to assess the model. The discussion section should talk about
expected advantages and disadvantages of the proposed evaluation
model.