CSCI 544 — Applied Natural Language Processing
Written paper: NLP prototype design proposal
Updates
- [2021-04-26] Removed Sample paper.
- [2021-04-02] Added Sample paper.
Due: April 11, 2021
Students with presentations on April 12 or 14 may submit
the written paper by April 18.
Overview
The prototype design proposal is an in-depth activity, where
students describe the design for a future natural language processing
application. The proposal can be on any aspect of natural language
processing, for any human language. You will formulate a research
question, identify potential resources and methods to address the
question, and describe a procedure for evaluating the prototype.
- The design proposal must be a new effort, conducted specifically
for this class. You may build on and extend previous research, but this
proposal needs to add to that research, not just reuse old material.
- The design proposal is an individual assignment. You may not
work in teams, or collaborate with other students or authors. You
must be the sole author of 100% of the work you turn in.
The proposal should be a document of about
1500 words, written in English in good academic
style. Proposals that substantially exceed this length (above
1600 words) will be penalized. The structure of the document
should be as follows.
- Title for the proposal.
- Introduction. Motivate a specific problem that you will consider,
which involves the use of real-world human language data. Describe
the problem you are trying to solve, why it is interesting or
challenging, and what applications it might have.
If possible, explain the linguistic insights that the proposal
is trying to capture and make use of.
Note: There is no need to motivate Natural Language Processing in
general, but rather your specific application.
- Method.
- Materials. Identify a potential source of data that you may
use, such as a specific corpus that you can get access to or
collect yourself.
Describe the data in some detail, including the source, the amount
of data, what kinds of annotation it has or needs, and what effort
it might take to obtain the data and adapt it for the purpose of
the prototype.
- Procedure. Describe the experimental procedure, that is
what models and methods can be used to process the data.
This may include algorithms, features, and tools, and any
required annotations. Well-known methods (such as Naive Bayes or
Convolutional Neural Networks) do not need to be explained, but you do
need to explain how you use them, for example the features you
choose.
- Evaluation. Describe a process for evaluating the
system’s performance, and the annotation procedure (if
needed). What measures can be used? What baseline can the system
be compared to?
- Discussion. Discuss implications of your
design: how might it compare to other possible solutions (either
from the literature or an alternative design), what advantages your
proposed designs have, what shortcomings, and how these affect
potential use.
- References cited.
There is no need to cite sources for well-known methods.
You should cite sources from which you borrow specific ideas, but
the focus of the paper should be your original design, not a
literature survey (if your design does not use specific ideas from
existing papers, you may not need to cite any references).
- Word count for the document, excluding references.
Submission
Submit the paper as a PDF file using the “Written
Paper” assignment on Blackboard. Use a style with clear section
headings and a single-column format (two-column
formats are designed to save paper, but are difficult to read on a
computer screen because of all the up-and-down movement).
Grading
The grade for the assignment will be broken down as follows.
- 12.5% Motivation: a clearly articulated problem with real-world implications.
- 12.5% Materials: appropriate choice of data that are feasible to
acquire or collect, and can help with a solution to the problem.
- 12.5% Model: appropriate choice of methods and models to reason
or learn from the data.
- 12.5% Evaluation: appropriate choice of a procedure to assess the
proposed solution.
- 12.5% Analysis: Critical evaluation of the proposal, alternative
solutions, and potential implications.
- 12.5% Creativity.
- 12.5% Thoroughness.
- 12.5% Quality and clarity of writing.
The prototype design proposal counts for 16% of the overall course grade.
The following notes are based on questions and answers from
students in past semesters.
- Does the paper need to be on one of the research areas covered
in class?
- The application can address any problem that involves the
use of real-world human language data; it does not need to fall
under the research areas discussed in class.
- Does the paper need to use one of the methods discussed in
class?
- The constraint is on the problem (human language processing),
not the model. Any suitable model is fine, whether discussed in
class or not.
- Is it OK to submit a paper that uses some of the methods covered
in class for an application in domains other than human language?
- The paper must be on the processing of human language. This is a
class on Natural Language Processing, and the papers must model an
application for human language.
- Is it OK to model human language together with some other domain?
- It is fine to model human language together with another domain,
as long as the focus is on the language part. For example,
an application that creates text descriptions of financial data,
or predicts financial trends from human language texts, can be
appropriate for this assignment if most of the modeling goes into
the language aspects;
but if most of the work is on modeling the financial data,
this can be a very interesting application, but it is not
appropriate for this class.
- Why is the structure of the paper so rigid?
- The constraints on the format are designed to help students with
their writing. My experience, based on many student proposals and
project reports, is that when students stray from the prescribed
guidelines, the papers often end up missing important
information.
- Should I include graphics in the paper?
- Graphics may be used to help clarify an idea. However, it is
important to always explain the ideas in words, so graphics should
not substitute for a textual description.
- How can I make the paper fit within the word limit?
- Try to focus on your own ideas, and give less prominence to
ideas by other people. Identify the main idea (or ideas) in your
proposal, and make the entire paper support that idea, while
trimming parts that are only tangential to supporting that idea.
If you feel you don't have enough space for a particular section,
the solution is to cut somewhere else. One way to achieve this is
to write out the section the way you would like it to be, see how
much this brings you over the limit, then identify the least
essential parts in the entire paper, and trim them.
- How should I format the references?
- Use any common citation format; make sure it is consistent, and
includes all the information needed to locate the reference. Where
possible, include links to the cited papers/articles.
- How do I express the motivation for a paper?
- The starting point should be the problem that the proposal is
trying to address: What is the proposal trying to do?
Why would this be a useful thing? The
motivation should guide the development of the rest of the
paper. How can we tell if an application is good for what it is
trying to achieve? What data, procedure, and measurements are needed
to assess if the idea is suitable for the purpose? What are the
advantages and disadvantages of the proposed design for this
particular purpose? All of these should go into the paper, in the
appropriate sections.
- What counts for creativity? Is taking an existing model and
changing some settings considered creative? How about fine-tuning a
pre-trained model, or using a mix of NLP techniques?
- It depends. If the change to an existing model is minor or
obvious, if the fine-tuning just feeds different data to a
process, then it is not very creative. But if
a change to a model requires some thought or is specifically suited for the
problem, if there is something interesting about how new data
relate to the problem or how they are used in tuning, if the argument is
made that a particular mix of techniques is a good way to
operationalize the problem, then the solution can
be considered creative.
- Should the paper consider moral and ethical issues?
- For certain applications, there may be practical and ethical
considerations about using an automated model for the proposed
application in the real world; such considerations should be
highlighted in the discussion section.
- Can the paper propose a new evaluation method?
- A proposal for an evaluation method, which takes the output of
an NLP process and provides a quality assessment, should follow the
same structure as outlined above:
proposing a model (in this case an evaluation model), and using some
language data, an experimental procedure, and and evaluation method
to assess the model. The discussion section should talk about
expected advantages and disadvantages of the proposed evaluation
model.
- Do we need to give out accuracy and other set of measures for
our proposed model?
- This is a design project, not an implementation project. The
design needs to include a procedure for evaluation, as explained
above, including descriptions of the measures that should be
taken. If you have certain expectations about performance, these
can be included as well; but since there is no implementation,
there is no way to measure actual performance.
- Is it OK to also implement the proposal and report on that?
- I would advise against reporting on an implementation, because
this takes up space that could be used for describing the design
in more detail.
- How should I present data in a foreign language?
- If you need to give examples of text in languages other than English,
please use the following multi-line format, to make the examples
readable to English speakers. Below is an example for how to present a
sentence in Hindi.
किस | ने | दवाई | को | खरीदा |
(the original text in its native script) |
kis | ne | davaaii | ko | khariidaa |
(a transcription into Latin script) |
who | ERG | medicine | ACC | bought |
(a word-by-word gloss) |
‘Who bought the medicine?’ |
(a translation into English) |
The explanations on the right (in parentheses) are part of the
instructions: they do not need to be repeated with the example.
The second line (transcription into Latin script) is not needed if the
language natively uses a version of the Latin script.