CSCI 544 — Applied Natural Language Processing

Written paper: NLP prototype design proposal

Updates

[2021-04-26] Removed Sample paper.
[2021-04-02] Added Sample paper.

Due: April 11, 2021

Students with presentations on April 12 or 14 may submit the written paper by April 18.

Overview

The prototype design proposal is an in-depth activity, where students describe the design for a future natural language processing application. The proposal can be on any aspect of natural language processing, for any human language. You will formulate a research question, identify potential resources and methods to address the question, and describe a procedure for evaluating the prototype.

The design proposal must be a new effort, conducted specifically for this class. You may build on and extend previous research, but this proposal needs to add to that research, not just reuse old material.
The design proposal is an individual assignment. You may not work in teams, or collaborate with other students or authors. You must be the sole author of 100% of the work you turn in.

Proposal structure

The proposal should be a document of about 1500 words, written in English in good academic style. Proposals that substantially exceed this length (above 1600 words) will be penalized. The structure of the document should be as follows.

Title for the proposal.
Introduction. Motivate a specific problem that you will consider, which involves the use of real-world human language data. Describe the problem you are trying to solve, why it is interesting or challenging, and what applications it might have. If possible, explain the linguistic insights that the proposal is trying to capture and make use of. Note: There is no need to motivate Natural Language Processing in general, but rather your specific application.
Method.
- Materials. Identify a potential source of data that you may use, such as a specific corpus that you can get access to or collect yourself. Describe the data in some detail, including the source, the amount of data, what kinds of annotation it has or needs, and what effort it might take to obtain the data and adapt it for the purpose of the prototype.
- Procedure. Describe the experimental procedure, that is what models and methods can be used to process the data. This may include algorithms, features, and tools, and any required annotations. Well-known methods (such as Naive Bayes or Convolutional Neural Networks) do not need to be explained, but you do need to explain how you use them, for example the features you choose.
- Evaluation. Describe a process for evaluating the system’s performance, and the annotation procedure (if needed). What measures can be used? What baseline can the system be compared to?
Discussion. Discuss implications of your design: how might it compare to other possible solutions (either from the literature or an alternative design), what advantages your proposed designs have, what shortcomings, and how these affect potential use.
References cited. There is no need to cite sources for well-known methods. You should cite sources from which you borrow specific ideas, but the focus of the paper should be your original design, not a literature survey (if your design does not use specific ideas from existing papers, you may not need to cite any references).
Word count for the document, excluding references.

Submission

Submit the paper as a PDF file using the “Written Paper” assignment on Blackboard. Use a style with clear section headings and a single-column format (two-column formats are designed to save paper, but are difficult to read on a computer screen because of all the up-and-down movement).

Grading

The grade for the assignment will be broken down as follows.

12.5% Motivation: a clearly articulated problem with real-world implications.
12.5% Materials: appropriate choice of data that are feasible to acquire or collect, and can help with a solution to the problem.
12.5% Model: appropriate choice of methods and models to reason or learn from the data.
12.5% Evaluation: appropriate choice of a procedure to assess the proposed solution.
12.5% Analysis: Critical evaluation of the proposal, alternative solutions, and potential implications.
12.5% Creativity.
12.5% Thoroughness.
12.5% Quality and clarity of writing.

The prototype design proposal counts for 16% of the overall course grade.

Notes

The following notes are based on questions and answers from students in past semesters.

Does the paper need to be on one of the research areas covered in class?

The application can address any problem that involves the use of real-world human language data; it does not need to fall under the research areas discussed in class.

Does the paper need to use one of the methods discussed in class?

The constraint is on the problem (human language processing), not the model. Any suitable model is fine, whether discussed in class or not.

Is it OK to submit a paper that uses some of the methods covered in class for an application in domains other than human language?

The paper must be on the processing of human language. This is a class on Natural Language Processing, and the papers must model an application for human language.

Is it OK to model human language together with some other domain?

It is fine to model human language together with another domain, as long as the focus is on the language part. For example, an application that creates text descriptions of financial data, or predicts financial trends from human language texts, can be appropriate for this assignment if most of the modeling goes into the language aspects; but if most of the work is on modeling the financial data, this can be a very interesting application, but it is not appropriate for this class.

Why is the structure of the paper so rigid?

The constraints on the format are designed to help students with their writing. My experience, based on many student proposals and project reports, is that when students stray from the prescribed guidelines, the papers often end up missing important information.

Should I include graphics in the paper?

Graphics may be used to help clarify an idea. However, it is important to always explain the ideas in words, so graphics should not substitute for a textual description.

How can I make the paper fit within the word limit?

Try to focus on your own ideas, and give less prominence to ideas by other people. Identify the main idea (or ideas) in your proposal, and make the entire paper support that idea, while trimming parts that are only tangential to supporting that idea. If you feel you don't have enough space for a particular section, the solution is to cut somewhere else. One way to achieve this is to write out the section the way you would like it to be, see how much this brings you over the limit, then identify the least essential parts in the entire paper, and trim them.

How should I format the references?

Use any common citation format; make sure it is consistent, and includes all the information needed to locate the reference. Where possible, include links to the cited papers/articles.

How do I express the motivation for a paper?

The starting point should be the problem that the proposal is trying to address: What is the proposal trying to do? Why would this be a useful thing? The motivation should guide the development of the rest of the paper. How can we tell if an application is good for what it is trying to achieve? What data, procedure, and measurements are needed to assess if the idea is suitable for the purpose? What are the advantages and disadvantages of the proposed design for this particular purpose? All of these should go into the paper, in the appropriate sections.

What counts for creativity? Is taking an existing model and changing some settings considered creative? How about fine-tuning a pre-trained model, or using a mix of NLP techniques?

It depends. If the change to an existing model is minor or obvious, if the fine-tuning just feeds different data to a process, then it is not very creative. But if a change to a model requires some thought or is specifically suited for the problem, if there is something interesting about how new data relate to the problem or how they are used in tuning, if the argument is made that a particular mix of techniques is a good way to operationalize the problem, then the solution can be considered creative.

Should the paper consider moral and ethical issues?

For certain applications, there may be practical and ethical considerations about using an automated model for the proposed application in the real world; such considerations should be highlighted in the discussion section.

Can the paper propose a new evaluation method?

A proposal for an evaluation method, which takes the output of an NLP process and provides a quality assessment, should follow the same structure as outlined above: proposing a model (in this case an evaluation model), and using some language data, an experimental procedure, and and evaluation method to assess the model. The discussion section should talk about expected advantages and disadvantages of the proposed evaluation model.

Do we need to give out accuracy and other set of measures for our proposed model?

This is a design project, not an implementation project. The design needs to include a procedure for evaluation, as explained above, including descriptions of the measures that should be taken. If you have certain expectations about performance, these can be included as well; but since there is no implementation, there is no way to measure actual performance.

Is it OK to also implement the proposal and report on that?

I would advise against reporting on an implementation, because this takes up space that could be used for describing the design in more detail.

How should I present data in a foreign language?

If you need to give examples of text in languages other than English, please use the following multi-line format, to make the examples readable to English speakers. Below is an example for how to present a sentence in Hindi.

किस	ने	दवाई	को	खरीदा	(the original text in its native script)
kis	ne	davaaii	ko	khariidaa	(a transcription into Latin script)
who	ERG	medicine	ACC	bought	(a word-by-word gloss)
‘Who bought the medicine?’					(a translation into English)

The explanations on the right (in parentheses) are part of the instructions: they do not need to be repeated with the example. The second line (transcription into Latin script) is not needed if the language natively uses a version of the Latin script.