CSCI 544 — Applied Natural Language Processing
Research Project
- Proposals are due March 29, 2016, at 23:59 Pacific Time (11:59 PM)
- Code demonstrations will take place on Monday, April 18, in
SAL 213. Timed sessions were arranged individually with each team.
- Final reports are due April 29, 2016, at 23:59 Pacific Time (11:59 PM)
Updates (April 18):
- Specific instructions on reporting new tools and giving foreign
language examples.
(April 17):
- Updated time and location for code demos (previously announced
over email).
- More details on final report format.
Overview
The research project is an in-depth activity that will be carried
out in teams of four. The project can be on any aspect of natural
language processing. You will formulate a research question, identify
resources and tools to address the question, implement and evaluate a
system that uses these resources and tools, demonstrate the system,
and write up a report.
- The research must involve a language other than English.
- The project must be a new effort, conducted specifically for
this class. You may build on and extend previous research, but this
project needs to add to that research, not just reuse old material.
- You must submit your code by putting it on a publicly available
repository (such as GitHub or
Bitbucket) and making the link
available, at least for the duration of the class.
- Your code may use external, publicly
available tools and resources, but only ones that you are able to
provide when you submit your code.
Procedure
- Form teams of four students; you may use the Blackboard forum to
team up together based on interests such as language and topic. Each
team should have strengths in all of the following areas: theory,
data, coding, and writing.
- Once you have formed your team, add your information on the
team
spreadsheet, and create a thread on the forum to
discuss your project ideas with the instructor and the TAs.
In the discussion, identify the language and problem you’re working
on, the data and tools you will use, and the effort you will put
in. You should receive feedback before writing up
your project proposal.
- Submit your project proposal by the deadline.
- Implement and evaluate your project.
- Demonstrate your code to the instructor and TAs (scheduled
individually), and submit your code through a public repository.
- Submit your final report by the deadline.
Proposal structure
The proposal describes your plan for the research project, and will
serve as the skeleton for the final report. As a plan it is subject to
change and does not represent a firm commitment, but it should show
that you’ve thought through the relevant aspects of your research.
The proposal should be a document of about 500 words, written in
English in good academic style. The structure of the document should
be as follows.
- Title for the project.
- Names, USC IDs, and USC emails of the team members.
- Introduction. Motivate a specific problem that you will consider,
which involves the use of real-world natural language data. Describe
the problem you are trying to solve, why it is interesting or
challenging, what existing work has been done, and how your
contribution relates to that.
- Method.
- Materials. Identify the source data that you use, such as a
specific corpus that you can get access to or collect yourself.
Describe the data in some detail, including the source, the amount
of data, what kinds of annotation it has or needs.
- Procedure. Describe what methods you will use to process your
data, algorithms, features, and tools, and what annotations you will
make (if needed).
- Evaluation. Describe how you will evaluate your system’s
performance, and your annotation procedure (if needed). What
measures will you use? What baseline system will you compare to?
- References cited.
- Division of labor between the teammates.
- Word count for the document.
The proposal should be written after you have received some
feedback about the general direction of your project on the Blackboard
thread. You will receive written feedback about your proposal, which
should help you with writing the final report; however, feedback on
the proposal might take some time, so don’t delay
collecting your data and implementing your system while waiting for
comments on your proposal. For feedback on specific issues that arise
with the project, use the Blackboard thread.
Code demonstrations
Each team will meet with the instructor and the TAs for a brief
demonstration to show how their code works.
Code demonstrations take place on Monday, April 18, in
SAL 213. Timed sessions were arranged individually with each team.
Final report
The final report describes the research you have done, reporting on
the method and results, relating the research to other work in the
field, and offering conclusions and directions for future work. The
report should be about 2000 words long, not counting the
references; reports that
substantially exceed this length will be penalized. The structure is
similar to the proposal, but with more detail, and two additional
sections following the method section.
- Title for the project.
- Names, USC IDs, and USC emails of the team members.
- Introduction. Motivate the problem that you have worked on,
which involves the use of real-world natural language data.
There is no need to motivate Natural Language Processing in general,
but rather your specific application.
Describe the problem, why it is interesting or
challenging, what existing work has been done, and how your
contribution relates to that.
- Method.
- Materials. Identify the source data that you
use, such as a specific corpus that you accessed or collected
yourself.
Describe the data in some detail, including the source, the amount
of data, what kinds of annotation it has, and what annotations
needed to be added (if any).
- Procedure. Describe the experimental
procedure, that is what methods you use to process the data.
This may include algorithms, features, and tools, and any
annotations you made. Well-known methods (such as Naive Bayes or
Conditional Random Fields) do not need to be explained, but you do
need to explain how you use them, for example the features you
choose. If you created a tool (such as an annotation or
visualization tool), describe it in some detail.
- Evaluation. Describe your method for
evaluating the system’s performance, and for evaluating your
annotation procedure (if any).
Include a description of the specific measures will you took,
and the baseline to which you compare your system.
- Results. Report how your system performs, and how it compares to
the baseline or to other comparable work. Discuss what it gets right,
what it gets wrong, and why.
- Discussion. Discuss conclusions that can be drawn from the
research, implications of your findings, the overall contribution to
the general NLP community, and directions for future research.
- References cited. You may choose your preferred style for
in-text citations (for example, numerical or author-year) and for
the reference listing, but please keep it consistent across the
document. The reference listing should contain all the information
required for accessing the reference – author(s), year,
title, and publication information (such as the conference, journal,
volume etc.).
- Division of labor between the teammates.
- Word count for the document, excluding references.
The six main content sections (introduction, materials, procedure,
evaluation, results, and discussion) carry equal weight. Therefore,
they should be of similar lengths – this means reserving
about 300–350 words for each section. This is only a general
guideline, as you may find that some sections require more text than
others. However, if you find you have more to say than fits within the
length requirement, then you’ll need to concentrate on the more
important aspects of your project.
When giving examples of text in languages other than English,
please use the following multi-line format, to make the examples
readable to English speakers. Below is an example for how to present a
sentence in Hindi.
1. The original text in its native script: | किस | ने | दवाई | को | खरीदा |
2. A transcription into Latin script: | kis | ne | davaaii | ko | khariidaa |
3. A word-by-word gloss: | who | ERG | medicine | ACC | bought |
4. A translation into English: | ‘Who bought the medicine?’ |
Line 2 is not needed if the language natively uses a version of the
latin script. Also, the line numbers and explanations on the left are
not needed in the report.
Grading
The grade for the assignment will be broken down as follows.
- 10% Discussion of the project with the instructor and TAs.
- 10% On-time submission of a coherent proposal.
- 10% Difficulty, creativity and originality.
- 20% Code demonstration and code quality.
- 50% Final report.
The research project counts for 30% of the overall course grade.