Your cart is currently empty!
In this assignment, you will implement the CKY algorithm for English and apply it to the word recognition and parsing problem. You can use the NLTK modules for representing context-free grammars and parse trees, but you should implement the parser from scratch. Ingredients. We provide the grammar and the test sentences. The grammar stems…
In this assignment, you will implement the CKY algorithm for English and apply it to the word recognition and parsing problem. You can use the NLTK modules for representing context-free grammars and parse trees, but you should implement the parser from scratch.
Ingredients. We provide the grammar and the test sentences. The grammar stems from the
Airline Travel Information System (ATIS), a project working on spoken dialog systems for air
travel. The ATIS CFG is available in the NLTK data package, together with 98 test sentences.
You can initialize the resources this
way:
# load the grammar
grammar = nltk.data.load(“grammars/large_grammars/atis.cfg”)
# load the raw sentences
s = nltk.data.load(“grammars/large_grammars/atis_sentences.txt”, “raw”)
# extract the test sentences
t = nltk.parse.util.extract_test_sentences(s)
NLTK already implements a number of parsing algorithms (see nltk.parse for the list). You can try one to see if you loaded the grammar correctly:
# initialize the parser
parser = nltk.parse.BottomUpChartParser(grammar)
# parse all test sentences
for sentence in t:
parser.chart_parse(sentence[0])
However, the NLTK version of the ATIS grammar is not in Chomsky normal form (CNF), which you will need for your CKY parser. Feel free to implement a conversion module. You can then read the grammar from the file using nltk.data.load() and utilize the nice features of the nltk.grammar module on the resulting object.
[Note that chart parse() throws an error when it encounters an unknown word, which is undesirable behavior for any parser. If you want to test the grammars with the pre implemented parser (for example, to check whether the CNF version is in fact weakly equivalent), you may need to catch this error]
Problem Statement: Parsing
Implement the CKY algorithm for parsing. Also implement a function that extracts the
set of all parse trees from the backpointers in the chart. Feel free to use the NLTK module
nltk.tree for this purpose; notice that only Immutable Trees can be used as elements of Python sets, whereas raw Trees cannot.
Please submit your code, outputs, and a README file containing instructions for running your recognizer and parser. The outputs should consist of at least:
You can visualize an NLTK tree using its draw method.