Project 3 (Autocomplete Me) Solution

$30.00

Description

This document only contains the description of the project and the project problems. For the programming exercises on concepts needed for the project, please refer to the project checklist .

The purpose of this assignment is to write a program to implement autocomplete for a given set of N strings and nonnegative weights. That is, given a pre x, nd all strings in the set that start with the pre x, in descending order of weight.

Autocomplete is an important feature of many modern applications. As the user types, the program predicts the complete query (typically a word or phrase) that the user intends to type. Autocomplete is most e ective when there are a limited number of likely queries. For example, the Internet Movie Database uses it to display the names of movies as the user types; search engines use it to display suggestions as the user enters web search queries; cell phones use it to speed up text input.

In these examples, the application predicts how likely it is that the user is typing each query and presents to the user a list of the top-matching queries, in descending order of weight. These weights are determined by historical data, such as box o ce revenue for movies, frequencies of search queries from other Google users, or the typing history of a cell phone user. For the purposes of this assignment, you will have access to a set of all possible queries and associated weights (and these queries and weights will not change).

The performance of autocomplete functionality is critical in many systems. For example, consider a search engine which runs an autocomplete application on a server farm. According to one study, the application has only about 50ms to return a list of suggestions for it to be useful to the user. Moreover, in principle, it must perform this computation for every keystroke typed into the search bar and for every user!

In this assignment, you will implement autocomplete by sorting the queries in lexicographic order; using binary search to nd the set of queries that start with a given pre x; and sorting the matching queries in descending order by weight.

Problem 1. (Autocomplete Term) Implement an immutable comparable data type Term in Term.java that represents an autocomplete term: a string query and an associated real-valued weight. You must implement the following API, which supports comparing terms by three di erent orders: lexicographic order by query string (the natural order); in descending order by weight (an alternate order); and lexicographic order by query string but using only the rst r characters (a family of alternate orderings). The last order may seem a bit odd, but you will use it in Problem 3 to nd all terms that start with a given pre x (of length r).

method

description

Term(String query)

initialize a term with the given query string and zero weight

Term(String query, long weight)

initialize a term with the given query string and weight

static Comparator<Term> byReverseWeightOrder()

for comparing terms in descending order by weight

static Comparator<Term> byPrefixOrder(int r)

for comparing terms in lexicographic order but using only the

rst r characters of each query

int compareTo(Term that)

compare the terms in lexicographic order by query

String toString()

a string representation of the term

1 of 3

CS210 Project 3 (Autocomplete Me) Swami Iyer

Corner cases. The constructor should throw a java.lang.NullPointerException if query is null

and a java.lang.IllegalArgumentException if weight is negative. The byPrefixOrder() method should throw a java.lang.IllegalArgumentException if r is negative.

Performance requirements. The string comparison functions should take time proportional to the number of characters needed to resolve the comparison.

  • java Term data / cities . txt 5 Top 5 by l e x i c o g r a p h i c order :

2200’s Gravenmoer , N e t h e r l a n d s

19190’s – Gravenzande , N e t h e r l a n d s 134520 ’s – Hertogenbosch , N e t h e r l a n d s

3628’t Hofke , N e t h e r l a n d s

246056

A C o r u a , Spain

Top 5 by reverse – weight order :

14608512

Shanghai , China

13076300

Buenos Aires , Ar ge n ti na

12691836

Mumbai , India

12294193

Mexico City , Distrito Federal , Mexico

11624219

Karachi , Pakistan

Problem 2. (Binary Search Deluxe) When binary searching a sorted array that contains more than one key equal to the search key, the client may want to know the index of either the rst or the last such key. Accordingly, implement a library of static methods BinarySearchDeluxe.java with the following API:

method

description

the index of the rst key

static int firstIndexOf(Key[] a, Key key, Comparator<Key> c)

in a[] that equals the search key,

or -1 if no such key

the index of the last key

static int lastIndexOf(Key[] a, Key key, Comparator<Key> c)

in a[] that equals the search key,

or -1 if no such key

Corner cases. Each static method should throw a java.lang.NullPointerException if any of its arguments is null. You should assume that the argument array is in sorted order (with respect to the supplied comparator).

Performance requirements. The firstIndexOf() and lastIndexOf() methods should make at most 1 + dlog Ne compares in the worst case, where N is the length of the array. In this context, a compare is one call to comparator.compare().

$ java B i n a r y S e a r c h D e l u x e data / w i k t i o n a r y . txt cook

3

Problem 3. (Autocomplete) In this part, you will implement a data type that provides autocomplete functionality for a given set of string and weights, using Term and BinarySearchDeluxe. To do so, sort the terms in lexicographic order; use binary search to nd the set of terms that start with a given pre x; and sort the matching terms in descending order by weight. Organize your program by creating an immutable data type Autocomplete in Autocomplete.java with the following API:

method

description

Autocomplete(Term[] terms)

initialize the data structure from the given array of terms

Term[] allMatches(String prefix)

all terms that start with the given pre x, in descending order of weight

int numberOfMatches(String prefix)

the number of terms that start with the given pre x

Corner cases. The constructor and each method should throw a java.lang.NullPointerException if its argument is null.

Performance requirements. The constructor should make proportional to N log N compares (or better) in the worst case, where N is the number of terms. The allMatches() method should make proportional to log N +M log M compares (or better) in the worst case, where M is the number of matching terms. The numberOfMatches() method should make proportional to log N compares (or better) in the worst case. In this context, a compare is one call to any of the compare() or compareTo() methods de ned in Term.

2 of 3

CS210

Project 3 (Autocomplete Me)

Swami Iyer

$ java A u t o c o m p l e t e

data / cities . txt 7

M

< enter >

12691836

Mumbai ,

India

12294193

Mexico City , Distrito Federal , Mexico

10444527

Manila , P h i l i p p i n e s

10381222

Moscow ,

Russia

3730206 Melbourne , Victoria , Au s tr al ia

3268513

M o n t r a l ,

Quebec , Canada

3255944

Madrid ,

Spain

Al M

< enter >

431052

Al

M a a l l a h

al

K u b r , Egypt

420195

Al

M a n

r a h , Egypt

290802

Al Mubarraz , Saudi Arabia

258132

Al

M u k a l l

,

Yemen

227150

Al

M i n y

,

Egypt

128297

Al

M a n q i l ,

Sudan

99357

Al

M a a r y a h ,

Egypt

< ctrl -d >

Acknowledgements This project is an adaptation of the Autocomplete Me assignment developed at Princeton University by Matthew Drabick and Kevin Wayne.

3 of 3