Median Heaps Solution

$35.00

Description

Assignment

Note: Running time is one of the most important considerations in the implementation of a data structure. Programs that produce the desired output but exceed the required running times are considered wrong implementations and will receive substantial deductions during grading.

Your assignment is to implement the MedianHeap data as described above. No header files are provided so you are free to design your own C++ classes, subject to the requirements below.

The name of the class must be MedianHeap.

You must also have a separate class called Heap.

The MedianHeap class must use two Heap data members: a min heap and a max heap.

The Heap implementation must be array based and start at index 1. (Just leave index 0 unused.)

Both MedianHeap and Heap must be templated and work with base C++ types (e.g., int and float) and any well-implemented classes.

The Heap class must store function pointers to comparison functions and use the function pointers to invoke functions that compare the data items. In particular, it should not assume that the template parameters are instantiated with types that have <, <=, ==, >= and > defined.

No STL container classes may be used in this programming project. In particular, you may not use vector instead of an array in your Heap class.

Specific Requirements

Your MedianHeap class must have these member functions. There are additional requirements for your Heap class (other than the ones already specified above), since external code should not be invoking the Heap class member functions directly.

Your class definitions must be in a single file called MedianHeap.h (case sensitive). Client programs that use the MedianHeap class should be able to do so simply using:

#include “MedianHeap.h”

The MedianHeap class must have a constructor with the signature:

template <typename T>

MedianHeap<T>::MedianHeap( bool (*lt) (const T&, const T&), bool (*gt) (const T&, const T&), int cap=100 ) ;

First two parameters are function pointers that compare items of type T. The first function returns true if the first parameter is less than the second parameter. The second function returns true if the first parameter is greater than the second. The constructor must create a MedianHeap object capable of holding cap items. The object created by the constructor must be useable immediately.

The MedianHeap class must have a copy constructor with the signature:

template <typename T>

MedianHeap<T>::MedianHeap(const MedianHeap<T>& otherH) ;

The constructor must create a copy of the MedianHeap object given in the parameter. The copied object must have its own allocated memory.

It is possible that your class design allows your MedianHeap to use the compiler-supplied copy constructor. In that case, you do not have to write a explicit copy constructor, but it is still your responsibility to make sure that the compiler-supplied copy constructor works correctly.

The MedianHeap class must have a destructor with the signature:

template <typename T>

MedianHeap<T>::~MedianHeap() ;

The destructor must deallocate any dynamically allocated memory.

It is possible that your class design allows your MedianHeap to use the compiler-supplied destructor. In that case, you do not have to write a explicit destructor, but it is still your responsibility to make sure that the compiler-supplied destructor works correctly.

The MedianHeap class must have an overloaded assignment operator with the signature:

template <typename T>

const MedianHeap<T>& MedianHeap<T>::operator=(const MedianHeap<T>& rhs) ;

The assignment operator must deallocate memory in the host object and copy the rhs MedianHeap into the host.

It is possible that your class design allows your MedianHeap to use the compiler-supplied assignment operator. In that case, you do not have to write a explicit assignment operator, but it is still your responsibility to make sure that the compiler-supplied assignment operator works correctly.

The MedianHeap class must have a member function called size() that returns the total number of items in the MedianHeap. The size() function must have the signature:

template <typename T>

int MedianHeap<T>::size() ;

The size() function must run in constant time to receive full credit.

The MedianHeap class must have a member function called capacity() that returns the maximum number of items that can be stored in the MedianHeap. The capacity() function must have the signature:

template <typename T>

int MedianHeap<T>::capacity() ;

The capacity() function must run constant time to receive full credit.

The MedianHeap class must have a member function called insert() that adds the item given in the parameter to the MedianHeap. The insert() function must have the signature:

template <typename T>

void MedianHeap<T>::insert(const T& item) ;

If the MedianHeap cannot hold any more items (i.e., it has reached capacity), then insert() should throw an out_of_range exception.

The insert() function must run in O(log n) time to receive full credit.

The MedianHeap class must have member functions called getMedian(), getMin() and getMax() that returns a copy of an object with the median, minimum and maximum keys, respectively. These functions should have the signatures:

template <typename T>

T MedianHeap<T>::getMedian() ;

template <typename T>

T MedianHeap<T>::getMin() ;

template <typename T>

T MedianHeap<T>::getMax() ;

These functions should throw an out_of_range exception if the MedianHeap is empty. For full credit, these functions must run in constant time.

Note that in computer science, the median value is always one of the stored values. In case the number of items in the MedianHeap is even, either one of the two objects closest to the middle is acceptable as the median value. Do not average these two values. (If the type T does not have the + operator defined, you cannot add them anyway.)

The MedianHeap class must have a member function called deleteItem with the following signature:

template <typename T>

bool MedianHeap<T>::deleteItem(T& givenItem, bool (*equalTo) (const T&, const T&) ) ;

The deleteItem() function should look for an item stored in the MedianHeap that is equal to the givenItem parameter according to the equalTo() function. The equalTo() function does not have to determine equality based on the key value used by the MedianHeap. For example, in the City class given in the test programs below, the MedianHeap might be keyed on the population of the cities, but we may invoke the deleteItem() function to remove a city with a particular name. In that case, the equalTo() function might compare the name and state of the city. You may not assume that givenItem has a valid entry in every field. If we are removing a city based on its name, the population field might have a garbage value. So, you will need to check both the min heap and the max heap to see if there is an item that matches.

If deleteItem() successfully finds and removes an item, it should return true, and copy the values of the found item in the givenItem parameter. (That is why the parameter is passed by reference.) In the City class example, you should be able to delete “Anchorage, AK” from the MedianHeap and obtain its statistics (latitude, longitude, population and elevation) from the reference parameter.

If deleteItem() does not find an item to remove, it should return false. The MedianHeap data structure should not be modified in this case. The parameter givenItem should also not be modified.

If the MedianHeap is empty, the deleteItem() function should throw an out_of_range exception.

The deleteItem() function must run in O(n) time to receive full credit.

The MedianHeap class must have a member function called dump that prints out the contents of the MedianHeap including the positions of each key in the max heap and the min heap. See formatting examples in the sample output below. The dump() function must have the signature:

template <typename T>

void MedianHeap<T>::dump() ;

Calling dump() should not modify the data structure in any way.

The MedianHeap class must have four member functions used for grading with the following signatures:

template <typename T>

int MedianHeap<T>::maxHeapSize() ;

template <typename T>

int MedianHeap<T>::minHeapSize() ;

template <typename T>

T MedianHeap<T>::locateInMaxHeap(int pos) ;

template <typename T>

T MedianHeap<T>::locateInMinHeap(int pos) ;

These functions are used for grading purposes. So, you should make sure that they are implemented in any submission.

The functions maxHeapSize() and minHeapSize() should return the number of items in the max heap and min heap. It should be the case that the sum of these two sizes equals the value returned by MedianHeap::size(). (I.e., don’t have any items, such as the median item, set aside and not stored in one of the two heaps.)

The functions locateInMaxHeap() and locateInMinHeap() returns a copy of the item in position pos in the max heap and min heap, respectively. Remember that you are required to store the items starting at index 1. (The slot in index 0, is unused.)

If the pos parameter has an invalid index, these functions should throw an out_of_range exception.

Your code must run without segmentation faults and without memory leaks. For grading purposes, memory leaks are considered as bad as segmentation faults. This is because many segmentation faults are cause by poorly written destructors. A program with an empty destructor might avoid some segmentation faults but will leak memory horribly. Thus, not implementing a destructor or not deleting unused memory must incur a penalty that is equivalent to a segmentation fault.

Test Programs

Here are some test programs for checking the correctness of your code. As usual, you should also do your own testing. Your programs should compile and run with the provided tests without seg faulting or leaking memory. However, good runs with these programs do not guarantee that your code is bug free.

Some of the code use the City class, defined and implemented: City.h and City.cpp.

A simple test using MedianHeap<int>:

p4test1.cpp and p4test1.txt.

A bigger test using MedianHeap<int>. This version also runs the sanityCheck() function:

p4test2.cpp and p4test2.txt.

A simple test using MedianHeap<City> (compile with City.cpp):

p4test3.cpp and p4test3.txt.

This test shows that your MedianHeap class can handle function pointers. One of the MedianHeaps is keyed on the population of the cities, while the second one is keyed on the elevation.

Similar to the previous case, except this program uses City pointers (i.e., MedianHeap<City *>:

p4test4.cpp and p4test4.txt.

A bigger test using the City class. This program expects to receive data in standard input. Test data are in:

CityData10.txt, CityDataMD.txt and CityDataUS.txt.

p4test5.cpp and p4test5.txt.

You can run the program in Unix this way:

linux1% g++ City.cpp p4test5.cpp -o t5.out

linux1% ./t5.out < CityData10.txt

linux1% ./t5.out < CityDataMD.txt

linux1% ./t5.out < CityDataUS.txt

Testing the copy constructor, destructor and assignment operator. You should run this version in valgrind to check for memory leaks.

p4test6.cpp and p4test6.txt.

Timing runs. This program takes a command line argument that specifies the number of repeititions of the timing runs.

p4test7.cpp and p4test7.txt.

Implementation Notes

Here we list some recommendations and point out some traps and pitfalls.

Template implementations can either be in a separate file called MedianHeap.cpp or they can be placed in MedianHeap.h after the templated class definitions. (There are pros and cons for each style.) Choose whichever style you prefer, but do recall that in the first method you have to #include the .cpp file at the bottom of the .h file and that the .cpp file has to be guarded.

Please review how to implement templates. Do this before you start coding, and certainly do this before you post questions on Piazza.

Do not define Heap as a nested class within MedianHeap. Yes, it actually makes sense, but the C++ syntax for templated inner classes are horrendous. Just don’t.

You should think about which data members and member functions belong to MedianHeap and which ones belong to Heap. Good design will make your coding much simpler. Simpler code is easier to debug.

Remember that we are not using the array slot in index 0. Make sure that you allocate the correct amount of space for the min heap and the max heap arrays. Also, note that one of the arrays will be bigger than the other one if there’s an odd number of items in the MedianHeap.

Potential Complex Edge Case to Watch For: If the dataset consists of a set of items with the same key value (which is completely allowed in our project), a situation can arise that might cause you to report the min item and max item incorrectly:

Suppose that we are using the City class and we have data from some county in Kansas where all of the cities have the same elevation. Depending on how you keep track of the min elevation and how you break ties, it is possible that the min item was the root of the max heap which then gets moved from the max heap to the min heap (to rebalance heap sizes) and then gets removed from the min heap by the deleteItem() function.

In that situation, you might end up reporting a city as having the min elevation that is not a city that is still in the MedianHeap. This is a very unusual situation because the entire max heap will have to have the same elevation in order for the max item to have the min elevation.

You can fix this situation by making “deleteRoot” of the max heap check to see if happens to equal the min item and make the last item of the heap the new min item, because the only way that can happen is if the entire heap has the same elevation.

Here’s an example program that illustrates this situation. It is tuned to expose the bug in a particular implementation. If you want to see if you have the same bug, you have to change the program around.

Main program that adds 10 cities, all with eleveation 50: obscure.cpp.

In the buggy version, the output shows that “Town 1” gets moved from the min heap to the max heap and then back to the min heap. Town 1 eventually gets deleted, but is still reported as the min and max items: buggy.txt.

The fixed version detects when the root of a max heap happens to be the min value (and when the root of a min heap happens to be the max value): debugged.txt.

What to Submit

Before submitting, remove all extraneous output statements from your program. (Just comment them out.) Your typescript file should have only a few hundred lines of output; if they have significantly more than that, you’re doing something wrong, and We will likely not look at the typescript file (and you will therefore not get credit for it!).

You must submit the following files to the proj4 directory.

MedianHeap.h

MedianHeap.cpp

If you put the implementation of the MedianHeap class in the .h file, then you only have to submit MedianHeap.h.

If you followed the instructions in the Project Submission page to set up your directories, you can submit your code using this Unix command command.

cp MedianHeap.h MedianHeap.cpp ~/cs341proj/proj4/

Your code should compile with the test programs using these Unix commands on GL:

linux1% g++ -I . ../../00Proj4/p4test1.cpp -o t1.out

linux1% g++ -I . ../../00Proj4/p4test2.cpp -o t2.out

linux1% g++ -I ../../00Proj4/ -I . ../../00Proj4/City.cpp ../../00Proj4/p4test3.cpp -o t3.out

linux1% g++ -I ../../00Proj4/ -I . ../../00Proj4/City.cpp ../../00Proj4/p4test4.cpp -o t4.out

linux1% g++ -I ../../00Proj4/ -I . ../../00Proj4/City.cpp ../../00Proj4/p4test5.cpp -o t5.out

linux1% g++ -I ../../00Proj4/ -I . ../../00Proj4/City.cpp ../../00Proj4/p4test6.cpp -o t6.out

linux1% g++ -I . ../../00Proj4/p4test7.cpp -o t7.out

Use the Unix script command to record yourself compiling these programs. Then, use p4test7.cpp to perform some timing runs:

linux1% time ./t7.out 15000

1.038u 0.006s 0:01.04 99.0% 0+0k 0+0io 0pf+0w

linux1% time ./t7.out 30000

2.237u 0.014s 0:02.25 99.5% 0+0k 0+0io 0pf+0w

linux1% time ./t7.out 60000

4.365u 0.023s 0:04.39 99.7% 0+0k 0+0io 0pf+0w

Finally, remember to exit from script.