CS385 -- Lab 4: Compressing
Fall 2004
Objectives Addressed
- Be able to apply asymptotic time complexity analysis to choose among
competing algorithms.
- Understand how graph and tree structures are implemented.
- Be familiar with engineering applications for many of the fundamental
algorithms discussed in the course.
Purpose
This lab assignment is designed to provide exposure to an example
of a greedy algorithm and develop more advanced software design skills.
Assignment
You are to design a C++ program that will compress and uncompress the
words.txt file. Your program
should allow for two forms of compression:
- Optimal fixed length codeword (FLC) compression
- Optimal variable length codeword (VLC) compression using Huffman
Coding
You may assume that the file consists of a total of 28 symbols (newline,
end of file, and the twenty-six lowercase letters of the English alphabet).
If you choose to make your program capable of handling a larger set of
symbols, be sure to mention this in your report. Although not required,
you may wish to consider using command line arguments for the user
interface.
You should consider the following questions when preparing your
report:
- Which method produces a smaller file? Is this what you would expect?
- Is it possible that an input file could be constructed that would
cause the better of the two methods in this case to produce the larger
output?
- Would sorting the words.txt file before compression
improve the results of either compression scheme?
- What is the largest code word needed using Huffman compression (using the
words.txt file)?
Design Decisions
While the concepts implemented in this assignment are not difficult,
the details involved in implementing them are not trivial. Be sure to spend
adequate time in the design stages before moving on. I have intentionally
left many of those design decisions for you to discover.
Week 10 demo 1 (due beginning of week 10, lecture 1)
You will be required to demonstrate your FLC algorithm for compressing and
uncompressing a text file.
Week 10 demo 2 (due beginning of week 10, lecture 3)
You will be required to demonstrate your VLC algorithm for compressing and
uncompressing a text file.
Lab report (due 4:00pm, Friday of week 10)
Each pair should submit one lab report. Your report should include:
- Discussion containing:
- An analysis of appropriate patterns
- An analysis of the compression rates (do they make sense?)
- Answers to the questions
- Your experience/impressions of the CSP
- How you an your partner worked (or didn't work) together
- Any problems you encountered
- Etc...
- Program output -- include the following:
- The frequency count of each letter found in words.txt (within
<code><![CDATA[ and
]]></code> formatting commands)
- The VLC table which indicates the binary codeword used for each symbol
found in the input file (also within appropriate formatting
commands)
- The binary VLC compressed file of the words.txt file (as a
separate file that is linked within the XML report following the
naming convention 385MSOEloginL4.vlc).
- CSP spreadsheet (continue with the spreadsheet you used for
lab 3, but use the naming convention:
385MSOEloginL4.xls).
- Documented source code
As with any report you submit, correct spelling and grammar are
required. In addition, your report should be submitted electronically
following the Electronic Submission
Guidelines. (You may wish to consult the
sample report before submitting your
report.) Be sure to keep copies of all your files, in case something
gets lost.
If you have any questions, consult the instructor.