CS286 -- Lab 3: Huffman Compression



->Courses
->CS286
-->Homework
-->Quiz 1
-->Lab 1
-->Lab 2
->Lab 3
-->Lab 4
->Electronic Submission
->MSVC++ Info
->STL Info
->MFC/GUI Info
->Tentative Schedule
->Course Policies

[Courses]
[Unix][Home][Photos]
[PHome]

Spring Quarter 2000

Purpose

This lab assignment is designed to provide exposure to an example of a greedy algorithm and develop more advanced software design skills.

Assignment

You are to design a C++ program that will compress and uncompress the words.txt file. Your program should allow for two forms of compression:

  • Optimal fixed length codeword (FLC) compression
  • Optimal variable length codeword (VLC) compression using Huffman Coding

You may assume that the file consists of a total of 28 symbols (newline, end of file, and the twenty-six lowercase letters of the English alphabet). If you choose to make your program capable of handling a larger set of symbols, be sure to mention this in your report. Although not required, you may wish to consider using command line arguments for the user interface.

You may wish to consider the following questions when preparing your report:

  • Which method produces a smaller file? Is this what you would expect?
  • Is it possible that an input file could be constructed that would cause the better of the two methods in this case to produce the larger output?
  • Would sorting the words.txt file before compression improve the results of either compression scheme?
  • What is the largest code word needed using Huffman compression?

Design Decisions

While the concepts implemented in this assignment are not difficult, the details involved in implementing them are not trivial. Be sure to spend adequate time in the design stages before moving on. I have intentionally left many of those design decisions for you to discover.

Preliminary deliverable (due 11:00pm, the day prior to week 7 lab)

By the end of the first week you should have the FLC method implemented. Each team should email (as an attachment) the binary file corresponding to the FLC compression of the words.txt file. Submitting this deliverable on time is worth 20% of the total assignment grade. You will be asked to demonstrate the creation of this file at the beginning of week 7 lab.

Lab report (due 11:00pm, the day prior to week 8 lab)

The lab report should be self-contained. That is, it should be possible for someone to understand what you did and why without seeing anything other than your report. Your report should include:

  • Purpose
  • Problem Statement
  • Procedure -- what approach you used to solve the problem
  • Documented source code
  • Discussion (your analysis of the compression rates, answers to the questions, the reasons for defining the classes as you did, any problems you encountered, etc...)
  • Program output -- include the following:
    1. The frequency count of each letter found in words.txt (within \begin{verbatim} and \end{verbatim} formatting commands)
    2. The VLC table which indicates the binary codeword used for each symbol found in the input file (also within appropriate formatting commands)
    3. The binary VLC compressed file of the words.txt file (as an email attachment).
  • CSP level 2.0 documentation (this may be submitted in hardcopy at the beginning of lab)
  • Conclusions (what you learned, suggestions of how the lab could have been better, things you would have done differently, further reactions to the CSP, etc.)

As with any report you submit, correct spelling and grammar are required. In addition, your report should be submitted electronically following the Electronic Submission Guidelines. (You may wish to consult the sample report before submitting your report.) Be sure to keep copies of all your files, in case something gets lost. It may be wise to keep a diskette backup as well.

If you have any questions, consult the instructor.

Office: CC-27C, Phone: 277-7339
Last Updated: April 24, 2000
© 2000 Dr. Christopher C. Taylor