CS285 -- Lab 2: HTML/XML Validator



->Courses
->CS285
-->Objectives
-->Homework
-->Quiz 1
-->Lab 1
->Lab 2
-->Lab 3
-->Lab 4
-->Lab 5
-->Lab 6
->Electronic Submission
->Old Exams
->C++ Examples
->MSVC Info
->STL Help
->Book Errata
->Software
->Tentative Schedule
->Course Policies

[Courses]
[Rich][Home][Rich]
[Author]

Winter 2001-2002

Overview

In this lab, you will write a program that validates the placement of HTML/XML markup commands.

Acknowledgment

This laboratory was developed by Dr. Chris Taylor.

Procedures

Text markup languages like HTML and XML make use of markup commands that are enclosed in less than/greater than symbols (e.g., <P>). All of the markup commands in XML and nearly all of the markup commands in HTML require an openning and closing command. The closing command is identical to the openning command with the addition of a forward slash immediately following the less than symbol. For example, in HTML, <B>Data Structures</B> will cause an HTML browser to display "Data Structures" in bold text.

It should be noted that the openning markup command may contain modifiers. For example <FONT COLOR="#CC0000"> issues a font change command and indicates that the font should be colored red. All modifiers (there may be more than one) should be separated by spaces. The corresponding closing command should only contain the markup command and no modifiers. For this example, the closing markup command should be </FONT>.

It is required that the markup commands be cleanly nested. That is, all markup commands must be terminated in the opposite order in which they were begun. Here are some examples:

  • <TT><STRONG>The</STRONG></TT>
    looks like:
    The
  • cow <TT>was <STRONG>very </STRONG>tired</TT>.
    looks like:
    cow was very tired.
  • cow <TT>was </TT>very <FONT COLOR="#CC0000" FACE="Helvetica, Arial">tired</FONT>.
    looks like:
    cow was very tired.
  • <TT><STRONG>The</TT></STRONG> -- illegal
  • cow <TT>was </STRONG>very <STRONG>tired</TT>. -- illegal
  • cow <TT>was </FONT>very <TT>tired</FONT>. -- illegal
Your program should make use of the stack data structure to ensure that all markup commands are cleanly nested. The program should
  • Accept a filename or list of filenames from the command line.
  • Indicate the validity of each file.
    If the file is not valid, display the unmatched openning markup command, the line number on which it was found, the unmatched closing markup command, and the line number on which it was found.

Although not required, you may wish to add functionality to your program to accomodate the following:

  • In HTML, there are a few markup commands that do not require a closing command (e.g., <BR> does not have a corresponding </BR>).
  • In XML, markup commands that do not require a closing command should contain a slash immediately before the greater than symbol (e.g., <BR/>).
  • In HTML and XML, comments are specified with a beginning markup command, <!--, and an ending markup command, -->.
  • In XML, anything between <![CDATA[ and ]]> should be ignored.
  • In XML, markup commands that begin and end with a question mark should be ignored, e.g., ignore markup commands like: <?xml version="1.0"?>.

You should test your program by running it from the DOS command prompt specifying the following four sample files: sample1.xml, sample1.htm, and sample2.xml and this webpage (lab2.htm). You should only run your program once to process all four files. Be sure to include the results of your program in your report.

Lab report (due 4:00pm, Friday of week 4)

Your report should include:

  • A discussion of how you approached the problem. This should contain a sufficient level of detail to convince your instructor that you were thinking as you worked on this assignment.
  • The Documented source code for your program.
  • Sample program output.
  • A brief description of any problems you encountered or questions you have regarding the lab.
  • A summary of your activity log indicating how much time you spent on the assignment. In addition to the total time spent on the project, please report the time in the following categories:
    • Design
    • Coding
    • Debug (before you think it's working)
    • Test (after you think it's working)
    • Documentation
    • Other
  • Any suggestions you have for how the lab could be improved.

As with any report you submit, correct spelling and grammar are required. In addition, your report should be submitted electronically following the Electronic submission guidelines. (You may wish to consult the sample report before submitting your report.) Be sure to keep copies of all your files, in case something gets lost. It may be wise to keep a diskette backup as well.

Your grade will depend on quality of design, clarity of code and documentation, as well as whether your program produces the correct results. If you have any questions, consult your instructor.

© 2001-2002 Dr. Christopher C. Taylor Office: CC-27C Phone: 277-7339 Last Updated: January 6, 2002
I am responsible for all content posted on these pages; MSOE is welcome to share these opinions but may not want to.