KAIST
EE 209: Programming Structures for EE

Assignment 1: A Word Counting Program

(This assignment borrows some statements and examples from Princeton COS 217's "De-Comment" Program Assignment)

 

Purpose

The purpose of this assignment is to help you learn or review (1) the fundamentals of the C programming language, (2) a portion of the "de-commenting" task of the C preprocessor, and (3) how to use the GNU/Unix programming tools, especially bash, emacs, and gcc209.

Rules

Make sure you study the course Policy web page before doing this assignment or any of the EE 209 assignments. In particular, note that you may consult with the course instructors, lab TAs, mailing list, etc. while doing assignments, as prescribed by that web page. However, there is one exception...

Throughout the semester, each assignment will have an "on your own" part. You must do that part of the assignment completely on your own, without consulting with the course instructors, lab TAs, mailing list, etc., except for clarification of requirements. You might think of the "on your own" part of each assignment as a small take-home exam.

For this assignment, "detecting and reporting unterminated comments" (as described below) is the "on your own" part. That part is worth 10% of this assignment.

The Task

Your task is to write a C program named wc209 that prints the number of lines, words, and characters in the input text fed from standard input to standard ouput. The program behaves similarly to Linux wc, but wc209 skips "commented text"(e.g., text in /* ... */) and does not count such text in the output.

Functionality

Your program should read characters from the standard input stream, and writes the output to the standard output stream and possibly to the standard error stream. Specifically, your program should (1) read text from the standard input stream, (2) writes the number of lines, words, and characters in the input text to the standard output stream with each comment replaced by a space, and (3) writes error and warning messages as appropriate to the standard error stream. A typical execution of your program from a shell might look like this:

./wc209 < somefile.txt 2> errorandwarningmessages
3 13 300
The output (3 13 300) indicates that there are 3 lines, 13 words, and 300 characters in the file, somefile.txt.

Here are a few rules.

In the following examples a space character is shown as "s" and a newline character as "n".

Your program should internally replace each comment with a space. Examples:

Standard Input Stream Internal Representation After Decommenting Standard Output Stream Standard Error Stream
abc/*def*/ghin abcsghin 2s2s8n
abc/*def*/sghin abcssghin 2s2s9n
abcs/*def*/ghin abcssghin 2s2s9n

Your program should define "comment" as in the C99 standard. In particular, your program should consider text of the form (/*...*/) to be a comment. It should not consider text of the form (//...) to be a comment. Example:

Standard Input Stream Internal Representation After Decommenting Standard Output Stream Standard Error Stream
abc//defn abc//defn 2s1s9n

Your program should allow a comment to span multiple lines. That is, your program should allow a comment to contain newline characters. Your program should add blank lines as necessary to preserve the original line numbering. Also, each newline character in comment is counted as one character. Examples:

Standard Input Stream Internal Representation After Decommenting Standard Output Stream Standard Error Stream
abc/*defnghi*/jklnmnon abcsnjklnmnon 4s3s13n
abc/*defnghinjkl*/mnonpqrn abcsnnmnonpqrn 5s3s14n

Your program should not recognize nested comments. Example:

Standard Input Stream Internal Representation After Decommenting Standard Output Stream Standard Error Stream
abc/*def/*ghi*/jkl*/mnon abcsjkl*/mnon 2s2s13n

Your program should detect an unterminated comment. If your program detects end-of-file before a comment is terminated, it should write the message "Error: line X: unterminated comment" to the standard error stream. "X" should be the number of the line on which the unterminated comment begins. Examples:

Standard Input Stream Standard Output Stream Standard Error Stream
abc/*defnghin Error:slines1:sunterminatedscommentn
abcdefnghi/*n Error:slines2:sunterminatedscommentn
abc/*def/ghinjkln Error:slines1:sunterminatedscommentn
abc/*def*ghinjkln Error:slines1:sunterminatedscommentn
abc/*defnghi*n Error:slines1:sunterminatedscommentn
abc/*defnghi/n Error:slines1:sunterminatedscommentn

Your program (more precisely, its main function) should return EXIT_FAILURE if it was unsuccessful, that is, if it detects an unterminated comment and so was unable to remove comments properly. Otherwise it should return EXIT_SUCCESS or, equivalently 0. Note that EXIT_SUCCESS and EXIT_FAILURE are defined in the standard header file, stdlib.h.

Your program should work for standard input lines of any length whose number of characters is less than 2 billion characters. (You don't have to consider any test cases over 2 billion characters.)

Design

Design your program as a deterministic finite state automaton (DFA, alias FSA). The DFA concept is described in lectures, and in Section 5.1 of the book Introduction to Programming (Sedgewick and Wayne). That book section is available through the web at http://introcs.cs.princeton.edu/java/51language/.

We suggest that your program use the standard C getchar function to read characters from the standard input stream.

Logistics

You should create your program on the lab machines cluster using bash, emacs, and gcc209.

Step 1: Design a DFA

Express your DFA using the traditional "ovals and labeled arrows" notation. More precisely, use the same notation as is used in the examples from Section 7.3 of the Sedgewick and Wayne book. Let each oval represent a state. Give each state a descriptive name. Let each arrow represent a transition from one state to another. Label each arrow with the character, or class of characters, that causes the transition to occur. We encourage (but do not require) you also to label each arrow with action(s) that should occur (e.g. "print the character") when the corresponding transition occurs.

Express as much of the program's logic as you can within your DFA. The more logic you express in your DFA, the better your grade on the DFA will be.

To properly report unterminated comments, your program must contain logic to keep track of the current line number of the standard input stream. You need not show that logic in your DFA.

Step 2: Create Source Code

Use emacs to create source code in a file named wc209.c that implements your DFA.

Step 3: Preprocess, Compile, Assemble, and Link

Use the gcc209 command to preprocess, compile, assemble, and link your program. Perform each step individually, and examine the intermediate results to the extent possible.

Step 4: Execute

Execute your program multiple times on various input files that test all logical paths through your code.

We have provided several files below.

samplewc209 wc209 Sample Tests

(1) Download all files to your project directory. You will find samplewc209 and 10 test C source files. You need to make samplewc209 executable (by changing file permission) by

chmod u+x samplewc209 

(2) samplewc209 is an executable version of a correct assignment solution. Your program should write exactly (character for character) the same data to the standard output stream and the standard error stream as samplewc209 does. You should test your program using commands similar to these:

./samplewc209 < somefile.c > output1 2> errors1
./wc209 < somefile.c > output2 2> errors2
diff output1 output2
diff errors1 errors2
rm output1 errors1 output2 errors2

The Unix diff command finds differences between two given files. diff output1 output2 produces output, then samplewc209 and your program have written different characters to the standard output stream. Similarly, if the command diff errors1 errors2 produces output, then samplewc209 and your program have written different characters to the standard error stream.

You also should test your program against its own source code using a command sequence such as this:

./samplewc209 < wc209.c > output1 2> errors1
./wc209 < wc209.c > output2 2> errors2
diff output1 output2
diff errors1 errors2
rm output1 errors1 output2 errors2

Step 5: Create a readme File and an Ethics document

Use emacs to create a text file named readme (not readme.txt, or README, or Readme, etc.) that contains:

Descriptions of your code should not be in the readme file. Instead they should be integrated into your code as comments.

Your readme file should be a plain text file. Don't create your readme file using Microsoft Word, Hangul (HWP) or any other word processor.

For every assignment submission, you must submit your own Ethics document that pledges that you did not violate any rules of course policy or any rules of ethics enforced by KAIST while doing this assignment.

Please edit an Ethics document for assignment 1 and submit it along with other files. Please write the assignment number, your name, sign on it, and make it into a PDF file (you can convert it into the PDF format in the FILE menu of MS Word).

Ethics document

Step 6: Submit

Your submission should include your wc209.c file, the files that gcc209 generated from it, and your readme file.

Also submit your DFA. Create your "labeled ovals and labeled arrows" DFA and make it in a PDF file. A DFA drawn using drawing software (e.g. Microsoft PowerPoint) would be best. But it is sufficient to submit a photo of a neatly hand-drawn DFA. Make sure you convert the file into a PDF file.

Please name the DFA file dfa.pdf (not dfa.txt, DFA, DFA.jpg, DFA.png,etc.) We cannot accept your DFA via e-mail.

Create a local directory named 'YourID_assign1' and place all your files in it. Then, tar your submission file by issuing the following command on a lab machine (assuming your ID is 20191234):

mkdir 20191234_assign1
mv wc209.c wc209.i wc209.s wc209.o wc209 readme EthicsOath.pdf dfa.pdf 20191234_assign1
cd 20191234_assign1
tar -zcf 20191234_assign1.tar.gz *

Upload your submission file (20191234_assign1.tar.gz) to our KLMS assignment submission page. We do not accept e-mail submission (unless our course KLMS page is down).

Please follow the same procedure for the future assignments.

Your submission file should look like this:

20191234_assign1.tar.gz
wc209.c
wc209.i
wc209.s
wc209.o
wc209
dfa.pdf
readme
EthicsOath.pdf
  • You can use this checker before you submit.
  • The name of the files must be exact. Note that file name is case sensitive. You might not get the full credit if the names are not correct.

    Late Submission

    You can use the late submission (late pass; as known as a token) which can be late up to three days without penalty for the first four programming assignments. That is, you can apply your late submission days (within 3 days in total) spread over the first four programming assignments. The minimum granulaity is one day: if you are 1 hour late, that's still counted as one day late. If you're going to spend your free late days, please say so in your readme file.

    You need to notify your token usage in your readme file in following format; [TOKEN=n] where n is the number of tokens that you want to use. For example, if you want to use 2 tokens, following text MUST be included in your readme.

    [TOKEN=2]

    You need to strictly follow this format. Otherwise, TAs will not consider your late token usage.

    Grading

    If your code cannot be compiled at eelab5 with gcc209, we cannot give you any points. Please double check before you submit.

    We will grade your work on two kinds of quality: quality from the user's point of view, and quality from the programmer's point of view. To encourage good coding practices, we will deduct points if gcc209 generates warning messages.

    From the user's point of view, a program has quality if it behaves as it should. The correct behavior of your program is defined by the previous sections of this assignment specification, and by the behavior of the given samplewc209 program.

    From the programmer's point of view, a program has quality if it is well styled and thereby easy to maintain. In part, style is defined by the rules given in The Practice of Programming (Kernighan and Pike), as summarized by the Rules of Programming Style document. For this assignment we will pay particular attention to rules 1-24. These additional rules apply:

    Please note that you might not get a full credit even if you pass the test with given test case. TAs might use another test cases to test functionality and robustness of your implementation.

    Special Note

    As prescribed by Kernighan and Pike style rule 25, generally you should avoid using global variables. Instead all communication of data into and out of a function should occur via the function's parameters and its return value. You should use ordinary call-by-value parameters to communicate data from a calling function to your function. You should use your function's return value to communicate data from your function back to its calling function. You should use call-by-reference parameters to communicate additional data from your function back to its calling function, or as bi-directional channels of communication.

    However, call-by-reference involves using pointer variables, which we have not discussed yet. So for this assignment you may use global variables instead of call-by-reference parameters. (But we encourage you to use call-by-reference parameters.)

    In short, you should use ordinary call-by-value function parameters and function return values in your program as appropriate. But you need not use call-by-reference parameters; instead you may use global variables. In subsequent assignments you should use global variables only when there is a good reason to do so.

    FAQ

    Question

    I got following error message: gcc command not found.

    Answer

    Try sudo apt install build-essential to install gcc.

    Question

    Will I get deduction for writing unnecessary codes(not used for grading main requirements) in my source code in terms of programmer’s point of view?

    Answer

    TAs will not strictly deduct points for writing unnecessary codes. However, please try to refactor your codes as possible as you can before submission so that your source code is readable for grading.

    Question

    Does this mean points can be deducted if we do not use any functions? It doesn't seem very necessary.

    Function modularity: Your program should not consist of one large main function. Instead your program should consist of multiple small functions, each of which performs a single well-defined task. For example, you might create one function to implement each state of your DFA.

    Answer

    We encourage you to modularize your code into functions, even if doing so does not seem necessary. It is a good programming practice.

    Question

    I am retaking the course. Is it allowed to use my own previous codes for the assignment?

    Answer

    As long as the previous code is something that you wrote, I don't see a problem with it. However, it might be easier to write from beginning, instead of trying to remember what you wrote back then.

    Note that, the content of the assignment might be different from the previous assignments.

    Question

    When is the final due date if we use all three tokens?

    Answer

    You can submit your assignment until 72 hours after to original due date. If you choose to take late deduction points, you can still submit your assignment until 48 hours after to your own due date.

    Question

    Which files should I submit?

    Answer

    We clarified the submission file format and files that you should include in your submission. Please follow the instructions above, otherwise you will not get full credit.

    Question

    Can we use other editors (e.g., vim) except emacs?

    Answer

    Any editor is fine.

    Question

    What should we do in the case of input like this: abc"def/*abc*/def". Should we consider the inner /*abc*/ just as characters or should we see them as a comment?

    Answer

    We provide a sample program "samplewc209". So, please use the program to confirm your thought. The correct behavior of your program is defined by the assignment document, and by the behavior of the given samplewc209 program.

    Question

    Can I use global variable or pointer in the Assignment 1?

    Answer

    Yes.

    Question

    In order to recognize space, do I have to use isspace() function?

    Answer

    You don't need to use isspace() as long as your program works properly.

    Question

    Do I need to produce an actual de-commented string within my code, as well as line/word/char output result?

    Answer

    You don't need to produce de-commented string. We only need the line/word/char output result.

    Question

    How can I return error message as Standard error stream?

    Answer

    You can use fprintf(stderr, "message");

    Question

    Do I have to use state for my assignment? Is it fine to use {if, else} conditional statement instead?

    Answer

    We encourage you to do your assignment, by referring to your DFA state graph. If you use conditional statement, your source code will not be efficient and not readable.

    Question

    How can I check if my output is correct with solution output?

    Answer

    You can use diff to compare the result.

    Question

    As I know, the gcc209 compiles the code as C99. However, in assignment it mentions that our code follows the syntax of C99. Which syntax should we follow writing our code?

    Answer

    It means that the syntax of comment for assignment input follows that of the C language (C99). It does not mean you code should follow C99.