unixcommands: a Moodle question type for mastering Unix/Linux commands

Keywords: Moodle, question type, Unix/Linux commands, exact and fuzzy matching

This question type has been designed to help master Unix/Linux commands and provide the means for the exact or fuzzy comparison of two strings or two groups of strings.

The answer form contains the following three fields for every possible correct answer: Command, Command options, Grade options.

Let’s suppose that a student is supposed to answer the following simple question:

Q: List the content of /etc directory in the long format with hidden files included.

The fields should be filled in the following way:

Command:          ls
Command options:  -a;; -l;; -- /etc
Grade options:    p

All the options should be separated with “;;” token and any arguments should be separated from the options by “double dash” token. Since the options may be given in any order the grade options field must contain the grade option “p”. Therefore the following reponses will treated as correct ones:

ls -al /etc
ls -la /etc
ls -a -l /etc
ls -l -a /etc

Of course ls -al /etc/, etc are also correct responses and they can be accounted for by defining the second correct answer within the form.

Note that if a multicharacter option is used it must be composed solely of letters or digits.

Let’s take another example.

Q: Output the first 20 lines of /etc/hosts file.

In this case the fields should be filled thus:

Command:          head
Command options:  -n 20;; -- /etc/hosts
Grade options:

Note the space must separate option and its value. Note also that the command options could be also given as -n 20 -- /etc/hosts.

In this the response head -20 /etc/hosts is also correct and should be added as another possible answer, namely

Command:          head
Command options:  -20 -- /etc/hosts
Grade options:

The following grade options can be used:

i    - ignore case
p    - ignore the order of options
f<n> - fuzzy match: strings treated as single words
F<n> - fuzzy match: strings treated as arrays of words

c<r> - character penalty
t<r> - transpose penalty
w<r> - word penalty

where <n> denotes an integer in the range 0..100 and <r> – a real number 0<=r<=1.0. The meaning and usage of the penalty options and the fuzzy match ones is explained below.

By default the following grade options are used: c20; f80; t0.5; w0.2. Note that the grade options must be separated with the semicolon token (spaces, if any, are ignored).

Note that curly brackets cannot be used within the command options field. For example, -f -- a{b,c}.mp3 will not work since Moodle scripts remove the right bracket when parsing the content of this field.

The grading of the response is done in three steps.

  1. The command is extracted from the response and compared with the string given in the command field of the answer form. The compare2Words function is used to this end with the grading option “f100” which means that the perfect match is expected between the two single word strings. Any mismatch renders the whole response incorrect and the score is set to 0. If grading option “f<n>”, 0<n<100, is used a sort of fuzzy matching is performed (see below).

  2. If any arguments are present they are compared next. A single argument is compared with the reference one using the compare2Words function. If two or more arguments are present then the compareMultWordStrings function is used and the corresponding strings are compared word by word using the compare2Words function.

  3. The options are compared. The grade option “p” should be used if their order is irrelevant, i.e. options may be permuted. The options must match exactly, i.e. any mistakes or typos render the response incorrect and the final score is set to zero.

Response and reference words are of equal lengths

Let’s examine how the fuzzy matching is performed by the compare2Words function.

If two words are of equal lengths and are equal then the match is perfect and the function returns with the score equal to the number of characters that match, i.e. the length of the words.

However, it may happen that the response contains some typos, i.e. it matches the answer modulo these typos. Instead of rejecting the answer one could accept it but lower the grade accordingly. Therefore when Nt transpositions are spotted the score is given by

S = ( l - Nt Pt )

where l is the length of the word and Pt is the transposition penalty. By default it is set to 0.2 but can be adjusted by the transposition grading option “t<r>”.

By default the grading option “f75” is used, i.e. one is ready to accept the response when the final percent score reaches at least 75% of the perfect match one. Let’s suppose that the expected answer is parted and the command part of the response reads partde, i.e. there is a transposition. According to the above formula the score is not 6 but 5.8 since any transposition by default incurs 0.2 penalty. The percent score is thus 5.8/6=0.98 in this case and is greater than 75/100 threshold level. If two transpositions would be spotted the percent score would be equal to 0.84 and the test would also pass.

If there are no transpositions one can try to determine the maximum number of the consecutive characters in both words that match, Nm, and calculate the score as

S = Nm ( 1 - Pc)

where Pc is the character penalty. By default “c0.1” option is used and the larger the value the higher the cost of a single character mismatch and the smaller the score. In case one character mismatch is encountered, e.g. the response reads partes instead of parted, the score is 4.5 and the percent one is 0.75. If “c0.2” grade option is used the percent score is 0.67 and the test fails.

It may happen that the answer is correct up to spurious words following the correct ones. In such a case the score is decreased by the number of these extra words.

Response and reference words are of unequal lengths

If the are two words of unequal lengths there are the following cases to be considered.

  1. la = lr - 1

    The reference word, $answer (of length la), is shorter than the response one, $response (of length lr) by one character and is the part of it, i.e. strpos($response,$answer) is true. In this case the score is calculated as

    S = la - Pt

    where the single spurious character incurs the same Pt penalty as a single transposition.

  2. la = lr + 1

    The response word is shorter than the response one by one character and is the part of it, i.e. strpos($answer,$response) is true. In this case the score is again calculated as

    S = la - Pt

  3. When the response word is longer than the answer the score is equal to the length of the longest substring of the response, lm, that is the part of the answer times the penalty factor (1 - Pc).

    If lm = la then the final score is equalt to

    S = [ lm - (lr - la) ] ( 1 - Pc)

    Otherwise

    S = lm ( 1 - Pc)

    For example, partedd will produce 0.97 score but parteddx only 0.60=(6-2)(1-0.1)/6 and the test will fail.

Response and reference strings are composed of two or more words

In cases we have to compare strings consisting of two or more words the compareMulti-WordStrings functions should be used. It takes every word of the reference string and compares it with the corresponding word of the response (i.e. its argument part). Let’s assume that we have a reference string composed of Nw words with lengths l1, l2, etc. The score is calculated according to the formula:

S = S1 + S2 + … + (Nw - 1)

where Si are the values returned by the compare2Words function. If i>1 and a mismatch is encountered then Si is multiplied by (1-Pw), where Pw is the word penalty. By default it is set to 0.2 and can be changed via the “w<r>” grade option. Note that if Pw is set 1 any word except for the first one with even a single character mismatch will be treated as missing. The last term accounts for spaces separating the words.

The score returned by the compare2Words or compareMultWords functions is used to calculate the final percent score according to the formula S/N, where N is the lenght of the reference string (either a single or multword one). And if this percent score is higher that the corresponding threshold f<n>/N the response is rendered (partially) correct.

If the response contains also the argument part the command and the argument scores are combined thus

Sa / Na - ( 1 - Sc/ Nc )

Examples of usage

Let’s examine a couple of examples.

  1. tar -xzf etc.tgz

    Command:          tar
    Command options:  -x;; -z;; -f etc.tgz
    Grade options:    p
    

    Correct or partially correct responses (the score given as a comment) :

    tar -x -z -f etc.tgz      # 1.00
    tar -xz -f etc.tgz        # 1.00
    tar -xzf etc.tgz          # 1.00
    tar -zxf etc.tgz          # 1.00
    ...
    tra -xzf etc.tgz          # 0.93
    atr -xzf etc.tgz          # 0.93
    tar -xZf etc.tgz          # 0.00
    tar  -zf etc.tgz          # 0.00
    tar -xzf etc.tgzz         # 0.00
    
  2. parted -s /dev/sda mkpart primary 1000 -1000

    Answer1:

    Command:          parted
    Command options:  -s -- /dev/sda mkpart primary 1000 -1000
    Grade options:    p
    

    Answer2:

    Command:          parted
    Command options:  -s -- /dev/sda mkpart primary 1000MB -1000MB
    Grade options:    p
    

    Correct or partially correct responses:

    parted -s /dev/sda mkpart primary 1000 -1000         # 1.00
    parted -s /dev/sda mkpart primary 1000MB -1000MB     # 0.85
    

    Note that the second response is perfectly correct but the score is not 1.00. This is due to the fact that the first correct or partially correct response wins since Moodle does not try to find a better match. However, when the “f100” grade option is used both these answers result in 1.00 score.

Fuzzy matching of two strings

The fuzzy matching can be used to compare two strings. The strings can be compared as single words or as multiword ones. To choose the former mode the “f<n>” grade option must be used. The latter mode is selected by using the “F<n>” grade option instead.

Suppose we ask for the BIOS abbreviation to be expanded and expect “Basic Input Output System” as the correct answer.

Let’s compare the effects of using “f50” and “F50” grade options.

Example 1. Grade option: i;f50

Command:          Basic Input Output System
Command options:
Grade options: i;f50

These are the gradings for a few response strings:

Basic Input Output ystem      # 0.86
basic input utput system      # 0.86
basic input output syste      # 0.99
basci input output systme     # 0.98

Example 2. Grade option: i;F50

Answer1

  Command:          Basic Input Output System
  Command options:
  Grade options: i;F50
Answer2

  Command:          Basic Input/Output System
  Command options:
  Grade options: i;F50

Now the grading results are the following:

BASIC INPUT OUTPUT SYSTEM     # 1.00
basic input/output system     # 1.00
basic input utput system      # 0.99
basci input output systme     # 0.94
basic inut autput system      # 0.85
basic inut autput systen      # 0.75
basik inut autput systen      # 0.67