Welcome, Guest User :: Click here to login

Logo 67272q

Lab 3: Writing Regular Expressions

Due Date: February 16

Objectives

  • Teach students to write regular expressions
  • Practice creating classes in Ruby
  • Review using irb in development

README.md

Important!

In order to be able to understand all the aspects of the lab, please go through the regex tutorial available at:
https://www.rubyguides.com/2015/06/ruby-regex/

Also, you can use Rubular, the Ruby online regular expression editor, if needed: https://rubular.com/


  1. We are going to begin by writing a simple program for Ruby to test regular expressions and then modify it several times. To begin, create a new file called regex_tester.rb using your preferred editor/IDE.

  2. Create another file called test_arrays.rb and add to it the arrays listed below.

    # TEST ARRAYS FOR REGEX TESTER
    # ----------------------------
    %w[http://www.google.com apidock.com www.microsoft.com http://www.heimann-family.org http://www.kli.org http://www.acac.net http://www.cmu.edu http://is.hss.cmu.edu www.amazon.co.uk]
    
    %w[1234567890123456 1234-5678-9012-3456 1234\ 5678\ 9012\ 3456 1234567890 #1234567890123456 1234|5678|9012|3456 12345678901234567]
    
    # INITIAL REGEX PATTERN FOR REGEX TESTER
    # --------------------------------------
    pattern = /^(http:\/\/)?www\.\w+\.(com|edu|org)$/
    
  3. If you did not do this earlier, install the cheat gem. (If you are unsure, type gem list and see if it is there. To install, type gem install cheat.) Once this gem is installed, type cheat regex on the command line to get a useful cheat sheet for regular expressions.

  4. In regex_tester.rb, create a new class called RegexTester. This class should have a constructor (initialize) that takes two arguments called pattern and statement. Within this method, add two lines of code that will set the instance variable pattern to the local variable 'pattern' and statement to the local variable 'statement'. Finally, in case no value of pattern was passed when the object was created, we will generate a setter (and a getter) with the simple line: attr_accessor :pattern, :statement.

By the way, have you been using Git to save all code? If not, I suggest you start immediately and commit your code to git on a regular basis.

  1. Time to test these methods. At the end of the file after the class has ended, add the following code:
  pattern =  /^(http:\/\/)?www\.\w+\.(com|edu|org)$/  # from test_arrays.rb
  statement = "http://www.google.com"
  regex = RegexTester.new(pattern, statement)
  puts regex.pattern
  puts regex.statement

Run the code and see that the output of the getter statements is what we'd expect. The statement output matches what we set it to be.

  1. Next, we are going to create a method called test for the RegexTester class which will test the statement to see if it matches the regex pattern. The code is below. Note in this case that we are calling a method yet to be defined – pattern_matches? – and this method takes the argument @statement but that there are no parentheses used. Also note that for mistakes we are using Ruby's standard error output to print the message. Finally, remember that #{} within quotes will evaluate the Ruby code inside and convert to string.
  def test
    if pattern_matches? @statement
      puts "MATCH: #{@statement}"
    else
      STDERR.puts "NO MATCH: #{@statement}"
    end
  end 
  1. Now it's time to add the pattern_matches? method. For this lab, we will make it a private method. To do this, we simply write the keyword private in the line ahead of the method; now this method (and any that follows after the private declaration) will be private. The code for this method is very simple:
private
  def pattern_matches? statement
    statement.match(@pattern) != nil
  end
  1. Now it is time to test this. Assuming you still have the test code from step 7, just add the lines below and run it. Notice that when you run it the first statement is a match but that the second statement fails and is printed in red (if supported by your OS).
  puts "------"
  regex.test
  regex.statement = "apidock.com"
  regex.test
  1. Now we'd really like to test a batch of statements all at once, so we will have to modify this code a tad. Before jumping into that, let's take a quick break by whipping up a solution in irb first. (This will remind you about how to use irb as well.) Fire up irb on the command line and test with the block:
arr = %w[http://www.google.com apidock.com www.microsoft.com http://www.heimann-family.org http://www.kli.org http://www.acac.net http://www.cmu.edu http://is.hss.cmu.edu www.amazon.co.uk]
pattern = /^(http:\/\/)?www\.\w+\.(com|edu|org)$/
arr.each { |item| puts "MATCH #{item}" if item.match(pattern) != nil }

Notice how it converts %w[] to a regular array (... saw this in an earlier lab). Also notice that regex patterns always start with a / thus pattern = /....

  1. In this case, there should be four matches (google, Microsoft, kli, cmu) from the array of possibilities. This should be the case after we modify our regex_tester class.

  2. Continuing on, to convert our previous method we will begin by renaming the getter method for statement to statements and convert the instance variable to @statements. Next, we want to modify the setter method for statement to statements so that it raises a TypeError if anything other than an array is passed to the method and a RuntimeError if the array is empty. As a result of one of these exceptions, the user should get a helpful message explaining the problem, followed by a final insult (same insult used for either failure), followed by program termination. This revised method can be seen below:

  def statements=(arr)
    begin
      raise TypeError unless arr.class == Array
      raise RuntimeError if arr.empty?
      @statements = arr

    rescue RuntimeError
      STDERR.puts "You need to have at least one statement to test against the pattern."
      add_insult
      exit
    rescue TypeError
      STDERR.puts "You must enter an ARRAY of statements to use this regex_tester." 
      add_insult
      exit  
    end
  end
  1. Test the setter method to make sure these exceptions are handled properly. Assuming you did not create an add_insult method in advance, the tests erred. Create the add_insult method to the private section of the class. An example of this method appears below, but feel free to create any appropriate insult (i.e., one without vulgar, sexist or racist language) that you prefer.
  def add_insult
    STDERR.puts "-------------------------------------"
    STDERR.puts "As a coding infidel, you are hereby sentenced to death.  The firing squad will be here shortly to carry out the execution.  Please remain seated until they arrive. Thank you for your cooperation."   
  end
  1. Next, we need to change the test method so that it iterates through the array and tests each item. This is not unlike what we just did in irb, except that our test code is multiple lines, so we will have to use do ... end rather than {} for our block.

  2. Test the revisions with the first test array provided in Part 1 Step 2. Similar to what we saw in irb, we should get the following results:

  MATCH: http://www.google.com
  MATCH: www.microsoft.com
  MATCH: http://www.kli.org
  MATCH: http://www.cmu.edu
  NO MATCH: apidock.com
  NO MATCH: http://www.heimann-family.org
  NO MATCH: http://www.acac.net
  NO MATCH: http://is.hss.cmu.edu
  NO MATCH: www.amazon.co.uk

Notice that the standard error output always comes after the regular output and that it is displayed in red on most systems. (Don't panic if not in red on your machine.)

  1. Comment out these test lines for now and let's switch gears slightly. We want to build a regex pattern for validating credit cards. To simplify matters for lab purposes, we assume that all valid credit cards must have 16 digits, with optional spaces or dashes breaking up those digits into 4 groups of four. The second test array has six test statements – the first three are valid credit cards and the last three are invalid. We will begin this part of the lab by creating a new set of testing code:
  p= /^\d{16}$/
  statement_array=[%w[1234567890123456 1234-5678-9012-3456 1234\ 5678\ 9012\ 3456 1234567890 #1234567890123456 1234|5678|9012|3456 12345678901234567]]
  cc = RegexTester.new(p, statement_array)
  cc.test

Remember to copy in the second test array from Part 1 Step 2 of this lab. We will build the pattern slowly and in steps listed below.

  1. Allow for just 16 digits by adding
  cc.pattern = /^\d{16}$/

and then test with

  cc.test

Running this code resulted in only the first statement passing and the rest failing.

  1. Now modify the pattern:
  cc.pattern = /^\d{16}$|^(\d{4}-\d{4}-\d{4}-\d{4})$/

Rerun the code and the second statement should now pass. Be sure you understand why this works for proceeding (look at cheat regex).

  1. We want to shorten this pattern and get rid of some of the repetition. Change the pattern to
  cc.pattern = /^\d{16}$|^((\d{4}-){4})$/

and see what happens. Why does it fail? Be sure to understand why (look at cheat regex) before proceeding.

  1. Let's fix that problem by changing the pattern above ever so slightly to
  cc.pattern = /^\d{16}$|^((\d{4}-?){4})$/

What is the difference and why does it matter? Again, be sure to understand why before proceeding.

  1. We want to get the space recognized, so we will modify the pattern again to
  cc.pattern = /^\d{16}$|^(\d{4}[ -]?){4}$/

Confirm that this works and be sure you know why.

  1. Wait! You just had a great epiphany and realize that the 'or' is no longer needed! Why, you could just shorten this to regex to /^(\d{4}[ -]?){4}$/ and it would work. Try this and confirm your insight and be sure you fully understand why it works.

  2. Now that we can see how our little regex_tester will help us create viable regular expressions, use it to go back to the original pattern and first test array and revise the pattern so that it only allows for proper URLs but also that it allows all the items in the test array (which are in fact valid and working URLs). After you make corrections to the pattern and all statements pass the test.


Submission:

This lab is to be submitted by Sunday February 16, 2020 at 11:59PM.