Lab 3: Writing Regular Expressions
Due Date: February 16
Objectives
- Teach students to write regular expressions
- Practice creating classes in Ruby
- Review using irb in development
Due Date: February 16
Objectives
In order to be able to understand all the aspects of the lab, please go through the regex tutorial available at:
https://www.rubyguides.com/2015/06/ruby-regex/
Also, you can use Rubular, the Ruby online regular expression editor, if needed: https://rubular.com/
We are going to begin by writing a simple program for Ruby to test regular expressions and then modify it several times. To begin, create a new file called regex_tester.rb
using your preferred editor/IDE.
Create another file called test_arrays.rb
and add to it the arrays listed below.
# TEST ARRAYS FOR REGEX TESTER
# ----------------------------
%w[http://www.google.com apidock.com www.microsoft.com http://www.heimann-family.org http://www.kli.org http://www.acac.net http://www.cmu.edu http://is.hss.cmu.edu www.amazon.co.uk]
%w[1234567890123456 1234-5678-9012-3456 1234\ 5678\ 9012\ 3456 1234567890 #1234567890123456 1234|5678|9012|3456 12345678901234567]
# INITIAL REGEX PATTERN FOR REGEX TESTER
# --------------------------------------
pattern = /^(http:\/\/)?www\.\w+\.(com|edu|org)$/
If you did not do this earlier, install the cheat
gem. (If you are unsure, type gem list
and see if it is there. To install, type gem install cheat
.) Once this gem is installed, type cheat regex
on the command line to get a useful cheat sheet for regular expressions.
In regex_tester.rb
, create a new class called RegexTester. This class should have a constructor (initialize) that takes two arguments called pattern
and statement
. Within this method, add two lines of code that will set the instance variable pattern
to the local variable 'pattern' and statement
to the local variable 'statement'. Finally, in case no value of pattern
was passed when the object was created, we will generate a setter (and a getter) with the simple line: attr_accessor :pattern, :statement
.
By the way, have you been using Git to save all code? If not, I suggest you start immediately and commit your code to git on a regular basis.
pattern = /^(http:\/\/)?www\.\w+\.(com|edu|org)$/ # from test_arrays.rb
statement = "http://www.google.com"
regex = RegexTester.new(pattern, statement)
puts regex.pattern
puts regex.statement
Run the code and see that the output of the getter statements is what we'd expect. The statement output matches what we set it to be.
test
for the RegexTester class which will test the statement to see if it matches the regex pattern. The code is below. Note in this case that we are calling a method yet to be defined – pattern_matches?
– and this method takes the argument @statement
but that there are no parentheses used. Also note that for mistakes we are using Ruby's standard error output to print the message. Finally, remember that #{}
within quotes will evaluate the Ruby code inside and convert to string. def test
if pattern_matches? @statement
puts "MATCH: #{@statement}"
else
STDERR.puts "NO MATCH: #{@statement}"
end
end
pattern_matches?
method. For this lab, we will make it a private method. To do this, we simply write the keyword private
in the line ahead of the method; now this method (and any that follows after the private declaration) will be private. The code for this method is very simple:private
def pattern_matches? statement
statement.match(@pattern) != nil
end
puts "------"
regex.test
regex.statement = "apidock.com"
regex.test
irb
first. (This will remind you about how to use irb
as well.) Fire up irb
on the command line and test with the block:arr = %w[http://www.google.com apidock.com www.microsoft.com http://www.heimann-family.org http://www.kli.org http://www.acac.net http://www.cmu.edu http://is.hss.cmu.edu www.amazon.co.uk]
pattern = /^(http:\/\/)?www\.\w+\.(com|edu|org)$/
arr.each { |item| puts "MATCH #{item}" if item.match(pattern) != nil }
Notice how it converts %w[]
to a regular array (... saw this in an earlier lab). Also notice that regex patterns always start with a /
thus pattern = /...
.
In this case, there should be four matches (google, Microsoft, kli, cmu) from the array of possibilities. This should be the case after we modify our regex_tester class.
Continuing on, to convert our previous method we will begin by renaming the getter method for statement
to statements
and convert the instance variable to @statements
. Next, we want to modify the setter method for statement
to statements
so that it raises a TypeError if anything other than an array is passed to the method and a RuntimeError if the array is empty. As a result of one of these exceptions, the user should get a helpful message explaining the problem, followed by a final insult (same insult used for either failure), followed by program termination. This revised method can be seen below:
def statements=(arr)
begin
raise TypeError unless arr.class == Array
raise RuntimeError if arr.empty?
@statements = arr
rescue RuntimeError
STDERR.puts "You need to have at least one statement to test against the pattern."
add_insult
exit
rescue TypeError
STDERR.puts "You must enter an ARRAY of statements to use this regex_tester."
add_insult
exit
end
end
add_insult
method in advance, the tests erred. Create the add_insult
method to the private
section of the class. An example of this method appears below, but feel free to create any appropriate insult (i.e., one without vulgar, sexist or racist language) that you prefer. def add_insult
STDERR.puts "-------------------------------------"
STDERR.puts "As a coding infidel, you are hereby sentenced to death. The firing squad will be here shortly to carry out the execution. Please remain seated until they arrive. Thank you for your cooperation."
end
Next, we need to change the test
method so that it iterates through the array and tests each item. This is not unlike what we just did in irb
, except that our test code is multiple lines, so we will have to use do ... end
rather than {} for our block.
Test the revisions with the first test array provided in Part 1 Step 2. Similar to what we saw in irb, we should get the following results:
MATCH: http://www.google.com
MATCH: www.microsoft.com
MATCH: http://www.kli.org
MATCH: http://www.cmu.edu
NO MATCH: apidock.com
NO MATCH: http://www.heimann-family.org
NO MATCH: http://www.acac.net
NO MATCH: http://is.hss.cmu.edu
NO MATCH: www.amazon.co.uk
Notice that the standard error output always comes after the regular output and that it is displayed in red on most systems. (Don't panic if not in red on your machine.)
p= /^\d{16}$/
statement_array=[%w[1234567890123456 1234-5678-9012-3456 1234\ 5678\ 9012\ 3456 1234567890 #1234567890123456 1234|5678|9012|3456 12345678901234567]]
cc = RegexTester.new(p, statement_array)
cc.test
Remember to copy in the second test array from Part 1 Step 2 of this lab. We will build the pattern slowly and in steps listed below.
cc.pattern = /^\d{16}$/
and then test with
cc.test
Running this code resulted in only the first statement passing and the rest failing.
cc.pattern = /^\d{16}$|^(\d{4}-\d{4}-\d{4}-\d{4})$/
Rerun the code and the second statement should now pass. Be sure you understand why this works for proceeding (look at cheat regex
).
cc.pattern = /^\d{16}$|^((\d{4}-){4})$/
and see what happens. Why does it fail? Be sure to understand why (look at cheat regex
) before proceeding.
cc.pattern = /^\d{16}$|^((\d{4}-?){4})$/
What is the difference and why does it matter? Again, be sure to understand why before proceeding.
cc.pattern = /^\d{16}$|^(\d{4}[ -]?){4}$/
Confirm that this works and be sure you know why.
Wait! You just had a great epiphany and realize that the 'or' is no longer needed! Why, you could just shorten this to regex to /^(\d{4}[ -]?){4}$/
and it would work. Try this and confirm your insight and be sure you fully understand why it works.
Now that we can see how our little regex_tester will help us create viable regular expressions, use it to go back to the original pattern and first test array and revise the pattern so that it only allows for proper URLs but also that it allows all the items in the test array (which are in fact valid and working URLs). After you make corrections to the pattern and all statements pass the test.
This lab is to be submitted by Sunday February 16, 2020 at 11:59PM.