Solved-Refactor a text-matching regular expressions engine to use a more strongly-typed interface 2– Solution

$30.00 $19.00

Marking Total of 20 marks (20% of practical component) Late Penalty The maximum available mark is reduced by 10% if the assignment is one day late, by 25% if it is 2 days late and by 50% if it is 3 days late. Assignments that are late 4 days or more will be awarded zero…

You’ll get a: . zip file solution

 

 
Categorys:
Tags:

Description

5/5 – (2 votes)

Marking Total of 20 marks (20% of practical component)

Late Penalty The maximum available mark is reduced by 10% if the assignment is one day late, by 25% if it is 2 days late and by 50% if it is 3 days late. Assignments that are late 4 days or more will be awarded zero marks. So if your assignment is worth 88% and you submit it one day late you still get 88%, but if you submit it two days late you get 75%, three days late 50%, and four days late zero.

Submission Instructions The assignment can be submitted using the ‘give’ system.

To submit from a CSE terminal, type:

$ give cs3141 Hare Hare.hs

Overview

In this assignment, you will refactor a text-matching regular expressions engine to use a more strongly-typed interface. These stronger types not only reduce the possibility for bugs, but also improve the readability and clarity of the code base. On top of the strongly-typed version, you will implement a number of additional convenience combi-nators for working with regular expressions. This assignment will give you experience using Haskell type system extensions (speci cally GADTs), experience programming with monads and applicative functors, and introduce you to concepts such as alterna-tives.

1

Provided Code

The provided code consists of a number of modules:

Hare.hs contains the stubs of the code you should implement.

Untyped.hs contains an untyped, unextended version of the regex engine.

HareMonad.hs contains the monadic type used to write the matching algorithm.

Tests.hs contains the main function for running tests.

Tests/Support.hs contains the support code for running tests.

Tests/UnitTests.hs contains properties for the basic regular expressions.

Tests/Transcript.hs contains some acceptance tests for your combinators, for analysing a UNSW transcript.

Tests/Examples.hs contains all examples from this spec, as unit tests.

Note: The only le you can submit is ‘Hare.hs’, so make sure your submission compiles with the original versions of all other les.

Regular Expressions

Regular expressions are a popular way to describe and manipulate patterns of strings. Often they are used to search for a substring that matches a particular pattern, and then extracting information from that substring.

Many languages include regular expressions as a built in feature or in their standard library. If you have done COMP2041, you will have been exposed to several such lan-guages. Unfortunately, Haskell is not such a language. To remedy this, Helen the Hare started to design H.A.R.E, a regular expressions library for Haskell.

She started by encoding the basic building blocks of regular expressions as a Haskell data type.

data RE = Empty — Matches the empty string

| Fail — Matches no strings

  • Char [Char] — Matches a single character from the list

  • Seq RE RE— Matches the two expressions in sequence

  • Choose RE RE — Matches either the first, or the second

  • Star RE— Matches the expression zero or more times

Note: If you have seen regular expressions before, you may be used to more features being available than the ones above. In general, those features can be translated into the features given above.

Some simple examples of these regular expressions are given in the table below.

2

Regular Expression

Matches

Star (Char [ 0 .. 9 ]) Seq Char [ 0 .. 9 ]

“1234”,“039”,“0”,

Star (Char [ a ]) Choose Star (Char [ b ])

“”,“a”,“b”,“aa”,“bb”..

Fail

— nothing

Empty

“”

In order to de ne a matching algorithm for regular expressions, Helen de ned a type called Hare (found in HareMonad.hs), which is similar to State String except that the return type is wrapped in a monad f:

newtype Hare f a = Hare { runHare :: String -> f (String, a) }

The hare function can be used to \run” a Hare computation on a string:

hare :: Functor f => Hare f a -> String -> f a hare a s = fmap snd (runHare a s)

Helen has parameterised the Hare type by f so that it could be instantiated to Maybe if the user only wishes to nd the rst match for a particular regular expression, or to the list type constructor [] if the user wishes to nd all possible matches of the regular expression. This requires us to make use of the Alternative type class, which has two built in methods:

class Applicative f => Alternative f where empty :: f a

(<|>) :: f a -> f a -> f a

The empty method denotes failure. For the list instance, it is the empty list [], whereas for the Maybe instance it is Nothing. The (<|>) method denotes alternation. For the list instance it is the same as concatentation (++), whereas for the Maybe instance, it is a left-biased choice, like so:

instance Alternative Maybe where

empty = Nothing

Just a <|> b = Just a

Nothing <|> b = b

3

To avoid confusing empty with the Empty constructor for regular expressions, Helen also de ned an alias for empty, called failure:

failure :: (Alternative f, Monad f) => Hare f a failure = empty

The readCharacter action is also of note, which removes a character from the String state and returns it. If there are no characters left in the string, it will fail:

readCharacter :: (Alternative f, Monad f) => Hare f Char readCharacter = Hare $ \s -> case s of

[] -> empty

(x:xs) -> pure (xs,x)

Armed with just readCharacter and the Monad, Applicative and Alternative in-stances for Hare, Helen de nes the matching algorithm, in a function called match (found in Untyped.hs):

match :: (Monad f, Alternative f) => RE -> Hare f Results

Here the Results type describes the string that was matched by each part of the regular expression:

data Results = None

  • Character Char

  • Tuple Results Results

  • Repetition [Results]

Seeing as the Empty expression matches any string, Helen simply returns None in that case:

match Empty = pure None

Conversely, because Fail matches no strings at all, Helen always calls failure if at-tempting to match this expression:

match Fail = failure

To de ne the case for Char, Helen uses the built-in function guard, which will fail if the given condition evaluates to false:

match (Char cs) = do

x <- readCharacter

guard (x elem cs)

pure (Character x)

To de ne the case for Seq, Helen makes use of the Monad instance, rst matching on the rst expression a, then the second b:

4

match (Seq a b) = do

ra <- match a

rb <- match b

pure (Tuple ra rb)

Because there is no dependencies between ra and rb, this could also have been de ned using the Applicative instance, as follows:

match (Seq a b) = Tuple <$> match a <*> match b

For the case for Choose, Helen makes use of the Alternative (<|>) method:

match (Choose a b) =

match a

<|> match b

For Star, Helen observed the following bijection1:

Star r === (r Seq Star r) Choose Empty

For this reason, Helen implements Star a by rst matching the expression a, then matching on Star a recursively. If either match fails, it falls back to returning the empty match:

match (Star a) =

addFront <$> match a <*> match (Star a)

<|> pure (Repetition [])

where

addFront x (Repetition xs) = Repetition (x:xs) addFront _ _ = error “(should be) impossible!”

Note that Helen de ned Star to choose the longer alternative rst so that if used with Maybe, the match function will return the longest match rather than the empty string.

Having de ned match, Helen can now de ne the regex matching operator (=~), which searches through a string for a match to the expression:

(=~) :: (Monad f, Alternative f) => String -> RE -> f Results str =~ re = hare matchAnywhere str

where matchAnywhere = match re <|> (readCharacter >> matchAnywhere)

If we look at some examples, we can see that invoking it with the Maybe monad returns just the rst result, whereas the list version returns all results:

*> “COMP3141” =~ (Char [ 0 .. 9 ]) :: Maybe Results Just (Character 3 )

*> “COMP3141” =~ (Char [ 0 .. 9 ]) :: [Results]

[Character 3 ,Character 1 ,Character 4 ,Character 1 ]

  • While they return di erent results, there is a bijection between them

5

A Typed Implementation (12 Marks)

Helen’s untyped implementation works, but it’s not very clean. For example, we know that the Results returned by the regular expression

Char [ 0 .. 9 ] Seq Star (Char [ 0 .. 9 ])

will be of the format:

Tuple (Character c) (Repetition [Character x, Character y, ])

but any function designed to process the Results from this regular expression would have to be partial, because we did not encode the format of the results returned into the type system.

Instead of having a Results type, your rst task will be to de ne a type-indexed version of the RE type and associated match function, in Hare.hs.

data RE :: * -> * where

Empty :: RE ()

Fail :: RE a

— Your constructors here

Each constructor has an additional type argument which speci es the type of the result returned from each expression. We have de ned the rst two constructors for you. You will have to nd appropriate de nitions for the rest of them.

For implementing match, feel free to look at Helen’s original implementation in the Untyped.hs module as a guide. The test cases provided may also be instructive in getting your implementation type-checking and correct.

Marking Criteria

Marks Description

  • Empty type correct; match implementation correct.

  • Fail type correct; match implementation correct.

  • Char type correct; match implementation correct.

  • Seq type correct; match implementation correct.

  • Choose type correct; match implementation correct.

  • Star type correct; match implementation correct.

12 Total

6

Adding Actions (1 Marks)

Next, we will add an additional constructor to RE that does not change the set of strings matched, but allows us to run an arbitrary transformation on the results returned from the match:

Action :: (a -> b) -> RE a -> RE b

Your task in this part is to implement the match function for this constructor. You may nd the Functor instance for Hare useful.

Examples

*> “***” =~ Action length (Star (Char [ * ])) :: [Int] [3,2,1,0,2,1,0,1,0,0]

*> “AXax” =~ Action isUpper (Char [ a , A ]) :: [Bool] [True, False]

*> let f (x,y) = [x,y]

*> let atoz = Char [ a .. z ]

*> “ab01cd20” =~ Action f (atoz Seq atoz) :: [String] [“ab”,“cd”]

Marking Criteria

Marks Description

  • Action clause in match correct.

  • Total

7

Utility Combinators (7 Marks)

Using our newly minted Action constructor, you must now de ne a series of combinator functions that allow us to de ne a lot of the features people normally expect to nd in a regular expressions system.

cons function

The cons function matches a regular expression for an element of type a, followed by a regular expression for a list [a], and returns that element prepended to the list.

cons :: RE a -> RE [a] -> RE [a]

Examples:

*> “10100” =~ cons (Char [ 1 ]) (Star (Char [ 0 ])) :: [String] [“10”,“1”,“100”,“10”,“1”]

*> “10100” =~ cons (Char [ 1 ]) (Action (const []) Empty) :: [String] [“1”,“1”]

plus function

The plus function is like Star, but requires the expression given to occur at least once.

Thus, this function can be de ned using Action, Seq and Star.

plus :: RE a -> RE [a]

Examples:

*> “10100” =~ plus (Char [ 0 ]) :: [String] [“0”,“00”,“0”,“0”]

*> let atoz = Char [ a .. z ]

*> let digits = Char [ 0 .. 9 ]

*> “ab1c3” =~ plus (atoz Seq digits) :: [[(Char,Char)]] [[( b , 1 ),( c , 3 )],[( b , 1 )],[( c , 3 )]]

string function

The string function produces a regex that only matches the given string, and returns it. You should be able to implement it using Char, cons, Empty and Action.

string :: String -> RE String

Examples:

*> let comp3141 = string “COMP3141”

*> “My favourite subject is COMP3141” =~ comp3141 :: Maybe String Just “COMP3141”

*> “My favourite subject is MATH1141” =~ comp3141 :: Maybe String Nothing

8

choose function

The choose function is like the Choose constructor, but generalised to lists. That is, choose [a, b] is equivalent to a ‘Choose‘ b. What should choose [] be?

choose :: [RE a] -> RE a

Examples:

*> let re = choose [string “COMP”, string “MATH”, string “PHYS”] *> “COMP3141, MATH1081, PHYS1121, COMP3121” =~ re :: [String] [“COMP”,“MATH”,“PHYS”,“COMP”]

*> “abc” =~ choose [] :: Maybe String

Nothing

option function

The option function matches the given expression zero or one times. In other words, if the expression matches, it returns Just that match, otherwise Nothing. You can implement this combinator with Action, Choose and Empty.

option :: RE a -> RE (Maybe a)

Examples:

*> let

digits =

Char [ 0 .. 9 ]

*> let

sign =

Action (fromMaybe

+ ) (option (Char [])

*> “-30 5 3” =~

(sign cons plus

digits) :: [String]

[“-30”,“-3”,“+30”,“+3”,“+0”,“+5”,“+3”]

*> “foo” =~ option (Char [ a ]) :: [Maybe Char] [Nothing,Nothing,Nothing,Nothing]

Note: As with Star, prefer longer matches over shorter ones, so that the results appear as in the above example.

rpt function

The rpt combinator allows a regular expression to be repeated a xed number of times. The expression rpt n x is equivalent to repeating x in sequence n times, returning the results in a list.

rpt :: Int -> RE a -> RE [a]

Examples:

9

*> let digits = Char [ 0 .. 9 ]

*> let programs = choose [string “COMP”, string “PHYS”, string “MATH”] *> let courseCode = programs Seq rpt 4 digits

*> “COMP3141, MATH1081, and PHYS1121” =~ courseCode :: [(String,String)] [(“COMP”,“3141”),(“MATH”,“1081”),(“PHYS”,“1121”)]

*> “foo” =~ rpt 0 (Char [ a ]) :: Maybe [Char] Just “”

rptRange function

Lastly, the rptRange function takes a range of numbers (x; y). You may assume that x y. It will match the given regex at least x times and at most y times.

rptRange :: (Int, Int) -> RE a -> RE [a]

Examples:

*> “1234” =~ rptRange (2,4) (Char [ 0 .. 9 ]) :: [String] [“1234”,“123”,“12”,“234”,“23”,“34”]

*> “1234” =~ rptRange (3,3) (Char [ 0 .. 9 ]) :: [String] [“123”,“234”]

Note: As with Star and option, prefer longer matches to shorter ones.

Marking Criteria

Marks

Description

1

cons implementation correct.

1

plus implementation correct.

1

string implementation correct.

1

choose implementation correct.

1

option implementation correct.

1

rpt implementation correct.

1

rptRange implementation correct.

7

Total

10

Compiling and Building

In addition to the standard library, this project depends on the QuickCheck and HUnit testing libraries and the test framework called tasty. For CSE machines, we have already a con gured a package database on the course account that should build the assignment without di culty using the standard Haskell build tool cabal. For students using their own computer, we instead recommend the alternative build tool stack, available from the Haskell Stack website at https://www.haskellstack.org. We have provided a stack con guration le that xes the versions of each of our dependencies to known-working ones. If you use stack to set up your toolchain, you should have minimal compatibility di culties regardless of the platform you are using. If you are using versions of GHC or Haskell build tools supplied by your Linux distribution, these are commonly out of date or incorrectly con gured. We cannot provide support for these distributions.

Detailed, assignment-speci c instructions for each build tool are presented below.

On CSE Machines

Enter a COMP3141 subshell by typing 3141 into a CSE terminal:

$ 3141

newclass starting new subshell for class COMP3141…

From there, if you navigate to the directory containing the assignment code, you can build the assignment by typing:

$ cabal build

To run the program from ‘Tests.hs’, which contains all the unit, acceptance, and prop-erty tests, type

$ ./dist/build/HareTests/HareTests

To start a ghci session, type:

$ cabal repl

Lastly, if for whatever reason you want to remove all build artefacts, type:

$ cabal clean

For stack users

Firstly, ensure that GHC has been setup for this project by typing, in the directory that contains the assignment code:

$ stack setup

If stack reports that it has already set up GHC, you should be able to build the assign-ment with:

11

$ stack build

This build command will, on rst run, download and build the library dependencies as well, so be sure to have an internet connection active. To run the program from ‘Tests.hs’, which contains all the unit, acceptance, and property tests, type

$ stack exec HareTests

To start a ghci session,type:

$ stack repl

Lastly, if for whatever reason you want to remove all build artefacts, type:

$ stack clean

Marking and Testing

Testing

A number of di erent test suites are provided:

The Tests/Transcript.hs le contains some high-level acceptance tests that ex-tract various bits of information from an anonymised UNSW student transcript, located in Tests/transcript.txt.

The Tests/UnitTests.hs le contains some QuickCheck properties that, while not complete speci cations, will help you to gain some assurance for your imple-mentation of match for each constructor.

The Tests/Examples.hs le contains each example presented in this spec docu-ment as a unit test.

The Tests.hs le contains a tasty test suite containing all of the above tests.

Marking

All marks for this assignment are awarded based on automatic marking scripts, which are comprised of QuickCheck properties, acceptance tests, and unit tests. Marks are not awarded subjectively, and are allocated according to the criteria presented in each section.

Barring exceptional circumstances, the marks awarded by the automatic marking script are nal. For this reason, please make sure that your submission compiles and runs correctly on CSE machines. We will use similar machines to mark your assignment.

A dry-run script that runs the tests provided in the assignment code will be provided. When you submit the assignment, please make sure the script does not report any problems.

12

Late Submissions

Unless otherwise stated if you wish to submit an assignment late, you may do so, but a late penalty reducing the maximum available mark applies to every late assignment. The maximum available mark is reduced by 10% if the assignment is one day late, by 25% if it is 2 days late and by 50% if it is 3 days late. Assignments that are late 4 days or more will be awarded zero marks. So if your assignment is worth 88% and you submit it one day late you still get 88%, but if you submit it two days late you get 75%, three days late 50%, and four days late zero.

Extensions

Assignment extensions are only awarded for serious and unforeseeable events. Having the u for a few days, deleting your assignment by mistake, going on holiday, work commitments, etc do not qualify. Therefore aim to complete your assignments well before the due date in case of last minute illness, and make regular backups of your work.

Plagiarism

Many students do not appear to understand what is regarded as plagiarism. This is no defense. Before submitting any work you should read and understand the UNSW plagiarism policy https://student.unsw.edu.au/plagiarism.

All work submitted for assessment must be entirely your own work. We regard un-acknowledged copying of material, in whole or part, as an extremely serious o ence. In this course submission of any work derived from another person, or solely or jointly writ-ten by and or with someone else, without clear and explicit acknowledgement, will be severely punished and may result in automatic failure for the course and a mark of zero for the course. Note this includes including unreferenced work from books, the internet, etc.

Do not provide or show your assessable work to any other person. Allowing another student to copy from you will, at the very least, result in zero for that assessment. If you knowingly provide or show your assessment work to another person for any reason, and work derived from it is subsequently submitted you will be penalized, even if the work was submitted without your knowledge or consent. This will apply even if your work is submitted by a third party unknown to you. You should keep your work private until submissions have closed.

If you are unsure about whether certain activities would constitute plagiarism ask us before engaging in them!

13