Sign In | Subscribe
Start learning today, and be successful in your academic & professional career. Start Today!
Loading video...
This is a quick preview of the lesson. For full access, please Log In or Sign up.
For more information, please see full course syllabus of Introduction to Ruby
  • Discussion

  • Study Guides

  • Download Lecture Slides

  • Table of Contents

  • Transcription

  • Related Services

Bookmark and Share

Start Learning Now

Our free lessons will get you started (Adobe Flash® required).
Get immediate access to our entire library.

Sign up for Educator.com

Membership Overview

  • Unlimited access to our entire library of courses.
  • Search and jump to exactly what you want to learn.
  • *Ask questions and get answers from the community and our teachers!
  • Practice questions with step-by-step solutions.
  • Download lesson files for programming and software training practice.
  • Track your course viewing progress.
  • Download lecture slides for taking notes.
  • Learn at your own pace... anytime, anywhere!

Regular Expressions

  • Regular Expressions are used to match a pattern against strings
  • RDoc: http://www.ruby-doc.org/core-1.9.3/Regexp.html
  • To create a regular expression, you need to look at metacharacters, bracket expressions, quantifiers, and anchors
  • Metacharacters can be used to match a variety of expressions like word characters, digits, hexdigits, and even whitespace
  • Quantifiers allow you to match a character a certain amount of times and even set bounds to the minimum amount to maximum amount
  • Anchors are metacharacters that match the zero-width positions between characters
  • Anchors are used to match to a specific position
  • Regular expression modifier characters allow you to control how a pattern can match

Regular Expressions

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

  • Intro 0:00
    • Regular Expressions
    • How to create a regular expression
    • What goes inside
    • Metacharacters
    • Bracket expressions
    • Quantifiers
    • Anchors
  • Metacharacters 3:30
    • word and non-word characters
    • digit and non-digit characters
    • hexdigit and non-hexdigit characters
    • whitespace and non-whitespace characters
    • Examples
  • POSIX Bracket Expressions 7:48
  • Non-POSIX Bracket Expressions 9:48
  • Bracket Expression Examples 10:58
  • Quantifiers 12:34
    • Examples
  • Character Properties 17:24
    • Similar to POSIX bracket classes
    • More Character Properties
    • Examples
  • Anchors 20:08
    • Examples
  • Regular Expression Matching: Regexp Object 22:40
    • match
  • Regular Expression Matching: String Object 24:14
    • match
  • Regular Expression Modifier Characters 25:14
    • pat
    • Example
  • Regular Expression Modifier Objects 27:14
    • Example
    • Regexp Rdoc

Transcription: Regular Expressions

Welcome back to Educator.com.0000

Today's lesson is on regular expressions.0002

What are regular expressions? They are used to match patterns against strings.0008

In Ruby, you will see the object that is called RegExp, and it is the main object to create regular expressions.0016

It is used to match a pattern against strings, as I just said.0036

How do you create a regular expression?0044

There are three ways: one is using the forward slashes--this is the most popular way, and it follows a similar syntax as in other languages, so if you are coming from another language, you will probably see this the most often; it's the most familiar to you.0046

Another way is the literal syntax.0067

This one is more well-known in Ruby; it uses the percentage, r, and the curly braces.0070

You have it through the constructor; the Regexp.new creates it through the constructor.0083

You saw on that slide, it has the dot, dot, dot; what goes in there...you can put a different type of things in there.0097

For example, for this one, I say it's two words, and for that I have this regular expression--what is says here is, "I'm going to take a word character--one or more word characters, a space, and then one or more word characters again."0112

That is what that regular expression is matching.0135

Another one I have here is a range of digits--so it takes one or more digit characters, which is 0 through 9.0140

The other one I have here is just dot, dot, dot...and this one is for you to be creative.0150

If you were just to put ... , it would match three wildcard characters.0160

So, this is just...be creative, and update that to make your own regular expression; that is what I want you to do.0166

I'm going to write that down: Make your own.0176

To create regular expressions, we need to look at four different key elements.0185

One is metacharacters, bracket expressions, quantifiers, and anchors.0192

First, we are going to look at metacharacters.0207

For this one, we are going to look at this dot; this will match any character except a new line.0211

The next one we are looking at is this multiline mode. 0224

It just adds this m at the end, so it can match new lines.0228

Here, with the slash and the lowercase w...this matches a word character.0236

A word character is a lowercase a through z, capital A through Z, 0 through 9, or this underscore character.0245

Another one we can do is, with a capital W, we can match a non-word character.0256

This is not a through z, not A through Z capital, 0 through 9, or the underscore.0264

The slash d matches a digit character; it's 0 through 9.0280

A capital D does a non-digit character.0288

You can match hex digit characters, 0 through 9, a through f...0293

Capital H--a non-hex digit character...0301

The small s matches a white space character--you have your new lines, your tabs...0308

The capital S matches a non-white-space character--so not those.0316

Here, we have a few examples.0325

For the example on top, I have this sentence here; what this regular expression is going to do is, I'm calling a literal, and it's going to get two words.0331

It says, "I'm going to match the first two words I see."0346

So, it matched this first part of the sentence.0350

On this next section here, I have another string, and this one is going to match a range of one or more digits, a space, and then a word, which is one or more word characters.0357

Of course, it's matching the beginning again, so it's the "1000 bottles".0381

Notice, for this one, we are using the token method by string, and we are passing a regular expression, versus passing the actual quoted value for the string.0387

Here is another way you can do regular expressions.0405

This is defining the regular expression through a constructor method; what it's going to do is, it's going to create that object, and you can pass that in.0406

If I do Regexp.new, it's going to match a word, a space, another word, a space, and another word.0423

I'm just going to call this regular expression 'Three words.'0433

I'm telling the string, "Match me a pattern that has three words," and it finds, "1000 bottles of" and matches that.0436

We can also do this same thing using the regular expression literal.0451

You will notice that it has the same value, also.0458

The next thing we are looking at is bracket expressions.0466

First, we have the POSIX bracket expressions; this one gives you a more refined, specific case of things you can match against.0474

The first one, alnum, is alphabetic and numeric character.0487

Notice, to do this match, it has two square brackets.0492

The second one after that is alphabetic characters; it uses this alpha.0497

The third one is this blank, and that matches a space or tab.0503

This one here, cntrl, matches the control character.0515

For the fifth, we're matching a digit: remember, that is our 0 through 9--so it is matching our digit character.0520

Our sixth one matches a non-blank character--this graph--it excludes spaces, control characters, and similar...0531

The next one matches lowercase alphabetical characters--it uses lower.0543

As you can see, this is just using tags, but it is matching these different characters using them.0550

It's not scoped as deeply as the metacharacters were, but you can refine...look at these different patterns and match more specifically to it.0557

The next one is print: it's like graph, but it includes a space character.0575

The last one can match punctuation characters.0581

Also, there are more bracket expressions than this, but I'm just showing you some of the more popular ones.0590

Here is the space--it's very easy to understand--it's a white space character.0596

And then, this upper matches the uppercase alphabetical characters, and xdigit is a digit allowed in hexadecimal--a hexadecimal number--0 through 9, a through f, and capital A through F.0604

Then, there are also non-POSIX bracket expressions, which are allowed in Ruby, too.0623

If I pass in word, it's going to match "a character in one of the following Unicode...categories: Letter, Mark, Number, Connector_Punctuation."0629

Then, if you pass ascii, it's a character in the ASCII character set, so that one is probably more of a popular one you would be using.0644

Let's look at some examples of bracket expressions.0654

We can do it without bracket expressions, but this allows us the flexibility to do both; Ruby allows us to have this open scope to match using the ways we find most popular.0659

Here, we have our string: "A fool thinks himself to be wise, but a wise man knows himself to be a fool."0676

Here, we are matching three words using the bracket expression.0683

One word, one word, one word...also, I know I haven't told you yet, but this plus is a quantifier.0687

We will go over that...but that is the thing that says "one or more"--so it says "one or more word characters."0697

Notice, for this next example, I take that regular expression, and I save it to a variable, so I can pass it in here; and I get the same value.0712

And then this last example: what I'm telling it is to match a digit from the string, but since there are no digits in it, it's going to just return nil.0731

If there is any character that doesn't exist, it will return nil, just like that.0745

Now, looking at the quantifiers: you have already seen this one with the plus that says "one or more times."0753

Another popular one you will use is the star; it says "zero or more times."0760

Question mark says "zero or one time."0766

Then, we have these four bracketed lines: the curly n--exactly n amount of times: so if I pass a 10 there, exactly 10 times--that is how it is going to match.0772

If I do n, comma, it's going to match--for 10, it would be 10 or more times.0785

If I pass a comma, and then pass a number in there, I'm going to match m or less times.0792

I can pass both arguments in it, and it's going to say "at least n and at most m times" for the quantifiers.0799

Let's go through some examples.0809

Here, I've made this regular expression--it's called the namefinder: what it does is, it looks for two words, one word with the uppercase and then the lowercase characters after that; then it's going to look for a space; and then it's going to look for another uppercase character and a range of lowercase characters from there.0812

So, I pass this namefinder to the string: you will notice it finds that pattern right there.0839

For our next example, I am making a regular expression; it will say...it's going to look for the letter a.0849

Before that, it could be a word character of 0 or more, the same as afterwards.0860

I pass that in here, in this pattern, and it finds the first word that contains the letter a, which is name.0868

This regular expression says, "OK, a could be at the beginning of this word; it could also be in the middle; of course, it could also be at the end."0882

Either way...even if it's by itself, it will match that as a word, too...so if it's just a single letter a, this expression will also match that.0896

So, one of these scenarios, if they pop up, it's going to return that.0909

The first one it sees is the a here--it's the third word--so it returns that as the matching pattern, and it returns that as what it gets from this regular expression.0915

The next one we are looking at is those curly brackets.0934

We have our string; it has three a's, four b's, five c's.0942

For this one, I have this a with the curly brackets; it's one, comma, four.0949

It matches with the three letters there.0957

This says that a is shown at least one time and at most four times.0965

That is what this one and the four is.1000

Here, we have the same for b, but the b is shown at least one time and at most two times, and here you go--it returns two, which is the most.1004

And then, we have this one with the five, comma, which is "if c is shown five or more times," and it returns ccc.1026

The next thing we're looking at is the character properties.1043

This is another way to do a matching with the regular expression.1050

It uses the slash, p, and the curly brackets.1057

You will notice it has a lot of the similar classes as POSIX.1063

You still have your alnum for alphabetical/numeric; you have alpha for alphabetical characters, blank for spaces or tabs, cntrl for a control character, digit, and graph.1069

For graph, that is your non-blank character again.1088

Notice--it's interesting how I'm still passing my regular expression pattern, but I am using this new p with curly brackets, and I'm passing the actual thing I want to find into this pattern.1093

It's...p...you enter a pattern in here...1115

Here are some more examples.1126

You can pass in lower for lowercase alphabetical characters, punct for punctuation characters, space for a white space character, and upper for uppercase alphabetical.1136

This p--you can also match a character's Unicode script.1152

If you look at the RDoc, there are other character properties in there.1162

We will look at that, too.1167

Here is an example with digit.1170

I'm using my quantifier "one or more."1173

I have my digit regular expression match.1177

It's going to take the first digit--one or more--it's going to match the whole thing, so it sees 5000 and that is what it returns here.1182

The next one we are looking at is alphanumeric characters, and then it's going to look for a space and another alphanumeric character.1191

The first one that comes up is this "apples in," so that is what it matches, and it returns that--that is the token it finds.1199

The next thing we're looking at is anchors.1209

This will be pretty important, too, because you want to anchor your regular expression--not always at the beginning: you might want to do the end.1214

These are metacharacters that match the zero-width position between characters.1224

It is used to anchor the match to a specific position.1230

Here we have our ^; it matches the beginning of a line; a $ matches the end of a line.1235

The A matches the beginning of a string; the Z matches the end of string; if the string ends with a new line, it matches just before the new line.1245

Lowercase z matches the end of string.1260

There are a lot of other anchors in the RDoc, but these are the popular ones.1263

Let's look at some examples of anchors.1272

For this one, I'm using the dollar sign that will match at the end of the string.1276

Instead of starting at the beginning, it's going to match from the end, look for patterns from that, and if it finds it, it will return me that substring that is part of it.1281

Here, what I want to do with this string is, I want to get the last sentence.1293

There are two sentences here; I want to get this last one.1298

In my regular expression, what I'm looking for is, first, it's going to anchor that to the end to do a match.1302

It's going to be looking for a period or a question mark, and then it's going to look for a group of words.1310

That will match my last sentence.1322

When I pass that in into my string, it does get, "What is your name?"1324

The problem is, it also matched this space.1332

What I'll do is, I'll just chain another method here; I call this method called strip--this is one of the string methods--and that is going to remove that blank space that was in front.1337

It's going to get rid of that, and then I get this string here.1351

The next thing we will look at is regular expression matching with the Regexp object.1361

For this, we are looking at the match method.1368

This is part of the Regexp object, and it allows matches with strings.1373

For this one, I have the string, "Hello, my name is Mr. Smith. What is your name?" 1380

This object is a regular expression.1388

I call the match method that is part of the regular expression, and I pass in the string, and what it will do is, it's going to look for Mr. in this string.1394

It's going to find it here, and then it will return...it says, "Hey, I found the match data, and I found your token."1419

Now, if I pass in there a pattern--like I pass in the w+, it's going to match the "Hello" at the beginning.1428

Again, if I also pass in...if I start with a regular expression and do a match with that string, and that token doesn't exist, it's going to just say nil.1442

The next regular expression matching we're going to look at is the String object; this does the exact same thing, but we're starting with the String object.1455

We have this match method that is part of the String object; it allows regular expression matches and the reverse of the regular expression .match method.1463

Here I have a string: "Hello, my name is Mr. Smith. What is your name?"1477

At the beginning, we start with a string, and then we call this match method.1481

It does the exact same thing: it returns the match data.1488

It's flexible; we could start with the regular expression or the string, depending on how your code is going.1493

I have the string.match, and we pass a regular expression here, and it says nil.1502

The next thing we're going to look at is regular expression modifier characters.1514

This is located at the end of a delimiter, and it has one or more single-letter options.1520

This will control how the pattern can match.1527

For example, let's say pat is my regular expression pattern; this is whatever you put in those parentheses; this could be...let's try a digit character, space, another digit character.1532

That is our regular expression.1554

I'm just saying pat is just some pattern.1557

What is important here is these modifiers.1562

This i says you can ignore case--it doesn't matter if it's uppercase or lowercase; it will match it because I put that modifier there.1566

If I put an m, it makes the dot match newline, so if you put a period in a regular expression, it will match newlines.1578

If I put an x, it says to ignore the comments and white space in the pattern.1588

If I put a 0, it says you can perform interpolation--only once, though.1592

Let's take a look at this...we have our example here again, but notice with my match I used a lowercase n.1598

Notice, it still matches it: the reason it does is because of this i here, which says to ignore that case.1607

Again, for that same setup, if I go using the regular expression object, and I have that modifier i, it will get the same result with that match data.1616

The last thing I want to talk about is the regular expression modifier objects.1635

We have Regexp::IGNORECASE, Regexp::EXTENDED, and Regexp::MULTILINE.1640

You can pass these in as options in your constructor.1646

We have our Regexp.new; you have your pattern here; and then you can pass the options as a second argument--and this is optional.1651

For example, I use that same sentence--we've been really pummeling this sentence with regular expression matches--and I say IGNORECASE.1677

It's another way to do the modifier in the constructor; this is the way you would do it.1690

And then, that will match that string, even though this m is lowercase.1694

Let me show you, in the terminal, how you can do this.1702

You can pass an array as an argument, so let's go ahead and use the example again: for options, I'm going to just pass an array...Regexp...IGNORECASE...1716

These are constants that are part of the Regexp object...Regexp::MULITLINE...notice, I've passed in all three options.1732

Also, notice that these values--these constants--are just flags as part of that object.1744

Then, let's create my string that I want to work with.1754

"Hello, my name is Mr. Smith. What is your name?"1759

Then, I'm just going to create my regular expression, and then pass in options...let's do a match string.1766

I'm creating a constructor--it's calling new--and the token I'm looking for is the Mr.1782

I can also choose it as a regular expression pattern.1796

These options...that array, I'm just going to match it with that string.1800

It actually didn't find it...let's try putting back those quotes...and it does find it there.1808

That is the reason--because we're using a constructor here; we are not using the literal sense.1818

That is how to pass those arguments in.1824

Let me show you the RDoc for regular expressions.1831

We have gone through a lot of these examples already.1838

Here is the one with the forward slash and the Ruby regular expression literal.1843

Notice--we talked about this before--it uses the equal tilde sign, and you can pass a regular expression with the equal tilde and what you are looking up.1852

It will tell you the position it starts with; this one starts with 0.1863

You can also use the .match that we talked about.1867

We already went through some of this: metacharacters, we talked about...1877

Escapes--we didn't actually go over that, but you can escape the special characters, like the plus and the question mark.1882

If you're matching those, make sure to escape those out.1891

Also, notice that you can use a Unicode; here is an example of some Unicode.1895

That gets passed as some international characters.1900

We talked about how the character class uses those square brackets.1911

Over here, notice "aeiou"--if any of the second characters matches these ones, the second letter matches these ones...word matches, ward matches...it will create a match with that regular expression, too.1916

With the same character class, with 0-9, a through f, if it's one of those that falls in that range, it will match it.1937

Here, 9 does, and 9f.1945

We already saw the carat--this one says it's not a through e, not g through z...and it will match f, and it matches f there, because it doesn't fall in that character set.1949

We already went through these metacharacters.1965

You will see that there is actually quite a bit more here.1968

We went through the repetition, and there is also grouping that is allowed--so you notice these parentheses.1974

I will let you look at this RDoc so you can get more details on these regular expressions.1991

Otherwise, this is the end of the regular expression lesson, and thank you for watching Educator.com!1999