avatarAndrew Zuo

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

2729

Abstract

can easily extract the year, month, and day using the named groups like this:</p><div id="3ea5"><pre><span class="hljs-keyword">final</span> match = <span class="hljs-built_in">RegExp</span>(r’^(?<year>\d{<span class="hljs-number">4</span>})-(?<month>\d{<span class="hljs-number">2</span>})-(?<day>\d{<span class="hljs-number">2</span>})$’) .firstMatch(‘<span class="hljs-number">2023</span><span class="hljs-number">03</span><span class="hljs-number">15</span>’);

<span class="hljs-keyword">final</span> year = <span class="hljs-built_in">int</span>.parse(match!.namedGroup(<span class="hljs-string">'year'</span>)!); <span class="hljs-keyword">final</span> month = <span class="hljs-built_in">int</span>.parse(match.namedGroup(<span class="hljs-string">'month'</span>)!); <span class="hljs-keyword">final</span> day = <span class="hljs-built_in">int</span>.parse(match.namedGroup(<span class="hljs-string">'day'</span>)!);

<span class="hljs-built_in">print</span>(<span class="hljs-string">'<span class="hljs-subst">year</span>-<span class="hljs-subst">month</span>-<span class="hljs-subst">$day</span>'</span>); <span class="hljs-comment">// Output: 2023–3–15</span></pre></div><p id="6a03">This is used in my language learning app Litany. I ask GPT-3.5 to give me synonyms in the form of ‘Synonyms: a, b, c’ but sometimes it gives them to me in the form:</p><blockquote id="081a"><p>Synonyms

  • a
  • b
  • c</p></blockquote><p id="6ec4">OK, whatever. Just use a regex to convert it. Actually, I’m not using the named part of the named groups. I’m just using groups. It’s the same idea though. You just omit the <code>?<name></code> section and get it using <code>.group(1)</code>.</p><h2 id="c438">2. Unicode Properties</h2><p id="b41d">Also in Litany I have a list of punctuation characters so I can strip them out of the text. But apparently there is a regex that does this for you called Unicode Properties.</p><p id="1172">Unicode properties are a set of special character classes in regular expressions that match characters based on their Unicode properties. Using Unicode properties, you can match characters based on their script, general category, or specific properties like whether they are a letter, digit, or whitespace character.</p><p id="83d7">To use Unicode properties in a regular expression, you use the syntax <code>\p{prop}</code> or <code>\P{prop}</code>, where prop is the name of the Unicode property. Also you have to set <code>unicode</code> to <code>true</code> (In Flutter, not sure about other languages).</p><p id="f10b">I use this to fix a duplicate hyphen issue in my hyphenation package.</p><div id="d7ff" class="link-block"> <a href="https://readmedium.co

Options

m/i-fixed-hyphenation-in-flutter-904e50440c0e"> <div> <div> <h2>I Fixed Hyphenation In Flutter</h2> <div><h3>So a while ago I wrote about a text layout algorithm I wrote for my app.</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*mV612AS_Ku0YPHop)"></div> </div> </div> </a> </div><p id="e5e0">Specifically I use this.</p><div id="d332"><pre><span class="hljs-keyword">final</span> regex = <span class="hljs-title function_ invoke__">RegExp</span>(r’\p{P}’, <span class="hljs-attr">unicode</span>: <span class="hljs-literal">true</span>);</pre></div><p id="9464">There are tons of these available. There’s one for separators, there exists one for upper case, lower case, math symbols, currency symbols, and even different scripts like Greek, Lao, and Tagalog. There’s a lot you can play with.</p><div id="4f58" class="link-block"> <a href="https://www.regular-expressions.info/unicode.html"> <div> <div> <h2>Regex Tutorial - Unicode Characters and Properties</h2> <div><h3>Edit description</h3></div> <div><p>www.regular-expressions.info</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/)"></div> </div> </div> </a> </div><h2 id="c16c">Final Thoughts</h2><p id="9120">There are some other tricks ChatGPT showed me. Like ‘look ahead’ and ‘look behind’, non-capturing groups, and new lines. But these are pretty niche. I don’t see myself using them.</p><p id="af5d">The two tricks above are useful though. I’m already using them. And maybe you can too.</p><p id="2742">THE ABOVE INFORMATION IS FOR ENTERTAINMENT PURPOSES ONLY. THE AUTHOR OF THIS BLOG POST ASSUMES NO LIABILITY FOR ANY DAMAGES ARISING FROM THE USE OF REGULAR EXPRESSIONS INCLUDING, BUT NOT LIMITED TO, INCORRECT OR MISSING RESULTS, DATA LOSS, DATA CORRUPTION, OR THERMONUCLEAR WAR. USE OF THE TECHNIQUES IN THIS BLOG POST ARE ENTIRELY AT YOUR OWN RISK.</p><p id="ecb4">If you liked this article be sure to give it a few claps. It helps out a lot with the algorithm. Also consider subscribing, I made an RSS reader that makes it very easy to do. It’s available on <a href="https://apps.apple.com/us/app/id6445805598?platform=iphone">iOS</a> and <a href="https://play.google.com/store/apps/details?id=com.amorfatite.keystone">Android</a>. ChatGPT contributed to this post.</p></article></body>

Photo by Mika Baumeister on Unsplash

What ChatGPT Has Taught Me About Regexes

ChatGPT is so helpful. When I don’t know how to do something I usually ask ChatGPT for a solution before trying to think of one myself. Especially with anything about Unicode or XML. Because have you read the XML specification? It’s like a novel. No one’s got time for that.

Unicode is hard and most of the time ChatGPT is very helpful. Not always, but usually it is. And one thing ChatGPT has been teaching me recently is regular expressions. I know what you’re thinking: what is there to know about regular expressions? Well:

1. Named Groups

Regular expressions don’t just match text. They can also get the portion that matches using named groups.

They allow you to assign a name to a specific portion of a match. By doing so, you can easily reference and manipulate these specific parts of the match in your code. To create a named group, you use the syntax (?<name>pattern).

For example, suppose you want to extract the date from a string in the format of YYYY-MM-DD. You can use a regular expression with named groups to capture the year, month, and day as separate groups. Here’s an example regular expression:

RegExp(r’^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$’)

In this regular expression, we are using named groups to capture the year, month, and day as separate groups.

With this regular expression, we can easily extract the year, month, and day using the named groups like this:

final match = RegExp(r’^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$’) .firstMatch(‘20230315’);

final year = int.parse(match!.namedGroup('year')!);
final month = int.parse(match.namedGroup('month')!);
final day = int.parse(match.namedGroup('day')!);

print('$year-$month-$day'); // Output: 2023–3–15

This is used in my language learning app Litany. I ask GPT-3.5 to give me synonyms in the form of ‘Synonyms: a, b, c’ but sometimes it gives them to me in the form:

Synonyms - a - b - c

OK, whatever. Just use a regex to convert it. Actually, I’m not using the named part of the named groups. I’m just using groups. It’s the same idea though. You just omit the ?<name> section and get it using .group(1).

2. Unicode Properties

Also in Litany I have a list of punctuation characters so I can strip them out of the text. But apparently there is a regex that does this for you called Unicode Properties.

Unicode properties are a set of special character classes in regular expressions that match characters based on their Unicode properties. Using Unicode properties, you can match characters based on their script, general category, or specific properties like whether they are a letter, digit, or whitespace character.

To use Unicode properties in a regular expression, you use the syntax \p{prop} or \P{prop}, where prop is the name of the Unicode property. Also you have to set unicode to true (In Flutter, not sure about other languages).

I use this to fix a duplicate hyphen issue in my hyphenation package.

Specifically I use this.

final regex = RegExp(r’\p{P}’, unicode: true);

There are tons of these available. There’s one for separators, there exists one for upper case, lower case, math symbols, currency symbols, and even different scripts like Greek, Lao, and Tagalog. There’s a lot you can play with.

Final Thoughts

There are some other tricks ChatGPT showed me. Like ‘look ahead’ and ‘look behind’, non-capturing groups, and new lines. But these are pretty niche. I don’t see myself using them.

The two tricks above are useful though. I’m already using them. And maybe you can too.

THE ABOVE INFORMATION IS FOR ENTERTAINMENT PURPOSES ONLY. THE AUTHOR OF THIS BLOG POST ASSUMES NO LIABILITY FOR ANY DAMAGES ARISING FROM THE USE OF REGULAR EXPRESSIONS INCLUDING, BUT NOT LIMITED TO, INCORRECT OR MISSING RESULTS, DATA LOSS, DATA CORRUPTION, OR THERMONUCLEAR WAR. USE OF THE TECHNIQUES IN THIS BLOG POST ARE ENTIRELY AT YOUR OWN RISK.

If you liked this article be sure to give it a few claps. It helps out a lot with the algorithm. Also consider subscribing, I made an RSS reader that makes it very easy to do. It’s available on iOS and Android. ChatGPT contributed to this post.

Regex
Regexp
Flutter
Programming
Regular Expressions
Recommended from ReadMedium