基本符號

符號	意義	用法
.	換行以外的任何字元	.
?	符合0次或1次，或者比對最少的就停止。	a?
*	符合0次或多次，或者盡可能比對最多。	a*
+	符合1次或多次。	a+
[]	自訂比對格式。左側中括號加上^表示[]內的除外。	[abc] [^abc]
{}	指定比對的次數。	a{1,5} a{1,} a{,5}
()	將規則分組。	(abc)
^	表示開頭。	^abc
$	表示結尾。	abc$
\d	從0到9的數字，大寫D表示\d除外	\d \D
\w	任何的字母、數字及底線符號_，大寫W表示\w除外	\w \W
\s	空白字元，包括空格、(tab)、換行符號，大寫S表示\s除外	\s \S

使用正規表達式

python的用法需要引入一個re模組

import re

使用compile建立物件正規表達式物件，表達式:

Regex = re.compile(r'\d{2}-\d{4}-\d{4}')

字串前加上r的意思是保留原始字串，避免與python其他功能衝突，例如跳脫字元

Regex物件可以用幾個函式來做不同的搜尋:

方法	功能
match	尋找必須是開頭的字串(相當於正規表達式開頭加上^)
search	尋找第一個符合的字串
findall	尋找所有符合的字串
finditer	尋找所有符合的字串

findall會把找到的值以串列方式回傳，沒找到則返回空串列
match、search、finditer是回傳物件，沒找到則返回none，要以以下方法取值:

方法	功能
gruop	找到的值
span	找到值的起始位置與結束位置(回傳數組)
start	找到值的起始位置
end	找到值的結束位置

使用search

Regex = re.compile(r'\d{2}-\d{4}-\d{4}')

str = "Please call David at 02-8888-1688 by today.\r\n02-9888-9898 is his office number."

result = Regex.search(str)
print("group():",result.group())
print("span():",result.span())
print("start():",result.start())
print("end():",result.end())

輸出:

使用match

Regex = re.compile(r'\d{2}-\d{4}-\d{4}')

str = "Please call David at 02-8888-1688 by today.\r\n02-9888-9898 is his office number."

result = Regex.match(str)
print(result)

輸出:

Regex = re.compile(r'\d{2}-\d{4}-\d{4}')

str = "02-9888-9898 is his office number."

result = Regex.match(str)
print(result)

輸出:

使用findall

Regex = re.compile(r'\d{2}-\d{4}-\d{4}')

str = "Please call David at 02-8888-1688 by today.\r\n02-9888-9898 is his office number."

result = Regex.findall(str)
print(result)

輸出:

使用finditer

Regex = re.compile(r'\d{2}-\d{4}-\d{4}')
str = "Please call David at 02-8888-1688 by today.\r\n02-9888-9898 is his office number."

result = Regex.finditer(str)
for match in result:
  print("group():",match.group())
  print("span():",match.span())
  print("start():",match.start())
  print("end():",match.end())
  print()

輸出:

練習

判斷是否為台灣的電話(0911111111,886911111111)
判斷是否為台灣身份證字號(英文大寫字母+1 or 2+八個數字)
判斷格式是否符合:1~3組數字，每組數字佔兩位，並以逗號分隔

進階

旗標

旗標	功能
ASCII, A	使用 ASCII 字符集
UNICODE, U	使用 Unicode 字符集
DOTALL, S	.可以表示任何字元，包括換行字元
IGNORECASE, I	忽略大小寫
LOCALE, L	匹配 {\w \W \b \B} 跟本地語言相關。不推薦使用
MULTILINE, M	^、$將換行作為字串開始與結尾
VERBOSE, X	忽略各種空格以及以#開頭的註釋，這使得長匹配模式可以分行來寫，提高了可讀性

旗標範例

要使用旗標時將re.”旗標名”寫到compile函數後面，並以逗號分隔
寫完整單字或是字母縮寫都可以

Regex = re.compile(r'[ABC]',re.IGNORECASE)
or
Regex = re.compile(r'[ABC]',re.I)

MULTILINE

Regex = re.compile(r'^\w+',re.M)

str = "Please call David at 02-8888-1688 by today.\r\n02-9888-9898 is his office number."

result = Regex.findall(str)
print(result)

輸出:

VERBOSE

下列以電子郵件的正規表達式示範
沒VERBOSE旗標:

Regex = re.compile(r"^([\w!#$%&'*+\-/=?^_`{|}~]+)(\.[\w!#$%&'*+\-/=?^_`{|}~]+)*@[\w-]+(\.[\w-]+)+$")

有VERBOSE旗標:

Regex = re.compile(r"""
                   ^
                   ([\w!#$%&'*+\-/=?^_`{|}~]+)    #第一個點前的字串
                   (\.[\w!#$%&'*+\-/=?^_`{|}~]+)* #任意組點加字串
                   @
                   [\w-]+                         #一組字串
                   (\.[\w-]+)+                    #至少一組點加字串
                   $
                   """,re.X))

分组

被()包括的內容就是一個組，有時候需要再尋找更細節的資料，就可以分組來取得細節

可以發現在原本的正規表達式中再加入了三組括號，分別將三組數字再個別分組，之後輸出在函數中加入參數取值
函數的參數(0)等同於()，就是完整正規表達式所找到的字串
甚至可以以逗號串接引數，將會返回一組數組

Regex = re.compile(r'(\d{2})-(\d{4})-(\d{4})')

str = "Please call David at 02-8888-1688 by today.\r\n02-9888-9898 is his office number."
result = Regex.search(str)
print(result.group(0),result.span(0))
print(result.group(1),result.span(1))
print(result.group(2),result.span(2))
print(result.group(3),result.span(3))
print(result.group(0,1,2,3))

輸出:

使用findall若正規表達式中有分組，則只輸出組的內容，想要完整字串只要全部再用括號包起來即可

Regex = re.compile(r'(\d{2})-(\d{4})-(\d{4})')

str = "Please call David at 02-8888-1688 by today.\r\n02-9888-9898 is his office number."
result = Regex.findall(str)
print(result)

輸出:

若不想在使用findall的時候搜尋到括號的內容，就在括號內的開頭加上?:

Regex = re.compile(r'((\d{2})-(\d{4})-(?:\d{4}))')

str = "Please call David at 02-8888-1688 by today.\r\n02-9888-9898 is his office number."
result = Regex.findall(str)
print(result)

輸出:

(? )這類型的用法還有很多例如:

a(?=b) a後面必須是b

Regex = re.compile(r'\d{4}(?=-1688)')
str = "Please call David at 02-8888-1688 by today.\r\n02-9888-9898 is his office number."
result = Regex.findall(str)
print(result)

輸出:

需要注意的是放在(?= )裡的條件不會出現在結果裡

a(?!b) a後面必須不是b

Regex = re.compile(r'\d{4}(?!-1688)')
str = "Please call David at 02-8888-1688 by today.\r\n02-9888-9898 is his office number."
result = Regex.findall(str)
print(result)

結果:

(?P<first>…) 把分組命名為first\

除了能做為group等函數的參數外，用groupdict函數還能做為key輸出成字典

Regex = re.compile(r'(?P<first>\d{2})-(?P<second>\d{4})-(?P<third>\d{4})')

str = "Please call David at 02-8888-1688 by today.\r\n02-9888-9898 is his office number."
result = Regex.search(str)
print("group()\t\t\t:",result.group())
print("group(\"first\")\t:",result.group("first"))
print("group(\"second\")\t:",result.group("second"))
print("group(\"third\")\t:",result.group("third"))
print("groupdict()\t\t:",result.groupdict())

輸出:

符號	意義
\|	可以比對多個規則，較前面的條件符合為比對結果
\A	字串開頭，等同於^，但不受MULTILINE旗標影響
\Z	字串結尾，等同於$，但不受MULTILINE旗標影響
\b	表示單字邊界，即單字開頭與結尾

正規表達式(python)

Monosparta Lab

Share This Post

基本符號

使用正規表達式

使用search

使用match

使用findall

使用finditer

練習

進階

更多符號

旗標

旗標範例

MULTILINE

VERBOSE

分组

訂閱研究文章

Get updates and learn from the best

More To Explore

換臉照怎麼做的？懷舊風格結合 AI 技術

🕵🏻‍♂️Google新增生成式AI抓漏類別

Keep In Touch

All Rights Reserved © 2022

hurry up !

軟體工程師培訓

限時免費報名中