Share This Post

爬到PTT的標題 🔗 Link: https://www.ptt.cc/bbs/Gossiping/index.html

套件

  1. requests
  2. beautifulsoup4
pip3 install requests beautifulsoup4

開始

第一次請求

import requests

url = "<https://www.ptt.cc/bbs/sex/index.html>"
response = requests.get(url)

print(response.text)

觀察結果

<!DOCTYPE html>

...

    <body>

<div class="bbs-screen bbs-content">
    <div class="over18-notice">
        <p>本網站已依網站內容分級規定處理</p>

        <p>警告︰您即將進入之看板內容需滿十八歲方可瀏覽。</p>

        <p>若您尚未年滿十八歲,請點選離開。若您已滿十八歲,亦不可將本區之內容派發、傳閱、出售、出 租、交給或借予年齡未滿18歲的人士瀏覽,或將本網站內容向該人士出示、播放或放映。</p>
    </div>
</div>

<div class="bbs-screen bbs-content center clear">
    <form action="/ask/over18" method="post">
        <input type="hidden" name="from" value="/bbs/Gossiping/index.html">
        <div class="over18-button-container">
            <button class="btn-big" type="submit" name="yes" value="yes">我同意,我已年滿十八歲<br><small>進入</small></button>
        </div>
        <div class="over18-button-container">
            <button class="btn-big" type="submit" name="no" value="no">未滿十八歲或不同意本條款<br><small>離開</small></button>
        </div>
    </form>
</div>

...

帶上cookie

import requests

url = "<https://www.ptt.cc/bbs/Gossiping/index.html>"
cookies = {
    "over18": "1"
}

response = requests.get(url, cookies=cookies)

print(response.text)

觀察結果

<!DOCTYPE html>

...

<div id="topbar-container">
        <div id="topbar" class="bbs-content">
                <a id="logo" href="/bbs/">批踢踢實業坊</a>
                <span>&rsaquo;</span>
                <a class="board" href="/bbs/Gossiping/index.html"><span class="board-label">看板 </span>Gossiping</a>
                <a class="right small" href="/about.html">關於我們</a>
                <a class="right small" href="/contact.html">聯絡資訊</a>
        </div>
</div>

<div id="main-container">
        <div id="action-bar-container">
                <div class="action-bar">
                        <div class="btn-group btn-group-dir">
                                <a class="btn selected" href="/bbs/Gossiping/index.html">看板</a> 
                                <a class="btn" href="/man/Gossiping/index.html">精華區</a>        
                        </div>
                        <div class="btn-group btn-group-paging">
                                <a class="btn wide" href="/bbs/Gossiping/index1.html">最舊</a>    
                                <a class="btn wide" href="/bbs/Gossiping/index38935.html">&lsaquo; 上頁</a>
                                <a class="btn wide disabled">下頁 &rsaquo;</a>
                                <a class="btn wide" href="/bbs/Gossiping/index.html">最新</a>     
                        </div>
                </div>
        </div>

        <div class="r-list-container action-bar-margin bbs-screen">
                <div class="search-bar">
                        <form type="get" action="search" id="search-bar">
                                <input class="query" type="text" name="q" value="" placeholder="搜尋文章&#x22ef;">
                        </form>
                </div>

                <div class="r-ent">
                        <div class="nrec"><span class="hl f3">10</span></div>
                        <div class="title">

                                <a href="/bbs/Gossiping/M.1679238802.A.D7B.html">[問卦] Dcard熱門 :我的BP門票疑似被政府收走</a>

                        </div>
                        <div class="meta">
                                <div class="author">newpasta</div>
                                <div class="article-menu">

                                        <div class="trigger">&#x22ef;</div>
                                        <div class="dropdown">
                                                <div class="item"><a href="/bbs/Gossiping/search?q=thread%3A%5B%E5%95%8F%E5%8D%A6%5D&#43;Dcard%E7%86%B1%E9%96%80%EF%BC%9A%E6%88%91%E7%9A%84BP%E9%96%80%E7%A5%A8%E7%96%91%E4%BC%BC%E8%A2%AB%E6%94%BF%E5%BA%9C%E6%94%B6%E8%B5%B0">搜尋同標題文章</a></div>

                                                <div class="item"><a href="/bbs/Gossiping/search?q=author%3Anewpasta">搜尋看板內 newpasta 的文章</a></div>

                                        </div>

                                </div>
                                <div class="date"> 3/19</div>
                                <div class="mark"></div>
                        </div>
                </div>

...

解析HTML

import requests
from bs4 import BeautifulSoup

url = "<https://www.ptt.cc/bbs/Gossiping/index.html>"
cookies = {
    "over18": "1"
}

response = requests.get(url, cookies=cookies)
soup = BeautifulSoup(response.text, 'html.parser')

print(soup)

擷取所有title的div

import requests
from bs4 import BeautifulSoup

url = "<https://www.ptt.cc/bbs/Gossiping/index.html>"
cookies = {
    "over18": "1"
}

response = requests.get(url, cookies=cookies)
soup = BeautifulSoup(response.text, 'html.parser')
divs = soup.find_all('div', class_="title")

print(divs)
[<div class="title">
<a href="/bbs/Gossiping/M.1679239678.A.20C.html">[問卦] 高雄怎麼不一直辦演唱會?</a>
</div>, <div class="title">
<a href="/bbs/Gossiping/M.1679239867.A.EA3.html">[問卦] 越南貨幣動不動就幾十萬幾百萬為什麼不換</a>
</div>, <div class="title">
<a href="/bbs/Gossiping/M.1679239891.A.370.html">[問卦] 台中可以隨便壓人脖子合法?</a>
</div>, <div class="title">
<a href="/bbs/Gossiping/M.1677600392.A.D12.html">[公告] 八卦板板規(2023.03.01)</a>
</div>, <div class="title">
<a href="/bbs/Gossiping/M.1678097203.A.63D.html">[協尋] 3/3 16:50-17:20大寮區新厝路和新三</a>   
</div>, <div class="title">
<a href="/bbs/Gossiping/M.1678780409.A.A18.html">[協尋] 台2濱海事故</a>
</div>, <div class="title">
<a href="/bbs/Gossiping/M.1678924835.A.FA2.html">[協尋] 3/15 7:44高雄大寮行車記錄器(更新地點)</a> 
</div>, <div class="title">
<a href="/bbs/Gossiping/M.1678447616.A.4DB.html">[公告] 代PO 政黑進板圖徵選 5/23截止(每推100P</a> 
</div>]

擷取title的文字

import requests
from bs4 import BeautifulSoup

url = "<https://www.ptt.cc/bbs/Gossiping/index.html>"
cookies = {
    "over18": "1"
}
response = requests.get(url, cookies=cookies)
soup = BeautifulSoup(response.text, 'html.parser')
divs = soup.find_all('div', class_="title")

for div in divs:
    print(div.text)

[問卦] 高雄怎麼不一直辦演唱會?

[問卦] 越南貨幣動不動就幾十萬幾百萬為什麼不換

[問卦] 台中可以隨便壓人脖子合法?

[問卦] 有紅超過15年的韓團嗎?

Re: [問卦] 今年會有台日劇超越黑暗榮耀嗎?

[問卦] 高雄會讓BP覺得台灣很落後嗎?

Re: [問卦] 菲律賓有缺雞蛋嗎

[問卦] 伍佰究竟有什麼魅力?

[公告] 八卦板板規(2023.03.01)

[協尋] 3/3 16:50-17:20大寮區新厝路和新三

[協尋] 台2濱海事故

[協尋] 3/15 7:44高雄大寮行車記錄器(更新地點)

[公告] 代PO 政黑進板圖徵選 5/23截止(每推100P

訂閱研究文章

Get updates and learn from the best