re.search实例

提取csdn帖子地址

  1. foundLastListPageUrl = re.search('<a\s+?href="(?P<lastListPageUrl>/\w+?/article/list/\d+)">尾页</a>', homeRespHtml, re.I)
  2. logging.debug("foundLastListPageUrl=%s", foundLastListPageUrl)
  3. if(foundLastListPageUrl):
  4. lastListPageUrl = foundLastListPageUrl.group("lastListPageUrl")

详见:

https://github.com/crifan/BlogsToWordpress/blob/master/libs/crifan/blogModules/BlogCsdn.py

从内容中

  1. <a href="/chenglinhust/article/list/22">尾页</a>

提取出

  1. /chenglinhust/article/list/22

提取csdn帖子的标题

  1. foundTitle = re.search('<span class="link_title"><a href="[\w/]+?">\s*(<font color="red">\[置顶\]</font>)?\s*(?P<titleHtml>.+?)\s*</a>\s*</span>', html, re.S)
  2. logging.debug("foundTitle=%s", foundTitle)
  3. if(foundTitle):
  4. titleHtml = foundTitle.group("titleHtml")
  5. logging.debug("titleHtml=%s", titleHtml)

详见:

https://github.com/crifan/BlogsToWordpress/blob/master/libs/crifan/blogModules/BlogCsdn.py

从内容中

  1. <span class="link_title"><a href="/v_july_v/article/details/6543438">
  2. <font color="red">[置顶]</font>
  3. 程序员面试、算法研究、编程艺术、红黑树4大系列集锦与总结
  4. </a></span>

  1. <span class="link_title"><a href="/chdhust/article/details/7252155">
  2. windows编程中wParam和lParam消息
  3. </a>
  4. </span>

提取出

  1. 程序员面试、算法研究、编程艺术、红黑树4大系列集锦与总结

  1. windows编程中wParamlParam消息