re.finditer实例

从模式对话中提取出每段对话的信息

输入字符串:

  1. Place: School canteen
  2. Topic: food
  3. Tittle:Have lunch
  4. Age: 3-4
  5. J: What did you have for lunch?
  6. L: I ate rice, fish and bread.
  7. J: Do you like rice?
  8. L: Yes, I do.
  9. J: Do you like fish?
  10. L: Yes, I do.
  11. J: Do you like bread?
  12. L: No, I dont.
  13. J: What did you drink?
  14. L: I drank milk.
  15. J: Do you like milk?
  16. L: Yes, I do.
  17. Place: home
  18. Topic: house
  19. Tittle: Doing housework
  20. Age: 4-5
  21. J: Do you like cooking, mom?
  22. M: Yes, I do a lot. What about you?
  23. J: Mom, you know me. I cant cook.
  24. M: But can you help me wash dishes?
  25. J: Yes, I can help you.
  26. M: Lets make a deal, ok?
  27. J: What kind of deal?
  28. M: Im going to cook.
  29. J: And then?
  30. M: Then you wash the dishes after the meal.
  31. J: Thats ok. I d like to help you mom.
  32. M: You are a good boy.
  33. 。。。

代码:

  1. singleScriptPattern = r"(?P<singleScript>place:(?P<place>.+?)\ntopic:(?P<topic>.+?)\ntittle:(?P<title>.+?)\nage:(?P<age>.+?)\n(?P<content>.+?))\n{2,1000}"
  2. matchIterator = re.finditer(singleScriptPattern, allLine, flags=re.I | re.M | re.DOTALL)
  3. print("matchIterator=%s" % matchIterator)
  4. # if matchIterator:
  5. for scriptNum, eachScriptMatch in enumerate(matchIterator):
  6. print("[%d] eachScriptMatch=%s" % (scriptNum, eachScriptMatch))
  7. singleScript = eachScriptMatch.group("singleScript")
  8. print("singleScript=%s" % singleScript)
  9. place = eachScriptMatch.group("place")
  10. print("place=%s" % place)
  11. topic = eachScriptMatch.group("topic")
  12. print("topic=%s" % topic)
  13. title = eachScriptMatch.group("title")
  14. print("title=%s" % title)
  15. age = eachScriptMatch.group("age")
  16. print("age=%s" % age)
  17. content = eachScriptMatch.group("content")
  18. print("content=%s" % content)

log输出:

  1. >matchIterator=<callable_iterator object at 0x10e3f7b70>
  2. [0] eachScriptMatch=<_sre.SRE_Match object; span=(1, 309), match='Place: School canteen\nTopic: food\nTittle:Have l>
  3. singleScript=Place: School canteen
  4. Topic: food
  5. Tittle:Have lunch
  6. Age: 3-4
  7. J: What did you have for lunch?
  8. L: I ate rice, fish and bread.
  9. J: Do you like rice?
  10. L: Yes, I do.
  11. J: Do you like fish?
  12. L: Yes, I do.
  13. J: Do you like bread?
  14. L: No, I don’t.
  15. J: What did you drink?
  16. L: I drank milk.
  17. J: Do you like milk?
  18. L: Yes, I do.
  19. place= School canteen
  20. topic= food
  21. title=Have lunch
  22. age= 3-4
  23. age=J: What did you have for lunch?
  24. L: I ate rice, fish and bread.
  25. J: Do you like rice?
  26. L: Yes, I do.
  27. J: Do you like fish?
  28. L: Yes, I do.
  29. J: Do you like bread?
  30. L: No, I don’t.
  31. J: What did you drink?
  32. L: I drank milk.
  33. J: Do you like milk?
  34. L: Yes, I do.

py_re_finditer_dialog

匹配特定模式的成语

背景需求

问题描述:使用Python正则表达式,进行汉语成语的模式搜索

搜索目的地:汉语成语词典

搜索目标:几种特殊模式的成语,例如:

(1)xxyy模式的,高高兴兴,快快乐乐

(2)数字模式的,三心二意,一泻千里

(3)动物模式的,鸡鸣狗盗,狐假虎威

(4)……

先将汉语成语文件准备好,再在文件中,使用正则表达式,进行搜索。搜索结果,显示在屏幕上,同时保存到一个结果文件中。

代码:

  1. # Function:
  2. # 请教怎样使用Python正则表达式,进行汉语成语模式搜索-CSDN论坛
  3. # https://bbs.csdn.net/topics/396860414
  4. # Author: Crifan
  5. # Update: 20200619
  6. import re
  7. seperator = "-"
  8. idiomStr = """高高兴兴
  9. 快快乐乐
  10. 快乐至上
  11. 欢欢喜喜
  12. 欢天喜地
  13. 一心一意
  14. 三心二意
  15. 一泻千里
  16. 三番五次
  17. 一鼓作气
  18. 以一敌万
  19. 鸡鸣狗盗
  20. 狐假虎威
  21. 兔死狐悲
  22. 狗急跳墙
  23. """
  24. def printSeperatorLine(curTitle):
  25. print("%s %s %s" % (seperator*30, curTitle , seperator*30))
  26. def printEachMatchGroup(someIter):
  27. for curIdx, eachMatch in enumerate(someIter):
  28. curNum = curIdx + 1
  29. # print("eachMatch=%s" % eachMatch)
  30. eachMatchWholeStr = eachMatch.group(0)
  31. print("[%d] %s" % (curNum, eachMatchWholeStr))
  32. printSeperatorLine("xxyy模式成语")
  33. # xxyyP = "(\S)\1(\S)\2"
  34. # xxyyP = "(\S)\1(\S)\2"
  35. # xxyyP = "(\S)\1"
  36. # xxyyP = "(.)\1"
  37. # xxyyP = "(.)\\1"
  38. # xxyyP = "(?:P\S)\\1(\S)\\2"
  39. xxyyP = "(\S)\\1(\S)\\2"
  40. # foundAllXxyy = re.findall(xxyyP, idiomStr, re.S)
  41. # foundAllXxyy = re.search(xxyyP, idiomStr, re.S)
  42. # foundAllXxyyIter = re.finditer(xxyyP, idiomStr, re.S)
  43. foundAllXxyyIter = re.finditer(xxyyP, idiomStr)
  44. # print("foundAllXxyy=%s" % foundAllXxyy)
  45. # for curIdx, eachMatch in enumerate(foundAllXxyyIter):
  46. # curNum = curIdx + 1
  47. # # print("eachMatch=%s" % eachMatch)
  48. # eachMatchWholeStr = eachMatch.group(0)
  49. # print("[%d] %s" % (curNum, eachMatchWholeStr)
  50. printEachMatchGroup(foundAllXxyyIter)
  51. # print("%s %s %s" % (seperator*30, "数字模式成语" , seperator*30))
  52. printSeperatorLine("数字模式成语")
  53. # refer:
  54. # 个,十,百,千,万……兆 后面是什么?-作业-慧海网
  55. # https://www.ajpsp.com/zuoye/4174539
  56. zhcnDigitList = [
  57. "一",
  58. "二",
  59. "三",
  60. "四",
  61. "五",
  62. "六",
  63. "七",
  64. "八",
  65. "九",
  66. "十",
  67. "百",
  68. "千",
  69. "万",
  70. "亿",
  71. "兆",
  72. "京",
  73. "垓",
  74. "秭",
  75. "穰",
  76. "沟",
  77. "涧",
  78. "正",
  79. "载",
  80. ]
  81. zhcnDigitListGroup = "|".join(zhcnDigitList)
  82. zhcnDigitP = "(%s)\S(%s)\S" % (zhcnDigitListGroup, zhcnDigitListGroup)
  83. zhcnDigitIter = re.finditer(zhcnDigitP, idiomStr, re.S)
  84. printEachMatchGroup(zhcnDigitIter)
  85. printSeperatorLine("动物模式成语")
  86. animalList = [
  87. "鸡",
  88. "鸭",
  89. "猫",
  90. "狗",
  91. "猪",
  92. "兔",
  93. "狐",
  94. "狼",
  95. "虎",
  96. "豹",
  97. "狮",
  98. # TODO:添加更多常见动物
  99. ]
  100. animalGroup = "|".join(animalList)
  101. animalP = "(%s)\S(%s)\S" % (animalGroup, animalGroup)
  102. animalIter = re.finditer(animalP, idiomStr, re.S)
  103. printEachMatchGroup(animalIter)

输出:

  1. ------------------------------ xxyy模式成语 ------------------------------
  2. [1] 高高兴兴
  3. [2] 快快乐乐
  4. [3] 欢欢喜喜
  5. ------------------------------ 数字模式成语 ------------------------------
  6. [1] 一心一意
  7. [2] 三心二意
  8. [3] 一泻千里
  9. [4] 三番五次
  10. ------------------------------ 动物模式成语 ------------------------------
  11. [1] 鸡鸣狗盗
  12. [2] 狐假虎威
  13. [3] 兔死狐悲