PyReader

  • class paddle.fluid.io.PyReader(feed_list=None, capacity=None, use_double_buffer=True, iterable=True, return_list=False)[源代码]

在python中为数据输入创建一个reader对象。将使用python线程预取数据,并将其异步插入队列。当调用Executor.run(…)时,将自动提取队列中的数据。

  • 参数:
    • feed_list (list(Variable)|tuple(Variable)) - feed变量列表,由 fluid.layers.data() 创建。
    • capacity (int) - PyReader对象内部维护队列的容量大小。单位是batch数量。若reader读取速度较快,建议设置较大的capacity值。
    • use_double_buffer (bool) - 是否使用 double_buffer_reader 。若use_double_buffer=True,PyReader会异步地预读取下一个batch的数据,可加速数据读取过程,但同时会占用少量的CPU/GPU存储,即一个batch输入数据的存储空间。
    • iterable (bool) - 所创建的DataLoader对象是否可迭代。
    • return_list (bool) - 每个设备上的数据是否以list形式返回。仅在iterable = True模式下有效。若return_list = False,每个设备上的返回数据均是str -> LoDTensor的映射表,其中映射表的key是每个输入变量的名称。若return_list = True,则每个设备上的返回数据均是list(LoDTensor)。推荐在静态图模式下使用return_list = False,在动态图模式下使用return_list = True。

返回: 被创建的reader对象

返回类型: reader (Reader)

代码示例

1.如果iterable=False,则创建的PyReader对象几乎与 fluid.layers.py_reader() 相同。算子将被插入program中。用户应该在每个epoch之前调用 start() ,并在epoch结束时捕获 Executor.run() 抛出的 fluid.core.EOFException 。一旦捕获到异常,用户应该调用 reset() 手动重置reader。

  1. import paddle
  2. import paddle.fluid as fluid
  3. import numpy as np
  4.  
  5. EPOCH_NUM = 3
  6. ITER_NUM = 5
  7. BATCH_SIZE = 3
  8.  
  9. def network(image, label):
  10. # 用户定义网络,此处以softmax回归为例
  11. predict = fluid.layers.fc(input=image, size=10, act='softmax')
  12. return fluid.layers.cross_entropy(input=predict, label=label)
  13.  
  14. def reader_creator_random_image_and_label(height, width):
  15. def reader():
  16. for i in range(ITER_NUM):
  17. fake_image = np.random.uniform(low=0,
  18. high=255,
  19. size=[height, width])
  20. fake_label = np.ones([1])
  21. yield fake_image, fake_label
  22. return reader
  23.  
  24. image = fluid.layers.data(name='image', shape=[784, 784], dtype='float32')
  25. label = fluid.layers.data(name='label', shape=[1], dtype='int64')
  26.  
  27. reader = fluid.io.PyReader(feed_list=[image, label],
  28. capacity=4,
  29. iterable=False)
  30.  
  31. user_defined_reader = reader_creator_random_image_and_label(784, 784)
  32. reader.decorate_sample_list_generator(
  33. paddle.batch(user_defined_reader, batch_size=BATCH_SIZE))
  34.  
  35. loss = network(image, label)
  36. executor = fluid.Executor(fluid.CPUPlace())
  37. executor.run(fluid.default_startup_program())
  38. for i in range(EPOCH_NUM):
  39. reader.start()
  40. while True:
  41. try:
  42. executor.run(feed=None)
  43. except fluid.core.EOFException:
  44. reader.reset()
  45. break

2.如果iterable=True,则创建的PyReader对象与程序分离。程序中不会插入任何算子。在本例中,创建的reader是一个python生成器,它是可迭代的。用户应将从PyReader对象生成的数据输入 Executor.run(feed=…)

  1. import paddle
  2. import paddle.fluid as fluid
  3. import numpy as np
  4.  
  5. EPOCH_NUM = 3
  6. ITER_NUM = 5
  7. BATCH_SIZE = 10
  8.  
  9. def network(image, label):
  10. # 用户定义网络,此处以softmax回归为例
  11. predict = fluid.layers.fc(input=image, size=10, act='softmax')
  12. return fluid.layers.cross_entropy(input=predict, label=label)
  13.  
  14. def reader_creator_random_image(height, width):
  15. def reader():
  16. for i in range(ITER_NUM):
  17. fake_image = np.random.uniform(low=0, high=255, size=[height, width]),
  18. fake_label = np.ones([1])
  19. yield fake_image, fake_label
  20. return reader
  21.  
  22. image = fluid.layers.data(name='image', shape=[784, 784], dtype='float32')
  23. label = fluid.layers.data(name='label', shape=[1], dtype='int64')
  24. reader = fluid.io.PyReader(feed_list=[image, label], capacity=4, iterable=True, return_list=False)
  25.  
  26. user_defined_reader = reader_creator_random_image(784, 784)
  27. reader.decorate_sample_list_generator(
  28. paddle.batch(user_defined_reader, batch_size=BATCH_SIZE),
  29. fluid.core.CPUPlace())
  30. loss = network(image, label)
  31. executor = fluid.Executor(fluid.CPUPlace())
  32. executor.run(fluid.default_startup_program())
  33.  
  34. for _ in range(EPOCH_NUM):
  35. for data in reader():
  36. executor.run(feed=data, fetch_list=[loss])
  • return_list=True,返回值将用list表示而非dict,通常用于动态图模式中。
  1. import paddle
  2. import paddle.fluid as fluid
  3. import numpy as np
  4.  
  5. EPOCH_NUM = 3
  6. ITER_NUM = 5
  7. BATCH_SIZE = 10
  8.  
  9. def reader_creator_random_image(height, width):
  10. def reader():
  11. for i in range(ITER_NUM):
  12. yield np.random.uniform(low=0, high=255, size=[height, width]), \
  13. np.random.random_integers(low=0, high=9, size=[1])
  14. return reader
  15.  
  16. place = fluid.CPUPlace()
  17. with fluid.dygraph.guard(place):
  18. py_reader = fluid.io.PyReader(capacity=2, return_list=True)
  19. user_defined_reader = reader_creator_random_image(784, 784)
  20. py_reader.decorate_sample_list_generator(
  21. paddle.batch(user_defined_reader, batch_size=BATCH_SIZE),
  22. place)
  23. for image, label in py_reader():
  24. relu = fluid.layers.relu(image)
  • start()

启动数据输入线程。只能在reader对象不可迭代时调用。

代码示例

  1. import paddle
  2. import paddle.fluid as fluid
  3. import numpy as np
  4.  
  5. BATCH_SIZE = 10
  6.  
  7. def generator():
  8. for i in range(5):
  9. yield np.random.uniform(low=0, high=255, size=[784, 784]),
  10.  
  11. image = fluid.layers.data(name='image', shape=[784, 784], dtype='float32')
  12. reader = fluid.io.PyReader(feed_list=[image], capacity=4, iterable=False)
  13. reader.decorate_sample_list_generator(
  14. paddle.batch(generator, batch_size=BATCH_SIZE))
  15.  
  16. executor = fluid.Executor(fluid.CPUPlace())
  17. executor.run(fluid.default_startup_program())
  18. for i in range(3):
  19. reader.start()
  20. while True:
  21. try:
  22. executor.run(feed=None)
  23. except fluid.core.EOFException:
  24. reader.reset()
  25. break
  • reset()

fluid.core.EOFException 抛出时重置reader对象。只能在reader对象不可迭代时调用。

代码示例

  1. import paddle
  2. import paddle.fluid as fluid
  3. import numpy as np
  4.  
  5. BATCH_SIZE = 10
  6.  
  7. def generator():
  8. for i in range(5):
  9. yield np.random.uniform(low=0, high=255, size=[784, 784]),
  10.  
  11. image = fluid.layers.data(name='image', shape=[784, 784], dtype='float32')
  12. reader = fluid.io.PyReader(feed_list=[image], capacity=4, iterable=False)
  13. reader.decorate_sample_list_generator(
  14. paddle.batch(generator, batch_size=BATCH_SIZE))
  15.  
  16. executor = fluid.Executor(fluid.CPUPlace())
  17. executor.run(fluid.default_startup_program())
  18. for i in range(3):
  19. reader.start()
  20. while True:
  21. try:
  22. executor.run(feed=None)
  23. except fluid.core.EOFException:
  24. reader.reset()
  25. break
  • decorate_sample_generator(sample_generator, batch_size, drop_last=True, places=None)

设置PyReader对象的数据源。

提供的 sample_generator 应该是一个python生成器,它生成的数据类型应为list(numpy.ndarray)。

当PyReader对象可迭代时,必须设置 places

如果所有的输入都没有LOD,这个方法比 decorate_sample_list_generator(paddle.batch(sample_generator, …)) 更快。

  • 参数:
    • sample_generator (generator) – Python生成器,yield 类型为list(numpy.ndarray)
    • batch_size (int) – batch size,必须大于0
    • drop_last (bool) – 当样本数小于batch数量时,是否删除最后一个batch
    • places (None|list(CUDAPlace)|list(CPUPlace)) – 位置列表。当PyReader可迭代时必须被提供

代码示例

  1. import paddle.fluid as fluid
  2. import numpy as np
  3.  
  4. EPOCH_NUM = 3
  5. ITER_NUM = 15
  6. BATCH_SIZE = 3
  7.  
  8. def network(image, label):
  9. # 用户定义网络,此处以softmax回归为例
  10. predict = fluid.layers.fc(input=image, size=10, act='softmax')
  11. return fluid.layers.cross_entropy(input=predict, label=label)
  12.  
  13. def random_image_and_label_generator(height, width):
  14. def generator():
  15. for i in range(ITER_NUM):
  16. fake_image = np.random.uniform(low=0,
  17. high=255,
  18. size=[height, width])
  19. fake_label = np.array([1])
  20. yield fake_image, fake_label
  21. return generator
  22.  
  23. image = fluid.layers.data(name='image', shape=[784, 784], dtype='float32')
  24. label = fluid.layers.data(name='label', shape=[1], dtype='int64')
  25. reader = fluid.io.PyReader(feed_list=[image, label], capacity=4, iterable=True)
  26.  
  27. user_defined_generator = random_image_and_label_generator(784, 784)
  28. reader.decorate_sample_generator(user_defined_generator,
  29. batch_size=BATCH_SIZE,
  30. places=[fluid.CPUPlace()])
  31. loss = network(image, label)
  32. executor = fluid.Executor(fluid.CPUPlace())
  33. executor.run(fluid.default_startup_program())
  34.  
  35. for _ in range(EPOCH_NUM):
  36. for data in reader():
  37. executor.run(feed=data, fetch_list=[loss])
  • decorate_sample_list_generator(reader, places=None)

设置PyReader对象的数据源。

提供的 reader 应该是一个python生成器,它生成列表(numpy.ndarray)类型的批处理数据。

当PyReader对象不可迭代时,必须设置 places

  • 参数:
    • reader (generator) – 返回列表(numpy.ndarray)类型的批处理数据的Python生成器
    • places (None|list(CUDAPlace)|list(CPUPlace)) – 位置列表。当PyReader可迭代时必须被提供

代码示例

  1. import paddle
  2. import paddle.fluid as fluid
  3. import numpy as np
  4.  
  5. EPOCH_NUM = 3
  6. ITER_NUM = 15
  7. BATCH_SIZE = 3
  8.  
  9. def network(image, label):
  10. # 用户定义网络,此处以softmax回归为例
  11. predict = fluid.layers.fc(input=image, size=10, act='softmax')
  12. return fluid.layers.cross_entropy(input=predict, label=label)
  13.  
  14. def random_image_and_label_generator(height, width):
  15. def generator():
  16. for i in range(ITER_NUM):
  17. fake_image = np.random.uniform(low=0,
  18. high=255,
  19. size=[height, width])
  20. fake_label = np.ones([1])
  21. yield fake_image, fake_label
  22. return generator
  23.  
  24. image = fluid.layers.data(name='image', shape=[784, 784], dtype='float32')
  25. label = fluid.layers.data(name='label', shape=[1], dtype='int64')
  26. reader = fluid.io.PyReader(feed_list=[image, label], capacity=4, iterable=True)
  27.  
  28. user_defined_generator = random_image_and_label_generator(784, 784)
  29. reader.decorate_sample_list_generator(
  30. paddle.batch(user_defined_generator, batch_size=BATCH_SIZE),
  31. fluid.core.CPUPlace())
  32. loss = network(image, label)
  33. executor = fluid.Executor(fluid.core.CPUPlace())
  34. executor.run(fluid.default_startup_program())
  35.  
  36. for _ in range(EPOCH_NUM):
  37. for data in reader():
  38. executor.run(feed=data, fetch_list=[loss])
  • decorate_batch_generator(reader, places=None)

设置PyReader对象的数据源。

提供的 reader 应该是一个python生成器,它生成列表(numpy.ndarray)类型或LoDTensor类型的批处理数据。

当PyReader对象不可迭代时,必须设置 places

  • 参数:
    • reader (generator) – 返回LoDTensor类型的批处理数据的Python生成器
    • places (None|list(CUDAPlace)|list(CPUPlace)) – 位置列表。当PyReader可迭代时必须被提供

代码示例

  1. import paddle.fluid as fluid
  2. import numpy as np
  3.  
  4. EPOCH_NUM = 3
  5. ITER_NUM = 15
  6. BATCH_SIZE = 3
  7.  
  8. def network(image, label):
  9. # 用户定义网络,此处以softmax回归为例
  10. predict = fluid.layers.fc(input=image, size=10, act='softmax')
  11. return fluid.layers.cross_entropy(input=predict, label=label)
  12.  
  13. def random_image_and_label_generator(height, width):
  14. def generator():
  15. for i in range(ITER_NUM):
  16. batch_image = np.random.uniform(low=0,
  17. high=255,
  18. size=[BATCH_SIZE, height, width])
  19. batch_label = np.ones([BATCH_SIZE, 1])
  20. batch_image = batch_image.astype('float32')
  21. batch_label = batch_label.astype('int64')
  22. yield batch_image, batch_label
  23. return generator
  24.  
  25. image = fluid.layers.data(name='image', shape=[784, 784], dtype='float32')
  26. label = fluid.layers.data(name='label', shape=[1], dtype='int64')
  27. reader = fluid.io.PyReader(feed_list=[image, label], capacity=4, iterable=True)
  28.  
  29. user_defined_generator = random_image_and_label_generator(784, 784)
  30. reader.decorate_batch_generator(user_defined_generator, fluid.CPUPlace())
  31.  
  32. loss = network(image, label)
  33. executor = fluid.Executor(fluid.CPUPlace())
  34. executor.run(fluid.default_startup_program())
  35.  
  36. for _ in range(EPOCH_NUM):
  37. for data in reader():
  38. executor.run(feed=data, fetch_list=[loss])
  • next()

获取下一个数据。用户不应直接调用此方法。此方法用于PaddlePaddle框架内部实现Python 2.x的迭代器协议。