5. 快速选取标量

  1. # 通过将行标签赋值给一个变量,用loc选取
  2. In[37]: college = pd.read_csv('data/college.csv', index_col='INSTNM')
  3. cn = 'Texas A & M University-College Station'
  4. college.loc[cn, 'UGDS_WHITE']
  5. Out[37]: 0.66099999999999992
  1. # at可以实现同样的功能
  2. In[38]: college.at[cn, 'UGDS_WHITE']
  3. Out[38]: 0.66099999999999992
  1. # 用魔术方法%timeit,对速度进行比较
  2. In[39]: %timeit college.loc[cn, 'UGDS_WHITE']
  3. Out[39]: 9.93 µs ± 274 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
  4. In[40]: %timeit college.at[cn, 'UGDS_WHITE']
  5. Out[40]: 6.69 µs ± 223 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

.iat.at只接收标量值,是专门用来取代.iloc.loc选取标量的,可以节省大概2.5微秒。

  1. # 用get_loc找到整数位置,再进行速度比较
  2. In[41]: row_num = college.index.get_loc(cn)
  3. col_num = college.columns.get_loc('UGDS_WHITE')
  4. In[42]: row_num, col_num
  5. Out[42]: (3765, 10)
  6. In[43]: %timeit college.iloc[row_num, col_num]
  7. Out[43]: 11.1 µs ± 426 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
  8. In[44]: %timeit college.iat[row_num, col_num]
  9. Out[44]: 7.47 µs ± 109 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
  10. In[45]: %timeit college.iloc[5, col_num]
  11. Out[45]: 10.8 µs ± 467 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
  12. In[46]: %timeit college.iat[5, col_num]
  13. Out[46]: 7.12 µs ± 297 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

更多

  1. # Series对象也可以使用.iat和.at选取标量
  2. In[47]: state = college['STABBR']
  3. In[48]: state.iat[1000]
  4. Out[48]: 'IL'
  5. In[49]: state.at['Stanford University']
  6. Out[49]: 'CA'