Build a simple stock analysis Python App with MatrixOne

This tutorial shows you how build a simple Python application with MatrixOne.

About the demo

This demo will store the historical stock data of the China stock market and make a simple analysis to find the best stocks to buy.

The basic idea is that we track the P/E (Price-to-Earnings), P/B (Price-to-Book) level of each stock, which can represent a company’s market capitalization. If you are not familiar with these two concepts, please refer to P/E and P/B.

Every day after the market closes, we compare the P/E and P/B value with the historical lowest P/E and P/B. If the current P/E or P/B is even lower than the historical lowest, the stock is very likely under-estimated. We can further investigate its reason of falling and judge if it’s a good time to buy it.

Before you start

Before you start, you need to have a Tushare account and get an API token. If you haven’t signed up for Tushare, you can sign it up at https://tushare.pro/ and you can get your API token at https://tushare.pro/user/token#.

Info

Tushare is an open source utility for crawling historical data of China stocks. It has a complete dataset, but pay attention to their rules of API access limit and frequency.

Besides the data source account, you need to at least have these two basic software installed:

  • Python 3.x
  • MatrixOne

You may refer to Python 3 installation tutorial and MatrixOne installation for more details.

Moreover, we need to install dependant Tushare and pymysql python libraries to use Tushare and access MatrixOne.

  1. pip3 install -r requirements.txt

Info

Pymysql is the only ORM tool supported by MatrixOne. The other python MySQL ORM tools as SQLAlchemy, mysql-connector, MySQLdb are not supported yet.

Step1: prepare and load historical dataset

At first, we need to load the historical stock data in MatrixOne.

As Tushare interface only allows a fetch of 5000 rows of data at once, we just collect the latest 5000 trade days data for each stock. Each year there are roughly 250 trade dates. 5000 makes almost 20 years of data, which largely satisfies our demo.

The below code will set up tushare interface and get the list of stocks.

  1. import tushare as ts
  2. import time
  3. # Set Tushare data source
  4. ts.set_token('YOUR_TUSHARE_API_TOKEN')
  5. pro = ts.pro_api()
  6. # Get the list of stocks
  7. pool = pro.stock_basic(exchange = '',
  8. list_status = 'L',
  9. adj = 'qfq',
  10. fields = 'ts_code,symbol,name,area,industry,fullname,list_date, market,exchange,is_hs')

As we only need the P/E and P/B information, we call the daily_basic method and get each stock with data frames with ts_code, trade_date, pe and pb fields. Without any specification of start date and end date, Tushare will automatically output 5000 lastest records.

  1. j = 1
  2. for i in pool.ts_code:
  3. print('Getting %d stock,Stock Code %s.' % (j, i))
  4. #The interface is limited to be queried 200 times/minute, some little delays are necessary
  5. time.sleep(0.301)
  6. j+=1
  7. #Get stock P/E and P/B data frames
  8. df = pro.daily_basic(**{
  9. "ts_code": i,
  10. "trade_date": "",
  11. "start_date": "",
  12. "end_date": "",
  13. "limit": "",
  14. "offset": ""
  15. }, fields=[
  16. "ts_code",
  17. "trade_date",
  18. "pe",
  19. "pb"
  20. ])

With the data ready, we need to load them into MatrixOne. We use pymysql as the python-MatrixOne connector to run SQLs in MatrixOne. Prior to the below code, the MatrixOne server must be launched and a database called stock needs to be created first.

MatrixOne:

  1. mysql> CREATE DATABASE stock.

Python:

  1. import pymysql
  2. # Open a MatrixOne connection
  3. db = pymysql.connect(host='127.0.0.1',
  4. port=6001,
  5. user='dump',
  6. password='111',
  7. database='stock')
  8. # Create a cursor object
  9. cursor = db.cursor()
  10. # Create PE table
  11. cursor.execute('CREATE TABLE IF NOT EXISTS pe(ts_code VARCHAR(255), trade_date VARCHAR(255), pe FLOAT, pb FLOAT)')

Load each 5000 records of data into MatrixOne:

  1. if df.empty == False:
  2. # Fill P/E and P/B values which are NaN
  3. df = df.fillna(0.0)
  4. val_to_insert = df.values.tolist()
  5. cursor.executemany(" insert into pe (ts_code, trade_date,pe,pb) values (%s, %s,%s,%s)", val_to_insert)

Step2: find the historical lowest P/E or P/B stock

After we load the historical stock data into MatrixOne, we have about 11 millions records.

  1. mysql> select count(*) from pe;
  2. +----------+
  3. | count(*) |
  4. +----------+
  5. | 11233508 |
  6. +----------+
  7. 1 row in set (0.16 sec)

Now we run a sql to calculate the lowest P/E and P/B of each stock.

  1. # Find stocks that the current P/E is even lower than the historical lowest
  2. cursor.execute('select ts_code,min(pe) from pe where pe>0 group by ts_code order by ts_code')
  3. # Fetch the result as python object
  4. value = cursor.fetchall()
  5. # Find stocks that the current P/B is even lower than the historical lowest
  6. cursor.execute('select ts_code,min(pb) from pe where pb>0 group by ts_code order by ts_code')
  7. # Fetch the result as python object
  8. value = cursor.fetchall()

Then we call Tushare daily_basic interface again to get the current P/E and P/B level and make a search to locate the stock whose current P/E or P/B is even lower than the historical lowest. You could of course switch any trade date you want for comparison. We take P/E as an example.

  1. df = pro.daily_basic(**{
  2. "ts_code": "",
  3. "trade_date": sys.argv[1],
  4. "start_date": "",
  5. "end_date": "",
  6. "limit": "",
  7. "offset": ""
  8. }, fields=[
  9. "ts_code",
  10. "pe"
  11. ])
  12. df = df.fillna(0.0)
  13. for i in range(0,len(value)):
  14. ts_code, min_pe = value[i]
  15. for j in range(0, len(df.ts_code)):
  16. if ts_code == df.ts_code[j] and min_pe > df.pe[j] > 0:
  17. logging.getLogger().info("ts_code: %s", ts_code)
  18. logging.getLogger().info("history lowest PE : %f", min_pe)
  19. logging.getLogger().info("current PE found: %f", df.pe[j])

This will print every stock with the lowest ever P/E or P/B. They are usually good choices to buy if they are not experiencing big trouble.

Step3: Update our dataset

We already have a lot of historical data stored in MatrixOne, each time we run a n analysis, we usually have been through a new trading day. The dataset need to be updated with the latest data, otherwise next time you will still compare yourself with the old data.

We don’t need to insert all data again, we just insert the data with the lastest trading dates.

  1. # get updating trading dates by user argument inputs
  2. df = pro.daily_basic(**{
  3. "ts_code": i,
  4. "trade_date": "",
  5. "start_date": self.startDate,,
  6. "end_date": self.endDate,
  7. "limit": "",
  8. "offset": ""
  9. }, fields=[
  10. "ts_code",
  11. "trade_date",
  12. "pe",
  13. "pb"
  14. ])
  15. # append update to MatrixOne
  16. if df.empty == False:
  17. df = df.fillna(0.0)
  18. val_to_insert = df.values.tolist()
  19. print(val_to_insert)
  20. cursor.executemany(" insert into pe (ts_code, trade_date,pe,pb) values (%s, %s,%s,%s)", val_to_insert)

Source Code

You can find the source code of this demo at matrix_one_app.