 # Deep Learning Trading Strategy from the beginning to the production. Part II.

Hello everybody, My name is Denis, and here is the second part
of our interesting journey of creating a trading system. In the previous video we defined our goals,
and today we will prepare and label our dataset. Let’s start. Here I defined all my functions. First of all we need to load our data. Today, I will use a small dataset, because
the code is not optimized and some functions may take a really long time to run. As you can see we have a tick data here. Data Time Price Volume Bid and Ask. I will use only the data form one year here. Let’s check some statistics, just to have
an idea what kind of data we have. You can see here our minimum and maximum price. It’s around 100\$ that is good, because it
will be easy to calculate percentages of our profit Also I added one very important parameter
to the dataset. It is the size of a spread. Spread=Ask – Bid Why spread size is important? Simple example. if we have a bid equal to 100 and ask equal
to 101 our spread equals 1\$. And if we buy and immediately sell our stock,
we will always lose the spread, which is equal to 1\$ in this example. I have calculated the mean spread over all
days for every second. As you can see, normally we have a spread
that equals to 1 or 2 cents, but sometimes we have a little bigger spread. We can decide to do our trades only if our
spread is less than 3 cents. On the graph we can see that the biggest spread
occurs in the first few minutes after the market opens. For this reason it makes sense to skip the
first 10-15 minutes after the market opens when implementing an intraday strategy. Now, after we have set the time period, we
can try to generate labels for our model which should predict the price move direction. How we will generate the labels? As I said in the previous video, we don’t
have any triggers for opening a position. Because of that we need to generate labels
for each bar based on expected return. In the last video we defined that we want
to use the window method and generate labels when prices cross the window barriers. 1 – price goes up and crosses the top window
barrier -1 – price goes down and crosses bottom window
barrier 0 – if the price stays in the window
The length of our window will be equal to n bars, and the higher and the lower window
barriers will be set up according to our expected return in percentage. We will slide this window over all bars, with
one step equal to one bar. If the price goes out of the window, the first
bar in the window will get the label. I hope you got my idea. Now, before generating the labels, let us
define the window parameters. The idea is pretty simple, but selecting the
best window size and barrier levels is a challenge. That’s where I’ve actually been stuck
for some time, and I don’t think that I have found a good solution for it. To solve this task, I am going to calculate
historical volatility over the day and also historical volatility over the whole period
for our data. For example, this is the price change on each
tick in one day. And here is the volatility of this day. Let’s check the volatility over the whole
period of our data. As you can see, the mean volatility is only
0.003%. Which is equal to around 30 cents of the current
price. Before we define our window, let’s generate
volume bars from our tick data. It will be easy to calculate the length of
the window using bars instead of ticks, because we will have a similar amount of volume in
one bar and it will provide us with stable conditions. To generate a volume bar, we will go through
the ticks and calculate the volume. For the target volume equal to 1000, we will
start accumulating the volume of each tick until it will not be greater or equal to 1000. All these ticks will represent one bar. Let’s generate bars for one day . You can
see that we got 179 bars in the selected day. So, that is how the price graph looks now. We can also calculate percentage change in
each bar and the daily volatility, using the close price. Well, but I don’t want to use the window equal
to the whole day. To select the window length I tried to generate
100 windows with the random size and check the mean volatility in each window. This graph represents mean volatility in the
windows of different length.. we can see that if we select, for example, the size of the
window equal to 50 bars, we can expect volatility around 0.001%. This value could be useful to define our minimum
expected return and calculate the size of our stop loss price. Let’s calculate the volume bars and random
windows for the whole dataset. Here we can see our window volatility over
the whole period. Now, we will generate the labels, and i will
define my window with the size of 50 bars, and my expected return will be 0.003%, it
is around 30 cents, for the mean price. After the labeling process is finished, we
will get a lot of similar labels, so called crossing labels. We don’t want to have the same labels for
different events, in our case we will leave only the labels that have the closest distance
between the first bar of the window and the bar where the price crosses the window barrier. Let’s check how many labels we have. Well, we have around 700 labels and they distributed
fast equally. Ok, now I will save our dataset. Actually we will have two files. One will contain volume bar dataset, and another
file will contain tick information of each bar, who knows maybe we will need to use this
information in our model. Well, let me stop at this point. I think it is enough for today and if you
want to have more interesting information about the data labeling, you may check Chapters
3 and 4 of the book of Marcos Lopez de Prado. They is really informative. Our next step will be the feature engineering
and running everything with the tfx pipeline. I hope I will have time to create a new episode
soon. See you in the next video. bye.

• ### Loaii abdalslam

December 11, 2019

I'm Watting you man <3 !

• ### People_are_Awesome

December 11, 2019

Жду) Продолжай в том же духе

• • 