Hello everybody, My name is Denis, and here is the second part

of our interesting journey of creating a trading system. In the previous video we defined our goals,

and today we will prepare and label our dataset. Let’s start. Here I defined all my functions. First of all we need to load our data. Today, I will use a small dataset, because

the code is not optimized and some functions may take a really long time to run. As you can see we have a tick data here. Data Time Price Volume Bid and Ask. I will use only the data form one year here. Let’s check some statistics, just to have

an idea what kind of data we have. You can see here our minimum and maximum price. It’s around 100$ that is good, because it

will be easy to calculate percentages of our profit Also I added one very important parameter

to the dataset. It is the size of a spread. Spread=Ask – Bid Why spread size is important? Simple example. if we have a bid equal to 100 and ask equal

to 101 our spread equals 1$. And if we buy and immediately sell our stock,

we will always lose the spread, which is equal to 1$ in this example. I have calculated the mean spread over all

days for every second. As you can see, normally we have a spread

that equals to 1 or 2 cents, but sometimes we have a little bigger spread. We can decide to do our trades only if our

spread is less than 3 cents. On the graph we can see that the biggest spread

occurs in the first few minutes after the market opens. For this reason it makes sense to skip the

first 10-15 minutes after the market opens when implementing an intraday strategy. Now, after we have set the time period, we

can try to generate labels for our model which should predict the price move direction. How we will generate the labels? As I said in the previous video, we don’t

have any triggers for opening a position. Because of that we need to generate labels

for each bar based on expected return. In the last video we defined that we want

to use the window method and generate labels when prices cross the window barriers. 1 – price goes up and crosses the top window

barrier -1 – price goes down and crosses bottom window

barrier 0 – if the price stays in the window

The length of our window will be equal to n bars, and the higher and the lower window

barriers will be set up according to our expected return in percentage. We will slide this window over all bars, with

one step equal to one bar. If the price goes out of the window, the first

bar in the window will get the label. I hope you got my idea. Now, before generating the labels, let us

define the window parameters. The idea is pretty simple, but selecting the

best window size and barrier levels is a challenge. That’s where I’ve actually been stuck

for some time, and I don’t think that I have found a good solution for it. To solve this task, I am going to calculate

historical volatility over the day and also historical volatility over the whole period

for our data. For example, this is the price change on each

tick in one day. And here is the volatility of this day. Let’s check the volatility over the whole

period of our data. As you can see, the mean volatility is only

0.003%. Which is equal to around 30 cents of the current

price. Before we define our window, let’s generate

volume bars from our tick data. It will be easy to calculate the length of

the window using bars instead of ticks, because we will have a similar amount of volume in

one bar and it will provide us with stable conditions. To generate a volume bar, we will go through

the ticks and calculate the volume. For the target volume equal to 1000, we will

start accumulating the volume of each tick until it will not be greater or equal to 1000. All these ticks will represent one bar. Let’s generate bars for one day . You can

see that we got 179 bars in the selected day. So, that is how the price graph looks now. We can also calculate percentage change in

each bar and the daily volatility, using the close price. Well, but I don’t want to use the window equal

to the whole day. To select the window length I tried to generate

100 windows with the random size and check the mean volatility in each window. This graph represents mean volatility in the

windows of different length.. we can see that if we select, for example, the size of the

window equal to 50 bars, we can expect volatility around 0.001%. This value could be useful to define our minimum

expected return and calculate the size of our stop loss price. Let’s calculate the volume bars and random

windows for the whole dataset. Here we can see our window volatility over

the whole period. Now, we will generate the labels, and i will

define my window with the size of 50 bars, and my expected return will be 0.003%, it

is around 30 cents, for the mean price. After the labeling process is finished, we

will get a lot of similar labels, so called crossing labels. We don’t want to have the same labels for

different events, in our case we will leave only the labels that have the closest distance

between the first bar of the window and the bar where the price crosses the window barrier. Let’s check how many labels we have. Well, we have around 700 labels and they distributed

fast equally. Ok, now I will save our dataset. Actually we will have two files. One will contain volume bar dataset, and another

file will contain tick information of each bar, who knows maybe we will need to use this

information in our model. Well, let me stop at this point. I think it is enough for today and if you

want to have more interesting information about the data labeling, you may check Chapters

3 and 4 of the book of Marcos Lopez de Prado. They is really informative. Our next step will be the feature engineering

and running everything with the tfx pipeline. I hope I will have time to create a new episode

soon. See you in the next video. bye.

## Loaii abdalslam

December 11, 2019I'm Watting you man <3 !

## People_are_Awesome

December 11, 2019Жду) Продолжай в том же духе

## CloseToAlgoTrading

December 11, 2019I have added a link to the notebook in the description.

## Regele IONESCU

December 13, 2019Hi! Great video but very poor sound. Try recording the sound in a closet or under a blanket to reduce the anoising echo. Keep up the good work!