When we define a statistical model focused on a dependent variable, we attempt to explain the relationship between that dependent variable and other independent variable(s).
To compute these models, we need to vectorize our variables and align them in a way that every column is a variable (discretized) and every row is an observation of the variables in a given point of a common axis. This is mandatory for applications (such as machine learning) which require a tidy dataset as input.
To show it in a simple example:
Above, two variables are both graphed and vectorized sharing a common axis…
In this article, we’ll create a forecasting model to predict housing prices in Seattle. We will first make a model using the properties’ attributes such as sqft, rooms, bedrooms, bathrooms, view, etc.
Then we’ll significantly improve that model by generating features from external data such as proximity of cultural spaces, parks, public art spots, golf courses, swimming beaches, picnic tables, etc.. measuring the improvement from each added feature.
A simple approach.
“You will either step forward into growth or you will step backward into safety.”
Thousands of companies around the world, from small startups to global corporations, find great value in improving the performance of their supervised or unsupervised ML models, whether it’s a sales or demand forecast, a market basket analysis recommender, a customer classifier, a sales optimizer, a chatbot, an algorithmic trading pipeline, a document labeler, an elections forecast, a spam filter, a medical diagnosis solution, a route optimizer, a face recognizer or a self-driving car. …
“Simplicity is the keynote of all true elegance.”
*Disclaimer: I am assuming that whoever has the ability to comprehend and execute the content of this article is savvy enough to perform a robust back-testing in every corner of their trading pipeline before actually running it in production. However, there are some considerations that this article doesn’t take into account (spread, slippage and transaction costs, among others) and for that matter, this article is not to be considered financial advice. It is to be considered an educational step towards better performing results.
“People who are crazy enough to think they can change the world are the ones who do.”
For the last decade, advances on improvement of models have been increasing on many directions because the demand for visible performance is reachable on a global scope. Decision makers don’t need to be Statisticians to understand the value of increasing revenue or decreasing costs.
Thousands of companies around the world, from small startups to global corporations, find great value in improving the performance of their supervised or unsupervised ML models, whether it’s a sales or demand forecast, a market basket analysis…
A step-by-step tutorial in Python.
Sales or Demand Forecasts are a priority on a huge amount of companies (from startups to global corporations) Data Science/Analytics departments. To say the least, there is a low supply of experts in the subject. Reducing the error even by a small amount can make a huge difference in revenue or savings.
In this article, we’ll do a simple sales forecast model with real data and then improve it by finding relevant features using Python.
Simplicity is key.
In this tutorial we’ll make a Machine Learning Pipeline that inputs News and applies NLP to generate predictions for Amazon Stock Price re-training through time.
We’ll also measure how profitable it would be in real life.
Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.
On today’s harsh global economic conditions, traditional indicators and techniques can have poor performances (to say the least).
In this tutorial we’ll search for useful information on news and transform it to a numerical format using NLP to train a Machine Learning model which will predict the rise or fall of any given Cryptocurrency…
The campaigns for the 2020 US Presidential Election have started and roughly 4 months from now (November 3rd is Election Day) a head of state will be selected by the voters.
Which candidate will leverage the concepts that influence the voters’ behavior the most?
Due to the US Electoral College system, it’s highly likely that the election will be defined by the swing states. Currently, the polls are reasonably unanimous on the following:
There are 21 states that lean Democrat
There are 24 states that Lean Republican
And there are 6 swing states where voting preference is still up in…
“To better understand the marketplace, it is incumbent for organizations to look beyond their own four walls for data sources.”
Douglas Laney (VP, Gartner Research)
Thousands of companies around the world, from small startups to global corporations, find great value in being able to accurately predict sales, and it’s almost always one of the priorities for their Data Science / Analytics teams.
However, all of them seem to attempt to increase accuracy (reduce error) by focusing on mainly two things:
1) Feature engineering (getting the most out of your features)
2) Model/parameter optimization (choosing the best model & best parameters)
An attempt to separate signal from noise. | MSDS