The meaningful features your model is missing maybe a one-liner away.

1. Theory

When we define a statistical model focused on a dependent variable, we attempt to explain the relationship between that dependent variable and other independent variable(s).

To compute these models, we need to vectorize our variables and align them in a way that every column is a variable (discretized) and every row is an observation of the variables in a given point of a common axis. This is mandatory for applications (such as machine learning) which require a tidy dataset as input.

To show it in a simple example:

Above, two variables are both graphed and vectorized sharing a common axis…

Boosting Performance by Generating Features from External Data with Python.


In this article, we’ll create a forecasting model to predict housing prices in Seattle. We will first make a model using the properties’ attributes such as sqft, rooms, bedrooms, bathrooms, view, etc.

Then we’ll significantly improve that model by generating features from external data such as proximity of cultural spaces, parks, public art spots, golf courses, swimming beaches, picnic tables, etc.. measuring the improvement from each added feature.

What we’ll do

  • Step 1: Explore the Seattle Housing Prices Data
  • Step 2: Create a Price Prediction Model
  • Step 3: Add Features from External Data
  • Step 4: Compare and Analyze Results

Step 1. Explore the Seattle Housing Prices Data

To make the model…

A simple approach.

You will either step forward into growth or you will step backward into safety.

-Abraham Maslow-


Thousands of companies around the world, from small startups to global corporations, find great value in improving the performance of their supervised or unsupervised ML models, whether it’s a sales or demand forecast, a market basket analysis recommender, a customer classifier, a sales optimizer, a chatbot, an algorithmic trading pipeline, a document labeler, an elections forecast, a spam filter, a medical diagnosis solution, a route optimizer, a face recognizer or a self-driving car. …

Simplicity is key

Simplicity is the keynote of all true elegance.

-Abraham Maslow-

*Disclaimer: I am assuming that whoever has the ability to comprehend and execute the content of this article is savvy enough to perform a robust back-testing in every corner of their trading pipeline before actually running it in production. However, there are some considerations that this article doesn’t take into account (spread, slippage and transaction costs, among others) and for that matter, this article is not to be considered financial advice. It is to be considered an educational step towards better performing results.


The goal of this article is to…

It’s simpler than you think.

People who are crazy enough to think they can change the world are the ones who do.

-Rob Siltanen-


For the last decade, advances on improvement of models have been increasing on many directions because the demand for visible performance is reachable on a global scope. Decision makers don’t need to be Statisticians to understand the value of increasing revenue or decreasing costs.

Thousands of companies around the world, from small startups to global corporations, find great value in improving the performance of their supervised or unsupervised ML models, whether it’s a sales or demand forecast, a market basket analysis…

A step-by-step tutorial in Python.

Sales or Demand Forecasts are a priority on a huge amount of companies (from startups to global corporations) Data Science/Analytics departments. To say the least, there is a low supply of experts in the subject. Reducing the error even by a small amount can make a huge difference in revenue or savings.

In this article, we’ll do a simple sales forecast model with real data and then improve it by finding relevant features using Python.

What we’ll do

  • Step 1: Define and understand Data and Target
  • Step 2: Make a Simple Forecast Model
  • Step 3: Improve it by…

Simplicity is key.


In this tutorial we’ll make a Machine Learning Pipeline that inputs News and applies NLP to generate predictions for Amazon Stock Price re-training through time.

We’ll also measure how profitable it would be in real life.

What we’ll do

  • Step 1: Set up technical prerequisites
  • Step 2: Get the data for daily Amazon Candles since 2017
  • Step 3: Define and understand target for ML
  • Step 4: Blend business news to our data and understand tokens
  • Step 5: Prepare our data and apply ML
  • Step 6: Measure and analyze results
  • Step 7: Break the data and train/test through time

Step 1. Prerequisites

  • Have Python…

A step-by-step tutorial using Python.

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

On today’s harsh global economic conditions, traditional indicators and techniques can have poor performances (to say the least).

In this tutorial we’ll search for useful information on news and transform it to a numerical format using NLP to train a Machine Learning model which will predict the rise or fall of any given Cryptocurrency…

A Study of Each Swing State’s Key Drivers Using Machine Learning.


The campaigns for the 2020 US Presidential Election have started and roughly 4 months from now (November 3rd is Election Day) a head of state will be selected by the voters.

Which candidate will leverage the concepts that influence the voters’ behavior the most?

Due to the US Electoral College system, it’s highly likely that the election will be defined by the swing states. Currently, the polls are reasonably unanimous on the following:

There are 21 states that lean Democrat

There are 24 states that Lean Republican

And there are 6 swing states where voting preference is still up in…

Create a Record Breaking Demand Sales Forecast on a Step-by-Step Tutorial Using Python

“To better understand the marketplace, it is incumbent for organizations to look beyond their own four walls for data sources.”

Douglas Laney (VP, Gartner Research)


Thousands of companies around the world, from small startups to global corporations, find great value in being able to accurately predict sales, and it’s almost always one of the priorities for their Data Science / Analytics teams.

However, all of them seem to attempt to increase accuracy (reduce error) by focusing on mainly two things:

1) Feature engineering (getting the most out of your features)

2) Model/parameter optimization (choosing the best model & best parameters)

Federico Riveroll

An attempt to separate signal from noise. | MSDS

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store