2025年8月20日 / 最終更新日 : 2025年8月20日 fujifuji コラム

機械学習で実現する店舗売上予測（需要予測）の完全ガイド – Python実装例付き

はじめに

店舗売上予測は、小売業界において最も重要な課題の一つです。正確な売上予測により、在庫管理の最適化、人員配置の効率化、マーケティング戦略の改善など、ビジネスの様々な側面で大きな効果を期待できます。

本記事では、機械学習を活用した店舗売上予測の手法を、初心者にも分かりやすく解説します。実際のPythonコード例を交えながら、理論から実装まで包括的に説明していきます。

店舗売上予測とは

店舗売上予測（需要予測）は、過去の売上データや外部要因を基に、将来の売上を予測する技術です。主に以下の目的で活用されます：

在庫管理の最適化: 過剰在庫や品切れのリスクを最小化
人員配置の効率化: 需要に応じた適切なスタッフ配置
マーケティング戦略: プロモーションのタイミングと規模の最適化
財務計画: より正確な売上予測に基づいた予算策定

予測に影響する主な要因

店舗売上に影響する要因は多岐にわたります：

内部要因

過去の売上実績
商品価格
プロモーション実施状況
店舗規模・立地条件

外部要因

季節性・曜日性
天候データ
祝日・イベント情報
競合他社の動向
経済指標

機械学習手法の選択

店舗売上予測には様々な機械学習手法が適用できます。それぞれの特徴と適用場面を見てみましょう。

1. 線形回帰（Linear Regression）

特徴

シンプルで解釈しやすい
計算が高速
特徴量と売上の関係が線形の場合に効果的

適用場面

初期検討やベースラインモデルとして
売上変動が比較的安定している場合

2. ランダムフォレスト（Random Forest）

特徴

非線形関係を捉えられる
過学習に強い
特徴量の重要度が分析できる

適用場面

多くの特徴量がある場合
複雑なパターンを含む売上データ

3. LSTM（Long Short-Term Memory）

特徴

時系列データの長期的な依存関係を学習
季節性やトレンドを自動で学習

適用場面

明確な季節性やトレンドがある場合
長期的な売上パターンを考慮したい場合

データの準備と前処理

機械学習モデルの性能は、データの品質に大きく依存します。以下に基本的な前処理の手順を示します。

import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# サンプルデータの作成
def create_sample_data():
    dates = pd.date_range('2020-01-01', '2023-12-31', freq='D')
    np.random.seed(42)
    
    # 基本的な売上トレンド
    trend = np.linspace(100, 150, len(dates))
    
    # 季節性（年間サイクル）
    seasonal = 30 * np.sin(2 * np.pi * np.arange(len(dates)) / 365.25)
    
    # 曜日効果（土日が高い）
    day_of_week = pd.to_datetime(dates).dayofweek
    weekend_effect = np.where((day_of_week == 5) | (day_of_week == 6), 20, 0)
    
    # ノイズ
    noise = np.random.normal(0, 10, len(dates))
    
    # 売上データ
    sales = trend + seasonal + weekend_effect + noise
    
    return pd.DataFrame({
        'date': dates,
        'sales': np.maximum(sales, 0),  # 負の値を0にクリップ
        'day_of_week': day_of_week,
        'month': pd.to_datetime(dates).month,
        'is_weekend': (day_of_week >= 5).astype(int)
    })

# データの準備
df = create_sample_data()
print("データの概要:")
print(df.head())

特徴量エンジニアリング

売上予測の精度向上のため、元データから新たな特徴量を作成します：

def create_features(df):
    df = df.copy()
    
    # 移動平均（過去7日間）
    df['sales_ma7'] = df['sales'].rolling(window=7).mean()
    
    # 前年同月比
    df['month_avg'] = df.groupby('month')['sales'].transform('mean')
    
    # ラグ特徴量（前日、前週の売上）
    df['sales_lag1'] = df['sales'].shift(1)
    df['sales_lag7'] = df['sales'].shift(7)
    
    # 季節性特徴量
    df['sin_month'] = np.sin(2 * np.pi * df['month'] / 12)
    df['cos_month'] = np.cos(2 * np.pi * df['month'] / 12)
    
    return df.dropna()  # 欠損値を除去

# 特徴量の作成
df_features = create_features(df)
print("特徴量の例:")
print(df_features[['date', 'sales', 'sales_ma7', 'sales_lag1']].head(10))

予測モデルの実装

1. ランダムフォレストによる売上予測

from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error
import matplotlib.pyplot as plt

def train_random_forest_model(df):
    # 特徴量とターゲットの分離
    feature_columns = ['day_of_week', 'month', 'is_weekend', 
                      'sales_ma7', 'sales_lag1', 'sales_lag7',
                      'sin_month', 'cos_month']
    
    X = df[feature_columns]
    y = df['sales']
    
    # 訓練・テストデータの分割
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, shuffle=False
    )
    
    # モデルの訓練
    rf_model = RandomForestRegressor(
        n_estimators=100,
        max_depth=10,
        random_state=42
    )
    rf_model.fit(X_train, y_train)
    
    # 予測
    y_pred = rf_model.predict(X_test)
    
    # 評価指標の計算
    mae = mean_absolute_error(y_test, y_pred)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    
    print(f"ランダムフォレストモデルの性能:")
    print(f"MAE: {mae:.2f}")
    print(f"RMSE: {rmse:.2f}")
    
    return rf_model, X_test, y_test, y_pred

# モデルの訓練と評価
model, X_test, y_test, y_pred = train_random_forest_model(df_features)

2. LSTMによる時系列予測

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from sklearn.preprocessing import MinMaxScaler

def prepare_lstm_data(sales_data, lookback_days=30):
    # データの正規化
    scaler = MinMaxScaler()
    scaled_data = scaler.fit_transform(sales_data.reshape(-1, 1))
    
    # 時系列データの作成
    X, y = [], []
    for i in range(lookback_days, len(scaled_data)):
        X.append(scaled_data[i-lookback_days:i, 0])
        y.append(scaled_data[i, 0])
    
    return np.array(X), np.array(y), scaler

def create_lstm_model(lookback_days=30):
    model = Sequential([
        LSTM(50, return_sequences=True, input_shape=(lookback_days, 1)),
        Dropout(0.2),
        LSTM(50, return_sequences=False),
        Dropout(0.2),
        Dense(25),
        Dense(1)
    ])
    
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

# LSTMモデルの実装例
def train_lstm_model(df):
    sales_data = df['sales'].values
    
    # データの準備
    X, y, scaler = prepare_lstm_data(sales_data)
    
    # 訓練・テストデータの分割
    train_size = int(len(X) * 0.8)
    X_train, X_test = X[:train_size], X[train_size:]
    y_train, y_test = y[:train_size], y[train_size:]
    
    # データの形状変更
    X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
    X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)
    
    # モデルの作成と訓練
    lstm_model = create_lstm_model()
    lstm_model.fit(X_train, y_train, 
                  batch_size=32, epochs=50, verbose=0)
    
    # 予測と逆変換
    predictions = lstm_model.predict(X_test)
    predictions = scaler.inverse_transform(predictions)
    y_test_actual = scaler.inverse_transform(y_test.reshape(-1, 1))
    
    # 評価
    mae = mean_absolute_error(y_test_actual, predictions)
    rmse = np.sqrt(mean_squared_error(y_test_actual, predictions))
    
    print(f"LSTMモデルの性能:")
    print(f"MAE: {mae:.2f}")
    print(f"RMSE: {rmse:.2f}")
    
    return lstm_model, scaler

# 注意: この例はTensorFlowが必要です
# model_lstm, scaler = train_lstm_model(df_features)

モデルの評価と改善

評価指標の選択

売上予測モデルの評価には以下の指標がよく使われます：

def evaluate_model(y_true, y_pred):
    mae = mean_absolute_error(y_true, y_pred)
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    mape = np.mean(np.abs((y_true - y_pred) / y_true)) * 100
    
    print(f"評価指標:")
    print(f"MAE (平均絶対誤差): {mae:.2f}")
    print(f"RMSE (二乗平均平方根誤差): {rmse:.2f}")
    print(f"MAPE (平均絶対パーセント誤差): {mape:.2f}%")
    
    return mae, rmse, mape

# 評価の実行
mae, rmse, mape = evaluate_model(y_test, y_pred)

モデルの改善手法

1. ハイパーパラメータチューニング

from sklearn.model_selection import GridSearchCV

def optimize_random_forest(X_train, y_train):
    param_grid = {
        'n_estimators': [50, 100, 200],
        'max_depth': [5, 10, 15, None],
        'min_samples_split': [2, 5, 10]
    }
    
    rf = RandomForestRegressor(random_state=42)
    grid_search = GridSearchCV(
        rf, param_grid, cv=5, 
        scoring='neg_mean_absolute_error'
    )
    
    grid_search.fit(X_train, y_train)
    
    print(f"最適なパラメータ: {grid_search.best_params_}")
    return grid_search.best_estimator_

2. アンサンブル手法

from sklearn.linear_model import LinearRegression

def create_ensemble_model(X_train, y_train):
    # 複数のモデルを作成
    models = {
        'rf': RandomForestRegressor(n_estimators=100, random_state=42),
        'lr': LinearRegression()
    }
    
    # 各モデルを訓練
    for name, model in models.items():
        model.fit(X_train, y_train)
    
    return models

def ensemble_predict(models, X_test):
    predictions = []
    for model in models.values():
        pred = model.predict(X_test)
        predictions.append(pred)
    
    # 平均を取る
    ensemble_pred = np.mean(predictions, axis=0)
    return ensemble_pred

実用化のポイント

1. リアルタイム予測システム

import joblib
from datetime import datetime, timedelta

class SalesPredictionSystem:
    def __init__(self, model_path):
        self.model = joblib.load(model_path)
        self.feature_columns = ['day_of_week', 'month', 'is_weekend', 
                               'sales_ma7', 'sales_lag1', 'sales_lag7',
                               'sin_month', 'cos_month']
    
    def predict_next_day(self, historical_data):
        # 最新データから特徴量を作成
        features = self.extract_features(historical_data)
        
        # 予測実行
        prediction = self.model.predict([features])[0]
        
        return max(0, prediction)  # 負の値を0にクリップ
    
    def extract_features(self, data):
        # 実際の特徴量抽出ロジック
        latest_date = data['date'].iloc[-1]
        features = [
            latest_date.weekday(),  # day_of_week
            latest_date.month,      # month
            1 if latest_date.weekday() >= 5 else 0,  # is_weekend
            data['sales'].tail(7).mean(),  # sales_ma7
            data['sales'].iloc[-1],        # sales_lag1
            data['sales'].iloc[-7],        # sales_lag7
            np.sin(2 * np.pi * latest_date.month / 12),  # sin_month
            np.cos(2 * np.pi * latest_date.month / 12)   # cos_month
        ]
        
        return features

# 使用例
# system = SalesPredictionSystem('sales_model.pkl')
# next_day_sales = system.predict_next_day(historical_data)

2. モデルの継続的な更新

def update_model_periodically(new_data, existing_model):
    # 新しいデータで特徴量を作成
    new_features = create_features(new_data)
    
    # 増分学習（対応している場合）
    X_new = new_features[feature_columns]
    y_new = new_features['sales']
    
    # 既存モデルを新しいデータで再訓練
    updated_model = existing_model
    # 注意: RandomForestは増分学習に対応していないため、
    # 実際は全データで再訓練が必要
    
    return updated_model

3. 予測の信頼区間

def predict_with_confidence(model, X_test, n_bootstrap=100):
    predictions = []
    
    for _ in range(n_bootstrap):
        # ブートストラップサンプリング
        indices = np.random.choice(len(X_test), len(X_test), replace=True)
        X_bootstrap = X_test.iloc[indices]
        
        # 予測
        pred = model.predict(X_bootstrap)
        predictions.append(pred)
    
    predictions = np.array(predictions)
    
    # 信頼区間の計算
    lower_bound = np.percentile(predictions, 2.5, axis=0)
    upper_bound = np.percentile(predictions, 97.5, axis=0)
    mean_pred = np.mean(predictions, axis=0)
    
    return mean_pred, lower_bound, upper_bound

まとめ

機械学習を活用した店舗売上予測は、小売業界のデジタル変革において重要な技術です。本記事で紹介した手法を参考に、以下のポイントを押さえて実装することをお勧めします：

成功のポイント

データ品質の確保: 正確で一貫性のあるデータ収集
適切な特徴量設計: ビジネス知識を活用した特徴量エンジニアリング
モデル選択: 問題に適した手法の選択
継続的な改善: 定期的なモデル更新と性能監視
業務との統合: 予測結果を実際の業務プロセスに組み込む

売上予測の精度向上により、在庫コストの削減、顧客満足度の向上、収益性の改善など、多くのビジネス価値を実現できます。まずは小規模な試行から始めて、段階的にシステムを拡張していくことが成功への鍵となります。

■テックジム「AIエンジニア養成コース」

AIエンジニア養成コース（機械学習・ディープラーニング）の概要

■プロンプトだけでオリジナルアプリを開発・公開してみた！！

AI駆動開発/生成AIエンジニアコースの開発アプリ紹介

■AI時代の第一歩！「AI駆動開発コース」はじめました！

テックジム東京本校で先行開始。

AI駆動開発/生成AIエンジニアコース（初心者向け）

■テックジム東京本校

「武田塾」のプログラミング版といえば「テックジム」。
講義動画なし、教科書なし。「進捗管理とコーチング」で効率学習。
より早く、より安く、しかも対面型のプログラミングスクールです。

テックジム東京本校

＜短期講習＞5日で5万円の「Pythonミニキャンプ」開催中。

独学もオンラインも無理だから、有給とって「Pythonミニキャンプ」へ【5日間で5万円】

＜オンライン無料＞ゼロから始めるPython爆速講座

【無料・オンライン】ゼロから始めるPython爆速講座

プログラミング学習に関するお役立ちコンテンツ

カテゴリー: コラム

機械学習で実現する店舗売上予測（需要予測）の完全ガイド – Python実装例付き

はじめに

目次

店舗売上予測とは

予測に影響する主な要因

機械学習手法の選択

1. 線形回帰（Linear Regression）

2. ランダムフォレスト（Random Forest）

3. LSTM（Long Short-Term Memory）

データの準備と前処理

特徴量エンジニアリング

予測モデルの実装

1. ランダムフォレストによる売上予測

2. LSTMによる時系列予測

モデルの評価と改善

評価指標の選択

モデルの改善手法

実用化のポイント

1. リアルタイム予測システム

2. モデルの継続的な更新

3. 予測の信頼区間

まとめ

プログラミング学習に関するお役立ちコンテンツ

営業売上予測を機械学習で実現する方法｜Python実装例とビジネス活用術

会員ビジネスの機械学習予測：入会・継続・退会を精度高く予測する方法【Python実装付き】