メニューを開く トップに戻る トップに戻る

Data Release

ZOZO RESEARCH hosts the following datasets collected from our web services, e.g., ZOZOTOWN and WEAR.
大学など公共の研究機関・研究者が研究目的で利用できるよう、ZOZOTOWN・WEARなどから取得されたデータを公開します

Index

Open Bandit Dataset

Open Bandit Dataset is a public real-world logged bandit feedback data. The dataset is provided by ZOZO, Inc., the largest Japanese fashion e-commerce company with over 5 billion USD market capitalization (as of May 2020). The company uses multi-armed bandit algorithms to recommend fashion items to users in a large-scale fashion e-commerce platform called ZOZOTOWN.

This dataset is released along with the paper:

Yuta Saito, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita.
Large-scale Open Dataset, Pipeline, and Benchmark for Bandit Algorithms https://arxiv.org/abs/2008.07146

When using this dataset, please cite the paper with following bibtex:

@article{saito2020large,
    title={Large-scale Open Dataset, Pipeline, and Benchmark for Bandit Algorithms},
    author={Saito, Yuta, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita},
    journal={arXiv preprint arXiv:2008.07146},
    year={2020}
 }

Data description

Open Bandit Dataset is constructed in an A/B test of two multi-armed bandit policies in a large-scale fashion e-commerce platform, ZOZOTOWN. It currently consists of a total of 26M rows, each one representing a user impression with some feature values, selected items as actions, true propensity scores, and click indicators as an outcome. This is especially suitable for evaluating off-policy evaluation (OPE), which attempts to estimate the counterfactual performance of hypothetical algorithms using data generated by a different algorithm in use.

Fields

Here is a detailed description of the fields (they are comma-separated in the CSV files): {behavior_policy}/{campaign}.csv (behavior_policy in (bts, random), campaign in (all, men, women))

  • timestamp: timestamps of impressions.
  • item_id: index of items as arms (index ranges from 0-80 in "All" campaign, 0-33 for "Men" campaign, and 0-46 "Women" campaign).
  • position: the position of an item being recommended (1, 2, or 3 correspond to left, center, and right position of the ZOZOTOWN recommendation interface, respectively).
  • click: target variable that indicates if an item was clicked (1) or not (0).
  • propensity_score: the probability of an item being recommended at each position.
  • user feature 0-4: user-related feature values.
  • user-item affinity 0-: user-item affinity scores induced by the number of past clicks observed between each user-item pair.

item_context.csv

  • item_id: index of items as arms (index ranges from 0-80 in "All" campaign, 0-33 for "Men" campaign, and 0-46 "Women" campaign).
  • item feature 0-3: item related feature values

Please visit the examples to learn how to use the data.

Google Group

The whole project is on-going. The project team plans to expand the data and release the new versions of the dataset in the near future. If you are interested, then you can follow the updates at out google group: https://groups.google.com/g/open-bandit-project

Contact

For any question, feel free to contact:

The authors of the paper: saito@hanjuku-kaso.com
ZOZO Research: zozo-research@zozo.com

Download

SHIFT15M Dataset

SHIFT15M

The main motivation of our SHIFT15M project is to provide a dataset that contains natural dataset shifts collected from a web service IQON, which was actually in operation for a decade. In addition, the SHIFT15M dataset has several types of dataset shifts, allowing us to evaluate the robustness of the model to different types of shifts (e.g., covariate shift and target shift).

This dataset is released along with the paper:

Masanari Kimura, Takuma Nakamura, and Yuki Saito.
SHIFT15M: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts. https://arxiv.org/abs/2108.12992

Tasks

The following tasks are now available:

TasksTask typeShift type# of input dim# of output dim
NumLikesRegressionregressiontarget shift(N, 25)(N, 1)
SumPricesRegressionregressioncovariate shift, target shift(N, 1)(N, 1)
ItemCategoryClassificationclassificationtarget shift(N, 4096)(N, 7)
Set2SetMatchingset-to-set matchingcovariate shift(N, 4096)x(M, 4096)(1)

Dataset Structure

We maintain the original dataset in JSON format, where each row corresponds to the outfit posted by users in the past on the IQON web service. The structure example is as follows:

{
  "user":{"user_id":"xxxx", "fav_brand_ids":"xxxx,xx,..."},
  "like_num":"xx",
  "set_id":"xxx",
  "items":[
    {"price":"xxxx","item_id":"xxxxxx","category_id1":"xx","category_id2":"xxxxx"},
    ...
  ],
  "publish_date":"yyyy-mm-dd"
}

You can find the details for the above fields and the dataset collection process here.

Note that we anonymized the whole dataset and distribute fashion item features converted from images.

Contact

For any questions, feel free to contact the authors directly or us.
ZOZO Research: zozo-research@zozo .com

Downloads

Please note that downloading the complete SHIFT15M dataset requires more than 160 GB of free space.
Instead, you can download each task's data using the scripts provided in the repository.

When using this dataset, please cite the paper with the following Bibtex:

@misc{kimura2021shift15m,
  title={SHIFT15M: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts},
  author={Masanari Kimura and Takuma Nakamura and Yuki Saito},
  year={2021},
  eprint={2108.12992},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

To download the files, please refer to the guidance in our repository below.

Download