.. _batch: Batch-Running Pipelines ======================= .. py:currentmodule:: lenskit.batch .. highlight:: python Offline recommendation experiments require *batch-running* a pipeline over a set of test users, sessions, or other recommendation requests. LensKit supports this through the facilities in the :py:mod:`lenskit.batch` module. By default, the batch facilities operate in parallel over the test users; this can be controlled by environment variables (see :ref:`parallel-config`) or through an ``n_jobs`` keyword argument to the various functions and classes. .. admonition:: Import Protection :class: important Scripts using batch pipeline operations must be *protected*; see :ref:`parallel-protecting`. Simple Runs ~~~~~~~~~~~ If you have a pipeline and want to simply generate recommendations for a batch of test users, you can do this with the :py:func:`recommend` function. For an example, let's start with importing things to run a quick batch: >>> from lenskit.basic import PopScorer >>> from lenskit.pipeline import topn_pipeline >>> from lenskit.batch import recommend >>> from lenskit.data import load_movielens >>> from lenskit.splitting import sample_users, SampleN >>> from lenskit.metrics import RunAnalysis, RBP Load and split some data: >>> data = load_movielens('data/ml-100k.zip') >>> split = sample_users(data, 150, SampleN(5, rng=1024), rng=42) Configure and train the model: >>> model = PopScorer() >>> pop_pipe = topn_pipeline(model, n=20) >>> pop_pipe.train(split.train) Generate recommendations: >>> recs = recommend(pop_pipe, split.test.keys(), n_jobs=1) >>> recs.to_df() user_id item_id score rank 0 ... 1 ... [3000 rows x 4 columns] And measure their results: >>> ra = RunAnalysis() >>> ra.add_metric(RBP()) >>> scores = ra.measure(recs, split.test) >>> scores.list_summary() # doctest: +ELLIPSIS mean median std metric RBP 0.06... 0.02... 0.07... The :py:func:`predict` function works similarly, but for rating predictions. Instead of a simple list of user IDs, it takes a dictionary mapping user IDs to lists of test items (as :py:class:`~lenskit.data.ItemList`). General Batch Pipeline Runs ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The :py:func:`recommend` and :py:func:`predict` functions are convenience wrappers around a more general facility, the :py:class:`BatchPipelineRunner`. .. _batch-queries: Batch Queries ~~~~~~~~~~~~~ .. py:currentmodule:: lenskit.data The batch inference functions and methods (:func:`~lenskit.batch.recommend`, :meth:`~lenskit.batch.BatchPipelineRunner.run`, etc.) accept multiple types of input to specify the set of users or test items. * An iterable (e.g. list) of recommendation queries (as :class:`RecQuery` objects). The queries must have at least one of :attr:`RecQuery.query_id` and :attr:`RecQuery.user_id` set, so that the output can be properly indexed. Queries should all have the identification method (i.e., all queries have a ``query_id``, or all queries have only a ``user_id``). * An iterable of 2-element ``(query, items)`` tuples. The query is a :class:`RecQuery` as in the previous method, and the items is an :class:`ItemList` containing the candidate items (for recommendation) or the items to score (for prediction and scoring). This is the most general form of input. * An iterable (e.g. list) of user IDs. These are passed as :attr:`RecQuery.user_id`, and the resulting outputs are mapped to ID. * An :class:`ItemListCollection`. At least one field of the collection key should be ``user_id``, and these user IDs are used as the query user IDs. The item lists themselves are used as in the tuple method above. Results are indexed by the entire key. * A mapping (dictionary) of IDs to item lists. This behaves like the item list collection; the IDs are taken to be user IDs. * A :class:`pandas.DataFrame`, which is converted to an item list collection. .. deprecated:: 2025.6 Mappings and data frames are deprecated in favor of other input types.