Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locality calibration + text generation #129

Open
wants to merge 104 commits into
base: xinyili/development
Choose a base branch
from

Conversation

imrecommender
Copy link
Contributor

The main logic in the code includes:

  1. the option to enable or disable text generation
  2. the ability to choose whether to consider topic or other filters when finding similar articles from the clicked history
  3. the option to apply a time decay weight when identifying similar articles.

I ran a test on different config combinations using the uploaded basic request JSON. There don’t seem to be any bugs at this point, but the details of the output still need to be checked.

karlhigley and others added 9 commits October 29, 2024 11:50
This copies embeddings from a candidate set to an selected set of
articles, so that embeddings can be used in downstream analysis for
metrics like ILD.
Since we're using the pipeline implementation from the `lkpipeline`
module now, it's confusing to have this earlier implementation still
hanging around in the code base. Removing for clarity.
…-POPROX#124)

This grabs the candidate embeddings from each pipeline execution and
builds up a cache of embeddings over the course of the eval run. Once
the eval run finishes, it writes them to a two column Parquet file where
the first column is the article id and the second is the corresponding
embedding.
…RI-POPROX#127)

This brings us **parallel generation** of recommendations for offline
evaluation, and uses that to regenerate our offline recommendation
outputs (and update our reported timings).

This is a big PR, unfortunately, for a couple of reasons:

- Parallelism requires refactoring quite a bit to have the parallel
operation encapsulated in a worker function.
- `ipyparallel` doesn't work well with code defined in the script that
is being run, so most of the code — in particular, the class definitions
it uses, but also functions — are refactored out into imported modules,
so the `ipyparallel` workers can find them correctly.

Highlights:

- Parallelize generation with the `ipyparallel` package. I have found
this to work well without some of the intermittent bugs I've encountered
with multiprocessing or `ProcessPoolExecutor`, and it also makes it easy
to run initialization and finalization tasks on all worker threads to
set up outputs or finish writing outputs.
- Refactor `generate` into a package (with `__main__.py`, so we can
still run it with `python -m poprox_recommender.evaluation.generate`),
with various modules implementing its pieces.
- `recommendations` is now a directory instead of a single Parquet file;
software that reads Parquets (including Pandas `read_parquet` and
DuckDB) generally accept directories of Parquet files and concatenate
the files they contain, to support sharding. We use this to shard the
output based on worker process; each worker writes its own Parquet
output. Otherwise sending data back to the parent and collecting or
writing it becomes a bottleneck that severely impairs parallelism (I
have learned this through long experience with Lenskit).
- Since each worker has its own output, we can no longer de-duplicate
embeddings while we write them. The solution I implemented is that each
worker de-duplicates embeddings within its own output, and writes the
results to a shared in a temporary directory; the parent process then
collects all those embeddings, de-duplicates them once more, and writes
the result to a single Parquet file.
- To support the per-worker writing, and to avoid slurping all outputs
into memory, this adds a batched parquet writer that accumulates batches
of rows and writes them directly to the Parquet file.
- We can still run in a single process with `-j 1` to the evaluation
script. The default is to look at the `POPROX_NUM_CPUS` environment
variable, or the min of the number of CPUs and 4.
- This adds `recommend-mind-subset` and `measure-mind-subset` tasks to
generate and measure for a small subset (first 1K rows) of MIND, to more
easily facilitate quick testing of the offline eval code.
- The `poprox_recommender.evaluation.generate.outputs` module defines
the layout of the outputs from recommendation generation, so this is
defined in one place instead of needing to keep the different pieces of
the generator consistent.
- **Renames** the “MRR” user metric to “RR” — the individual per-list
metric is recip. rank, and the mean of them is the mean recip. rank.
@sophiasun0515 sophiasun0515 changed the base branch from xinyili/development to main November 6, 2024 18:46
@sophiasun0515 sophiasun0515 changed the base branch from main to xinyili/development November 6, 2024 18:51
Zentavious and others added 17 commits November 8, 2024 09:17
Add generic poprox export support to generate.py
Several relatively small improvements to our Docker config:

- configure Serverless to build for `linux/amd64` explicitly
- update deploy.sh to run serverless properly with npx, and only pull
necessary DVC data
- bump Pixi version in docker images
- clean up docker image a bit, removing DNF caches (saves 60MB in final
image)
- add `outputs/` and `.pixi/` to `.dockerignore`, speeding up Docker
build startup a lot
imrecommender and others added 30 commits December 4, 2024 01:09
Remove stray reference to SentenceTransformers
Adding support for date and pipeline splitting to generate
Bake the MiniLM model into the Docker container for deployment
Aggregate Metrics by Recommender, Locality Theta, and Topic Theta for Locality-Cali Pipelineunifying evals
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants