Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weekly Status Log: Caching Project #1

Open
domoritz opened this issue Sep 11, 2024 · 13 comments
Open

Weekly Status Log: Caching Project #1

domoritz opened this issue Sep 11, 2024 · 13 comments
Assignees

Comments

@domoritz
Copy link
Member

domoritz commented Sep 11, 2024

This is for uwdata#385. Code is in https://github.com/cmudig/mosaic/tree/cache.

@domoritz
Copy link
Member Author

domoritz commented Sep 11, 2024

September 11, 2024

  • Discussed plan for setting up experiment
  • We need to log: query, size (use apache arrow blob size), query time/latency, hit rate ...
  • Eventually we want to have a cache that works well across different scenarios: latency, workload, cache sizes, etc

@audilin
Copy link

audilin commented Sep 12, 2024

September 12, 2024

current goal: create a script to collect data on current LRU caching strategy
the script will:

  • run commands on dataset locally
  • keep track of cache & queries → log important data somehow

in order to do this, we first:

  • need to figure out what happens on the server side
  • need to figure out how queries work

side note: need to figure out how to measure latency

questions:

  • should the script be inside pre-existing files or should it be separate?
  • how does the logging work, i.e. can we write to a txt file or something other than the default JS log?

@audilin
Copy link

audilin commented Sep 18, 2024

September 18, 2024

what we have:

  • logging integrated into current cache file, gets current log: queries, latency, cache size, etc.

next steps:

  • make download button for current log (write a new, custom logger), only saves if key is Query
  • separate folder/webapp for analysis, maybe use observable framework
  • figure out how to download to another file using JSON object (tutorial) (example)
  • see what pre-existing "Log Queries" feature does, and determine how to best merge it into what we're doing
  • make instructions on how to collect logs, and add features such as reset button, etc in readme.

@AllllenLuo
Copy link

AllllenLuo commented Sep 25, 2024

September 25th

What We Have:

  • A Web Interface that allow user to upload json file and display all the logs in a table view.

What's Next:

  • Download the cache as json file, have a button when user check the "Log Query" checkbox.
  • Move the Webapp to the package folder, allow both upload new files and read existing files. Can use checkboxs to select and compare the data across several log files.
  • Build a cache interface that allows add item and check item. This should be generic and apply to different cache algorithm.
  • On the webapp page, create compute hit rate & miss rate over item (or more accurately, use size) and create corresponding plots.

@AllllenLuo
Copy link

AllllenLuo commented Oct 2, 2024

October 2nd

What's New This Week:

  • A cache webapp that allow user to upload single json file and enter custom cache size.
  • Display the hit rate based on the input (currently supports LRU cache)
    image

Questions:

  • Do we need to plot a hit rate vs cache size plot? If so, how to pick the range of the cache size?

What's Next:

  • To be discussed during regular meeting
  • Remove get/set distinction, depend on whether executed in the server side
  • Create a plot of cache-size vs hit rate, calculate sum of all distinct query size, making the hit rate 100% at the end of the plot.
  • Create a plot of how many % of cache is occupied.
  • Create another branch for cache depend on size.

mhli1260 pushed a commit that referenced this issue Oct 2, 2024
@audilin
Copy link

audilin commented Oct 8, 2024

October 8, 2024

What we have:

  • Created a working download button for cache.json
    Image
  • Create two graphs on the cache webapp, one for cache size vs hit rate, and another one for log index vs cache used rate
    Image
    Image

Questions:

  • what do the pre-existing buttons do? specifically "Log Queries, Query Cache, Query Consolidation"
  • how should these buttons affect the downloaded cache

Next steps:

  • what other information about the cache should be collected, and how do I do this
  • will need to update what information is downloaded based on pre-existing buttons

@domoritz
Copy link
Member Author

domoritz commented Oct 8, 2024

I think Log queries and log cache should be the same.

Also, shouldn't the hit rate be 1 at some cache size?

@AllllenLuo
Copy link

Please correct me if I'm wrong, but I think hit rate = number of hit queries / total number of queries. And for example when we are doing the first query, we are guaranteed to miss that query since nothing is in the cache, so I think the hit rate should not be 1.

I think Log queries and log cache should be the same.

Also, shouldn't the hit rate be 1 at some cache size?

@domoritz
Copy link
Member Author

domoritz commented Oct 8, 2024

Maybe we should measure the hit rate relative to repeated queries so that 1 == perfect cache. But maybe we should measure both?

@AllllenLuo
Copy link

AllllenLuo commented Oct 9, 2024

What's next week:

  • Separate the different graphs by sections, highlight the correlation between the input & graph

  • Drop down for file selection

  • Add checkbox for the log queries, show whether each query is hit/miss

  • UI improvement: slider for cache size but not user input

  • instead of looking at what’s in the cache, look at what’s queried and submitted in the query manager (QueryManager.js)

    • can use pre-existing record() function
  • log queries checkbox: prints out queries that are sent to the back end

  • record queries checkbox: save all queries in an array

  • add download button to download recorded queries

@AllllenLuo
Copy link

AllllenLuo commented Oct 22, 2024

October 23, 2024

What's New:

  • File dropdown selection
  • New plot to find hit rate for repeated queries specifically (maximum is 100% for the right graph). For the plot on the left, we keep using the formula of hit rate = hit queries count / total queries count
    Image
  • Separate by sections, all the plots related to custom cache size is placed in the section called "Plots Based On Cache Size"
  • Improve UI, now user can input cache size through a slider
  • The log queries table now indicate whether each query is a hit or a miss
    Image

What's Next:

  • Make interaction between plots (eg. create lines in hit rate graph that can change the cache size in the later section)
  • Implement different cache algorithms
  • pull request for change the cache size-based in the main branch

@audilin
Copy link

audilin commented Nov 12, 2024

November 12, 2024

  • add new record result function to query manager, that'll record when the result of a query comes back
  • want result, time it took to run the query, and the size of the result
  • alternate solution: write proxy for connector?? since we only care about the consolidated queries sent
    • something like this: Image

so many questions about pre-existing record function and it's purpose and why there's multiple recorders

@domoritz
Copy link
Member Author

export async function setDatabaseConnector(type, addLogger) {
  let connector;
  switch (type) {
    case 'socket':
      connector = socketConnector();
      break;
    case 'rest':
      connector = restConnector();
      break;
    case 'rest_https':
      connector = restConnector('https://localhost:3000/');
      break;
    case 'wasm':
      connector = wasm || (wasm = wasmConnector());
      break;
    default:
      throw new Error(`Unrecognized connector type: ${type}`);
  }
  console.log('Database Connector', type);


  if (addLogger) {
    connector = loggerConnector(connector)
  }

  coordinator.databaseConnector(connector);
}

export function loggerConnector(connector) {
  const logs = [];

  return {
    snapshot() {
      return logs;
    },
    async query(query) {
      const result = await connector.query(query);
      logs.push(query, result);
      return result;
    }
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants