-
-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exploring a possible ungroup operation #110
Comments
I don't think there is any equivalent to I'd probably do it like this:
User 09:00 10:00 11:00 12:00 13:00 etc... ====================================================================== A 1 1 0 1 1 B 0 1 1 0 1
There are so many ways you could do this depending on the outcome you want. Please feel free to share the code with and I can help you with it. |
Apologies. I think I phrased my comment poorly but I added this to make a case for adding an ungroup function. It was an example meant to show the type of operation that might require an ungroup function. It has been useful for specific schedule operations I've had in the past, especially for cases where I need to group, do an operation, and then group that group with another operation before ungrouping them all. With regards to your response above, I think the first step can't be filter out breaks for all users because that first operation actually involves flattening a users' specific schedule into work intervals, not removing the breaks. It combines all the overlapping intervals into a single non-overlapping interval based on precedence. I probably could have better represented it by showing an intermediate step
The original work interval has the break and lunch interval removed. The end result is the time when an employee is doing productive work (and not on break or lunch). In reality, that part needs to happen on a per-user basis because there may be multiple kinds of overlap (eg a team meeting overlapping lunch doesn't result in -2 headcount...but you have to look at the specific user's schedule to determine if it is overlapping another break in addition to the work interval. |
I was thinking a bit more about this...perhaps the existing groupBy function can do something similar. I'm not experienced enough with it but perhaps
I really need to play around with data-forge more to understand this part. Honestly, there is no immediate need to get hung up on it so feel free to punt this or close it for a later visit. Just tossing around an idea. |
Yeah groupBy is probably close to what you need. If you find any good examples of how this kind of thing works in Pandas or R please share them here. I'll keep this ticket open and reconsider it again in the future. |
This came up in the data forge 2 discussion (Plans for Data Forge version 2).
I suspect you may be able to do this already in Data Forge but let me lay out my use case. In this example, I want to represent employee schedules in a DataFrame. A schedule for a single user consists of a several rows. Each row contains an interval with a start, end, type (such as work, lunch, break), and user to tie them all together.
Here's an example. I'm representing the interval by using a line such as |------| where the first | is the start (such as 9am) and the second | is the end (such as 5pm).
Example DataFrame - showing a schedule
The operation I want to do is take the schedules for all users, and determine the number of productive headcount I have in a given interval. So in the case of a big department store, I'm trying to determine the number of people working at a given time which I would likely use to ensure that we were staffed for peak demand.
So there are two operations I need to create custom code for. One is to convert a user's schedule into productive time. This removes all the breaks such that I only have the time when they are physically working.
operation: toProductiveTime
The second operation needs to aggregate all these productive intervals into a headcount number
operation: toAggregateChunks
So in order to do this, I want to do the following
I imagine the code might look like
Is this a crazy? To me, it feels very natural to group, do some operations on each subgroup, and then ungroup. It feels easy to read and the operations themselves are separately testable (which is good because they are complicated). I've implemented something just like the above in the past that didn't require loading all the data into memory...but I'm not sure if the way I think about this problem is overly complicated compared to how others might approach it.
The text was updated successfully, but these errors were encountered: