Skip to content

Tags and deps #551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 39 commits into from
Closed

Tags and deps #551

wants to merge 39 commits into from

Conversation

gbaz
Copy link
Contributor

@gbaz gbaz commented Oct 23, 2016

This is a cleanup of #514 that should supersede it. Changed revdeps in the display to give more useful numbers (after nub applied), made maintainers actually maintainers and not metadata uploaders, made display of already made votes work, improved table styling, added deprecation info to table, and other small UI improvements and cleanups. I'm pretty happy with the result.

I think there's more work to be done on the deploy branch with regards to improving the frontpage and some other docs, but I'm relatively happy here.

Soorya Narayan and others added 20 commits October 9, 2016 21:53
Cateories are parsed into tags. Helper functions to allow for tags to be proposed and accepted set up

Cleaned up the interface
Added Author and Maintainer fields to PackageItem
jQuery DataTables used for the page
Added votes as a queryable parameter
UI changes to propose/accept tags
A clickable UI for accepting/rejecting tags
Added get method to /package/:pkgname/tags
shifted forms to a separate template
When a trustee aliases Tag abcd -> Tag abc, all packages that were earlier
tagged `abcd` get tagged to `abc` and any new packages tagged `abcd` get
retagged `abc` on upload
Added backup functions for tag alias
Fixed tag aliasing bug on startup
Add link-to-search on "Package not found" page
display the names of the executables that a package can build
Limit revdeps to keep track of latest 5 versions
Represent all the dep stats as ReverseCount
HTML views for Revdeps by PackageID
Popularity Metric based on RevDeps/Votes/Downloads
Tag pages with packages by popularity
Add maintainers link to dataTables interface
Fixes category parsing error.
"a,b," now parsed to ["a","b"] as opposed to ["a","b",""]
Minor UI fixes
Fixes tag-edit.html error (caused by rebase)
multiple maintainers (if present) show up in the search tables as
opposed to just the latest uploader
adaptive doubling for rev deps graph
reverse dependencies and voting don't redirect
spacing and UI fixes as pointed
A simple related packages view
(currently based on bm25f results)
A visualization of the package database
Edited the output of /packages/search to use the datatables interface
@gbaz gbaz mentioned this pull request Oct 23, 2016
@gbaz gbaz mentioned this pull request Oct 23, 2016
It doesn't need to be in the interface for the core
We were writing changes into a channel but never reading from it.
Looks like dead code. It was being updated but never read.
These days we pretty much have to have all GETS provide some sane cache
control headers.
Copy link
Contributor

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a few changes directly. This will take a bit more review yet.

let revs = constructReverseIndex index
update $ ReplaceReverseIndex revs
}
initReverseIndex :: IO ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being run every time, so we're not actually benefiting from this as a persistent DB. We could just as well use a MemState and not have the complexities of persistent state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we keep this and get rid of the persistent component, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

index <- queryGetPackageIndex
let ReverseIndex _ revdeps nodemap = constructReverseIndex index
assoc = takeWhile (\(a,_) -> a < Bimap.size nodemap) $ Arr.assocs . Gr.transposeG $ revdeps
forM_ (map (\(a,b) -> (nodemap Bimap.!> a, map (nodemap Bimap.!>) b)) assoc) $ uncurry setReverse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost certainly quadratic since it's doing each package one by one rather than in bulk.

packageNodeIdMap = nodemap
}

changePackage :: PackageId -> [PackageId] -> [PackageId]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused. Should it be used?


arrDouble :: RevDeps -> RevDeps
arrDouble rev =
Gr.buildG (0, 2*revSize rev) (Gr.edges rev)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not how immutable graphs work. Doubling the size helps nothing, since updating the graph has to copy anyway. That is, insEdges is already O(n), and if needs to add more nodes then that can be done in one pass at the same time, using accumArray with the new graph bounds.

let index = PackageIndex.insert pkgid pindex
rd' | revSize revs < Bimap.size nodemap = arrDouble revs | otherwise = revs
npm = Bimap.tryInsert pkgid (Bimap.size nodemap) nodemap
rd = insEdges (map (\d -> (npm ! d, [npm ! pkgid])) deps) rd'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This graph enlarging ought to be done differently. Array size doubling is bogus for pure immutable arrays.

resourceGet = [("html", serveReverseList)]
}
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly these all need cacheControl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto as above -- not quite sure what to do for these, though happy to consult.

putAliasEdit :: DynamicPath -> ServerPartE Response
putAliasEdit dpath = do
tagname <- tagInPath dpath
mergeTags (Tag tagname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mergeTags isn't at the right abstraction level. It includes front-end things like looking up query params and then performing some state update. Instead all the front-end bits should live here and the code in the underlying feature should just deal with validation (if any) and the state/cache updates.

We can fix this by moving parts of the mergeTags code out to here.

addns <- optional $ look "addns"
delns <- optional $ look "delns"
raddns <- optional $ look "raddns"
rdelns <- optional $ look "rdelns"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These front-end bits don't really belong here. putTags should accept all these things as function args and the html layer should get them from the POST params and do the parsing.

@@ -195,20 +230,78 @@ tagsFeature CoreFeature{ queryGetPackageIndex
pkgMap <- liftM packageTags $ queryState tagsState GetPackageTags
return $ Map.fromDistinctAscList . map (\pkg -> (pkg, Map.findWithDefault Set.empty pkg pkgMap)) $ Set.toList pkgs

mergeTags :: Tag -> ServerPartE ()
mergeTags deprTag = do
tags <- optional $ look "tags"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These front-end bits don't really belong here. mergeTags should accept the tags as function args and the html layer should do the lookup and parsing.

@@ -1519,26 +1599,50 @@ mkHtmlTags HtmlUtilities{..}
pkgname <- packageInPath dpath
_ <- lookupPackageName pkgname -- TODO: necessary?
putTags pkgname
return $ toResponse $ Resource.XHtml $ hackagePage "Set tags"
[toHtml "Put tags for ", packageNameLink pkgname]
currTags <- queryTagsForPackage pkgname
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point about mergeTags applies to putTags here.

-- The main reverse dependencies map
reverseDependencies :: RevDeps,
data ReverseIndex = ReverseIndex {
duplicatedIndex :: !(PackageIndex PackageId),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this genuinely is redundant. It is used during lookups, but these are cases where we could grab the PackageIndex from the core feature and use that in combination. This will be a tad easier if we switch to using a MemState. Then the query functions will just need to take the ReverseIndex and the core PackageIndex to do their stuff.

It's not small so it'd be better if we can eliminate it.

where forComponent (_, iftree, elsetree) = harvestDependencies iftree ++ maybe [] harvestDependencies elsetree
constructRevDeps :: PackageIndex PkgInfo -> Bimap PackageId NodeId -> RevDeps
constructRevDeps index nodemap =
let allPackages = concatMap (take 5) $ PackageIndex.allPackagesByName index
Copy link
Contributor

@dcoutts dcoutts Nov 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea what this take 5 is about? Is it trying to take the 5 latest versions? Are we sure it's not actually getting the 5 oldest?

Also this is not consistent. This is only used on startup, but once we're in a steady state then we get all new versions not just the last 5.

addns = toStr $ fst revTags
delns = toStr $ snd revTags
trustainer <- authorisedAsMaintainerOrTrustee pkgname
user <- authorisedAsAnyUser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't likad adding these new ad-hoc auth functions (and particular the new ones they require in turn), and it's better not to add new auth functions to reduce possible confusion. We can just fail immediately if they're not authorised as any user, so overall we can use:

uid <- guardAuthorised [AnyKnownUser]
trustainer <- checkPriviledged uid [InGroup (maintainersGroup pkgname), InGroup trusteesGroup]

now ok having said lets not add new auth functions there's a reasonable argument that we should provide a wrapper for checkPriviledged in the Users feature that does what the underlying checkPriviledged does but supplies the Users db appropriately (which is what I'm implicitly using above). This at least isn't ad-hoc since it follows the existing pattern.


authorisedAsAnyUser :: ServerPartE Bool
authorisedAsAnyUser =
guardAuthorised' [AnyKnownUser]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to add these new auth functions, as mentioned above, but adding a wrapper here for checkPriviledged here is ok. e.g.

checkPriviledged :: UserId -> [PrivilegeCondition] -> ServerPartE Bool
checkPriviledged uid privconds = do
    users <- queryGetUserDb
    Auth.checkPriviledged users uid privconds

@@ -412,6 +421,11 @@ userFeature templates usersState adminsState
overrideResponse <- msum <$> runHook authFailHook err
throwError (fromMaybe defaultResponse overrideResponse)

-- Check if there is an authenticated userid, and return info, if so.
getAuthentication :: ServerPartE (Maybe (UserId, UserInfo))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called checkAuthenticated since that's what it wraps and that's the pattern here.

instance MemSize a => MemSize (IntMap a) where
memSize m = sum [ 8 + memSize v | v <- IntMap.elems m ]

instance MemSize Graph where
memSize m = sum [ 6 + memSize v | v <- Gr.edges m ]
Copy link
Contributor

@dcoutts dcoutts Nov 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just have an instance for ordinary boxed Arrays then this would be automatic. We've got one for unboxed arrays, boxed should be easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the right instance offhand, but feel free to go for it... (shouldn't be a blocker either way)

@gbaz
Copy link
Contributor Author

gbaz commented Nov 26, 2016

Ok, so this should fix everything except for the cache control stuff, which I'll leave to @dcoutts (though I'm happy to consult). Everything with the revdep index is now created directly at startup with no persistence, but it should still be relatively fast I think. Worth testing with a full hackage index just to be sure. Also redid the auth stuff as requested, using getAuthorised' from the users feature, which already existed and was good enough...

@gbaz
Copy link
Contributor Author

gbaz commented Dec 5, 2016

Any chance of reviewing this soon?

@gbaz
Copy link
Contributor Author

gbaz commented Dec 19, 2016

?

@gbaz
Copy link
Contributor Author

gbaz commented Mar 22, 2018

@hvr I think that with the new revdep branch I made there's nothing left on here that hasn't found its way elsewhere?

@gbaz
Copy link
Contributor Author

gbaz commented Jan 1, 2023

everything has now been merged in pieces i believe.

@gbaz gbaz closed this Jan 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants