-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Welcome to the Perl-Quiz wiki! test thing.
Here is a conversation we had, that we wanted to save some ideas in it
[9:35:48 PM] Max: working on it.
[9:35:56 PM] Max: my mic is a little messed up
[9:36:03 PM] Max: so i'll be able to in a sec
[9:36:05 PM] Alex: kk. dont worry about it
[9:36:08 PM] Max: once i get that figured out
[9:46:44 PM] Max: so i can't figure it out
[9:46:53 PM] Max: do you want to just use a phone?
[9:46:59 PM] Alex: yeah sure. skype was to show you my sceen
[9:47:19 PM] Max: ok
[9:47:24 PM] Max: then we can skype without sound
[10:09:52 PM] Max: if ($years)
{
open(TIMEPREPS, "<time_preps.txt") or die "Can't find time preposition dictionary time_preps.txt\n";
foreach()
{
chomp;
$timeprepregex.="( ".$_." )|"; #this way they can be a regex of "or" expressions
#like (in)|(during)|...
}
chop($timeprepregex); #to take last "|" off
close(TIMEPREPS);
}
[11:21:08 PM] Max: i just had a really good idea.
[11:22:16 PM] Alex: hit me
[11:22:35 PM] Max: true/false questions
[11:22:51 PM] Max: automatically replace words at a .5 frequency
[11:23:03 PM] Max: with candidate answers
[11:23:06 PM] Max: same curation processa
[11:23:13 PM] Max: so that you can change the word that was replaced
[11:23:30 PM] Alex: yeah. not getting the .5 freq part
[11:23:36 PM] Alex: replace what words?
[11:23:46 PM] Max: just saying that half the time we don't need to change anything
[11:24:03 PM] Max: any words of relative high frequency
[11:24:10 PM] Alex: oh okays
[11:24:13 PM] Max: we need a way to talk about words "of relative high frequency"
[11:24:22 PM] Max: we can call them "important" or something of the like
[11:24:36 PM] Alex: the .5 freq part confused me with high freq
[11:24:39 PM] Alex: important works
[11:24:44 PM] Alex: important words are important
[11:31:02 PM] Max: also
[11:31:06 PM] Max: questions of the form
[11:31:10 PM] Max: which of the following is true
[11:31:25 PM] Max: I II and IV, IV only
[11:31:31 PM] Max: those questions that everyone hates
[11:31:58 PM] Alex: like the one with I and II as an answer
[11:32:17 PM] Alex: I, II, IV
[11:32:23 PM] Alex: II, III
[11:32:58 PM] Alex: er anywho, write these ideas down somewhere
[11:33:09 PM] Alex: as guys, we dont like to write ideas down
[11:33:16 PM] Alex: but we must
[11:33:29 PM] Max: there's a wiki running that my brother puput
[11:33:33 PM] Max: put up*
[11:33:43 PM] Max: it's somewhere... i haven't used it in a year
[11:33:53 PM] Max: because this idea started about a year ago
[11:33:58 PM] Alex: check box answer questions? i.e. check all of the following that are true
[11:34:05 PM] Alex: okays
[11:34:47 PM] Alex: or the check box thingy would be good for implementing the most highly correlated words with given subject idea you had
[11:37:32 PM] Max: right
[11:37:45 PM] Max: i feel like once a true/false generator is built
[11:37:53 PM] Max: there are a lot of other things that are just a step away
[11:38:06 PM] Max: and i don't really feel like the true false generator is really that far-fetched
[11:38:08 PM] Max: oh!
[11:38:13 PM] Max: we can use the wiki on github if you want
[11:39:42 PM] Alex: sure, ill look for it in a bit
[11:39:57 PM] Alex: trying to get some progress with country stuff
[11:42:34 PM] Max: have a minute to chat about the common words?
[11:42:42 PM] Max: it will just take a second
[11:42:49 PM] Alex: sure
[11:43:46 PM] Max: okay
[11:43:49 PM] Max: i was trying to email you the file
[11:44:03 PM] Max: but then the poop hit the fan
[11:44:09 PM] Max: and my browser crashed
[11:44:11 PM] Max: anywho
[11:44:18 PM] Alex: awe
[11:44:39 PM] Max: the file is formatted word, number of files out of 1000 it occurred in, overall frequency
[11:44:56 PM] Max: and it's sorted by the second entry
[11:45:14 PM] Alex: okays
[11:45:24 PM] Max: so i was wondering if we could cmoe up with some sort of magic formula using the second two to figure out if it needed manual intervention
[11:45:25 PM] Max: like
[11:46:50 PM] Max: number of files it occured in divided by (overall frequency - 1000)
[11:46:54 PM] Max: email sent.
[11:49:03 PM] Alex: didnt get it yet
[11:49:34 PM] Alex: so is the idea to remove common words?
[11:49:40 PM] Alex: from the dictionary
[11:51:02 PM] Max: the idea is to kinda of moderate words which have an unusually hihg frequency
[11:51:10 PM] Max: solely because we're using wikipedia pages
[11:52:52 PM] Alex: so it numberof files its in divided by overall freq is a number greater than a third you could think it would be article or something
[11:53:12 PM] Max: one idea is to look at words which have an overall frequency very close to the frequency of the articles they appear in
[11:53:19 PM] Alex: stil no email btw
[11:53:55 PM] Alex: that, and like close within a small multiple of the number articles
[11:54:04 PM] Max: yeah i was thinking that too
[11:54:11 PM] Max: like look at "stubshidden"
[11:54:12 PM] Alex: for example article might appear 5 time per wiki page
[11:54:29 PM] Max: there's a weird one
[11:54:30 PM] Alex: i would if i had the email :P
[11:54:35 PM] Max: i sent it man
[11:54:39 PM] Max: like 10 minutes ago
[11:54:42 PM] Max: i even sent another one
[11:54:43 PM] Alex: i know. it doesnt like mne
[11:56:56 PM] Max: did it go through yet
[11:57:41 PM] Alex: nope
[11:57:53 PM] Max: anyway here are the words that i came up with
[11:57:54 PM] Max: create 1000 1034
trademark 1000 1003
null 1000 2000
free 1000 2136
available 1000 1142
true 1000 6976
commons 1000 1060
foundation 1000 1044
page 1000 1276
links 1000 1190
registered 1000 1027
wikimedia 1000 1006
terms 1000 2039
modified 1000 1019
bookdownload 1000 1000
creative 1000 1016
wikipedia 1000 2576
pdfprintable 1000 1000
last 1000 1159
cache 1000 2001
article 648 1093
articles 549 1882
external 450 808
stub 418 466
stubshidden 197 197
[11:58:57 PM] Max: here's another, different, question.
[11:59:53 PM] Max: nvm
[11:59:55 PM] Max: i answered it
Wednesday, April 13, 2011
[12:00:00 AM] Alex: those are good words to grab, except true i think
[12:00:13 AM] Alex: haha okay
[12:00:21 AM] Max: if you look at the stripped file, there are lots of trues and falses
[12:00:40 AM] Max: that come incorrectly from the HTML
[12:00:41 AM] Alex: okay
[12:00:47 AM] Max: that's why i kept those
[12:00:52 AM] Alex: okays
[12:01:10 AM] Alex: we dont want to give anything freq is doesnt deserve :P
[12:01:21 AM] Alex: gotta be fair dictionary parents
[12:01:24 AM] Max: right
[12:04:28 AM] Max: so i'm gonna call it a night
[12:04:29 AM] Max: but
[12:04:42 AM] Max: you're gonna have trouble when you want to push back to the server
[12:04:50 AM] Max: so what you're gonna want to do is
[12:04:54 AM] Max: git add
[12:05:03 AM] Max: oh.
[12:05:05 AM] Max: no.
[12:05:09 AM] Max: you're going to want to git pulll first
[12:05:17 AM] Alex: why pull?
[12:05:21 AM] Max: because that will clone the files that i just pushed
[12:05:32 AM] Max: which are unrelated to yours, so there shouldn't be any problems
[12:05:38 AM] Alex: okays
[12:05:48 AM] Max: it's going to die when you try to push because you're behind the master
[12:06:01 AM] Max: master *branch
[12:06:05 AM] Max: so it's git pull
[12:06:08 AM] Max: then git add
[12:06:11 AM] Max: git commit
[12:06:13 AM] Max: git push
[12:06:17 AM] Max: then you should be good.
[12:06:27 AM] Max: and if you can't, we can just take a look at it tomorrow
[12:06:31 AM] Alex: okay. i might not push tonight
[12:06:38 AM] Max: okay
[12:06:52 AM] Alex: i might push just so if we are working elsewhere tomorrow you can grab the half done stuff
[12:06:55 AM] Max: even if it's kinda broken , it's probably a good idea to push
[12:06:57 AM] Max: yeah
[12:07:05 AM] Max: it's okay if it's broken.
[12:07:11 AM] Max: just say that in your comment for the git commit
[12:07:16 AM] Alex: kk
[12:07:22 AM] Max: alright
[12:07:23 AM] Max: night
[12:07:25 AM] Alex: night night