Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge develop into master #408

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
841b5fe
First fortifications against phrases ¨longer than 60 characters in a …
xmacex Oct 1, 2018
d3ae0fb
Explode out the phrases into a variable before inspecting their lengths
xmacex Oct 2, 2018
540d385
Added a small test suite, which now also relies on the create_new_bin…
xmacex Oct 2, 2018
35a8b0b
Added a small test suite, which now also relies on the create_new_bin…
xmacex Oct 2, 2018
f5fb3d7
Merge branch 'phrases_must_not_exceed_60chrs_each' of https://github.…
xmacex Oct 2, 2018
25e4baa
Fixed a minor typo of a test data set name. I'll be honest, this is a…
xmacex Oct 2, 2018
e11cafe
Added checks against long query phrases to the query modification fun…
xmacex Oct 2, 2018
27aa81f
Documentation update for the capture form to inform the user of the 6…
xmacex Nov 10, 2018
0b02847
Merge branch 'master' into phrases_must_not_exceed_60chrs_each
xmacex Nov 10, 2018
026ea8a
A number of renames. This is basically me procrastinating before gett…
xmacex Nov 13, 2018
4f987a2
Nope, stick to the file names of the codebase when naming tests. This…
xmacex Nov 13, 2018
9a01571
A test harness for starting work towards maximum keyword length valid…
xmacex Nov 13, 2018
14e332b
Added the scenario where a list of keywords is at surface level too l…
xmacex Nov 13, 2018
df582ee
Ok validate_capture_phrases() should now catch phrases which are over…
xmacex Nov 13, 2018
2ccf102
Dropped the test suite of this branch of changes. I'll keep using the…
xmacex Nov 13, 2018
dbcba7f
Whoops I had misunderstood what search.php takes as input; it take th…
xmacex Nov 13, 2018
4d76576
Attempt to make bin removal and info lookup more robust.
mikesname Jan 3, 2019
f3859dc
Merge branch 'master' into phrases_must_not_exceed_60chrs_each
Feb 23, 2020
0ba9a0e
Removed recursive call to search
eeftychiou Mar 11, 2020
7f515ce
Consider all DB-collations beginning with 'utf8mb4' to be utf8mb4 col…
tmantynen Jun 22, 2020
36ea4c8
Fix issue #401 - Google chart not displayed.
tmantynen Jun 23, 2020
0319896
Merge pull request #343 from mikesname/query_lookup_errors into develop
Jul 8, 2020
484a16a
Merge pull request #402 from tmantynen/master
Jul 8, 2020
0d4d4e8
Merge pull request #389 from eeftychiou/searchfix
Jul 8, 2020
41204fd
Merge pull request #335 from xmacex/phrases_must_not_exceed_60chrs_each
Jul 8, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion analysis/common/functions.php
Original file line number Diff line number Diff line change
Expand Up @@ -866,7 +866,7 @@ function current_collation() {
$rec = $dbh->prepare($sql);
$rec->execute();
while ($res = $rec->fetch(PDO::FETCH_ASSOC)) {
if (array_key_exists('Collation', $res) && ($res['Collation'] == 'utf8mb4_unicode_ci' || $res['Collation'] == 'utf8mb4_general_ci')) {
if (array_key_exists('Collation', $res) && substr($res['Collation'], 0, 7) === 'utf8mb4') {
$is_utf8mb4 = true;
break;
}
Expand Down
3 changes: 3 additions & 0 deletions capture/common/form.trackphrases.php
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@
<li> exact phrases: ['global warming'] will get only tweets with the exact phrase. Beware, however that due to how the streaming API works, tweets are captured in the same way as in 2, but tweets that do not match the exact phrase are thrown away. This means that you will request many more tweets from the Twitter API than you will see in your query bin - thus increasing the possibility that you will hit a <a href='https://dev.twitter.com/docs/faq#6861' target='_blank'>rate limit</a>. E.g. if you specify a query like ['are we'] all tweets matching both [are] and [we] are retrieved, while DMI-TCAT only retains those with the exact phrase ['are we'].</li>
</ol>

The phrases between commas can be maximum 60 characters long.
<br/>

You can track a maximum of 400 queries at the same time (for all query bins combined) and the total volume should never exceed 1% of global Twitter volume, at any specific moment in time.
<br/>
Example bin: globalwarming,global warming,'climate change'
Expand Down
5 changes: 5 additions & 0 deletions capture/index.php
Original file line number Diff line number Diff line change
Expand Up @@ -762,6 +762,11 @@ function validateQuery(query,type) {
return false;
}
if(type == 'track') {
if(query.split(',').some(subq => subq.length > 60)) {
alert("Query phrases should not exceed 60 characters each. Please shorten your query phrases.");
return false;
};

// if literal phrase, there should be no comma's in between
if(query.indexOf("'")==-1) {
return true;
Expand Down
172 changes: 103 additions & 69 deletions capture/query_manager.php
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,17 @@ function create_new_bin($params) {
echo '{"msg":"This capturing type is not defined in the config file"}';
return;
}
if($type == 'track') {
$phrases = explode(",", $params["newbin_phrases"]);
$phrases = array_trim_and_unique($phrases);
foreach($phrases as $phrase) {
if(strlen($phrase) > 60) {
echo '{"msg":"Cannot add query because a phrase is too long."}';
throw new LengthException('A query phrase exceeds 60 chrs.');
return;
}
}
}
$comments = sanitize_comments($params['newbin_comments']);

// check whether the main query management tables are there, if not, create
Expand Down Expand Up @@ -182,71 +193,77 @@ function remove_bin($params) {
$bin_name = $results['querybin'];
}

// delete tcat_query_bin table
$sql = "DELETE FROM tcat_query_bins WHERE id = :id";
$delete_querybin = $dbh->prepare($sql);
$delete_querybin->bindParam(':id', $bin_id, PDO::PARAM_INT);
$delete_querybin->execute();

// delete periods associated with the query bin
$sql = "DELETE FROM tcat_query_bins_periods WHERE querybin_id = :id";
$delete_querybin_periods = $dbh->prepare($sql);
$delete_querybin_periods->bindParam(':id', $bin_id, PDO::PARAM_INT);
$delete_querybin_periods->execute();

// delete phrase references associated with the query bin
$sql = "DELETE FROM tcat_query_bins_phrases WHERE querybin_id = :id";
$delete_query_bins_phrases = $dbh->prepare($sql);
$delete_query_bins_phrases->bindParam(":id", $bin_id, PDO::PARAM_INT);
$delete_query_bins_phrases->execute();

// delete orphaned phrases
$sql = "DELETE FROM tcat_query_phrases where id not in ( select phrase_id from tcat_query_bins_phrases )";
$delete_query_phrases = $dbh->prepare($sql);
$delete_query_phrases->execute();

// delete user references associated with the query bin
$sql = "DELETE FROM tcat_query_bins_users WHERE querybin_id = :id";
$delete_query_bins_users = $dbh->prepare($sql);
$delete_query_bins_users->bindParam(":id", $bin_id, PDO::PARAM_INT);
$delete_query_bins_users->execute();

// delete orphaned users
$sql = "DELETE FROM tcat_query_users where id not in ( select user_id from tcat_query_bins_users )";
$delete_query_users = $dbh->prepare($sql);
$delete_query_users->execute();

$sql = "DROP TABLE " . $bin_name . "_tweets";
$delete_table = $dbh->prepare($sql);
$delete_table->execute();

$sql = "DROP TABLE " . $bin_name . "_mentions";
$delete_table = $dbh->prepare($sql);
$delete_table->execute();

$sql = "DROP TABLE " . $bin_name . "_hashtags";
$delete_table = $dbh->prepare($sql);
$delete_table->execute();

$sql = "DROP TABLE " . $bin_name . "_urls";
$delete_table = $dbh->prepare($sql);
$delete_table->execute();

$sql = "DROP TABLE " . $bin_name . "_withheld";
$delete_table = $dbh->prepare($sql);
$delete_table->execute();

$sql = "DROP TABLE " . $bin_name . "_places";
$delete_table = $dbh->prepare($sql);
$delete_table->execute();

$sql = "DROP TABLE " . $bin_name . "_media";
$delete_table = $dbh->prepare($sql);
$delete_table->execute();

echo '{"msg":"Query bin [' . $bin_name . ']has been deleted"}';

$dbh = false;
$dbh->beginTransaction();
try {
// delete tcat_query_bin table
$sql = "DELETE FROM tcat_query_bins WHERE id = :id";
$delete_querybin = $dbh->prepare($sql);
$delete_querybin->bindParam(':id', $bin_id, PDO::PARAM_INT);
$delete_querybin->execute();

// delete periods associated with the query bin
$sql = "DELETE FROM tcat_query_bins_periods WHERE querybin_id = :id";
$delete_querybin_periods = $dbh->prepare($sql);
$delete_querybin_periods->bindParam(':id', $bin_id, PDO::PARAM_INT);
$delete_querybin_periods->execute();

// delete phrase references associated with the query bin
$sql = "DELETE FROM tcat_query_bins_phrases WHERE querybin_id = :id";
$delete_query_bins_phrases = $dbh->prepare($sql);
$delete_query_bins_phrases->bindParam(":id", $bin_id, PDO::PARAM_INT);
$delete_query_bins_phrases->execute();

// delete orphaned phrases
$sql = "DELETE FROM tcat_query_phrases where id not in ( select phrase_id from tcat_query_bins_phrases )";
$delete_query_phrases = $dbh->prepare($sql);
$delete_query_phrases->execute();

// delete user references associated with the query bin
$sql = "DELETE FROM tcat_query_bins_users WHERE querybin_id = :id";
$delete_query_bins_users = $dbh->prepare($sql);
$delete_query_bins_users->bindParam(":id", $bin_id, PDO::PARAM_INT);
$delete_query_bins_users->execute();

// delete orphaned users
$sql = "DELETE FROM tcat_query_users where id not in ( select user_id from tcat_query_bins_users )";
$delete_query_users = $dbh->prepare($sql);
$delete_query_users->execute();

$sql = "DROP TABLE " . $bin_name . "_tweets";
$delete_table = $dbh->prepare($sql);
$delete_table->execute();

$sql = "DROP TABLE " . $bin_name . "_mentions";
$delete_table = $dbh->prepare($sql);
$delete_table->execute();

$sql = "DROP TABLE " . $bin_name . "_hashtags";
$delete_table = $dbh->prepare($sql);
$delete_table->execute();

$sql = "DROP TABLE " . $bin_name . "_urls";
$delete_table = $dbh->prepare($sql);
$delete_table->execute();

$sql = "DROP TABLE " . $bin_name . "_withheld";
$delete_table = $dbh->prepare($sql);
$delete_table->execute();

$sql = "DROP TABLE " . $bin_name . "_places";
$delete_table = $dbh->prepare($sql);
$delete_table->execute();

$sql = "DROP TABLE " . $bin_name . "_media";
$delete_table = $dbh->prepare($sql);
$delete_table->execute();

$dbh->commit();

echo '{"msg":"Query bin [' . $bin_name . ']has been deleted"}';
} catch (PDOException $e) {
error_log("Unable to remove bin '" . $bin_name . "': " . $e->getMessage());
$dbh->rollBack();
}
}

function pause_bin($params) {
Expand Down Expand Up @@ -438,6 +455,19 @@ function modify_bin_comments($querybin_id, $params) {
function modify_bin($params) {
global $captureroles, $now;

$type = $params['type'];
if($type == 'track') {
$phrases = explode(",", $params["newphrases"]);
$phrases = array_trim_and_unique($phrases);
foreach($phrases as $phrase) {
if(strlen($phrase) > 60) {
echo '{"msg":"Cannot add query because a phrase is too long."}';
throw new LengthException('A query phrase exceeds 60 chrs.');
return;
}
}
}

if (!table_id_exists($params["bin"])) {
echo '{"msg":"The bin ' . $params['bin'] . ' does not seem to exist"}';
return;
Expand All @@ -446,7 +476,6 @@ function modify_bin($params) {

if (array_key_exists('comments', $params) && $params['comments'] !== '') return modify_bin_comments($querybin_id, $params);

$type = $params['type'];
if (array_search($type, $captureroles) === false && ($type !== 'geotrack' || array_search('track', $captureroles) === false)) {
echo '{"msg":"This capturing type is not defined in the config file"}';
return;
Expand Down Expand Up @@ -785,9 +814,14 @@ function getBins() {
$querybins[$bin->id]->nrOfTweets = 0;
$sql = "SELECT count(id) AS count FROM " . $bin->name . "_tweets";
$res = $dbh->prepare($sql);
if ($res->execute() && $res->rowCount()) {
$result = $res->fetch();
$querybins[$bin->id]->nrOfTweets = $result['count'];
try {
if ($res->execute() && $res->rowCount()) {
$result = $res->fetch();
$querybins[$bin->id]->nrOfTweets = $result['count'];
}
} catch (PDOException $e) {
error_log("Error retrieving tweet info for bin '" . $bin->name . "': " . $e->getMessage());
unset($querybins[$bin->id]);
}
}
$dbh = false;
Expand Down
11 changes: 6 additions & 5 deletions capture/search/search.php
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@

queryManagerCreateBinFromExistingTables($bin_name, $querybin_id, $type, explode("OR", $keywords));

search($keywords);
while(search($keywords));
if ($tweetQueue->length() > 0) {
$tweetQueue->insertDB();
}
Expand All @@ -61,9 +61,10 @@

// TODO: see timeline.php for an improvement making it easier for users to start a bin immediatly after running a CLU script, and adept the method for this script

function search($keywords, $max_id = null) {
function search($keywords) {
global $twitter_keys, $current_key, $ratefree, $bin_name, $dbh, $tweetQueue;

static $max_id = null

$ratefree--;
if ($ratefree < 1 || $ratefree % 10 == 0) {
$keyinfo = getRESTKey($current_key, 'search', 'tweets');
Expand Down Expand Up @@ -122,12 +123,12 @@ function search($keywords, $max_id = null) {
return false;
}
sleep(1);
search($keywords, $max_id);
return true;
} else {
echo $tmhOAuth->response['response'] . "\n";
if ($tmhOAuth->response['response']['errors']['code'] == 130) { // over capacity
sleep(1);
search($keywords, $max_id);
return true;
}
}
}
Expand Down
12 changes: 10 additions & 2 deletions common/functions.php
Original file line number Diff line number Diff line change
Expand Up @@ -79,13 +79,21 @@ function controller_restart_roles($logtarget = "cli", $wait = false) {
* Validates a given list of keywords, as entered as a parameter in capture/search/search.php for example
*/
function validate_capture_phrases($keywords) {
$valid = true;
$illegal_chars = array( "\t", "\n", ";", "(", ")" );
foreach ($illegal_chars as $c) {
if (strpos($keywords, $c) !== FALSE) {
return FALSE;
$valid = false;
}
}
return TRUE;
foreach ((explode(' OR ', $keywords)) as $keyword) {
$keyword = trim($keyword);
$keyword = preg_replace('/\s+/', ' ', $keyword);
if (strlen($keyword) > 60) {
$valid = false;
}
}
return $valid;
}

/**
Expand Down