The Source app uses Google Cloud’s AI tools to give journalists instant access to an image's public history, so they can sort and analyze its provenance, including any manipulation. Source even goes a step further: helps detect and translate text within images. It was developed by Storyful in partnership with the Google News Initiative. More information on the project can be found here https://blog.google/around-the-globe/google-asia/new-tool-helping-asian-newsrooms-detect-fake-images/
In practical terms the application will allow a user to perform a reverse image search on an image uploaded by the user. The user may optionally crop, flip or rotate the image within the application prior to uploading it.
On completion of the reverse image search, the intention is that the application will display the results, which can then be filtered and/or sorted by the user by age. The intention is for the application to support the extraction of text found within an uploaded image and translation of same from a selection of supported languages.
This project is available for use under the Apache 2.0 license. There may be some fragmented references (logos and links) to Storyful within the code, which should be removed if you wish to build and host yourself.
Source is a Ruby on Rails app running on GKE.
It uses Redis as main data storage and Google Cloud Storage for the uploaded images.
It uses Sidekiq to perform asynchronously the full analysis of an uploaded image.
It is powered by the Google Vision and Google Translate APIs.
A simple diagram of the architecture: Google Slide
It shouldn't be necessary, but if you need, this is how to do it:
- Open Google Cloud Console
- Go to IAM & admin > Service accounts
- Click on the menu button related to the service account created for this project (called source)
- Click on create key, select JSON as type and click create
- This should download the file on your machine
SECRET_KEY_BASE="<rails generated key>"
GOOGLE_APPLICATION_CREDENTIALS=/root/.ssh/<name of the file>
MASTER_PASSWORD="<random string>"
BUCKET_NAME="<google cloud bucket name>"
GOOGLE_SHEET_PASSCODES_ID='<id of the google sheet containing passcodes>'
GOOGLE_SHEET_WHITELIST_URLS_ID='<id of the google sheet containing whitelisted urls>'
REDIS_HOST=redis
REDIS_PORT=6379
-
Duplicate docker-compose.yml and rename to docker-compose-mac.yml
-
Change the line in app -> volumes from
- /usr/bin/docker:/usr/bin/docker
to
- /usr/local/bin/docker:/usr/bin/docker
-
When running docker-compose use
docker-compose -f docker-compose-mac.yml
docker-compose build
docker-compose up
docker-compose -f docker-compose.rspec.test.yml run -e 'RAILS_ENV=test' app rake db:create db:migrate
docker-compose -f docker-compose.rspec.test.yml run -e 'RAILS_ENV=test' app guard
Google CloudBuild has been configured for this repository to enable automated builds and deployment.
The build script will build, test and push the nginx
and rails
images before finally deploying the new images to the target
environment.
Skip if already exists!
Note: The manifest templates documented below are also stored in the /kubernetes
directory
- Go to Google console and create a cluster.
- Select a region (region should be the same as the db instance)
- choose number of pods
- once the cluster is created, connect to the cluster and * create a file called credentials.json
- copy the content of the service account json file you should have on your machine
create config map for credentials
kubectl create configmap config --namespace credentials --from-file credentials.json
From the google console:
- select the workloads and click edit
- edit the yml file using the template below
---
apiVersion: apps/v1
kind: Deployment
metadata:
generation: 6
labels:
app: <name>
name: <name>
namespace: default
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: <name>
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: ~
labels:
app: <name>
spec:
containers:
-
name: sidekiq
image: <source app>
command: ['/bin/bash']
args: ['-c', 'bundle exec sidekiq']
-
name: source-nginx
image: <source nginx image tag>
env:
- name: BACKEND_HOST
value: localhost
- name: BACKEND_PORT
value: 3000
- name: LOG_FORMAT
value: text
- name: PROJECT_ID
value: <project ID>
-
name: <name>
image: <gcr image name>
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 2000m
memory: 2000Mi
requests:
cpu: 1500m
memory: 1000Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
-
mountPath: /secrets/cloudsql
name: cloudsql-instance-credentials
readOnly: true
env:
- name: PROJECT_ID
value: <project id>
- name: BUCKET_NAME
value: <bucket name>
- name: BUCKET_FOLDER
value: <bucket folder>
- name: GOOGLE_SHEET_PASSCODES_ID
value: <google sheet id>
- name: GOOGLE_SHEET_WHITELIST_URLS_ID
value: <google sheet id>
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /secrets/cloudsql/credentials.json
- name: RAILS_SERVE_STATIC_FILES
value: true
- name: GET_HOSTS_FROM
value: dns
- name: REDIS_HOST
value: <ip redis master>
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
-
name: cloudsql-instance-credentials
secret:
secretName: cloudsql-instance-credentials
- go to Google Cloud console
- select Kubernetes Engine > Workloads
- select the deployment
- in the section Exposing services click on the button 'expose'
- select type 'ClusterIP'
- edit port forwarding: port = 80, targetPort= 80
- after creating the service edit the yaml file following this template:
apiVersion: v1
kind: Service
metadata:
labels:
app: source
name: source-service
namespace: default
spec:
clusterIP: <cluster ip>
externalTrafficPolicy: Cluster
ports:
port: 80
protocol: TCP
targetPort: 80
selector:
app: source-staging
sessionAffinity: None
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/auth-realm: Authentication Required # Only for staging
nginx.ingress.kubernetes.io/auth-secret: basic-auth # Only for staging
nginx.ingress.kubernetes.io/auth-type: basic # Only for staging
nginx.ingress.kubernetes.io/proxy-body-size: 50m
nginx.ingress.kubernetes.io/proxy-connect-timeout: "600"
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
nginx.org/client-max-body-size: 100m
name: <ingress name>
namespace: <namespace>
spec:
ingressClassName: nginx
rules:
- host: <subdomain>.organisation.com
http:
paths:
- backend:
service:
name: <service name>
port:
number: 80
path: /
pathType: ImplementationSpecific
Follow this guide: https://cloud.google.com/kubernetes-engine/docs/tutorials/guestbook
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "4"
generation: 6
labels:
app: redis
role: master
tier: backend
name: redis-master
namespace: default
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: redis
role: master
tier: backend
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: redis
role: master
tier: backend
spec:
containers:
- image: k8s.gcr.io/redis:e2e
imagePullPolicy: IfNotPresent
name: master
ports:
- containerPort: 6379
protocol: TCP
resources:
requests:
cpu: 100m
memory: 100Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: redis-data
subPath: redis-data
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- name: redis-data
persistentVolumeClaim:
claimName: redis-disk
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: redis-prod-disk
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi
From Google Cloud console:
- go to Kubernetes Engine > Clusters
- select the cluster (e.g. source) and click connect
- Run the following commands
# install helm
curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get > get_helm.sh
chmod +x get_helm.sh
./get_helm.sh
helm init
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
helm init --service-account tiller --upgrade
helm repo add nginx-stable https://helm.nginx.com/stable
helm repo update
#Latest version:
helm install controller nginx-stable/nginx-ingress
#Spesific version:
helm install controller nginx-stable/nginx-ingress --version <chart version number>
helm upgrade controller nginx/nginx-ingress --version <chart version number>
Refer https://artifacthub.io/packages/helm/nginx/nginx-ingress for chart version number.
Take note of the IP address of the nginx-controller service that you can obtain by running kubectl get services | grep nginx
In Route53 create a record set type A with the IP of the nginx controller as value
- login with glcoud auth login
- create a file called cors.json
[
{
"origin": ["http://exmaple.com"],
"responseHeader": ["Content-Type"],
"method": ["GET", "HEAD", "DELETE"],
"maxAgeSeconds": 3600
}
]
run
gsutil setcors cors.json gs://<bucket name>