-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathrss.xml
4433 lines (3695 loc) · 605 KB
/
rss.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="assets/xml/rss.xsl" media="all"?><rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>shisaa.be</title><link>http://shisaa.be/</link><description>A blog about Programming, Unix, Japan and Photography</description><atom:link href="http://shisaa.be/rss.xml" type="application/rss+xml" rel="self"></atom:link><language>en</language><lastBuildDate>Mon, 05 Jan 2015 12:42:50 GMT</lastBuildDate><generator>http://getnikola.com/</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Postgis and PostgreSQL in Action - Timezones</title><link>http://shisaa.be/postset/postgis-and-postgresql-in-action-timezones.html</link><dc:creator>Tim van der Linden</dc:creator><description><div><h3>Preface</h3>
<p>Recently, I was lucky to be part of an <em>awesome</em> project called the <a href="http://breakingboundariestour.com">Breaking Boundaries Tour</a>.</p>
<p>This project is about two brothers, Omar and Greg Colin, who take their Stella scooters to make a full round trip across the United States.
And, while they are at it, try to raise funding for <a href="http://surfershealing.org/">Surfer's Healing Folly Beach</a> - an organization that does great work enhancing the lives of children with autism through surfing .
To accommodate this trip, they wished to have a site where visitors could follow their trail <em>live</em>, as it happened.
A marker would travel across the map, with them, 24/7.</p>
<p>Furthermore, they needed the ability to jump off their scooters, snap a few pictures, edit a video, write some side info and push it on the net, for whole the world to see.
Immediately after they made their post, it had to appear on the exact spot they where at when snapping their moments of beauty.</p>
<p>To aid in the live tracking of their global position, they acquired a dedicated GPS tracking device which sends a latitude/longitude coordinate via a mobile data network every 5 minutes.</p>
<p>Now, this (short) post is not about how I build the entire application, but rather about how I used PostGIS and PostgreSQL for a rather peculiar matter: deducting timezone information.</p>
<p>For those who are interested though: the site is entirely build in Python using the Flask "micro framework" and, of course, PostgreSQL as the database.</p>
<h3>Timezone information?</h3>
<p>Yes. Time, dates, timezones: hairy worms in hairy cans which many developers hate to open, but have to sooner or later.</p>
<p>In the case of Breaking Boundaries Tour, we had one major occasion where we needed the correct timezone information: where did the post happen?</p>
<h3>Where did it happen?</h3>
<p>A feature we wanted to implement was one to help visitors get a better view of when a certain post was written.
To be able to see when a post was written in your local timezone is much more convenient then seeing the post time in some foreign zone.</p>
<p>We are lazy and do not wish to count back- or forward to figure out when a post popped up in our frame of time.</p>
<p>The reasoning is simple, always calculate all the times involved back to simple UTC (GMT). Then figure out the clients timezone using JavaScript, apply the time difference and done!</p>
<p>Simple eh?</p>
<p>Correct, except for one small detail in the feature request, in what zone was the post actually made?</p>
<p>Well...damn.</p>
<p>While you heart might be at the right place while thinking: "Simple, just look at the locale of the machine (laptop, mobile phone, ...) that was used to post!", this information if just too fragile. Remember, the bothers are <em>crossing</em> the USA, riding through at least three major timezones.
You can simply not expect all the devices involved when posting to always adjust their locale automatically depending on where they are.</p>
<p>We need a more robust solution. We need PostGIS.</p>
<p>But, how can a spatial database help us to figure out the timezone?</p>
<p>Well, thanks to the hard labor delivered to us by Eric Muller from <a href="http://efele.net">efele.net</a>, we have a <em>complete</em> and <em>maintained</em> shapefile of the entire world, containing polygons that represent the different timezones accompanied by the official timezone declarations.</p>
<p>This enables us to use the latitude and longitude information from the dedicated tracking device to pin point in which timezone they where while writing their post.</p>
<p>So let me take you on a short trip to show you how I used the above data in conjunction with PostGIS and PostgreSQL.</p>
<h3>Getting the data</h3>
<p>The first thing to do, obviously, is to download the shapefile data and load it in to our PostgreSQL database.
Navigate to the <a href="http://efele.net/maps/tz/world/">Timezone World</a> portion of the efele.net site and download the "tz_world" shapefile.</p>
<p>This will give you a zip which you can extract:</p>
<pre class="code literal-block"><span class="nv">$ </span>unzip tz_world.zip
</pre>
<p>Unzipping will create a directory called "world" in which you can find the needed shapefile package files.</p>
<p>Next you will need to make sure that your database is PostGIS ready. Connect to your desired database (let us call it <em>bar</em>) <em>as a superuser</em>:</p>
<pre class="code literal-block"><span class="nv">$ </span>psql -U postgres bar
</pre>
<p>And create the PostGIS extension:</p>
<pre class="code literal-block"><span class="k">CREATE</span> <span class="n">EXTENSION</span> <span class="n">postgis</span><span class="p">;</span>
</pre>
<p>Now go back to your terminal and load the shapefile into your database using the original owner of the database (here called <em>foo</em>):</p>
<pre class="code literal-block"><span class="nv">$ </span>shp2pgsql -S -s <span class="m">4326</span> -I tz_world <span class="p">|</span> psql -U foo bar
</pre>
<p>As you might remember from the PostGIS series, this loads in the geometry from the shapefile using only simple geometry (not "MULTI..." types) with a SRID of 4326.</p>
<h3>What have we got?</h3>
<p>This will take a couple of seconds and will create one table and two indexes. If you describe your database (assuming you have not made any tables yourself):</p>
<pre class="code literal-block">public <span class="p">|</span> geography_columns <span class="p">|</span> view <span class="p">|</span> postgres
public <span class="p">|</span> geometry_columns <span class="p">|</span> view <span class="p">|</span> postgres
public <span class="p">|</span> raster_columns <span class="p">|</span> view <span class="p">|</span> postgres
public <span class="p">|</span> raster_overviews <span class="p">|</span> view <span class="p">|</span> postgres
public <span class="p">|</span> spatial_ref_sys <span class="p">|</span> table <span class="p">|</span> postgres
public <span class="p">|</span> tz_world <span class="p">|</span> table <span class="p">|</span> foo
public <span class="p">|</span> tz_world_gid_seq <span class="p">|</span> sequence <span class="p">|</span> foo
</pre>
<p>You will see the standard PostGIS bookkeeping and you will find the <em>tz_world</em> table together with a <em>gid</em> sequence.</p>
<p>Let us describe the table:</p>
<pre class="code literal-block"><span class="err">\</span><span class="n">d</span> <span class="n">tz_world</span>
</pre>
<p>And get:</p>
<pre class="code literal-block">Column <span class="p">|</span> Type <span class="p">|</span> Modifiers
--------+------------------------+--------------------------------------------------------
gid <span class="p">|</span> integer <span class="p">|</span> not null default nextval<span class="o">(</span><span class="s1">'tz_world_gid_seq'</span>::regclass<span class="o">)</span>
tzid <span class="p">|</span> character varying<span class="o">(</span>30<span class="o">)</span> <span class="p">|</span>
geom <span class="p">|</span> geometry<span class="o">(</span>Polygon,4326<span class="o">)</span> <span class="p">|</span>
Indexes:
<span class="s2">"tz_world_pkey"</span> PRIMARY KEY, btree <span class="o">(</span>gid<span class="o">)</span>
<span class="s2">"tz_world_geom_gist"</span> gist <span class="o">(</span>geom<span class="o">)</span>
</pre>
<p>So we have:</p>
<ul><li><em>gid</em>: an arbitrary id column</li>
<li><em>tzid</em>: holding the standards compliant textual timezone identification</li>
<li><em>geom</em>: holding polygons in <em>SRID</em> 4326.</li>
</ul><p>Also notice we have two indexes made for us:</p>
<ul><li><em>tz_world_pkey</em>: a simple B-tree index on our gid</li>
<li><em>tz_world_geom_gist</em>: a GiST index on our geometry</li>
</ul><p>This is a rather nice set, would you not say?</p>
<h3>Using the data</h3>
<p>So how do we go about using this data?</p>
<p>As I have said above, we need to figure out in which polygon (timezone) a certain point resides.</p>
<p>Let us take an arbitrary point on the earth:</p>
<ul><li>latitude: 35.362852</li>
<li>longitude: 140.196131</li>
</ul><p>This is a spot in the Chiba prefecture, central Japan.</p>
<p>Using the <em>Simple Features functions</em> we have available in PostGIS, it is trivial to find out in which polygon a certain point resides:</p>
<pre class="code literal-block"><span class="k">SELECT</span> <span class="n">tzid</span>
<span class="k">FROM</span> <span class="n">tz_world</span>
<span class="k">WHERE</span> <span class="n">ST_Intersects</span><span class="p">(</span><span class="n">ST_GeomFromText</span><span class="p">(</span><span class="s1">'POINT(140.196131 35.362852)'</span><span class="p">,</span> <span class="mi">4326</span><span class="p">),</span> <span class="n">geom</span><span class="p">);</span>
</pre>
<p>And we get back:</p>
<pre class="code literal-block"> tzid
------------
Asia/Tokyo
</pre>
<p><em>Awesome!</em></p>
<p>In the above query I used the function <em>ST_Intersects</em> which checks if a given piece of geometry (our point) <em>shares any space</em> with another piece.
If we would check the execute plan of this query:</p>
<pre class="code literal-block"><span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">SELECT</span> <span class="n">tzid</span>
<span class="k">FROM</span> <span class="n">tz_world</span>
<span class="k">WHERE</span> <span class="n">ST_Intersects</span><span class="p">(</span><span class="n">ST_GeomFromText</span><span class="p">(</span><span class="s1">'POINT(140.196131 35.362852)'</span><span class="p">,</span> <span class="mi">4326</span><span class="p">),</span> <span class="n">geom</span><span class="p">);</span>
</pre>
<p>We get back:</p>
<pre class="code literal-block"> QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Index Scan using tz_world_geom_gist on tz_world <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.28..8.54 <span class="nv">rows</span><span class="o">=</span><span class="m">1</span> <span class="nv">width</span><span class="o">=</span>15<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>0.591..0.592 <span class="nv">rows</span><span class="o">=</span><span class="m">1</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
Index Cond: <span class="o">(</span><span class="s1">'0101000020E61000006BD784B446866140E3A430EF71AE4140'</span>::geometry <span class="o">&amp;&amp;</span> geom<span class="o">)</span>
Filter: _st_intersects<span class="o">(</span><span class="s1">'0101000020E61000006BD784B446866140E3A430EF71AE4140'</span>::geometry, geom<span class="o">)</span>
Total runtime: 0.617 ms
</pre>
<p>That is not bad at all, a runtime of little over 0.6 Milliseconds and it is using our GiST index.</p>
<p>But, if a lookup is using our GiST index, a small alarm bell should go off inside your head. Remember my last chapter on the PostGIS series?
I kept on babbling about index usage and how geometry functions or operators can only use GiST indexes when they perform <em>bounding box</em> calculations.</p>
<p>The latter might pose a problem in our case, for bounding boxes are a <em>very</em> rough approximations of the actual geometry.
This means that when we arrive near timezone borders, our calculations might just give us the wrong timezone.</p>
<p>So how can we fix this?</p>
<p>This time, we do not need to.</p>
<p>This is one of the few <em>blessed</em> functions that makes use of both an index <em>and</em> is very accurate.</p>
<p>The <em>ST_Intersects</em> first uses the index to perform bounding box calculations. This filters out the majority of available geometry.
Then it performs a more expensive, but more accurate calculation (on a small subset) to check if the given point is <em>really</em> inside the returned matches.</p>
<p>We can thus simply use this function without any more magic...life is simple!</p>
<h3>Implementation</h3>
<p>Now it is fair to say that we do not wish to perform this calculation every time a user views a post, that would not be very efficient nor smart.</p>
<p>Rather, it is a good idea to generate this information at post time, and save it for later use.</p>
<p>The way I have setup to save this information is twofold:</p>
<ul><li>I only save a UTC (GTM) generalized timestamp of when the post was made.</li>
<li>I made an extra column in my so-called "posts" table where I only save the string that represents the timezone (Asia/Tokyo in the above case).</li>
</ul><p>This keeps the date/time information in the database naive of any timezone and makes for easier calculations to give the time in either the clients timezone or in the timezone the post was originally written.
You simply have one "root" time which you can move around timezones.</p>
<p>On every insert of a new post I have created a trigger that fetches the timezone and inserts it into the designated column.
You could also fetch the timezone and update the post record using Python, but opting for an in-database solution saves you a few extra, unneeded round trips and is most likely a lot faster.</p>
<p>Let us see how we could create such a trigger.</p>
<p>A trigger in PostgreSQL is an event you can set to fire when certain conditions are met. The event(s) that fire have to be encapsulated inside a PostgreSQL function.
Let us thus first start by creating the function that will insert our timezone string.</p>
<h3>Creating functions</h3>
<p>In PostgreSQL you can write functions in either <em>C</em>, <em>Procedural</em> languages (PgSQL, Perl, Python) or plain <em>SQL</em>.</p>
<p>Creating functions with plain SQL is the most straightforward and most easy way. However, since we want to write a function that is to be used inside a trigger, we have even a better option.
We could employ the power of the embedded PostgreSQL procedural language to easily access and manipulate our newly insert data.</p>
<p>First, let us see which query we would use to fetch the timezone and update our post record:</p>
<pre class="code literal-block"><span class="k">UPDATE</span> <span class="n">posts</span>
<span class="k">SET</span> <span class="n">tzid</span> <span class="o">=</span> <span class="n">timezone</span><span class="p">.</span><span class="n">tzid</span>
<span class="k">FROM</span> <span class="p">(</span><span class="k">SELECT</span> <span class="n">tzid</span>
<span class="k">FROM</span> <span class="n">tz_world</span>
<span class="k">WHERE</span> <span class="n">ST_Intersects</span><span class="p">(</span>
<span class="n">ST_SetSRID</span><span class="p">(</span>
<span class="n">ST_MakePoint</span><span class="p">(</span><span class="mi">140</span><span class="p">.</span><span class="mi">196131</span><span class="p">,</span> <span class="mi">35</span><span class="p">.</span><span class="mi">362852</span><span class="p">),</span>
<span class="mi">4326</span><span class="p">),</span>
<span class="n">geom</span><span class="p">))</span> <span class="k">AS</span> <span class="n">timezone</span>
<span class="k">WHERE</span> <span class="n">pid</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
</pre>
<p>This query will fetch the timezone string using a subquery and then update the correct record (a post with "pid" 1 in this example).</p>
<p>How do we pour this into a function?</p>
<pre class="code literal-block"><span class="k">CREATE</span> <span class="k">OR</span> <span class="k">REPLACE</span> <span class="k">FUNCTION</span> <span class="n">set_timezone</span><span class="p">()</span> <span class="k">RETURNS</span> <span class="k">TRIGGER</span> <span class="k">AS</span> <span class="err">$$</span>
<span class="k">BEGIN</span>
<span class="k">UPDATE</span> <span class="n">posts</span>
<span class="k">SET</span> <span class="n">tzid</span> <span class="o">=</span> <span class="n">timezone</span><span class="p">.</span><span class="n">tzid</span>
<span class="k">FROM</span> <span class="p">(</span><span class="k">SELECT</span> <span class="n">tzid</span>
<span class="k">FROM</span> <span class="n">tz_world</span>
<span class="k">WHERE</span> <span class="n">ST_Intersects</span><span class="p">(</span>
<span class="n">ST_SetSRID</span><span class="p">(</span>
<span class="n">ST_MakePoint</span><span class="p">(</span><span class="k">NEW</span><span class="p">.</span><span class="n">longitude</span><span class="p">,</span> <span class="k">NEW</span><span class="p">.</span><span class="n">latitude</span><span class="p">),</span>
<span class="mi">4326</span><span class="p">),</span>
<span class="n">geom</span><span class="p">))</span> <span class="k">AS</span> <span class="n">timezone</span>
<span class="k">WHERE</span> <span class="n">pid</span> <span class="o">=</span> <span class="k">NEW</span><span class="p">.</span><span class="n">pid</span><span class="p">;</span>
<span class="k">RETURN</span> <span class="k">NEW</span><span class="p">;</span>
<span class="k">END</span> <span class="err">$$</span>
<span class="k">LANGUAGE</span> <span class="n">PLPGSQL</span> <span class="k">IMMUTABLE</span><span class="p">;</span>
</pre>
<p>First we use the syntax <em>CREATE OR REPLACE FUNCTION</em> to indicate we want to create (or replace) a custom function.
Then we tell PostgreSQL that this function will return type <em>TRIGGER</em>.</p>
<p>You might notice that we do not give this function any arguments. The reasoning here is that this function is "special".
Functions which are used as triggers magically get information about the inserted data available.</p>
<p>Inside the function you can see we access our latitude and longitude prefixed with <em>NEW</em>. These keywords, <em>NEW</em> and <em>OLD</em>, refer to the <em>record</em> after and before the trigger(s) happened.
In our case we could have used both, since we do not alter the latitude or longitude data, we simply fill a column that is NULL by default.
There are more keywords available (<em>TG_NAME</em>, <em>TG_RELID</em>, <em>TG_NARGS</em>, ...) which refer to properties of the trigger itself, but that is beyond today's scope.</p>
<p>The actual SQL statement is wrapped between double dollar signs (<em>$$</em>). This is called <em>dollar quoting</em> and is the preferred way to quote your SQL string (as opposed to using single quotes).
The body of the function, which in our case is mostly the SQL statement, is surrounded with a <em>BEGIN</em> and <em>END</em> keyword.</p>
<p>A trigger function always needs a <em>RETURN</em> statement that is used to provide the data for the updated record. This too has to reside in the body of the function.</p>
<p>Near the end of our function we need to declare in which language this function was written, in our case <em>PLPGSQL</em>.</p>
<p>Finally, the <em>IMMUTABLE</em> keyword tells PostgreSQL that this function is rather "functional", meaning: if the inputs are the same, the output will also, <em>always</em> be the same.
Using this <em>caching</em> keyword gives our famous PostgreSQL planner the ability to make decisions based on this knowledge.</p>
<h3>Creating triggers</h3>
<p>Now that we have this functionality wrapped into a tiny PLPGSQL function, we can go ahead and create the trigger.</p>
<p>First you have the event on which a trigger can execute, these are:</p>
<ul><li>INSERT</li>
<li>UPDATE</li>
<li>DELETE</li>
<li>TRUNCATE</li>
</ul><p>Next, for each event you can specify at what timing your trigger has to fire:</p>
<ul><li>BEFORE</li>
<li>AFTER</li>
<li>INSTEAD OF</li>
</ul><p>The last one is a special timing by which you can replace the default behavior of the mentioned events.</p>
<p>For our use case, we are interested in executing our function <em>AFTER INSERT</em>.</p>
<pre class="code literal-block"><span class="k">CREATE</span> <span class="k">TRIGGER</span> <span class="n">set_timezone</span>
<span class="k">AFTER</span> <span class="k">INSERT</span> <span class="k">ON</span> <span class="n">posts</span>
<span class="k">FOR</span> <span class="k">EACH</span> <span class="k">ROW</span>
<span class="k">EXECUTE</span> <span class="k">PROCEDURE</span> <span class="n">set_timezone</span><span class="p">();</span>
</pre>
<p>This will setup the trigger that fires after the insert of a new record.</p>
<h3>Wrapping it up</h3>
<p>Good, that all there is to it.</p>
<p>We use a query, wrapped in a function, triggered by an insert event to inject the official timezone string which is deducted by PostGIS's spatial abilities.</p>
<p>Now you can use this information to get the exact timezone of where the post was made and use this to present the surfing client both the post timezone time and their local time.</p>
<p>For the curious ones out there: I used the <a href="http://momentjs.com/%20MomentJS%20JavaScript%20library">MomentJS</a> library for the client side time parsing. This library offers a timezone extension which accepts these official timezone strings to calculate offsets. A lifesaver, so go check it out.</p>
<p>Also, be sure to follow the bros while they scooter across the States!</p>
<p>And as always...thanks for reading!</p></div></description><category>postgis</category><category>postgresql</category><category>timezone</category><guid>http://shisaa.be/postset/postgis-and-postgresql-in-action-timezones.html</guid><pubDate>Wed, 20 Aug 2014 10:00:00 GMT</pubDate></item><item><title>Postgis, PostgreSQL's spatial partner - Part 3</title><link>http://shisaa.be/postset/postgis-postgresqls-spatial-partner-part-3.html</link><dc:creator>Tim van der Linden</dc:creator><description><div><p>You have arrived at the final chapter of this PostGIS introduction series. Before continuing, I recommend you read <a href="http://shisaa.be/postset/postgis-postgresqls-spatial-partner-part-1.html" title="Part one of this series.">chapter one</a> and <a href="http://shisaa.be/postset/postgis-postgresqls-spatial-partner-part-2.html" title="Part one of this series.">chapter two</a> first.</p>
<p>In the last chapter we finished by doing some real world distance measuring and we saw how different projections pushed forward different results.</p>
<p>Today I would like to take this practical approach a bit further and continue our work with real world data by showing you around the town of Kin in Okinawa. The town where I live.</p>
<h3>A word before we start</h3>
<p>In this chapter I want to do a few experiments together with you on real world data.
To gather this data, I would like to use OpenStreetMap because it is not only <em>open</em> but also gives us handy tools to export map information.</p>
<p>We will use a tool called <em>osm2pgsql</em> to load our OSM data into PostGIS enable tables.</p>
<p>However, it is more common to import and export real world GIS data by using the semi-closed ESRI standard <em>shapefile</em> format.
OpenStreetMap does not support exporting to this shapefile format directly, but exports to a more open XML file (.osm) instead.</p>
<p>Therefor, near the end of this post, we will briefly cover these shapefiles as well and see how we could import them into our PostgreSQL database.
But for the majority of our work today, I will focus on the OpenStreetMap approach.</p>
<h3>The preparation</h3>
<p>Let us commence with this adventure by first getting all the GIS data related to the whole of Okinawa.
We will only be interested in the data related to Kin town, but I need you to pull in a data set that is large enough (but still tiny in PostgreSQL terms) for us to experiment with indexing.</p>
<p>Hop online and download the file being served at the following URL: <a href="http://overpass-api.de/api/map?bbox=126.079,25.596,130.852,28.898">openstreetmap.org Okinawa island</a>
It is a file of roughly 180 Mb and covers most of the Okinawan main island. Save the presented "map" file.</p>
<p>Next we will need to install a third party tool which is specifically designed to import this OSM file into PostGIS.
This tool is called <em>osm2pgsql</em> and is available in many Linux distributions.</p>
<p>On a Debian system:</p>
<pre class="code literal-block">apt-get install osm2pgsql
</pre>
<h3>Loading foreign data</h3>
<p>Now we are ready to load in this data. But first, let us clean our "gis" database we used before.</p>
<p>Since all these import tools will create their own PostGIS enabled tables, we can delete our "shapes" table. Connect to your "gis" database and drop this table:</p>
<pre class="code literal-block"><span class="k">DROP</span> <span class="k">TABLE</span> <span class="n">shapes</span><span class="p">;</span>
</pre>
<p>Using this new tool, repopulate the "gis" database with the data you just downloaded:</p>
<pre class="code literal-block">osm2pgsql -s -U postgres -d gis map
</pre>
<p>If everything went okay, you will get a small report containing the information about all the tables <em>and</em> indexes that where created.</p>
<p>Let us see what we just did. </p>
<p>First we ran <em>osm2pgsql</em> with the <em>-s</em> flag. This flag enabled <em>slim</em> mode, which means it will use a database on disk, rather then processing all the GIS data in RAM.
The latter does not only potentially slow down your machine for larger data sets, but it enables less features to be available.</p>
<p>Next we tell the tool to connect as the user "postgres" and load the data into the "gis" database. The final argument is the "map" file you just downloaded.</p>
<h3>What do we have now?</h3>
<p>Open up a database console and let us describe our database to see what this tool just did:</p>
<pre class="code literal-block"><span class="err">\</span><span class="n">d</span>
</pre>
<p>As you can see, it inserted 7 new tables:</p>
<pre class="code literal-block">Schema <span class="p">|</span> Name <span class="p">|</span> Type <span class="p">|</span> Owner
--------+--------------------+-------+----------
public <span class="p">|</span> geography_columns <span class="p">|</span> view <span class="p">|</span> postgres
public <span class="p">|</span> geometry_columns <span class="p">|</span> view <span class="p">|</span> postgres
public <span class="p">|</span> planet_osm_line <span class="p">|</span> table <span class="p">|</span> postgres
public <span class="p">|</span> planet_osm_nodes <span class="p">|</span> table <span class="p">|</span> postgres
public <span class="p">|</span> planet_osm_point <span class="p">|</span> table <span class="p">|</span> postgres
public <span class="p">|</span> planet_osm_polygon <span class="p">|</span> table <span class="p">|</span> postgres
public <span class="p">|</span> planet_osm_rels <span class="p">|</span> table <span class="p">|</span> postgres
public <span class="p">|</span> planet_osm_roads <span class="p">|</span> table <span class="p">|</span> postgres
public <span class="p">|</span> planet_osm_ways <span class="p">|</span> table <span class="p">|</span> postgres
public <span class="p">|</span> raster_columns <span class="p">|</span> view <span class="p">|</span> postgres
public <span class="p">|</span> raster_overviews <span class="p">|</span> view <span class="p">|</span> postgres
public <span class="p">|</span> spatial_ref_sys <span class="p">|</span> table <span class="p">|</span> postgres
</pre>
<p>The other 5 views and tables are the good old PostGIS bookkeeping.</p>
<p>It is also important, yet less relevant for our work here today, to know that these tables, or rather the way <em>osm2pgsql</em> imports, is optimized to work with <em>Mapnik</em>.
Mapnik is an open-source map rendering software package used for both web and offline usage.</p>
<p>The tables that are imported contain many different types of information. Let me quickly go over them to give you a basic feeling of how the import happened:</p>
<ul><li>planet_osm_line: holds all non-closed pieces of geometry (called <em>ways</em>) at a high resolution. They mostly represent actual roads and are used when looking at a small, zoomed-in detail of a map.</li>
<li>planet_osm_nodes: an intermediate table that holds the raw point data (points in lat/long) with a corresponding "osm_id" to map them to other tables</li>
<li>planet_osm_point: holds all points-of-interest together with their OSM tags - tags that describe what they represent</li>
<li>planet_osm_polygon: holds all closed piece of geometry (also called <em>ways</em>) like buildings, parks, lakes, areas, ...</li>
<li>planet_osm_rels: an intermediate table that holds extra connecting information about polygons</li>
<li>planet_osm_roads: holds lower resolution, non-closed piece of geometry in contrast with "planet_osm_line". This data is used when looking at a greater distance, covering much area and thus not much detail about smaller, local roads.</li>
<li>planet_osm_ways: an intermediate table which holds non-closed geometry in raw format</li>
</ul><p>We will now continue working with a small subset of this data.</p>
<p>Let us take a peek at the Polygons tables for example. First, let us see what we have available:</p>
<pre class="code literal-block"><span class="err">\</span><span class="n">d</span> <span class="n">planet_osm_polygon</span>
</pre>
<p>That is quite a big list, but the major part of these columns are of mere TEXT type and contain human information about the geometry stored.
These columns corresponds with the way OpenStreetMap categorizes their data and with the way you could use the Mapnik software described above.</p>
<p>Let us do a targeted query:</p>
<pre class="code literal-block"><span class="k">SELECT</span> <span class="n">name</span><span class="p">,</span> <span class="n">building</span><span class="p">,</span> <span class="n">ST_AsText</span><span class="p">(</span><span class="n">way</span><span class="p">)</span>
<span class="k">FROM</span> <span class="n">planet_osm_polygon</span>
<span class="k">WHERE</span> <span class="n">building</span> <span class="o">=</span> <span class="s1">'industrial'</span><span class="p">;</span>
</pre>
<p>Notice that I use the <em>output</em> function <em>ST_AsText()</em> to convert to a human readable WKT string.
Also, I am only interested in some of the industrial buildings, so I set the building type to <em>industrial</em>.</p>
<p>The result:</p>
<pre class="code literal-block"> name <span class="p">|</span> building <span class="p">|</span> st_astext
----------------------------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------
沖ハム <span class="o">(</span>Okiham<span class="o">)</span> <span class="p">|</span> industrial <span class="p">|</span> POLYGON<span class="o">((</span>14221927.83 3049797.01,14222009.77 3049839.68,14222074.84 3049714.68,14222028.9 3049690.76,14221996.33 3049753.33,14221960.34 3049734.58,14221927.83 3049797.01<span class="o">))</span>
Kin Thermal Power Plant Coal storage building <span class="p">|</span> industrial <span class="p">|</span> POLYGON<span class="o">((</span>14239931.42 3054117.72,14239990.49 3054224.25,14240230.15 3054091.38,14240171.08 3053984.84,14239931.42 3054117.72<span class="o">))</span>
Kin Thermal Power Plant Exhaust tower <span class="p">|</span> industrial <span class="p">|</span> POLYGON<span class="o">((</span>14240167.1 3054497.14,14240172.26 3054507.93,14240176.04 3054515.82,14240195.76 3054506.39,14240186.84 3054487.7,14240167.1 3054497.14<span class="o">))</span>
</pre>
<p>We get back three records containing one industrial building each, described with a closed polygon. Cool.</p>
<p>Now, I can assure you that Okinawa has more then three industrial buildings, but do remember that we are looking at a rather rural island.
OpenStreetMap relies greatly on user generated content and there simply are not many users who have felt the need to index the industrial buildings here in this neck of the woods.</p>
<p>The <em>planet_osm_polygon</em> table does contain little over 6000 buildings of various types, which is still a small number, but for our purpose today I am only interested in the latter two, which both lie here in Kin town.</p>
<p>Also, if you would, for example, take a chunk of Tokyo, where there are hundreds of active OpenStreetMap contributors, you will find that many buildings are present and are sometimes even more accurately represented then some other online proprietary mapping solutions offered by some famous search engines. Ahum.</p>
<p>Before continuing, though, I would like to delete two GiST indexes that "osm2pgsql" made for us, purely to be able to demonstrate the importance of an index.</p>
<p>For now, just take my word and delete the indexes on all the geometry columns of the tables we will use today:</p>
<pre class="code literal-block"><span class="k">DROP</span> <span class="k">INDEX</span> <span class="n">planet_osm_line_index</span><span class="p">;</span>
<span class="k">DROP</span> <span class="k">INDEX</span> <span class="n">planet_osm_polygon_index</span><span class="p">;</span>
</pre>
<p>Then perform a VACUUM:</p>
<pre class="code literal-block"><span class="k">VACUUM</span> <span class="k">ANALYZE</span> <span class="n">planet_osm_line</span><span class="p">;</span>
<span class="k">VACUUM</span> <span class="k">ANALYZE</span> <span class="n">planet_osm_polygon</span><span class="p">;</span>
</pre>
<p><em>VACUUM</em> together with <em>ANALYZE</em> will force PostgreSQL to recheck the whole table for any changed conditions, as is the case since we removed the index.</p>
<p>The first thing I would like to find out is how large these building actually are.
We cannot measure how tall they are, for we are working with two dimensional data here, but we can measure their footprint on the map.</p>
<p>Since PostGIS makes all of our work easy, we could simply employ a function to tell us this information:</p>
<pre class="code literal-block"><span class="k">SELECT</span> <span class="n">ST_Area</span><span class="p">(</span><span class="n">way</span><span class="p">)</span>
<span class="k">FROM</span> <span class="n">planet_osm_polygon</span>
<span class="k">WHERE</span> <span class="n">building</span> <span class="o">=</span> <span class="s1">'industrial'</span><span class="p">;</span>
</pre>
<p>And get back:</p>
<pre class="code literal-block"> st_area
------------------
10155.3935499731
33381.1043500491
452.9464999972
</pre>
<p>As we know from the previous chapter, to be able to know what these numbers mean, we have to find out in which SRID this data was saved.
You could either describe the table again and look at the geometry column description, or use an <em>accessor</em> function <em>ST_SRID()</em>, to find it:</p>
<pre class="code literal-block"><span class="k">SELECT</span> <span class="n">ST_SRID</span><span class="p">(</span><span class="n">way</span><span class="p">)</span>
<span class="k">FROM</span> <span class="n">planet_osm_polygon</span><span class="p">;</span>
<span class="k">WHERE</span> <span class="n">building</span> <span class="o">=</span> <span class="s1">'industrial'</span><span class="p">;</span>
</pre>
<p>And get back:</p>
<pre class="code literal-block"> <span class="n">st_srid</span>
<span class="c1">---------</span>
<span class="mi">900913</span>
<span class="mi">900913</span>
<span class="mi">900913</span>
</pre>
<p>You could also query the PostGIS bookkeeping directly and look in the <em>geometry_columns</em> view:</p>
<pre class="code literal-block"><span class="k">SELECT</span> <span class="n">f_tablename</span><span class="p">,</span> <span class="n">f_geometry_column</span><span class="p">,</span> <span class="n">coord_dimension</span><span class="p">,</span> <span class="n">srid</span><span class="p">,</span> <span class="k">type</span>
<span class="k">FROM</span> <span class="n">geometry_columns</span><span class="p">;</span>
</pre>
<p>This view holds information about all the geometry columns in our PostGIS enabled database.
Our above query will return a list containing all the GIS describing information we saw in the previous chapter.</p>
<p>Nice. Both our buildings are stored in a geometry column and have an SRID of <em>900913</em>. We can now use our <em>spatial_ref_sys</em> table to look up this ID:</p>
<pre class="code literal-block"><span class="k">SELECT</span> <span class="n">srid</span><span class="p">,</span> <span class="n">auth_name</span><span class="p">,</span> <span class="n">auth_srid</span><span class="p">,</span> <span class="n">srtext</span><span class="p">,</span> <span class="n">proj4text</span>
<span class="k">FROM</span> <span class="n">spatial_ref_sys</span>
<span class="k">WHERE</span> <span class="n">srid</span> <span class="o">=</span> <span class="mi">900913</span><span class="p">;</span>
</pre>
<p>As you can see, this is basically a Mercator projection used by OpenStreetMap.
In the "proj4text" column we can see that its units are meters.</p>
<p>This thus means that the information we get back is in <em>square Meters</em>.</p>
<p>In this map (only looking at the latter two Kin buildings) we thus have a building with a total area of 33 <em>square Kilometers</em> and a more modest building of around 452 <em>square Meters</em>.
The former is a coal storage facility belonging to the <em>Kin Thermal Power Plant</em> and is indeed <em>huge</em>.
The second building represents the exhaust tower of that same plant.</p>
<p>You have just measured the area these buildings occupy, very neat right?</p>
<p>Now, let us find out which road runs next to this power plant, just in case we wish to drive to there.
It is important to note that OSM (and many other mapping solutions) divide roads into different types.</p>
<p>You have trunk roads, highways, secondary roads, tertiary roads, etc.
I am now interested to find the nearest <em>secondary</em> road.</p>
<p>To get a list of all the secondary roads in Okinawa, simply query the <em>planet_osm_roads</em> table:</p>
<pre class="code literal-block"><span class="k">SELECT</span> <span class="k">ref</span><span class="p">,</span> <span class="n">ST_AsText</span><span class="p">(</span><span class="n">way</span><span class="p">)</span>
<span class="k">FROM</span> <span class="n">planet_osm_line</span>
<span class="k">WHERE</span> <span class="n">highway</span> <span class="o">=</span> <span class="s1">'secondary'</span><span class="p">;</span>
</pre>
<p>We now get back all the linestring objects together with their reference inside of OSM.
The reference refers to the actual route number each road has.</p>
<p>The total count should be around <em>3215</em> pieces of geometry, which is already a nice list to work with.</p>
<p>Let us now see which of these roads is closest to our coal storage building.</p>
<p>To find out how far something is (nearest neighbor search) we could use our <em>ST_Distance()</em> function we used in the previous chapter and perform the following lookup:</p>
<pre class="code literal-block"><span class="k">SELECT</span> <span class="n">road</span><span class="p">.</span><span class="n">highway</span><span class="p">,</span> <span class="n">road</span><span class="p">.</span><span class="k">ref</span><span class="p">,</span> <span class="n">ST_Distance</span><span class="p">(</span><span class="n">road</span><span class="p">.</span><span class="n">way</span><span class="p">,</span> <span class="n">building</span><span class="p">.</span><span class="n">way</span><span class="p">)</span> <span class="k">AS</span> <span class="n">distance</span>
<span class="k">FROM</span> <span class="n">planet_osm_polygon</span> <span class="k">AS</span> <span class="n">building</span><span class="p">,</span> <span class="n">planet_osm_line</span> <span class="k">AS</span> <span class="n">road</span>
<span class="k">WHERE</span> <span class="n">road</span><span class="p">.</span><span class="n">highway</span> <span class="o">=</span> <span class="s1">'secondary'</span> <span class="k">AND</span> <span class="n">building</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="s1">'Kin Thermal Power Plant Coal storage building'</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">distance</span><span class="p">;</span>
</pre>
<p>This will bring us:</p>
<pre class="code literal-block"> highway <span class="p">|</span> ref <span class="p">|</span> distance
-----------+-----+------------------
secondary <span class="p">|</span> <span class="m">329</span> <span class="p">|</span> 417.374986575458
secondary <span class="p">|</span> <span class="m">104</span> <span class="p">|</span> 2258.90394593648
secondary <span class="p">|</span> <span class="m">104</span> <span class="p">|</span> 2709.00178089638
secondary <span class="p">|</span> <span class="m">104</span> <span class="p">|</span> 2745.76782385198
secondary <span class="p">|</span> <span class="m">234</span> <span class="p">|</span> 5897.78205314507
...
</pre>
<p>Cool, secondary route 329 is the closest to our coal storage building with a distance of <em>417 meters.</em></p>
<p>While this will return quite accurate results, there is one problem with this query. Indexes are not being used.
And every time an index is potentially left alone, you should start to worry, especially with larger data sets.</p>
<p>How do I know they are ignored? Simple, we did not make any indexes (and we deleted the ones made by "osm2pgsql")...which makes me pretty sure we cannot use them.</p>
<p>I refer you to <a href="http://shisaa.be/postset/postgresql-full-text-search-part-3.html">chapter three</a> of my PostgreSQL Full Text series where I talk a bit more about GiST and B-Tree index types.
And, as I also say in that chapter, I highly recommend reading Markus Winand's <a href="http://use-the-index-luke.com/" title="Use The Index, Luke series written by Markus Winand.">Use The Index, Luke</a> series, which explains in great detail how database indexes work.</p>
<p>The first thing to realize is that an index will only be used if the data set on which it is build is of sufficient size.
PostgreSQL has an AI build in, called the <em>query planner</em>, which will make a decision on whether or not to use an index.</p>
<p>If your data set is small enough a more traditional <em>Sequential Scan</em> will be faster or equal.</p>
<p>To know what is going on <em>exactly</em> and to know <em>how fast</em> our query runs, we have the <em>EXPLAIN</em> command at our disposal.</p>
<h3>Speeding things up</h3>
<p>Let us <em>EXPLAIN</em> the query we have just run:</p>
<pre class="code literal-block"><span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">SELECT</span> <span class="n">road</span><span class="p">.</span><span class="n">highway</span><span class="p">,</span> <span class="n">road</span><span class="p">.</span><span class="k">ref</span><span class="p">,</span> <span class="n">ST_Distance</span><span class="p">(</span><span class="n">road</span><span class="p">.</span><span class="n">way</span><span class="p">,</span> <span class="n">building</span><span class="p">.</span><span class="n">way</span><span class="p">)</span> <span class="k">AS</span> <span class="n">distance</span>
<span class="k">FROM</span> <span class="n">planet_osm_polygon</span> <span class="k">AS</span> <span class="n">building</span><span class="p">,</span> <span class="n">planet_osm_line</span> <span class="k">AS</span> <span class="n">road</span>
<span class="k">WHERE</span> <span class="n">road</span><span class="p">.</span><span class="n">highway</span> <span class="o">=</span> <span class="s1">'secondary'</span> <span class="k">AND</span> <span class="n">building</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="s1">'Kin Thermal Power Plant Coal storage building'</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">distance</span><span class="p">;</span>
</pre>
<p>We simply put the keyword <em>EXPLAIN</em> (and <em>ANALYZE</em> to give us total runtime) right in front of our normal query.</p>
<p>The result:</p>
<pre class="code literal-block">Sort <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>5047.50..5055.32 <span class="nv">rows</span><span class="o">=</span><span class="m">3129</span> <span class="nv">width</span><span class="o">=</span>391<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>41.481..41.815 <span class="nv">rows</span><span class="o">=</span><span class="m">3215</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
Sort Key: <span class="o">(</span>st_distance<span class="o">(</span>road.way, building.way<span class="o">))</span>
Sort Method: quicksort Memory: 348kB
-&gt; Nested Loop <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..4309.34 <span class="nv">rows</span><span class="o">=</span><span class="m">3129</span> <span class="nv">width</span><span class="o">=</span>391<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>1.188..38.617 <span class="nv">rows</span><span class="o">=</span><span class="m">3215</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
-&gt; Seq Scan on planet_osm_polygon building <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..279.01 <span class="nv">rows</span><span class="o">=</span><span class="m">1</span> <span class="nv">width</span><span class="o">=</span>207<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>0.981..1.409 <span class="nv">rows</span><span class="o">=</span><span class="m">1</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
Filter: <span class="o">(</span><span class="nv">name</span> <span class="o">=</span> <span class="s1">'Kin Thermal Power Plant Coal storage building'</span>::text<span class="o">)</span>
Rows Removed by Filter: 6320
-&gt; Seq Scan on planet_osm_line road <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..3216.79 <span class="nv">rows</span><span class="o">=</span><span class="m">3129</span> <span class="nv">width</span><span class="o">=</span>184<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>0.166..26.524 <span class="nv">rows</span><span class="o">=</span><span class="m">3215</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
Filter: <span class="o">(</span><span class="nv">highway</span> <span class="o">=</span> <span class="s1">'secondary'</span>::text<span class="o">)</span>
Rows Removed by Filter: 73488
Total runtime: 42.153 ms
</pre>
<p>That is a lot of output, but it shows you how the internal planner executes our query and which decisions it makes along the way.</p>
<p>To fully interpret a query plan (this is still a simple one), a lot more knowledge is needed and this would easily deserve its own <em>series</em>.
I am by far not an expert in the query planner (though it is an interesting study topic), but I will do my best to extract the important bits we need for our direct performance tuning.</p>
<p>A query plan is always made up out of nested nodes, the parent node containing all the accumulated information (costs, rows, ...) of its child nodes.</p>
<p>Inside the nested loop parent node we see above, we can find that the planner decided to use two filters, which correspond to the <em>WHERE</em> clause conditions of our query (building.name and road.highway).
You can see that both child nodes are of <em>Seq Scan</em> type, which means <em>Sequential Scan</em>. These types of nodes scan the whole table, simply from top to bottom, directly from disk.</p>
<p>Another important thing to note is the total time this query costs, which is <em>42.153 ms</em>.
The time reported here is the time on my local machine, depending on how decent your computer is, this time could vary.</p>
<p>A detail not to forget when looking at this timing, is the fact that it is slightly skewed if compared to real-world application use:</p>
<ul><li>We neglect network/client traffic. This query now runs internally and does not need to communicate with a client driver (which almost always brings extra overhead)</li>
<li>The time measurement itself also introduces overhead.</li>
</ul><p>The total runtime from our above plan does not sound as a big number, but we are working with a rather small data set - the area of Okinawa is large, but the geometry is rather sparse.</p>
<p>So our first reaction should be: this can be better.</p>
<p>First, let us try to get rid of these sequential scans, for they are a clear indication that the planner does not use an index.</p>
<h4>Creating indexes</h4>
<p>In our case we want to make two types of indexes:</p>
<ul><li>Indexes on our "meta" data, the names and other attributes describing out geometrical data</li>
<li>Indexes that actually index our geometrical data itself</li>
</ul><p>Let us start with our attributes columns.</p>
<p>These are all simple VARCHAR, TEXT or INT columns, so the good old Balanced Tree or <em>B-Tree</em> can be used here.
In our query above we use "road.highway" and "building.name" in our lookup, so let us make a couple of indexes that adhere to this query.
Remember, an index only makes sense if it is built the same way your queries question your data.</p>
<p>First, the "highway" column of the "planet_osm_line" table:</p>
<pre class="code literal-block"><span class="k">CREATE</span> <span class="k">INDEX</span> <span class="n">planet_osm_line_highway_index</span> <span class="k">ON</span> <span class="n">planet_osm_line</span><span class="p">(</span><span class="n">highway</span><span class="p">);</span>
</pre>
<p>The syntax is trivial. You simply tell PostgreSQL to create an index, give it a name, and tell it on which column(s) of which table you want it to be built.
PostgreSQL will always default to the <em>B-Tree</em> index type.</p>
<p>Next, the name column:</p>
<pre class="code literal-block"><span class="k">CREATE</span> <span class="k">INDEX</span> <span class="n">planet_osm_polygon_name_index</span> <span class="k">ON</span> <span class="n">planet_osm_polygon</span><span class="p">(</span><span class="n">name</span><span class="p">);</span>
</pre>
<p>Now perform another <em>VACUUM ANALYZE</em> on both tables:</p>
<pre class="code literal-block"><span class="k">VACUUM</span> <span class="k">ANALYZE</span> <span class="n">planet_osm_line</span><span class="p">;</span>
<span class="k">VACUUM</span> <span class="k">ANALYZE</span> <span class="n">planet_osm_polygon</span><span class="p">;</span>
</pre>
<p>Let us run explain again on the exact same query:</p>
<pre class="code literal-block"><span class="n">Sort</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">4058</span><span class="p">.</span><span class="mi">73</span><span class="p">..</span><span class="mi">4066</span><span class="p">.</span><span class="mi">56</span> <span class="k">rows</span><span class="o">=</span><span class="mi">3129</span> <span class="n">width</span><span class="o">=</span><span class="mi">394</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="n">time</span><span class="o">=</span><span class="mi">20</span><span class="p">.</span><span class="mi">817</span><span class="p">..</span><span class="mi">21</span><span class="p">.</span><span class="mi">149</span> <span class="k">rows</span><span class="o">=</span><span class="mi">3215</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">Sort</span> <span class="k">Key</span><span class="p">:</span> <span class="p">(</span><span class="n">st_distance</span><span class="p">(</span><span class="n">road</span><span class="p">.</span><span class="n">way</span><span class="p">,</span> <span class="n">building</span><span class="p">.</span><span class="n">way</span><span class="p">))</span>
<span class="n">Sort</span> <span class="k">Method</span><span class="p">:</span> <span class="n">quicksort</span> <span class="n">Memory</span><span class="p">:</span> <span class="mi">348</span><span class="n">kB</span>
<span class="o">-&gt;</span> <span class="n">Nested</span> <span class="n">Loop</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">72</span><span class="p">.</span><span class="mi">95</span><span class="p">..</span><span class="mi">3310</span><span class="p">.</span><span class="mi">07</span> <span class="k">rows</span><span class="o">=</span><span class="mi">3129</span> <span class="n">width</span><span class="o">=</span><span class="mi">394</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="n">time</span><span class="o">=</span><span class="mi">1</span><span class="p">.</span><span class="mi">356</span><span class="p">..</span><span class="mi">17</span><span class="p">.</span><span class="mi">743</span> <span class="k">rows</span><span class="o">=</span><span class="mi">3215</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="o">-&gt;</span> <span class="k">Index</span> <span class="n">Scan</span> <span class="k">using</span> <span class="n">planet_osm_polygon_name_index</span> <span class="k">on</span> <span class="n">planet_osm_polygon</span> <span class="n">building</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">28</span><span class="p">..</span><span class="mi">8</span><span class="p">.</span><span class="mi">30</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">width</span><span class="o">=</span><span class="mi">207</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="n">time</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">054</span><span class="p">..</span><span class="mi">0</span><span class="p">.</span><span class="mi">056</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">name</span> <span class="o">=</span> <span class="s1">'Kin Thermal Power Plant Coal storage building'</span><span class="p">::</span><span class="nb">text</span><span class="p">)</span>
<span class="o">-&gt;</span> <span class="n">Bitmap</span> <span class="n">Heap</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">planet_osm_line</span> <span class="n">road</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">72</span><span class="p">.</span><span class="mi">67</span><span class="p">..</span><span class="mi">2488</span><span class="p">.</span><span class="mi">23</span> <span class="k">rows</span><span class="o">=</span><span class="mi">3129</span> <span class="n">width</span><span class="o">=</span><span class="mi">187</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="n">time</span><span class="o">=</span><span class="mi">1</span><span class="p">.</span><span class="mi">258</span><span class="p">..</span><span class="mi">4</span><span class="p">.</span><span class="mi">661</span> <span class="k">rows</span><span class="o">=</span><span class="mi">3215</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Recheck</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">highway</span> <span class="o">=</span> <span class="s1">'secondary'</span><span class="p">::</span><span class="nb">text</span><span class="p">)</span>
<span class="o">-&gt;</span> <span class="n">Bitmap</span> <span class="k">Index</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">planet_osm_line_highway_index</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">71</span><span class="p">.</span><span class="mi">89</span> <span class="k">rows</span><span class="o">=</span><span class="mi">3129</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="n">time</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">864</span><span class="p">..</span><span class="mi">0</span><span class="p">.</span><span class="mi">864</span> <span class="k">rows</span><span class="o">=</span><span class="mi">3215</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">highway</span> <span class="o">=</span> <span class="s1">'secondary'</span><span class="p">::</span><span class="nb">text</span><span class="p">)</span>
<span class="n">Total</span> <span class="n">runtime</span><span class="p">:</span> <span class="mi">21</span><span class="p">.</span><span class="mi">527</span> <span class="n">ms</span>
</pre>
<p>You can see that we now traded our <em>Seq Scan</em> for <em>Index Scan</em> and <em>Bitmap Heap Scan</em>, which indicates that our attribute indexes are being used, yay!</p>
<p>The so-called <em>Bitmap Heap Scan</em>, instead of a <em>Sequential Scan</em>, is performed when the planner decides it can use the index to gather all the rows it thinks it needs, sort them in logical order and then fetch the data from the table on disk in the most optimized way possible (trying to open each disk page only once).</p>
<p>The order by which the <em>Bitmap Heap Scan</em> arranges the data is directed by the child node aka the <em>Bitmap Index Scan</em>. This latter type of node is the one doing the actual searching <em>inside</em> the index. Because in our <em>WHERE</em> clause we have a condition which tells PostgreSQL to limit the rows to the ones of "highway" type "secondary", the <em>Bitmap Index Scan</em> fetches the needed rows from our <em>B-Tree</em> index we just made and passes them to its parent, the <em>Bitmap Heap Scan</em>, which then goes on to order the geometry rows to be fetched.</p>
<p>This already helped much, for our query runtime dropped to half. Now, let us make the indexes for our actual geometry, and see the effect:</p>
<pre class="code literal-block"><span class="k">CREATE</span> <span class="k">INDEX</span> <span class="n">planet_osm_line_way</span> <span class="k">ON</span> <span class="n">planet_osm_line</span> <span class="k">USING</span> <span class="n">gist</span><span class="p">(</span><span class="n">way</span><span class="p">);</span>
<span class="k">CREATE</span> <span class="k">INDEX</span> <span class="n">planet_osm_polygon_way</span> <span class="k">ON</span> <span class="n">planet_osm_polygon</span> <span class="k">USING</span> <span class="n">gist</span><span class="p">(</span><span class="n">way</span><span class="p">);</span>
</pre>
<p>Creating a <em>GiST</em> index is quite similar to a normal <em>B-Tree</em> index. The only difference here is that you specify the index to be build with <em>GiST</em>.</p>
<p>Vacuum:</p>
<pre class="code literal-block"><span class="k">VACUUM</span> <span class="k">ANALYZE</span> <span class="n">planet_osm_line</span><span class="p">;</span>
<span class="k">VACUUM</span> <span class="k">ANALYZE</span> <span class="n">planet_osm_polygon</span><span class="p">;</span>
</pre>
<p>Now poke it again with the same query and see our new plan:</p>
<pre class="code literal-block">Sort <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>4038.82..4046.54 <span class="nv">rows</span><span class="o">=</span><span class="m">3089</span> <span class="nv">width</span><span class="o">=</span>395<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>21.137..21.479 <span class="nv">rows</span><span class="o">=</span><span class="m">3215</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
Sort Key: <span class="o">(</span>st_distance<span class="o">(</span>road.way, building.way<span class="o">))</span>
Sort Method: quicksort Memory: 348kB
-&gt; Nested Loop <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>72.64..3299.76 <span class="nv">rows</span><span class="o">=</span><span class="m">3089</span> <span class="nv">width</span><span class="o">=</span>395<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>1.382..17.858 <span class="nv">rows</span><span class="o">=</span><span class="m">3215</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
-&gt; Index Scan using planet_osm_polygon_name_index on planet_osm_polygon building <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.28..8.30 <span class="nv">rows</span><span class="o">=</span><span class="m">1</span> <span class="nv">width</span><span class="o">=</span>207<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>0.041..0.044 <span class="nv">rows</span><span class="o">=</span><span class="m">1</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
Index Cond: <span class="o">(</span><span class="nv">name</span> <span class="o">=</span> <span class="s1">'Kin Thermal Power Plant Coal storage building'</span>::text<span class="o">)</span>
-&gt; Bitmap Heap Scan on planet_osm_line road <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>72.36..2488.32 <span class="nv">rows</span><span class="o">=</span><span class="m">3089</span> <span class="nv">width</span><span class="o">=</span>188<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>1.297..4.726 <span class="nv">rows</span><span class="o">=</span><span class="m">3215</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
Recheck Cond: <span class="o">(</span><span class="nv">highway</span> <span class="o">=</span> <span class="s1">'secondary'</span>::text<span class="o">)</span>
-&gt; Bitmap Index Scan on planet_osm_line_highway_index <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..71.59 <span class="nv">rows</span><span class="o">=</span><span class="m">3089</span> <span class="nv">width</span><span class="o">=</span>0<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>0.866..0.866 <span class="nv">rows</span><span class="o">=</span><span class="m">3215</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
Index Cond: <span class="o">(</span><span class="nv">highway</span> <span class="o">=</span> <span class="s1">'secondary'</span>::text<span class="o">)</span>
Total runtime: 21.873 ms
</pre>
<p>Hmm, the plan did not change at all, and our runtime is roughly identical. Why is our performance still the same?</p>
<p>The culprit here is <em>ST_Distance()</em>.</p>
<p>As it turns out, this function is unable to use the <em>GiST</em> index and is therefor not a good candidate to set loose on your whole result set. The same goes for the <em>ST_Area()</em> function, by the way.</p>
<p>So we need a way to limit the amount of records we do this expensive calculation on.</p>
<h4>ST_DWithin()</h4>
<p>We introduce a new function: <em>ST_DWithin()</em>. This function could be our savior in this case, for it does use the <em>GiST</em> index.</p>
<p>Whether or not a function (or operator) can use the <em>GiST</em> index, depends on if it uses <em>bounding boxes</em> when performing calculations.
The reason why is because <em>GiST</em> indexes mainly store bounding box information and not the exact geometry itself.</p>
<p><em>ST_DWithin()</em> checks if given geometry is within a radius of another piece of geometry and simply returns <em>TRUE</em> or <em>FALSE</em>.
We can thus use it in our <em>WHERE</em> clause to filter out geometry for which it returns <em>FALSE</em> (and thus not falls within the radius).
It performs this check using bounding boxes, and thus is able to retrieve this information from our <em>GiST</em> index.</p>
<p>Let me present you with a query that limits the result set based on what <em>ST_DWithin()</em> finds:</p>
<pre class="code literal-block"><span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">SELECT</span> <span class="n">road</span><span class="p">.</span><span class="n">highway</span><span class="p">,</span> <span class="n">road</span><span class="p">.</span><span class="k">ref</span><span class="p">,</span> <span class="n">ST_Distance</span><span class="p">(</span><span class="n">road</span><span class="p">.</span><span class="n">way</span><span class="p">,</span> <span class="n">building</span><span class="p">.</span><span class="n">way</span><span class="p">)</span> <span class="k">AS</span> <span class="n">distance</span>
<span class="k">FROM</span> <span class="n">planet_osm_polygon</span> <span class="k">AS</span> <span class="n">building</span><span class="p">,</span> <span class="n">planet_osm_line</span> <span class="k">AS</span> <span class="n">road</span>
<span class="k">WHERE</span> <span class="n">road</span><span class="p">.</span><span class="n">highway</span> <span class="o">=</span> <span class="s1">'secondary'</span>
<span class="k">AND</span> <span class="n">building</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="s1">'Kin Thermal Power Plant Coal storage building'</span>
<span class="k">AND</span> <span class="n">ST_DWithin</span><span class="p">(</span><span class="n">road</span><span class="p">.</span><span class="n">way</span><span class="p">,</span> <span class="n">building</span><span class="p">.</span><span class="n">way</span><span class="p">,</span> <span class="mi">10000</span><span class="p">)</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">distance</span><span class="p">;</span>
</pre>
<p>As you can see we simply added one more <em>WHERE</em> clause to limit the returned geometry by radius.
This will result in the following plan:</p>
<pre class="code literal-block">Sort <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>45.66..45.67 <span class="nv">rows</span><span class="o">=</span><span class="m">1</span> <span class="nv">width</span><span class="o">=</span>395<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>6.048..6.052 <span class="nv">rows</span><span class="o">=</span><span class="m">27</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
Sort Key: <span class="o">(</span>st_distance<span class="o">(</span>road.way, building.way<span class="o">))</span>
Sort Method: quicksort Memory: 27kB
-&gt; Nested Loop <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>4.63..45.65 <span class="nv">rows</span><span class="o">=</span><span class="m">1</span> <span class="nv">width</span><span class="o">=</span>395<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>3.157..6.005 <span class="nv">rows</span><span class="o">=</span><span class="m">27</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
-&gt; Index Scan using planet_osm_polygon_name_index on planet_osm_polygon building <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.28..8.30 <span class="nv">rows</span><span class="o">=</span><span class="m">1</span> <span class="nv">width</span><span class="o">=</span>207<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>0.051..0.054 <span class="nv">rows</span><span class="o">=</span><span class="m">1</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
Index Cond: <span class="o">(</span><span class="nv">name</span> <span class="o">=</span> <span class="s1">'Kin Thermal Power Plant Coal storage building'</span>::text<span class="o">)</span>
-&gt; Bitmap Heap Scan on planet_osm_line road <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>4.34..37.09 <span class="nv">rows</span><span class="o">=</span><span class="m">1</span> <span class="nv">width</span><span class="o">=</span>188<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>3.090..5.771 <span class="nv">rows</span><span class="o">=</span><span class="m">27</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
Recheck Cond: <span class="o">(</span>way <span class="o">&amp;&amp;</span> st_expand<span class="o">(</span>building.way, 10000::double precision<span class="o">))</span>
Filter: <span class="o">((</span><span class="nv">highway</span> <span class="o">=</span> <span class="s1">'secondary'</span>::text<span class="o">)</span> AND <span class="o">(</span>building.way <span class="o">&amp;&amp;</span> st_expand<span class="o">(</span>way, 10000::double precision<span class="o">))</span> AND _st_dwithin<span class="o">(</span>way, building.way, 10000::double precision<span class="o">))</span>
Rows Removed by Filter: 4838
-&gt; Bitmap Index Scan on planet_osm_line_way <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..4.34 <span class="nv">rows</span><span class="o">=</span><span class="m">8</span> <span class="nv">width</span><span class="o">=</span>0<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>1.978..1.978 <span class="nv">rows</span><span class="o">=</span><span class="m">4865</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
Index Cond: <span class="o">(</span>way <span class="o">&amp;&amp;</span> st_expand<span class="o">(</span>building.way, 10000::double precision<span class="o">))</span>
Total runtime: 6.181 ms
</pre>
<p>Good. We have just gone down to only <em>6.181 ms</em>, That seems to be much more efficient.</p>
<p>As you can see, our query plan got a few new rows. The main thing to notice is the fact that our <em>Bitmap Heap Scan</em> got another <em>Recheck Cond</em>, our expanded <em>ST_DWithin()</em> condition.
More to the bottom, you can see that the condition is being pulled from the <em>GiST</em> index:</p>
<pre class="code literal-block">Index Cond: <span class="o">(</span>way <span class="o">&amp;&amp;</span> st_expand<span class="o">(</span>building.way, 10000::double precision<span class="o">))</span>
</pre>
<p>This seems to be a much more desirable and scalable query.</p>
<p>But there is a drawback, though <em>ST_DWithin()</em> will make for speedy results, it works only by giving it a fixed radius.</p>
<p>As you can see from our usage, we call the function as follows: ST_DWithin(road.way, building.way, 10000).
The last argument, "10000", tells us how big the search radius is. In this case our geometry is in meters, so this means we search in a radius of 10 Km.</p>
<p>This static radius number is quite arbitrary and might not always be desirable. What other options do we have without compromising performance too much?</p>
<h4>Operators</h4>
<p>Another addition of PostGIS we have not talked about much up until now are the spatial <em>operators</em> we have available.
You have a total of 16 operators you can use to perform matches on your GIS data.</p>
<p>You have straightforward operators like <em>&amp;&amp;</em>, which returns <em>TRUE</em> if one piece of geometry intersects with another (bounding box calculation) or the <em>&lt;&lt;</em> which returns <em>TRUE</em> if one object is fully to the left of another object.</p>
<p>But there are more interesting ones like the <em>&lt;-&gt;</em> and the <em>&lt;#&gt;</em> operators.</p>
<p>The first operator, <em>&lt;-&gt;</em>, returns the distance between two points. If you feed it other types of geometry (like a linestring of polygon) it will first draw a bounding box around that geometry and perform a point calculation by using the bounding box <em>centroids</em>. A centroid is the calculated center of a piece of geometry (the drawn bounding box in our case).</p>
<p>The second, <em>&lt;#&gt;</em>, acts completely the same, but works directly on bounding boxes of given geometry. In our case, since we are not working with points, it would make more sense to use this operator.</p>
<p>The big advantage of this distance calculation operator is, once more, the fact that it too calculates using a bounding box and is thus able to use a <em>GiST</em> index.
However, the <em>ST_Distance()</em> function calculates distances by finding two points on the given geometry most close to each other, which serves the most <em>accurate</em> result.
The <em>&lt;#&gt;</em> operator, as said before, stretches a <em>bounding box</em> around each piece of geometry and therefor deforms our objects, making for less accurate distance measuring.</p>
<p>It is therefor not wise to use <em>&lt;#&gt;</em> to calculate accurate distances, but it is a life saver to <em>sort away</em> geometry that is too far away for our interest.</p>
<p>So a proper usage would be to first <em>roughly</em> limit the result set using the <em>&lt;#&gt;</em> operator and then more accurately measure the distance of, say, the first 50 matches with our famous <em>ST_Distance()</em>.</p>
<p>Before we can continue, it is important to point out that both the <em>&lt;-&gt;</em> and <em>&lt;#&gt;</em> operator can only use the <em>GiST</em> index when either the left or right hand side of the operator is a <em>constant</em> or <em>fixed</em> piece of geometry. This means we have to provide actual geometry using a constructor function.</p>
<p>There are other ways around this limitation by, for example as Alexandre Neto points out on the PostGIS mailing list, providing your own function which converts our "dynamic" geometry into a constant.</p>
<p>But this would make this post run way past its initial focus.
Let us simply try by providing a fixed piece of geometry.
The fixed piece is, of course, still our "Kin Thermal Power Plant Coal storage building", but converted into WKT:</p>
<pre class="code literal-block"><span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">WITH</span> <span class="n">distance</span> <span class="k">AS</span> <span class="p">(</span>
<span class="k">SELECT</span> <span class="n">way</span> <span class="k">AS</span> <span class="n">road</span><span class="p">,</span> <span class="k">ref</span> <span class="k">AS</span> <span class="n">route</span>
<span class="k">FROM</span> <span class="n">planet_osm_line</span>
<span class="k">WHERE</span> <span class="n">highway</span> <span class="o">=</span> <span class="s1">'secondary'</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">ST_GeomFromText</span><span class="p">(</span><span class="s1">'POLYGON((14239931.42 3054117.72,14239990.49 3054224.25,14240230.15 3054091.38,14240171.08 3053984.84,14239931.42 3054117.72))'</span><span class="p">,</span> <span class="mi">900913</span><span class="p">)</span> <span class="o">&lt;#&gt;</span> <span class="n">way</span>
<span class="k">LIMIT</span> <span class="mi">50</span>
<span class="p">)</span>
<span class="k">SELECT</span> <span class="n">ST_Distance</span><span class="p">(</span><span class="n">ST_GeomFromText</span><span class="p">(</span><span class="s1">'POLYGON((14239931.42 3054117.72,14239990.49 3054224.25,14240230.15 3054091.38,14240171.08 3053984.84,14239931.42 3054117.72))'</span><span class="p">,</span> <span class="mi">900913</span><span class="p">),</span> <span class="n">road</span><span class="p">)</span> <span class="k">AS</span> <span class="n">true_distance</span><span class="p">,</span> <span class="n">route</span>
<span class="k">FROM</span> <span class="n">distance</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">true_distance</span>
<span class="k">LIMIT</span> <span class="mi">1</span><span class="p">;</span>
</pre>
<p>This query uses a <em>Common Table Expression</em> or <em>CTE</em> (you could also use a simpler subquery) to first get a rough result set of about 50 rows based on what <em>&lt;#&gt;</em> finds.
Then <em>only</em> on those 50 rows do we perform our more expensive, index-agnostic distance calculation.</p>
<p>This results in the following plan and runtime:</p>
<pre class="code literal-block">Limit <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>274.57..274.57 <span class="nv">rows</span><span class="o">=</span><span class="m">1</span> <span class="nv">width</span><span class="o">=</span>64<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>11.236..11.237 <span class="nv">rows</span><span class="o">=</span><span class="m">1</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
CTE distance
-&gt; Limit <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.28..260.82 <span class="nv">rows</span><span class="o">=</span><span class="m">50</span> <span class="nv">width</span><span class="o">=</span>173<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>0.389..10.764 <span class="nv">rows</span><span class="o">=</span><span class="m">50</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
-&gt; Index Scan using planet_osm_line_way on planet_osm_line <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.28..16362.19 <span class="nv">rows</span><span class="o">=</span><span class="m">3140</span> <span class="nv">width</span><span class="o">=</span>173<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>0.389..10.745 <span class="nv">rows</span><span class="o">=</span><span class="m">50</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
Order By: <span class="o">(</span>way &lt;<span class="c">#&gt; '010300002031BF0D000100000005000000D7A3706D17296B41C3F528DC124D47417B14AECF1E296B4100000020484D4741CDCCCCC43C296B410AD7A3B0054D4741295C8F6235296B41B81E856BD04C4741D7A3706D17296B41C3F528DC124D4741'::geometry)</span>
Filter: <span class="o">(</span><span class="nv">highway</span> <span class="o">=</span> <span class="s1">'secondary'</span>::text<span class="o">)</span>
Rows Removed by Filter: 4562
-&gt; Sort <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>13.75..13.88 <span class="nv">rows</span><span class="o">=</span><span class="m">50</span> <span class="nv">width</span><span class="o">=</span>64<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>11.234..11.234 <span class="nv">rows</span><span class="o">=</span><span class="m">1</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
Sort Key: <span class="o">(</span>st_distance<span class="o">(</span><span class="s1">'010300002031BF0D000100000005000000D7A3706D17296B41C3F528DC124D47417B14AECF1E296B4100000020484D4741CDCCCCC43C296B410AD7A3B0054D4741295C8F6235296B41B81E856BD04C4741D7A3706D17296B41C3F528DC124D4741'</span>::geometry, distance.road<span class="o">))</span>
Sort Method: top-N heapsort Memory: 25kB
-&gt; CTE Scan on distance <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..13.50 <span class="nv">rows</span><span class="o">=</span><span class="m">50</span> <span class="nv">width</span><span class="o">=</span>64<span class="o">)</span> <span class="o">(</span>actual <span class="nb">time</span><span class="o">=</span>0.412..11.188 <span class="nv">rows</span><span class="o">=</span><span class="m">50</span> <span class="nv">loops</span><span class="o">=</span>1<span class="o">)</span>
Total runtime: 11.268 ms
</pre>
<p>As you can see, we are now using the <em>GiST</em> index "planet_osm_line_way", which was what we were after.</p>
<p>This yields roughly the same runtime as with our <em>ST_DWithin()</em>, but without the arbitrary distance setting.
We indeed have a somewhat arbitrary limiter of 50, but this is much less severe then a distance limiter.</p>
<p>Even if the closest secondary road is 100 Km from our building, the above query would still find it whereas our previous query would return nothing.</p>
<h3>One more for the road home</h3>
<p>Let us do a few more fun calculations on our Okinawa data, before I let you off the island.</p>
<p>Next I would like to find the longest <em>trunk</em> road that runs through this prefecture:</p>
<pre class="code literal-block"><span class="k">SELECT</span> <span class="k">ref</span><span class="p">,</span> <span class="n">highway</span><span class="p">,</span> <span class="n">ST_Length</span><span class="p">(</span><span class="n">way</span><span class="p">)</span> <span class="k">as</span> <span class="k">length</span>
<span class="k">FROM</span> <span class="n">planet_osm_line</span>
<span class="k">WHERE</span> <span class="n">highway</span> <span class="o">=</span> <span class="s1">'trunk'</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="k">length</span> <span class="k">DESC</span><span class="p">;</span>
</pre>
<p>We have a new function <em>ST_Length()</em> which simply returns the length, given that the geometry is a linestring or multilinestring.
The only index that will be used is our "planet_osm_line_highway_index" <em>B-Tree</em> index to perform our <em>Bitmap Index Scan</em>.</p>
<p><em>ST_Length()</em> does obviously not work with bounding boxes and therefor cannot use the geometrical <em>GiST</em> index. This is yet another function you should use carefully.</p>
<p>When looking at the result set that was returned to us, you will see that some routes show up multiple times.
Take route <em>58</em>, which is the longest and most famous route in Okinawa. It shows up around <em>769</em> times. Why?</p>
<p>This is because, especially for a database prepared for mapping, these pieces of geometry are divided over different tiles.</p>
<p>We thus need to accumulate the length of all the linestrings we find that represent pieces of route 58.
First, we could try to accomplish this with plain SQL:</p>
<pre class="code literal-block"><span class="k">WITH</span> <span class="n">road_pieces</span> <span class="k">AS</span> <span class="p">(</span>
<span class="k">SELECT</span> <span class="n">ST_Length</span><span class="p">(</span><span class="n">way</span><span class="p">)</span> <span class="k">AS</span> <span class="k">length</span>
<span class="k">FROM</span> <span class="n">planet_osm_line</span>
<span class="k">WHERE</span> <span class="k">ref</span> <span class="o">=</span> <span class="s1">'58'</span>
<span class="p">)</span>
<span class="k">SELECT</span> <span class="k">sum</span><span class="p">(</span><span class="k">length</span><span class="p">)</span> <span class="k">AS</span> <span class="n">total_length</span>
<span class="k">FROM</span> <span class="n">road_pieces</span><span class="p">;</span>
</pre>
<p>This will return:</p>
<pre class="code literal-block">536468.804010367
</pre>
<p>Meaning a total length of <em>536.486 Kilometers</em>. This query will run in about <em>19.375 ms</em>.
Let us add an index to our "ref" column:</p>
<pre class="code literal-block"><span class="k">CREATE</span> <span class="k">INDEX</span> <span class="n">planet_osm_line_ref_index</span> <span class="k">ON</span> <span class="n">planet_osm_line</span><span class="p">(</span><span class="k">ref</span><span class="p">);</span>
</pre>
<p>Perform vacuum:</p>
<pre class="code literal-block"><span class="k">VACUUM</span> <span class="k">ANALYZE</span> <span class="n">planet_osm_line</span><span class="p">;</span>
</pre>
<p>This index creation will speed up to query and make it run in little over <em>3.524 ms</em>. Nice runtime.</p>
<p>You could also perform almost the exact same query, but instead of using an SQL sum() function, you could use <em>ST_Collect()</em>, which creates collections of geometry out of all the separate pieces you feed it.
In our case we feed it separate linestrings, which will make this function output a single <em>multilinestring</em>. We would then only have to perform one length calculation.</p>
<pre class="code literal-block"><span class="k">WITH</span> <span class="n">road_pieces</span> <span class="k">AS</span> <span class="p">(</span>
<span class="k">SELECT</span> <span class="n">ST_Collect</span><span class="p">(</span><span class="n">way</span><span class="p">)</span> <span class="k">AS</span> <span class="n">geom</span>
<span class="k">FROM</span> <span class="n">planet_osm_line</span>
<span class="k">WHERE</span> <span class="k">ref</span> <span class="o">=</span> <span class="s1">'58'</span>
<span class="p">)</span>
<span class="k">SELECT</span> <span class="n">ST_Length</span><span class="p">(</span><span class="n">geom</span><span class="p">)</span> <span class="k">AS</span> <span class="k">length</span>
<span class="k">FROM</span> <span class="n">road_pieces</span><span class="p">;</span>
</pre>
<p>This query will run even around <em>1 ms</em> faster then former and it returns <em>the exact</em> same distance of <em>536.486 Kilometers</em>.</p>
<p>Now that we have this one multilinestring which represents route 58, we could check how close this route comes to our famous Kin building (which we will statically feed):</p>
<pre class="code literal-block"><span class="k">WITH</span> <span class="n">road_pieces</span> <span class="k">AS</span> <span class="p">(</span>
<span class="k">SELECT</span> <span class="n">ST_Collect</span><span class="p">(</span><span class="n">way</span><span class="p">)</span> <span class="k">AS</span> <span class="n">geom</span>
<span class="k">FROM</span> <span class="n">planet_osm_line</span>
<span class="k">WHERE</span> <span class="k">ref</span> <span class="o">=</span> <span class="s1">'58'</span>
<span class="p">)</span>
<span class="k">SELECT</span> <span class="n">ST_Distance</span><span class="p">(</span><span class="n">geom</span><span class="p">,</span> <span class="n">ST_GeomFromText</span><span class="p">(</span><span class="s1">'POLYGON((14239931.42 3054117.72,14239990.49 3054224.25,14240230.15 3054091.38,14240171.08 3053984.84,14239931.42 3054117.72))'</span><span class="p">,</span><span class="mi">900913</span><span class="p">))</span> <span class="k">AS</span> <span class="n">distance</span>
<span class="k">FROM</span> <span class="n">road_pieces</span><span class="p">;</span>
</pre>
<p>Which would give us:</p>
<pre class="code literal-block"> 7900.58662432767
</pre>
<p>In other words: Route 58 is, at it closest point, <em>7.9 Kilometers</em> from our coal storage building.
This query now took about <em>5 ms</em> to complete. A rather nice throughput.</p>
<p>Okay, enough exploring for today.</p>
<p>We took a brief look at indexing our spatial data, and what benefits we could gain from it.
And, as you can imagine, a lack of indexes and improper use of the GIS functions, could lead to dramatic slow-downs, certainly on larger data sets.</p>
<h3>Shapefiles</h3>
<p>Before I will let you go I want to take a brief look at another mechanism of carrying around GIS data: the <em>shapefile</em>.
Probably more used then the OSM XML format, but less open. It is almost the GIS standard way of exchanging data between GIS systems.</p>
<p>We can import shapefiles by using a tool called "shp2pgsql" which comes shipped with PostGIS.
This tool will attempt to upload <em>ESRI</em> shape data into your PostGIS enables database.</p>
<h4>ESRI?</h4>
<p><em>ESRI</em> stands for <em>Environmental Systems Research Institute</em> and is yet another organization that taps into the world of digital cartography.</p>
<p>They have defined a (somewhat open) file format standard that allows the GIS world to save their data in a so called <em>shapefile</em>.
These files hold GIS primitives (polygons, linestrings, points, ...) together with a bunch of descriptive information that tells us what each primitive represents.</p>
<p>It was once developed for ESRI's own, proprietary software package (ArcGIS), but was quickly picked up by the rest of the GIS community.
Today, almost all serious GIS packages have the ability to read and/or write to such shapefiles.</p>
<h4>Shapefile build-up</h4>
<p>Let us take a peek at the guts of such a shapefile.</p>
<p>First, contrary to what the name suggest, a shapefile is not a single file. At a minimal level, it is a bundle containing a minimum of three files to be spec compliant:</p>
<ul><li><em>.shp</em>: the first mandatory file has the extension <em>.shp</em> and holds the GIS primitives themselves.</li>
<li><em>.shx</em>: the second important file is an index of the geometry </li>
<li><em>.dbf</em>: the last needed file is a database file with geometry attributes</li>
</ul><h3>Getting shapefile data</h3>
<p>There are many organizations who offer shapefiles of all areas of the globe, either free or for a small fee.
But since we already have data in our database we are familiar with, we could create our own shapefiles.</p>
<h4>Exporting with pgsql2shp</h4>
<p>Besides "shp2pgsql", which is used to import or <em>load</em> shapefiles, we also got shipped a reverse tool called "pgsql2shp", which can export to or <em>dump</em> shapefiles based on geometry in your database.</p>
<p>So let us, per experiment, create a shapefile containing all secondary roads of Okinawa.</p>
<p>First we need to prepare an empty directory where this tool can dump our data. Since it will create multiple files, it is best to put them in their own spot.
Open up a terminal window and go to your favorite directory-making place and create a directory called "okinawa-roads":</p>
<pre class="code literal-block"><span class="nv">$ </span>mkdir okinawa-roads
</pre>
<p>Next enter that directory.</p>
<p>The "pgsql2shp" tool needs a few parameters to be able to successfully complete. We will be using the following flags:</p>
<ul><li>-f, tells the tool which file name to adhere</li>
<li>-u, the database user to connect with</li>
</ul><p>After these flags we need to input the database we wish to take a chunk out of and the query which will determine the actual data to be dumped.</p>
<p>The above will result in the following command:</p>
<pre class="code literal-block"><span class="nv">$ </span> pgsql2shp -f secundairy_roads -u postgres gis <span class="s2">"select way, ref from planet_osm_line where highway = 'secondary';"</span>
</pre>
<p>As you can see we construct a query which only gets the road reference and the geometry "way" column from the secondary road types.</p>
<p>After some processing it will have created 4 files, the 3 mandatory ones mentioned above, and a new one called a <em>projection</em> file.
This file contains the coordinate system and other projection information in WKT format.</p>
<p>This bundle of 4 files is now our shapefile format which you could easily exchange between GIS aware software packages.</p>
<h4>Importing with shp2pgsql</h4>
<p>Let us now import these shapefiles back into PostgreSQL and see what happens.</p>
<p>For this we will ignore out "gis" database, and simply create a new database to keep things separated.
Connect to a PostgreSQL terminal, create the database and make it PostGIS aware:</p>
<pre class="code literal-block"><span class="k">CREATE</span> <span class="k">DATABASE</span> <span class="n">gisshape</span><span class="p">;</span>
<span class="err">\</span><span class="k">c</span> <span class="n">gisshape</span>
<span class="k">CREATE</span> <span class="n">EXTENSION</span> <span class="n">postgis</span><span class="p">;</span>
</pre>
<p>Now go back to your terminal window to do some importing.</p>
<p>The import tool works by dumping the SQL statements to <em>stdin</em> or to a SQL dump file if preferred.
If you do not wish to work with such a dump file, you have to pipe the output to the <em>psql</em> command to be able to load in the data.</p>
<p>From the directory where you saved the shapefile dump, run the "shp2pgsql" tool:</p>
<pre class="code literal-block"><span class="nv">$ </span>shp2pgsql -S -s <span class="m">900913</span> -I secundairy_roads <span class="p">|</span> psql -U postgres gisshape
</pre>
<p>Let me go over the flags we used:</p>
<ul><li>-S: is used to keep the geometry <em>simple</em>. The tool otherwise will convert all geometry to its <em>MULTI...</em> counterpart</li>
<li>-s: is needed to set the correct SRID</li>
<li>-I: specifies that we wish the tool to create <em>GiST</em> indexes on the geometry columns</li>
</ul><p>Note that the <em>-S</em> flag will only work if all of your geometry is actual simple and does not contain true MULTI... types of geometry with multiple linestrings, points or polygons in them.</p>
<p>An annoying fact is that you <em>have</em> to tell the loader which SRID your geometry is in. There is a <em>.prj</em> file in our shapefile bundle, but it only contains the WKT projection information, not the SRID.
One trick to find the SRID based on the information in the projection file is by using <em>OpenGEO</em>'s <a href="http://prj2epsg.org">Prj2EPSG"</a> website, which does quite a good job at looking up the EPSG ID (which most of the time is the SRID). However, it fails to find the SRID of our OSM projection.</p>
<p>Another way of finding our about the SRID is by using the PostGIS <em>spatial_ref_sys</em> table itself:</p>
<pre class="code literal-block"><span class="k">SELECT</span> <span class="n">srid</span> <span class="k">FROM</span> <span class="n">spatial_ref_sys</span> <span class="k">WHERE</span> <span class="n">srtext</span> <span class="o">=</span> <span class="s1">'PROJCS["Popular Visualisation CRS / Mercator (deprecated)",GEOGCS["Popular Visualisation CRS",DATUM["Popular_Visualisation_Datum",SPHEROID["Popular Visualisation Sphere",6378137,0,AUTHORITY["EPSG","7059"]],TOWGS84[0,0,0,0,0,0,0],AUTHORITY["EPSG","6055"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4055"]],UNIT["metre",1,AUTHORITY["EPSG","9001"]],PROJECTION["Mercator_1SP"],PARAMETER["central_meridian",0],PARAMETER["scale_factor",1],PARAMETER["false_easting",0],PARAMETER["false_northing",0],AUTHORITY["EPSG","3785"],AXIS["X",EAST],AXIS["Y",NORTH]]'</span><span class="p">;</span>
</pre>
<p>This will gives us:</p>
<pre class="code literal-block">900913
</pre>
<p>Perfect!</p>
<p>If you now connect to your database and query its structure:</p>
<pre class="code literal-block"><span class="err">\</span><span class="k">c</span> <span class="n">gisshape</span>
<span class="err">\</span><span class="n">d</span>
</pre>
<p>You will see we have a new table called "secondary_roads". This table now holds only the information we dumped into the shapefile, being our road route numbers and their geometry. Neat!</p>
<h3>The end</h3>
<p>Good.</p>
<p>We are done folks. I hope I have given you enough firepower to be able to commence with your own GIS work, using PostGIS.
As I have said in the beginning of this series, the past three chapters form merely an introduction into the capabilities of PostGIS, so as I expect you will do every time: go out and explore!</p>
<p>Try to load in different areas of the world, either with OpenStreetMap or by using shapefiles. Experiment with all the different GIS functions and operators that PostGIS makes available.</p>
<p>And above all, have fun!</p>
<p>And as always...thanks for reading!</p>
<!-- LocalWords: PostGIS PostgreSQL GIS OpenStreetMap
--></div></description><category>postgis</category><category>postgresql</category><guid>http://shisaa.be/postset/postgis-postgresqls-spatial-partner-part-3.html</guid><pubDate>Wed, 25 Jun 2014 10:00:00 GMT</pubDate></item><item><title>Postgis, PostgreSQL's spatial partner - Part 2</title><link>http://shisaa.be/postset/postgis-postgresqls-spatial-partner-part-2.html</link><dc:creator>Tim van der Linden</dc:creator><description><div><p>Welcome to the secoflynd part of our spatial story. If you have not done so, I advise you to go and read <a href="http://shisaa.be/postset/postgis-postgresqls-spatial-partner-part-1.html" title="Part one of this series.">part one</a> first.</p>
<p>The first part of this series gives you some basic knowledge about the GIS world (GIS Objects, WKT, Projections, ...).
This knowledge will come in handy in this chapter.</p>
<p>Today we will finally take an actual peek at PostGIS and do some database work:</p>
<ul><li>We will see how we can create valid GIS objects and insert them into our database</li>
<li>Next let PostGIS retrieve information about these inserted GIS objects</li>
<li>Further down the line we will manipulate these object a bit more</li>
<li>Then we will leap from geometry into geography</li>
<li>Finally we will be doing some real world measurements</li>
</ul><p>Let us get started right away!</p>
<h3>Creating the database</h3>
<p>Before we can do anything else, we need to make sure that we have the PostGIS extension installed.
PostGIS is most of the time packaged as a PostgreSQL contribution package.
On a Debian system, it can be installed as follows:</p>
<pre class="code literal-block">apt-get install postgresql-9.3-postgis-2.1
</pre>
<p>This will install PostGIS version 2.1 for the PostgreSQL 9.3 database.</p>
<p>Next, fire up your database console and let us first create a new user and database:</p>
<pre class="code literal-block"><span class="k">CREATE</span> <span class="k">user</span> <span class="n">gis</span> <span class="k">WITH</span> <span class="n">PASSWORD</span> <span class="s1">'10gis10'</span><span class="p">;</span>
<span class="k">CREATE</span> <span class="k">DATABASE</span> <span class="n">gis</span> <span class="k">WITH</span> <span class="k">OWNER</span> <span class="n">gis</span><span class="p">;</span>
</pre>
<p>Not very original names, I know, but it states its purpose.
Next, connect to the <em>gis</em> database and enable the PostGIS extension:</p>
<pre class="code literal-block"><span class="err">\</span><span class="k">c</span> <span class="n">gis</span>
<span class="k">CREATE</span> <span class="n">EXTENSION</span> <span class="n">postgis</span><span class="p">;</span>
</pre>
<p>Now our database is PostGIS aware, and we are ready to get our hands dirty!</p>
<p>Notice that if you now describe your database:</p>
<pre class="code literal-block"><span class="err">\</span><span class="n">d</span>
</pre>
<p>PostGIS has created a new table and a few new views. This is PostGIS's own bookkeeping and it will store which tables contain geometry or geography columns.</p>
<h3>Fun with Polygons</h3>
<p>Let us begin this adventure with creating a polygon that has one interior ring, similar to the one we saw in the previous chapter.</p>
<p>Before we can create them, though, we have to create a table that will hold their geometrical data:</p>
<pre class="code literal-block"><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">shapes</span> <span class="p">(</span>
<span class="n">name</span> <span class="nb">VARCHAR</span>
<span class="p">);</span>
</pre>
<p>Now we have a table named "shapes" with only a column to store its name. But where do we store the geometry?</p>
<p>Because of the new data types that PostGIS introduces (geometry and geography) and to keep its bookkeeping up to date, you can create this column with a PostGIS function named <em>AddGeometryColum()</em>:</p>
<pre class="code literal-block"><span class="k">SELECT</span> <span class="n">AddGeometryColumn</span><span class="p">(</span><span class="s1">'shapes'</span><span class="p">,</span> <span class="s1">'shape'</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="s1">'POLYGON'</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span>
</pre>
<p>Let us do a breakdown.</p>
<p>First, all the functions that PostGIS makes available to us are divided in groups that define their area of use. <em>AddGeometryColumn()</em> falls in the "Management Functions" group.</p>
<p>It is a function that will create a geometry column in a table of choice and adds a reference to this column to its bookkeeping. It accepts a number of arguments:</p>
<ul><li>The table name to where you wish to add the column</li>
<li>The actual column name you wish to have</li>
<li>The SRID</li>
<li>The WKT object you wish to represent</li>
<li>The coordinate type you desire (2 means XY)</li>
</ul><p>In the above case we thus wish to add a geometry column to the "shapes" table. The column will be named "shape". The geometry inserted there will get an SRID of 0 and will be of object type POLYGON and have a normal, two dimensional coordinate layout.</p>
<h4>SRID?</h4>
<p>One thing that you might not yet know from the above function definition is the <em>SRID</em> or <em>Spatial Reference ID</em> and is a <em>very</em> important number when working with spatial data.
Remember in the last chapter I kept on yapping about different projections we had and that each projection would yield different results?
Well, this is where all this information comes together: the SRID.</p>
<p>Our famous OGC has create a lookup table containing a whopping <em>3911</em> entries, each entry with a unique ID, the SRID.
This table is called <em>spatial_ref_sys</em> and is, by default, installed into your PostgreSQL database when you enable PostGIS.</p>
<p>But hold on, there is something I neglected to tell you in the previous chapter: the European Petroleum Survey Group or EPSG.
The following is something that confuses many people and makes them mix-and-match SRID and EPSG ID's. I will try my best not to add up to that confusion.</p>
<h4>EPSG</h4>
<p>The EPSG, now called the OGP, is a group of organizations that, among other things, concern themselves over cartography.
They are the world's number one authority that <em>defines</em> how spatial coordinates (projected or real world) should be calculated.
All the definitions they make get and accompanying ID called the EPSG ID.</p>
<p>The OGC maintains a list to be used inside databases (GIS systems). They give all their entries a unique SRID.
These entries refer to <em>defined</em> and <em>official</em> projections, primarily maintained by the <em>EPSG</em> which have their own EPSG ID and unique name.
Other projections (not maintained by the EPSG) are also accepted into the OGC SRID list as are your own projections (if you would feel the need).</p>
<p>Let us poke the spatial reference table and see if we can get a more clear picture.</p>
<p>If we would query our table (sorry for the wildcard) and ask for a famous SRID (more on this one later):</p>
<pre class="code literal-block"><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">spatial_ref_sys</span> <span class="k">WHERE</span> <span class="n">srid</span> <span class="o">=</span> <span class="mi">4326</span><span class="p">;</span>
</pre>
<p>We would get back one row containing:</p>
<ul><li>srid, which is the famous id</li>
<li>auth_name, the name of authority organization, in most cases EPSG</li>
<li>auth_srid, the EPSG ID the authority organization introduced</li>
<li>srtext, tells us how the spatial reference is built using WKT</li>
<li>proj4text, commands that drive the proj4 library which is used to make the actual projections</li>
</ul><p>And as you can see, both the "srid" column and the "auth_srid" are identical. This will be the case with many entries.</p>
<p>I should also tell you that this huge list of SRID entries mostly consists of dead or localized projections.
Many of the projections listed are not used anymore, but where popular some time in history (they are marked deprecated), or are very localized.
In the previous chapter I mentioned that the general UTM system, for example, could be used as a framework for more localized UTM projections.
There are hundreds of these local projections that only make sense when used in the area they are intended for.</p>
<h4>Simple Features Functions</h4>
<p>As I have told you before, the functions that PostGIS makes available are divided into several, defined groups. The functions themselves are too defined, not by PostGIS but by the Simple Features standard maintained by the <em>OGC</em> (as we saw in the previous chapter).</p>
<p>There are a total of 8 major categories available:</p>
<ul><li>Management functions: functions which can manipulate the internal bookkeeping of PostGIS</li>
<li>Geometry constructors: functions that can create or construct geometry and geography objects</li>
<li>Geometry accessors: functions that let us access and ask questions about the GIS objects</li>
<li>Geometry editors: functions that let us manipulate GIS objects</li>
<li>Geometry outputs: functions that give us various means by which to transform and "export" GIS objects</li>
<li>Operators: various SQL operators to query our geography and geometry</li>
<li>Spatial relationships and measurements: functions that let us do calculations between different GIS objects</li>
<li>Geometry processing: functions to perform basic operations on GIS objects</li>
</ul><p>I have left a few categories out for they are either not part of the Simple Features standard (such as three dimensional manipulations) or beyond the scope.
To see a list of all of the functions and their categories, I advise you to visit the PostGIS reference, <a href="http://postgis.net/docs/reference.html">section 8</a>.</p>
<p>Let us now do some fun manipulations and use some of the functions from these categories, just to get a bit more familiar with how it all works together.</p>
<p>If you inserted the last SQL command which makes the geometry column, you should have gotten back the following result:</p>
<pre class="code literal-block"><span class="k">public</span><span class="p">.</span><span class="n">shapes</span><span class="p">.</span><span class="n">shape</span> <span class="n">SRID</span><span class="p">:</span><span class="mi">0</span> <span class="k">TYPE</span><span class="p">:</span><span class="n">POLYGON</span> <span class="n">DIMS</span><span class="p">:</span><span class="mi">2</span>
</pre>
<p>This tells us we created the "shape" column in the "shapes" table and set the SRID to 0.
SRID 0 is a convention used to tell a GIS system that you currently do not care about the SRID and simply want to store geometry with an arbitrary X and Y value.</p>
<p>Let us now insert the shape of our square. To insert a polygon into your column, you could use various functions. One of these functions is <em>ST_GeomFromText()</em>:</p>
<pre class="code literal-block"><span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">shapes</span> <span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">shape</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span>
<span class="s1">'Square with hole'</span><span class="p">,</span>
<span class="n">ST_GeomFromText</span><span class="p">(</span><span class="s1">'POLYGON ((8 1, 8 8, 1 8, 1 1, 8 1), (6 3, 6 6, 3 6, 3 3, 6 3))'</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">);</span>
</pre>
<p>This now inserts our polygon and gives it a name. The <em>ST_GeomFromText()</em> function enables us to enter our polygon object using WKT.
This function also accepts a second, optional parameter which is the SRID by which we wish to work.
The category of this function is called <em>Geometry Constructors</em>.</p>
<p>You know this polygon has two rings, the exterior and the interior. Let us now ask PostGIS to return only the line that represents the exterior ring:</p>
<pre class="code literal-block"><span class="k">SELECT</span> <span class="n">ST_ExteriorRing</span><span class="p">(</span><span class="n">shape</span><span class="p">)</span>
<span class="k">FROM</span> <span class="n">shapes</span>
<span class="k">WHERE</span> <span class="n">name</span> <span class="o">=</span> <span class="s1">'Square with hole'</span><span class="p">;</span>
</pre>
<p>And we get back:</p>
<pre class="code literal-block"><span class="mi">0102000020</span><span class="n">E6100000050000000000000000002040000000000000F03F00000000000020400000000000002040000000000000F03F0000000000002040000000000000F03F000000000000F03F0000000000002040000000000000F03F</span>
</pre>
<p>Oh my...that is not what we expected. But yet it is correct. This is how PostgreSQL stores geometry/geography.
The result is correct, yet unreadable to us humans. </p>
<p>If we wish to get back a readable WKT string, we have to convert it using one of the conversion functions:</p>
<pre class="code literal-block"><span class="k">SELECT</span> <span class="n">ST_AsText</span><span class="p">(</span><span class="n">ST_ExteriorRing</span><span class="p">(</span><span class="n">shape</span><span class="p">))</span>
<span class="k">FROM</span> <span class="n">shapes</span>
<span class="k">WHERE</span> <span class="n">name</span> <span class="o">=</span> <span class="s1">'Square with hole'</span><span class="p">;</span>
</pre>
<p>And we get:</p>
<pre class="code literal-block"><span class="n">LINESTRING</span><span class="p">(</span><span class="mi">8</span> <span class="mi">1</span><span class="p">,</span><span class="mi">8</span> <span class="mi">8</span><span class="p">,</span><span class="mi">1</span> <span class="mi">8</span><span class="p">,</span><span class="mi">1</span> <span class="mi">1</span><span class="p">,</span><span class="mi">8</span> <span class="mi">1</span><span class="p">)</span>
</pre>
<p>Aha, that is more like it! This we can read!</p>
<p>We used the <em>ST_ExteriorRing()</em> which falls under the <em>Geometry Accessors</em> category and the <em>ST_AsText()</em> function which resides in the category <em>Geometry Outputs</em>.</p>