Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Christmas trees and potted plants #219

Open
martincollignon opened this issue Dec 22, 2024 · 1 comment
Open

Missing Christmas trees and potted plants #219

martincollignon opened this issue Dec 22, 2024 · 1 comment

Comments

@martincollignon
Copy link
Owner

martincollignon commented Dec 22, 2024

Problem:

When trying to join pesticide data and field data, some CVRs present in the pesticide data could not be matched to CVRs in the field data (around 4-5%). This means the pesticide consumption can't be located geographically.
A large part was related to Christmas trees and Potted plants, most probably because they are not necessarily on agricultural fields. It would be great to investigate other data sources.

Current ideas that may be relevant to explore (could be combined):

  • Notering af en ejendom som landbrugsejendom kan ske uden tilladelse efter reglerne i landbrugsloven, hvis ejendommen opfylder følgende betingelser, jf. landbrugslovens § 4, stk. 1:

er på 2 ha eller derover,
anvendes til landbrug, skovbrug, gartneri, herunder blomstergartneri, frugtplantage, planteskole eller lignende jordbrugsvirksomhed,
er forsynet med en beboelsesbygning.

Supporting analysis matching fields from 2023 and pesticides from 2023.

1. Analysis of Missing Farms:
Number of farms only in pesticide data: 1166

2. Crop Types for Missing Farms:
                                                      CompanyRegistrationNumber  \
                                                                        nunique   
Code  Name                                                                        
583.0 Juletræer og pyntegrønt                                               284   
545.0 Potteplanter                                                           71   
1.0   Vårbyg                                                                 45   
11.0  Vinterhvede                                                            45   
580.0 Anden skovdrift                                                        41   
581.0 Skovdrift med fjernelse af ved                                         25   
22.0  Vinterraps                                                             25   
497.0 Planteskolekulturer, vedplanter, til videresalg                        24   
900.0 Øvrige afgrøder                                                        22   
499.0 Lukket system                                                          21   

                                                        AcreageSize             
                                                                sum       mean  
Code  Name                                                                      
583.0 Juletræer og pyntegrønt                          42999.363800  45.357979  
545.0 Potteplanter                                      1975.857556   4.082350  
1.0   Vårbyg                                           16683.180000  73.494185  
11.0  Vinterhvede                                      21170.810000  85.023333  
580.0 Anden skovdrift                                    425.870000   7.743091  
581.0 Skovdrift med fjernelse af ved                     895.790000  24.883056  
22.0  Vinterraps                                       11294.550000  81.844565  
497.0 Planteskolekulturer, vedplanter, til videresalg    362.862000   4.371831  
900.0 Øvrige afgrøder                                     72.711860   1.547061  
499.0 Lukket system                                      270.847850   2.555168  

3. Area Statistics for Missing Farms:
count     1166.000000
mean       104.932561
std        674.343058
min          0.000000
25%          0.000000
50%          0.000000
75%         25.607500
max      11493.960000
Name: AcreageSize, dtype: float64

4. Top Postal Codes for Missing Farms:
PostCodeIdentifier
5270    135
5450     99
6500     84
8920     83
5672     69
9320     65
5290     57
9700     54
5250     53
8600     47
Name: count, dtype: int64

5. Size Distribution of Missing Farms vs Matching Farms:
Missing farms area statistics:
count     1166.000000
mean       104.932561
std        674.343058
min          0.000000
25%          0.000000
50%          0.000000
75%         25.607500
max      11493.960000
Name: AcreageSize, dtype: float64
@martincollignon
Copy link
Owner Author

FYI seems to be impacting fertiliser data as well...

DUPLICATE CHECK
Duplicates in field data keys: 18655
Duplicates in fertiliser data keys: 2513

WARNING: Duplicates found! This will affect the matching statistics.

==================================================

1. OVERALL DATASET STATISTICS
Unique CVRs in field data: 29050
Unique CVRs in fertiliser data: 26241
Total unique CVRs: 29772
CVRs in both datasets: 25519
CVRs only in field data: 3531
CVRs only in fertiliser data: 722

2. FIELD COUNTS AND MATCHING
Total unique fields in field data: 591435
Total unique fields in fertiliser data: 582339
Unmatched fertiliser fields: 11929 (2.0%)
Matched fertiliser fields: 572923 (98.4%)

3. AREA ANALYSIS
Total area in field data: 2657702.32 ha
Total area in fertiliser data: 2641507.12 ha
Area in unmatched fertiliser fields: 41323.50 ha (1.6%)

4. N KVOTE ANALYSIS
Total N Kvote: 402020038.43
N Kvote in unmatched fields: 4179711.86 (1.0%)

5. SIZE DISTRIBUTION ANALYSIS

All fertiliser fields area distribution:
count    584852.000000
mean          4.516539
std           6.785569
min           0.010000
25%           0.780000
50%           2.190000
75%           5.640000
max         714.270000
Name: Areal, dtype: float64

Unmatched fertiliser fields area distribution:
count    11929.000000
mean         3.464121
std          9.377051
min          0.010000
25%          0.490000
50%          1.330000
75%          3.250000
max        317.180000
Name: Areal, dtype: float64

6. CVR ANALYSIS
CVRs with unmatched fields: 1405 (5.4%)

7. FIELD SIZE CATEGORIES
Unmatched fields by size category:
size_cat
0-0.1 ha       733
0.1-0.5 ha    2328
0.5-1 ha      1933
1-5 ha        5016
5-10 ha       1198
>10 ha         721
Name: count, dtype: int64

Percentage distribution:
size_cat
0-0.1 ha       6.1
0.1-0.5 ha    19.5
0.5-1 ha      16.2
1-5 ha        42.0
5-10 ha       10.0
>10 ha         6.0
Name: count, dtype: float64

8. CROP TYPE ANALYSIS

Most common crop types in unmatched fields:
Hovedafgrøde
583.0    5384
581.0    1201
310.0     614
1.0       511
252.0     414
11.0      394
276.0     306
499.0     282
218.0     197
260.0     174
Name: count, dtype: int64

Percentage distribution:
Hovedafgrøde
583.0    45.1
581.0    10.1
310.0     5.1
1.0       4.3
252.0     3.5
11.0      3.3
276.0     2.6
499.0     2.4
218.0     1.7
260.0     1.5
Name: count, dtype: float64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant