Skip to content

This package allows to extract data from the Companies House API and create interlocking directorates networks

Notifications You must be signed in to change notification settings

MatthewSmith430/CompaniesHouse

Repository files navigation

Companies House

This is an R package aimed at helping in extracting data from companies house: https://www.gov.uk/government/organisations/companies-house

It particular, it provides a way to search for companies and extract a set of company numbers. These company numbers can then be used to identify company directors.

This package also provides functions which allow you to build a network of interlocking directors, that is a network of individuals and the companies, linked by board membership. Other networks are also created - such as director networks, this is a set of individuals linked by sitting on (at least one of) the same company board of directors. Company networks - a set of companies linked by having (at least one of) the same directors sitting on the board.

To install follow these steps

#Install CompaniesHouse:
library(devtools)
devtools::install_github("MatthewSmith430/CompaniesHouse")
library(CompaniesHouse)

Authorisation Key

To extract data from companies house (with the API), you will need to get an authorisation key.

The instructions on how to obtain your key can be found at: https://developer.company-information.service.gov.uk/

When using the package, save your key as mkey:

mkey<-"ENTER YOUR AUTHORISATION KEY"

Company Search

The following function allows you to search for companies in Companies House (using the API). You use the search term with your authorisation key, and it returns a list of companies that match the search term. It also give the Companies House Company number, the company address and various other information. The company number is important, as it is used to identify the firm, and is used in many of the other package functions. There are three versions for this command:
1. CompanySearch_limit_first This returns the first company from the search results
2. CompanySearch_limit This return the first page of search results
3. CompanySearch This returns all search results

In the following example I will use CompanySearch_limit (yet I will only display the first three results to save space)

#Search for a "COMPANY SEARCH TERM"
#In this example we use "unilever"

CompanySearchList<-CompanySearch_limit("unilever",mkey)

id.search.term

company.name

company.number

Date.of.Creation

company.type

company.status

address

Locality

postcode

unilever

UNILEVER PLC

00041424

1894-06-21

plc

active

Port Sunlight, Wirral, Merseyside, CH62 4ZD

Merseyside

CH62 4ZD

unilever

UNILEVER AUSTRALIA INVESTMENTS LIMITED

00137659

1914-09-12

ltd

active

Unilever House, 100 Victoria Embankment, London, EC4Y 0DY

London

EC4Y 0DY

unilever

UNILEVER AUSTRALIA PARTNERSHIP LIMITED

00315312

1936-06-17

ltd

active

Unilever House, 100 Victoria Embankment, London, EC4Y 0DY

London

EC4Y 0DY

Extract Directors Data

This function extracts director information for a company numbers. Where it gives a dataframe containing a list of directors and director information for the company number. In this example, I will ouput a small selection of the directors from Unilever Plc.

#We conintue to use Uniever as a example, we know that the company
#number for Unilever Plc is "00041424".

#Therefore we can extract the director information
#for Unilvever Plc
DirectorInformation<-company_ExtractDirectorsData("00041424", mkey)

company.id

director.id

directors

start.date

end.date

occupation

role

residence

postcode

nationality

birth.year

birth.month

former.name

download.date

00041424

i-a1nTc06VZikEBTLGW9DYwuANM

SOTAMAA, Ritva

2018-01-01

NA

NA

secretary

NA

EC4Y 0DY

NA

NA

NA

NULL

2020-10-11

00041424

ZI4TtLjPrlcnIckJGNlqCLV2s_Y

ANDERSEN, Nils Smedegaard

2015-04-30

NA

None

director

Denmark

EC4 0DY

Danish

1958

7

NULL

2020-10-11

00041424

E3FTMwTYyFn9_AXshohmRCws23c

CHA, Laura May Lung

2013-05-15

NA

Deputy Chairman Hsbc Asia Pacific

director

Hong Kong

EC4Y 0DY

Chinese

1949

12

NULL

2020-10-11

##Company Sector Code
This function finds the sector a company operates in - where it gives its SIC code. The function requires the company number.

#Again we use Unilever Plc as an example - using their company number
CompanySIC<-CompanySIC("00041424", mkey)

CompanySIC

x

70100

Director Information

In CompaniesHouse you can also examine the boards that a director sits on, if you have the director id. The indiv_ExtractDirectorsData function returns a list of firms where the individual has served as a member of the board.

You can also search for directors by name - in a similar way to company searches. Where you can search by director name. Similar to the company search there are three options:
1. DirectorSearch_limit_first This returns the first director from the search results
2. DirectorSearch_limit This return the first page of search results
3. DirectorSearch This returns all search results

Example

An example of the director search function can be used for examining Boris Johnson and the firms he has previously acted as a director.

The first steps is to use the function to identify his director id. His date of birth is June 1964 - this information can be used to identify the correct person and id.

##Use of make use of tidyverse package to process the dataframe
library(tidyverse)
boris_search<-DirectorSearch_limit("boris johnson",mkey) %>%
  filter(month.of.birth==6 & 
           year.of.birth==1964)

id.search.term

director.id

person.name

addess.snippet

locality

month.of.birth

year.of.birth

boris johnson

EZWa9WI6ur100VnMhfHT6EP4twA

Alexander Boris De Pfeffel JOHNSON

13 Furlong Road, London, N7 8LS

London

6

1964

Now we have his id, we can extract the list of firms where he has been a director.

boris_info<-indiv_ExtractDirectorsData(boris_search$director.id,mkey)

company.id

comapny.name

director.id

directors

director.forename

director.surname

start.date

end.date

occupation

role

residence

postcode

nationality

birth.year

birth.month

appointment.kind

download.date

05774105

FINLAND STATION LIMITED

EZWa9WI6ur100VnMhfHT6EP4twA

Alexander Boris De Pfeffel JOHNSON

Alexander

JOHNSON

2006-04-07

2008-05-23

Editor/Politician

director

NA

N7 8LS

British

1964

6

personal-appointment

11/10/2020 19:09:53

Networks

The package can be used to create a set of networks.
- Interlocking directorates network: a set of companies and individuals, where individuals are tied to companies where they sit on the board of directors.
- Director network: a set of directors, where they are linked if they sit on the same company board.
- Company network: a set of companies, where they are linked if they share a director.

Create Networks

The following functions create the various networks. Where a list of company numbers is required to create these networks.

Interlocking Directorates Network

When creating the interlock network - you need to specify the years that you want to cover - a start year and end year There are two ways to create the interlocking directorates network:

1.) From a list of company numbers

INTERLOCKS1<-InterlockNetwork(List.of.company.numbers,mkey)

##Example for all company numbers associated with the 
##Unilever search term for 2015 -2020 
##The first steps is to remove companbies that are no longer active from the list, then create the interlock network 
library(tidyverse)
COMP_LIST<-CompanySearchList%>%
  filter(company.status=="active")

INTERLOCKS1<-InterlockNetwork(COMP_LIST$company.number,mkey,start = 2015,end = 2020)

2.) From a data frame produced using the indiv_ExtractDirectorsData function. This dataframe can be edited manually to use company names (or perhaps another id system) in the network.

INTERLOCKS2<-make_interlock(DataFrame)

##Example for all company numbers associated with the 
##Unilever search term - the dataframe created with indiv_ExtractDirectorsData
INTERLOCKS2<-make_interlock(MultilpleDirectorInfo)

Company Network

The next network that can be created with the CompaniesHouse package is the company network. This is a one-mode projection of th interlocking directorates network. It is a set of companies that are linked when they share a director.

CompanyNET<-CompanyNetwork(List.of.company.numbers,mkey,start = 2015,end = 2020)

##Example for all company numbers associated with the 
##Unilever search term:
CompanyNET<-CompanyNetwork(COMP_LIST$company.number,mkey,start = 2015,end = 2020)

Director Network

The next network that can be created with the CompaniesHouse package is the director network. This is a one-mode projection of the interlocking directorates network, but for directors instead of companies. It is a set of direcotrs that are linked when they share a sit on the same board.

DirNET<-DirectorNetwork(List.of.company.numbers,mkey,start = 2015,end = 2020)

##Example for all company numbers associated with the 
##Unilever search term:
DirNET<-DirectorNetwork(COMP_LIST$company.number,mkey,start = 2015,end = 2020)

Network Analysis

The network (igraph object) is required for these functions. These are calculated using the commands from the “Create Networks” section.

Centrality

For each network we can calculate a range of centrality measures. The director and company networks are one-mode networks, so a wider range of centrality measures can be calculated.

INTERLOCKcent<-InterlockCentrality(INTERLOCKS1)

NAMES

Degree.Centrality

00041424

00041424

20

00137659

00137659

4

00315312

00315312

4

COMPANYcent<-one_mode_centrality(CompanyNET)

name

Weighted.Degree

Binary.Degree

Betweenness

Closeness

Eigenvector

NAMES

00041424

00041424

2

2

0.0000

0.0110

0.0018

00041424

00137659

00137659

25

11

3.1333

0.0152

1.0000

00137659

00315312

00315312

25

11

3.1333

0.0152

1.0000

00315312

DIRcent<-one_mode_centrality(DirNET)

name

Weighted.Degree

Binary.Degree

Betweenness

Closeness

Eigenvector

NAMES

p9NWLNpKrF1rsf9hRuxo6j0YbJQ

p9NWLNpKrF1rsf9hRuxo6j0YbJQ

19

19

0

0.0012

0.9103

p9NWLNpKrF1rsf9hRuxo6j0YbJQ

bJK4sl0SPT-Zxzq88lC1ouqrtl8

bJK4sl0SPT-Zxzq88lC1ouqrtl8

19

19

0

0.0012

0.9103

bJK4sl0SPT-Zxzq88lC1ouqrtl8

-dJ_v_xnd71ByCzbr1g-uLqafak

-dJ_v_xnd71ByCzbr1g-uLqafak

19

19

0

0.0012

0.9103

-dJ_v_xnd71ByCzbr1g-uLqafak

Network properties

We can calculate the properties of the director and company networks.

COMPANYprop<-CompanyNetworkProperties(CompanyNET)

One-Mode Company Network

Size

17.0000

Density

0.4338

Diameter

5.0000

Average.path.lenth

1.7905

Average.node.stregnth

6.1176

Average.Degree

3.4706

Betweenness.Centralisation

0.2502

Closeness.Centralisation

0.1160

Eigenvector.Centralisation

0.4333

Degree.Centralisation

0.2537

Clustering.coefficent.transitivity

0.8491

Clustering.Weighted

0.9158

DIRprop<-DirectorNetworkProperties(DirNET)

One-Mode Director Network

Size

66.0000

Density

0.1716

Diameter

6.0000

Average.path.lenth

2.6284

Average.node.stregnth

6.0606

Average.Degree

5.5758

Betweenness.Centralisation

0.3776

Closeness.Centralisation

0.0298

Eigenvector.Centralisation

0.7157

Degree.Centralisation

0.3669

Clustering.coefficent.transitivity

0.8848

Clustering.Weighted

0.8231

Plot Networks

The following function create plots of various networks. The TRUE/FALSE option indicates whether node labels should be included in the plots or not. The network plots are created from a list of company numbers for a quick inspection of the networks. There are a number of other commands and packages that can be used to create high quality network visualisations from the network objects in R. You can also specify the node size - in the following examples we use size 6.

InterlockNetworkPLOT(COMP_LIST$company.number,mkey,FALSE,NodeSize = 6,start = 2015,end = 2020)

#Directors Plot
DirectorNetworkPLOT(COMP_LIST$company.number,mkey,FALSE,NodeSize = 6,start = 2015,end = 2020)

#Company Plot
CompanyNetworkPLOT(COMP_LIST$company.number,mkey,FALSE,NodeSize = 6,start = 2015,end = 2020)

You can also create grid plots - showing a plot of all three networks on a single grid using the cowplot library. In the example below we plot the networks in a grid, setting node size to degree centrality.

library(cowplot)
library(CompaniesHouse)
##Create plot objects with node size based on centrality
interlock.plot<-InterlockNetworkPLOT(COMP_LIST$company.number,
                                     mkey,FALSE,NodeSize = "CENTRALITY",
                                     start = 2015,end = 2020)
director.plot<-DirectorNetworkPLOT(COMP_LIST$company.number,
                                   mkey,FALSE,NodeSize = "CENTRALITY",
                                   start = 2015,end = 2020)
company.plot<-CompanyNetworkPLOT(COMP_LIST$company.number,
                                 mkey,FALSE,NodeSize = "CENTRALITY",
                                 start = 2015,end = 2020)

##Plot as a grid
plot_grid(interlock.plot,director.plot,company.plot,
          labels=c("Interlocks","Directors","Companies"))

Additional useful functions

Gender Information

If your research require to examine the gender of directors, and how patterns of interlocking directorates differ for males and female, you will need additional information, as companies house does not provide gender information. However, there are a number of R packages that estimate the likelihood that a individual is male or female based on their first names. Although this is restricted to English first names, it still remains a useful tool to proxy gender information.

The available packages include gender and genderize. In the following example, we make use of the genderize package. We extract the gender information for all actors in the example Unilever director network, and then plot this network with the gender information.

This examples shows how you can identify gender for a network of directors where the director name is present. These commands cannot be directly to the object created by the CompaniesHouse package, as the directors are identified by their id only in this network.

##Load the relevant packages
library(igraph)
library(magrittr)
library(intergraph)
library(network)
library(GGally)
#devtools::install_github("kalimu/genderizeR")
library(genderizeR)

##Create name dataframe from director network
directornames<-V(DirNET)$name%>%as.data.frame(.,stringsAsFactors=FALSE)
colnames(directornames)<-"Names"

##Split the names into first and last names
names.split <- strsplit(unlist(directornames$Names), ",")
name.last <- sapply(names.split, function(x) x[1])
name.first <- sapply(names.split, function(x)
  # this deals with empty name slots in your original list, returning NA
  if(length(x) == 0) {
    NA
    } else if (x[length(x)] %in% c("Jr.", "Jr", "Sr.", "Sr",
                                 "Dr", " Dr", " Baron","Dr.", " Dr.",
                                 "Professor", " Professor")) {
    gsub("[[:punct:]]", "", x[length(x) - 1])
    
  } else {
    x[length(x)]
  })

##Create new names dataframe
nameDF<-data.frame(id=1:length(name.first),
                   name.first=name.first,
                   name.last=name.last)

nameDF$name.first<-as.character(nameDF$name.first)
nameDF$name.first<-trimws(nameDF$name.first)
nameDF$name.last<-as.character(nameDF$name.last)

##Extract first word of first name vectors (as this can also include middle/multiple names etc)
##This will be used as matching key later.
name1<-gsub(" .*", '', nameDF$name.first)
nameDF<-cbind(nameDF,name1)
nameDF$name1<-as.character(nameDF$name1)%>%tolower()
  
##Implement genderizeR
xPrepared = textPrepare(nameDF$name.first)
givenNames = findGivenNames(xPrepared, progress = FALSE) %>% as.data.frame(.,stringsAsFactors=FALSE)

##From this create a gender-name key
nameKEY<-givenNames
nameKEY$probability<-NULL
nameKEY$count<-NULL
colnames(nameKEY)<-c("name1","gender")

##Merge this will the director name dataframe
nameDF <- merge(nameDF, nameKEY, by = "name1",all.x = TRUE) 
nameDF$name1<-NULL
nameDF[is.na(nameDF)]<-"na"

##Add these as igraph network attributes
V(DirNET)$gender<-nameDF$gender
numericgender<-as.factor(nameDF$gender)%>%as.numeric() #Add numeric attribute
V(DirNET)$gendernumeric<-numericgender

##plot director network with gender information
DIRnetwork<-asNetwork(DirNET)

ggnet2(DIRnetwork,color.palette="Set1",
       node.size=4,color.legend = "Gender",
       node.color = get.vertex.attribute(DIRnetwork,"gender"),
       label = FALSE,edge.color =  "grey50",arrow.size=0)

####NOTE
##This can implemented for other languages (not just english names), 
##if the following is implemented.
Sys.setlocale("LC_ALL", "Polish") #Polish example
##see the genderizeR documentation for further details

About

This package allows to extract data from the Companies House API and create interlocking directorates networks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages