Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internet Censorship in Iran: A First Look (FOCI 2013) #226

Open
wkrp opened this issue Mar 12, 2023 · 12 comments
Open

Internet Censorship in Iran: A First Look (FOCI 2013) #226

wkrp opened this issue Mar 12, 2023 · 12 comments
Labels
Iran reading group summaries and discussions of research papers and other publications

Comments

@wkrp
Copy link
Member

wkrp commented Mar 12, 2023

The previous live reading group was good fun and I'd like to schedule another session to talk about a noteworthy paper from the past. This time I want to do a paper about censorship in Iran from 2013.

"Internet Censorship in Iran: A First Look"
PDF, slides and video

The time:
Sunday, 2023-03-26 13:00–14:00 UTC

If you want to participate, just read the paper in advance and show up to the gathering, location TBA. As before, I'll make a video for those who cannot attend live.

The paper is an early effort to document censorship mechanisms in Iran, such as DNS injection, HTTP filtering, and bandwidth throttling, using the classic methodology of a controlled vantage point inside the censor's network. There is a noteworthy analysis based on traceroute to understand the network topology of the censorship system.

It will be ideal if we can have participation from someone who is familiar with current censorship conditions in Iran, so we can talk about what has changed and what has not since 2013.

@wkrp wkrp added the Iran label Mar 12, 2023
@wkrp wkrp added the reading group summaries and discussions of research papers and other publications label Mar 25, 2023
@wkrp
Copy link
Member Author

wkrp commented Mar 25, 2023

We'll start the reading group for "Internet Censorship in Iran" at Sunday at 13:00 UTC, about 20 hours from now.

https://meet.jit.si/moderated/2dc56baf31ed6b05581d4de75787544466acc45c4ac8230c35c4b3540e138c25

As before, I'll try to get the meeting running about 20 minutes early to give time to debug any connection issues. You can join using a pseudonym. I will set up the meeting so that participants' webcam and microphone are disabled when they join. I will make a video, but the video will not include the browser window that shows the meeting. To get an idea of how the video will look, see last month's reading group.

@wkrp
Copy link
Member Author

wkrp commented Mar 25, 2023

Here's a summary of the paper in advance of the meeting. I have written to the authors to see if they think the summary I have written is accurate, but they have not yet replied. I don't want to post a permanent summary when there's a chance the authors might disagree with its characterization, so here's a temporary link. When I hear back, I'll edit this comment to contain the summary. Did not hear back from the authors, but here is the summary for posterity:

Internet Censorship in Iran: A First Look
Simurgh Aryan, Homa Aryan, J. Alex Halderman
https://censorbib.nymity.ch/#Aryan2013a
Slides & video

This is among the earliest efforts to formally document Internet censorship practices in Iran. The researchers used a single vantage point in Iran to make measurements of foreign destinations; in some experiments they connected to their own cooperating server outside Iran, in order to see both sides of the connection. The study covers a time span shortly before and after presidential elections in June 2013. It found evidence of HTTP Host header filtering, HTTP path keyword filtering, DNS interception, and bandwidth throttling. Traceroute tests suggest that censorship activity is centralized at a single chokepoint on paths that leave the country.

The largest set of experiments consisted of HTTP GET requests to the top 500 sites in each of 18 Alexa categories. The most censored category was "Adult", at more than 95%; the next most censored was simply the top 500 by popularity, at around 50%. In a few cases the requests timed out, and some of these were certainly cases of geoblocking, where it was the destination server refusing to provide service, rather than interference by a middlebox. The form of blocking most commonly observed was triggered by the HTTP Host header: requesting a blocked domain name results in a false HTTP response with a "403 Forbidden" status code that redirects first to a censorship block page at http://10.10.34.34/ (only accessible from inside Iran) and then to http://peyvandha.ir/ (2013 archive here). The TCP handshake gets through the firewall to the true destination web server, but the packet containing the GET request and Host header is blocked (never reaches the destination), and instead the censorship node itself returns the false response, meanwhile also sending 5 RST packets to the destination with various sequence numbers and a spoofed source address. Besides the Host header, certain keywords in URL paths result in similar blocking: the researchers demonstrate by requesting "sex.htm". There were no packet-level changes that would indicate the use of a transparent HTTP proxy.

False DNS responses were also observed, but only for a small set of domains: facebook.com, youtube.com, and plus.google.com. The IP address in fake DNS responses was the same as in the fake HTTP responses, 10.10.34.34. The DNS query is stopped at the censorship node and never makes it to the intended resolver; this is different from how DNS injection works in China, where the query gets to the resolver and the user gets two DNS responses, one injected and one genuine. Bizarrely, the censorship node sent TCP RST packets to the destination DNS resolver, as with the HTTP-based blocking, in spite of the fact that the DNS exchange used UDP. They tested TCP-based DNS as well, and found no interference with it.

The researchers tested throttling by downloading a file over HTTP, HTTPS, and SSH. The HTTP and HTTPS transfers went about as fast as expected, but SSH transfers were throttled down to about 15% of available speed. The throttling was effected by dropping TCP packets. SSH obfuscated by xoring with a constant key was throttled almost to zero, as was obfs2. The authors' interpretation of these results is that throttling is based on a protocol whitelist: a few enumerated protocols are permitted, and all others are throttled. Tests after the election in June 2013 showed no evidence of throttling.

There is evidence that network censorship is enforced at a centralized location. Traceroutes to IP addresses in various foreign countries all passed through a certain 10.10.*.* IP address. Traceroutes in the reverse direction did not see the 10.10.*.* address. TTL-limited HTTP GET requests in the manner of Xu et al. 2011 show that the node at this IP address is the same one responsible for HTTP interception. The address did not respond to port scans.

@cross-hello
Copy link

cross-hello commented Mar 25, 2023 via email

@ftfws
Copy link

ftfws commented Mar 27, 2023

@wkrp Followup about DNS: I did not observe any RSTs when trying to reproduce what you said in the meeting (doing normal DNS over UDP requests with clients behind the old and also new DPI).

I did however see one TCP RST+ACK on the remote server when testing DNS over TCP with a client behind the new(?) DPI. The client just received an ACK followed by the malformed response and nothing else. both forged responses had random IP IDs and no TCP option fields (header was exactly 20 bytes). When the client tried to close the flow with a FIN+ACK it surprisingly arrived at its destination and the remote server replied with RST as it should since the connection was already torn down (I'm not still sure about if they're just allowing FIN packets or the flow is not blocked. This needs more testing.).

The old DPI still allows DNS over TCP and does not alter the response.


Example of the Location header redirect I mentioned in the meeting:

HTTP/1.1 301 Moved Permanently
Location: http://10.10.34.34
Content-Type: text/html
Content-Length: 156

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Moved</title>
</head>
<body>
<h1>Redirected</h1>
</body>
</html>

I also have PCAPs of both the client and server. I can share it via a private communication channel if possible.


I should also note that what I'm referring to as the "new" DPI might not be that new but I've never seen such behavior (random IP ID and RST after forbidden SNI) before and I am seeing it deployed on more networks slowly.

@wkrp
Copy link
Member Author

wkrp commented Mar 27, 2023

Video thumbnail
Link to video

Here is the video of the discussion. Thanks to all who participated.

Some quite interesting points came up in the discussion, comparing the current censorship situation with what this paper described ten years ago:

  • The paper describes an HTTP blocking technique that injects an HTML 403 response with an iframe enclosing http://10.10.34.X:80/. Documented examples of this block page are here and here. But apparently there is another (newer?) kind of censorship node that uses a Location header to redirect to http://10.10.34.X:80/ rather than an iframe. This type of node uses random IP IDs, instead of copying the IP ID of the request. You can see what the Location-based block page looks like in another post in this thread.
  • The paper found interference with UDP-based DNS, but none with TCP-based DNS (Section 4.3). There is now a new type of censorship node that tries to interfere with TCP-based DNS; however its injected responses are malformed. It wasn't clear from the discussion whether the new type of TCP-based DNS blocker is the same as the Location-based HTTP blocker.
  • These days, of course, TLS SNI–based blocking is prevalent. One of the participants said that after seeing a forbidden SNI, all packets are dropped (not just the one containing the SNI). I didn't hear in the discussion whether there the packet dropping expires after a timeout, or whether it affects all packets between the same source and destination IP addresses, or just ones with the same port numbers.
  • It is reported that all ISPs route through AS49666, the Telecommunication Infrastructure Company, or TIC. This is consistent with what was reported in how Iran is filtering the v2ray traffic. #188. (Note that TIC, the Telecommunication Infrastructure Company, is not the same as TCI, the Telecommunication Company of Iran, a major ISP.) TIC may be where centralized censorship is implemented. However, some ISPs, especially mobile ISPs, may implement their own layer of censorship before packets even get to TIC. As one commenter said, things like forbidden UDP packets get lost "too soon".
  • The Communications Regulatory Authority of Iran (CRA) mentioned in Section 2 is the same as in the January 2023 Citizen Lab report about a telephony intercept system.
  • SSH throttling still occurs, but does not affect all foreign IP address ranges equally. Amazon IP addresses are hard to get, but are not blocked.

We talked about ways to reproduce the paper's Figure 2, which shows the blocking rate of different website categories, using contemporary censorship measurement systems. This is Figure 2 from the paper:

Figure 2 from the paper: "Effects of Iranian Internet censorship on the top 500 websites for 18 Alexa categories"

You can get a similar chart straight from OONI MAT. Go to https://explorer.ooni.org/chart/mat, enter Country: Iran and Columns: Website Categories, then click Show Chart. The columns aren't all the same size, and the result categories are stacked rather than side by side, but you can still see that, e.g., the PORN category is proportionally almost completely blocked:

https://explorer.ooni.org/chart/mat?probe_cc=IR&since=2023-02-24&until=2023-03-27&time_grain=day&axis_x=category_code&test_name=web_connectivity
OONI MAT: Iran, Web Connectivity Test

I could not figure out how to get such a chart directly from the Censored Planet dashboard. But the data is there: you can export a CSV and visualize it yourself. Select Country: Iran, then clear the Network, Subnetwork, Site Category, and Domain fields. Click the icon next to "Top 50 – Domain" to expose the site categories, then click and Export as CSV.

"OK" versus "Unexpected" counts in site categories from Censored Planet

The R/Tidyverse script to produce the above Censored Planet graph is:

Censored Planet Dashboard_HTTPS Analysis_Pivot table.csv

library("tidyverse")
ggplot(read_csv("Censored Planet Dashboard_HTTPS Analysis_Pivot table.csv") %>%
	filter(Domain != "CONTROL" & `Site Category` != "Uncategorized") %>%
	mutate(
		OK = `Probe Count` * (1 - `Unexpected Rate`),
		Unexpected = `Probe Count` - OK
	) %>%
	group_by(`Site Category`) %>%
	summarize(`Probe Count` = sum(`Probe Count`), OK = sum(OK), Unexpected = sum(Unexpected)) %>%
	mutate(`Site Category` = fct_reorder(`Site Category`, OK / `Probe Count`)) %>%
	pivot_longer(cols = c(OK, Unexpected), names_to = "result", values_to = "count")
) +
geom_bar(
	aes(x = `Site Category`, y = count / `Probe Count`, fill = result),
	width = 0.7,
	stat = "identity",
	position = "dodge"
) +
scale_fill_manual(values = c(OK = "lightgreen", Unexpected = "red")) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
labs(x = NULL, y = NULL)
ggsave("censoredplanet-categories.png", width = 8, height = 5)

Links to references that came up during the discussion:

@wkrp wkrp changed the title Live reading group 2023-03-26 – Internet Censorship in Iran: A First Look (FOCI 2013) Internet Censorship in Iran: A First Look (FOCI 2013) Mar 27, 2023
@wkrp
Copy link
Member Author

wkrp commented Mar 27, 2023

@ftfws: thanks so much for the tests and the information.

You said that the responses injected by DNS-over-TCP censorship are malformed. In what way are they malformed?

@ftfws
Copy link

ftfws commented Mar 27, 2023

@wkrp Here is an example for google.com. I have redacted the transaction ID but it did match the question's transaction ID.

0000   00 2a xx xx 81 a0 00 01 00 01 00 00 00 01 06 67   .*.............g
0010   6f 6f 67 6c 65 03 63 6f 6d 00 00 01 c0 0c 00 01   oogle.com.......
0020   00 01 00 00 01 a2 00 04 d8 ef 26 78               ..........&x

I don't know much about DNS over TCP's wire protocol but it looks like it's missing the question class (0x0001 in this case for IN) before starting the answer section and the pointer (0xc00c).

@ftfws
Copy link

ftfws commented Mar 27, 2023

And about HTTP blocking, the new(?) DPI still injects iframes too but the injected response differs slightly (there are more headers now). The packets still show signs that they're injected from the new DPI (random IP ID for example). I have no idea why they do both.

Example of iframe injection (the title is redacted as I thought it might contain identifiable information):

HTTP/1.1 403 Forbidden
Connection: close
Content-Type: text/html; charset=utf-8
Content-Length: 337

<html><head><meta http-equiv="Content-Type" content="text/html; charset=windows-1256"><title>XXXX</title>
</head><body><iframe src="http://10.10.34.36/?type=Invalid Keyword&policy=MainPolicy " style="width: 100%; height: 100%" scrolling="no" marginwidth="0" marginheight="0" frameborder="0" vspace="0" hspace="0"></iframe></body></html>

Notice that there are now 2 lines (and the line ends at </title>). There are also 2 new headers (Content-Type and Content-Length). Another thing is that the Content-Type header's value differs from what's specified in meta tag.


You also can trigger the Location-based redirect by doing an HTTP GET request to any domain that was used for a proxy from MCCI-AS (AS197207) this was the only widely-used ISP that I could think of that shows these signs. I don't know if this is regional or not.
Some ISPs in some cities block any domain in the .site TLD if it's behind Cloudflare (the destination is in AS13335). This also triggers the redirect if the ISP is behind this DPI.

@wkrp
Copy link
Member Author

wkrp commented Mar 28, 2023

I don't know much about DNS over TCP's wire protocol but it looks like it's missing the question class (0x0001 in this case for IN) before starting the answer section and the pointer (0xc00c).

Yes, your analysis is correct. The response is missing the QCLASS field of the Question section. The NAME field of the Answer section is being interpreted as the QCLASS, and then parsing gets desynchronized. The response is malformed in another way: it says ARCOUNT=1 but the Additional section is empty.

But the IP address in the response seems to be a valid one for google.com, 216.239.38.120. Here is a manual dissection:

--- Header ---
xx xx        ID
81 a0        QR=1 OPCODE=0 AA=0 TC=0 RD=1 RA=1 Z=0 AD=1 CD=0 RCODE=0
00 01        QDCOUNT=1
00 01        ANCOUNT=1
00 00        NSCOUNT=0
00 01        ARCOUNT=1
--- Question ---
06 google 03 com 00  QNAME
00 01        QTYPE=1 (A)
<missing>    QCLASS=?
--- Answer ---
c0 0c        NAME (points back to QNAME)
00 01        TYPE=1 (A)
00 01        CLASS=1 (IN)
00 00 01 a2  TTL=418
00 04        RDLENGTH=4
d8 ef 26 78  RDATA=216.239.38.120
--- Authority ---
--- Additional ---
<missing>

It is strange, if this interference is intended for censorship, that it returns what appears to be a good IP address, rather than a 10.10.34.X one. Does google.com get interference with UDP-based DNS? Are all domain names interfered with when using DNS over TCP, or only some of them? It may be that this ISP's DNS-over-TCP resolver just doesn't work properly.

ARCOUNT=1 may be something that is (wrongly) copied from the query. It is a little reminiscent of the faulty response construction that used to happen in Turkmenistan. ARCOUNT=1 is characteristic of EDNS, which uses the Additional section to store extension information. I suggest trying a query which does not used EDNS, and seeing if ARCOUNT=0 in the response when you do that. E.g. dig +tcp +noedns google.com. You can get other ideas of experiments to try at #80.

@ftfws
Copy link

ftfws commented Mar 28, 2023

But the IP address in the response seems to be a valid one for google.com, 216.239.38.120.

This is forcesafesearch.google.com and is not a normal IP address for google.com.

It is strange, if this interference is intended for censorship, that it returns what appears to be a good IP address, rather than a 10.10.34.X one.

The censor has the ability to inject any IP address and not just 10.10.34.3x. In this case it is used for forcing safe-search to be enabled for everyone. I don't know if this was previously documented here but it's a well-known thing inside of Iran.

Does google.com get interference with UDP-based DNS?

This kind of interference, like all kinds of DNS censorship, happens on all ISPs over UDP (and TCP if the ISP has the new DPI).

Are all domain names interfered with when using DNS over TCP, or only some of them? It may be that this ISP's DNS-over-TCP resolver just doesn't work properly.

Not all domains of course. I am also using a DNS server outside of Iran to avoid the ISP's DNS and focus on the firewall itself.

ARCOUNT=1 may be something that is (wrongly) copied from the query. It is a little reminiscent of #80 (comment). ARCOUNT=1 is characteristic of EDNS, which uses the Additional section to store extension information.

The command used was a simple dig google.com +tcp @theServerOutsideOfIran, which indeed uses EDNS by default. I will try another one without EDNS and post the result. Edit: I can confirm that not using EDNS sets ARCOUNT to zero.

@wkrp
Copy link
Member Author

wkrp commented Mar 29, 2023

This is forcesafesearch.google.com and is not a normal IP address for google.com.

Wow, TIL. Thanks for the information.

https://support.google.com/websearch/answer/186669

Map google domains to forcesafesearch.google.com

This method leverages SafeSearch VIP to force all users on your network to use SafeSearch on Google Search while still allowing a secure connection via HTTPS. The VIP in SafeSearch VIP refers to a Virtual IP, which is an IP address that can be routed internally to multiple Google servers. We will serve SafeSearch results for all requests that we receive on this VIP, which includes Google search, image search, and video search.

This would make a good small research project. Process past OONI and Censored Planet measurements, looking for ones where DNS queries for google.com get a forcesafesearch.google.com IP address, as well as looking for empty or NXDOMAIN responses to queries for use-application-dns.net, which is a canary domain that DNS operators can use to disable DNS over HTTPS in browsers.

@ftfws
Copy link

ftfws commented Mar 30, 2023

Just a some more info as I'm sifting through OONI data.

I've found several measurements that show the new Location-based redirect. They are vary rare because almost all of them are first blocked by DNS and never make it to the HTTP request.

OONI Explorer Links for Location header redirects:

An example of RST after forbidden SNI (I've not yet found a good way to filter this reliably): https://www.cpj.org/ on AS42337 (Respina Networks & Beyond PJSC)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Iran reading group summaries and discussions of research papers and other publications
Projects
None yet
Development

No branches or pull requests

3 participants