4 PM.
Main hall.
.
.
21 October 2025. RIPE 91
Main hall
MAT Working Group
4 p.m.
MASSIMO CANDELA: Hello. Welcome, good afternoon everybody, welcome to this edition of the MAT Working Group, which stands for measurement, us analysis and tools. Welcome to Bucharest of course, and together with me chairing this session I have migrate co‑chair Nina and Stephen, which are both here in person. And I would like to just give you a quick reminder what this session is about.
This is a space where researchers or people doing experiments both from the academia or the industry can communicate with operators and get feedback. We want to bridge in gap between the researchers and the operators. We want to ‑‑ for the researchers to share their research and maybe get a reality check from the operators and at the same time benefit from the analysis, the data sets, the statistics and the tooling that the researchers are able to provide, and maybe shape the future of the Internet based on this.
In one of the ‑‑ actually in the last edition in Lisbon, my co‑Chair Stephen presented a statistics that we compiled together about how the Working Group is doing and also what we did to address feedback, how we are doing with the feedback that we received, even in this session we are addressing some feedback we got in the last edition in Lisbon. All of this to say it's really important that you, at the end of the session, you can approach us and provide your opinion. And also rate the presentation so as that we know what content is appealing to you.
We received a lot of submissions, we accepted 38 percent, so as more or less as usual, but this time we also got a bit of people concerned why they didn't get accepted, but we only have an hour and a half and we already packed five presentations so I would say that it's time to go straight to the agenda of today.
The first presenter and the only one remote is Geoff Huston, presenting network measurement in the dark. Then we have Raffaele presenting disrupting the Internet in the name of copyright, an Italian story. Then Remi Hendriks presenting an empirical evasion of longitudinal Anycast catchment stability. And they are all here now.
I would say that we should go directly to the first presenter, which doesn't really need any introduction but I am going to do it anyway.
So, Geoff Huston orchestrated a construction of the first Internet network between Australian university, APNIC chief scientist and Internet hall of fame inductee. APNIC is he undertakes research on Internet infrastructure, IP technologies and address distribution policies.
So Geoff, the stage is yours, even if virtually and he is going to present his talk.
GEOFF HUSTON: Thank you very much for that, and I'm sorry I can't be with you in person, budgets, time, whatever, I am sorry, I am just not here.
What I am trying to do is present my slides, and that's not working, so, in some ways just to keep things rolling, I am going to actually press on with a verbal presentation of the slides, and if the meeting team can actually get something running that would be great. But let me press on.
Networks always had a very privileged position inside our environment of communications. Networks was something that basically had a position of privilege that they could be observe what was going on inside that network and what they are saying to each other, there were a one‑stop shop if you will for us using use behaviour.
Users expected those networks to actually maintain privacy of maybe half, sometimes it was a reinforced with regulatory measures, sometimes not, but nevertheless, they were privy to what users did. The last, I don't know, 20‑odd years of the Internet, has seen an incredible erosion of that trust model. What we are seeing is that the advertising revenue as a means of funding the Internet has acted as a major incentive to actually look at users behaviour and not in a productive way, not in a gentle way, about you to actually figure out more about you to make the advertising work better. The better the profile, the better the network operators or indeed the advertising is able of the users the higher the value of the advertiser.
Quite frankly, that was a value ‑‑ that was an erosion that was always going to get perverted and the perversion occurred in the Stone Age time when it occurred that the in some US agencies... wholesale surveillance of users, and it all ended badly, it all ended very badly when that was made public.
The IETF came out in May 2014 with RFC 7258 that said that the mass surveillance was an attack on the privacy of Internet users and organisations and the reaction was that we were going to basically mitigate that by designing protocols that effectively made observation impossible or so expensive that it was infeasible.
So, as I am trying to get a slide deck being set up and it's just not happening. I don't know what's going on there, I thought the tech team actually had my slides, time is moving on, I am going to just press on with this presentation. If they can get the slides up, that would be great, but I don't know, select your slides it says on the screen. No document is available is the warning there. I can't make this go away, I am sorry.
I am sorry about this, this is not meant to be happening. I am afraid you are going to have to share this yourself or it's not going to happen.
What does all that mean? Well let me continue on? What it meant was that we started making encryption a default. These days, for example, around 97% of all HTTPS traffic is encrypted using ATIs, that's a CloudFlare radar report. We then started hiding the metadata and we made work in trying to make the DNS, an encrypted DNS over TLS, DNS over QUIC, DNS over HTTPS. The majority of the DNS in terms of the user perspective, you talking to the closest recursive resolver is still in the clear in the open over UDP, that's about 70% but the other 30% is DNS over encrypted.channels these days. Little by little, we're actually encrypting much of the DNS. Can we go further? Yes, we can go a lot further and part of this is actually hiding all of the data by actually he using things like Apple private data relay, MASQUE as another technique, where we use double encryption on all of the traffic. We use the first relay to hide the identity of the user and a second relay to hide the content from the first relay. So no single point can actually see both who is asking and what they are asking in a single transaction. So someone is bringing up slides, I am up to about slide 20 if you could press on. I am now moving through. I have gone through all this.
So, now we're getting back to well what's left toe actually look at? It's the peep holes. It's the server name indication. That's being sealed as we speak, and now all that's left is the whole issue of the web PKI and revocation.
That's actually a tricky problem. I'm not sure it's going to get solved easily, because if you want to revoke a certificate, you actually have the user to ask is somebody is the certificate that I am being presented with good? And the problem is that somebody you ask now sees everything you are doing, and these days revocation, particularly in the web PKI, is kind of a yesterday concept, because it leaks like a sieve and/or substitution is we used very short lived certificates because obviously real bad people takes months to set up their attacks and that's just fine. The other alternative explanation is there is no revocation. Once a certificate goes bad, users are trapped and that's a very, very bad thing. But we don't have a better answer. Oops!
.
So why are we doing all this? That's a really good question, because the original question was that users cared about privacy. But if you actually look at the world a little bit harder you find that privacy is dead. And quite frankly most people don't care. It is a more general expression. Users will happily use Gmail for their mail and they'll happily use third party servers, they will happily tell Meta all their secrets on Facebook because they don't care. They'll happily trade on privacy for free access to search and everything else. It's not the users we are doing this for. Who is it? Well I tend to suspect that it's themselves. It's the folk who are competing with each other to set up profiles of users. This is all about each individual actor protecting their core asset of individual profiles from each other. If I'm writing on an Apple ‑‑ a mobile platform, it's anticipate that's the enemy, if I'm co‑resident in the operating system or at least in large system with some kind of web application etc., it's them that's the enemy, it's the network. So, the thing about privacy is that the application is trying to obscure itself from everything around it, from every other app on the host from the host itself, from the network and everything else.
And how do you do that? Well, you basically lift everything up into the encrypted envelope. And in the case of transport TCP that actually means lifting up TCP out of the kernel, out of the common infrastructure of protocol 6, bypass kernel handling, the lot. What does that mean? Well it's QUIC. QUIC is now relatively common out there on the Internet. It's been adopted by many folk and part of the reason is that what it does do is hide the entire transport from literally everything else. It's the new TCP for this paranoid world. CloudFlare report about 32% of the traffic they see is now running over QUIC. Cisco reported a couple of years ago suggested an extraordinary number coming out of YouTube, coming out of Instagram, coming out of Facebook, etc., that all made extensive use of QUIC.
Today's networking space is a different space now as a result. That the many that used to be in networks is now the money that is in the application space. What does that mean? You lift everything out. QUIC transport is now the dominating transport and becoming more prevalent. What does that mean for the network? Nothing, it's all encrypted. There is a huge amount of work on relays and proxies. What does that mean for the network? There is nothing to see. Because that relationship between applications and network is now soured into distrust and suspicion, and the applications defending itself by wrapping everything up in as much encryption as it can manage. The network operator there is no coming back here, there is nothing to see.
What can you do? Well as the Cisco presentation said two years ago, it's got worse not better. You can try and look at the profile of packets and sore of deploy the power of AI to understand what's going on in your network. You probably do that. There is not much left. You can't figure out the control signals and the Meta signals that are happening in your network to actually understand what's going on and your regulatory obligations regarding traffic interception are currently useless. All you see is a massive amount of traffic that is encrypted and not much else can be gleaned from it. What does that leave for us as measurement guys, as measurement people? What's left for measurement? Well you can't just collect all the packets and infer what's going on. For those of us who are around a long time might remember a few massive studies in the late eighties, early nineties where someone collected all the packets and tried to infer what's happening based on a huge collection. It doesn't work anymore. You can't do whole of network views. What's going on is that now control over the network is irrelevant. Control over the application is everything. And network is being pushed into basic commodity base behaviour.
So, each service, particularly with QUIC, now has the opportunity, each ad to define its own behaviour. What's interoperability in such a world? What about standards? It actually a says to me that the whole area around standards and open is actually dying a death under this inexorable pressure for applications to hide what they are doing from each other and from everything else. Measurement is sitting in this terrible space of trying to figure out what's going on without any clues being provided. And, you know, the depressing view from all that is network measurement a contradiction? It a bit like the presentation yesterday from Measurement Lab, where they are actually trying to go into hosts on the edge of the network and do the measurement from the edge from where encryption is not happening in order to see what's going on in the network. It's the same attitude with SLAs and it's exactly the same attitude with APNIC's ad based measurement. You now have to be the user and sit indeed, the device and inside the application in order to gain any insights about what they see from the inside looking out.
And to my mind, that's the big issue here about network measurement, that it's no defunct. We have to think about ways to do application and end based measurement.
So, that is my time. I know that the agenda is packed. I have left a couple of minutes for questions if there are any. But otherwise, I'll hand it back to you Massimo and my thank you for your tolerance in us getting start with this slide pack. Thank you.
(Applause)
STEPHEN STROWES: Okay. Thank you Geoff. Also apologies for the little bit of chaos at the start. I assure you there was a team of people trying to get your slides up on the screen, I am glad that we got there. Thank you.
Do we have any questions for Geoff?
AUDIENCE SPEAKER: Hi, this is Andrei. I was thinking more in the percent of view from the implementers, small implementers, so the QUIC is good for big players because you need to implement everything in the application and the library space is just horrible, horrible at the moment. It's not possible to reasonably implement QUIC in an application if you are not a big player or you are not playing with different weird implementations of because there is no support in OpenSSL, so, this whole move to QUIC kills your measurements and it also doesn't care about the small people.
GEOFF HUSTON: I sympathise with your position. You are absolutely right. As this game gets played and the applications become more of a complete operating system an entire environment because they are borrowing nothing from a common kernel, then this becomes a game only for the very largest. Now if you are one the very largest, this is fine, this is perfect. If you are not, you are playing a game that's more and more marginal, and we, the rest of the environment, seem to be tolerating this destruction. We should not be doing that. We should not be tolerating this perversion of what we thought of the most valuable part of the Internet, the very core of openness and open cooperation and common standards. They are all being killed before your very eyes, and quite frankly, if you let this happen, then I suspect it's your own fault. We saw it. We could have stopped it I think is what we'll be saying to ourselves in less than five years' time and that's really, really depressing.
I have gone sobbing in the corner over here.
STEPHEN STROWES: We have a question in the meet from... the question is "if DNS is encrypted you can use Dane to provide a certificate. If you also have Dane specified the certificate's expiration date, wouldn't revocation be achieved by setting a TTL and remove the records when compromised?"
GEOFF HUSTON: So close, so close, but encryption is not authentication. If the DNS in encrypted and you are using Dane, then you can actually securely map an association of the DNS to the authenticity of the name. Once you do that yes you can walk away from this entire RPKI revocation mess. We tried this, this was around 10 to 12 years ago. Dane gave it a shot. But again, as we have already said, Dane didn't work in the interests of the larger players and it got killed. Part of the issue is and it's a really big issue, if you want DNSSEC to work, you have to bring DNSSEC down to the stub resolver and setting the bit that says hi, I am your recursive resolver and I have validated the answer and sent can it in the clear is not security and it's not DNSSEC. How we get DNSSEC to stub resolvers is perhaps the really tough question that Dane needs to answer, and a few others if we need to complete that loop. So yes we can do a whole lot better but we need to solve a few fundamental issues with DNSSEC and stub resolvers before we get there, thank you for the question. We are straying away from network measurement is really, really bad, we need to do measurement out there at the edge. Sorry!
AUDIENCE SPEAKER: Coming back to measurements, I want to disagree slightly with you on two points. First of all, endpoint measurements always had been part of the mix. You mentioned RIPE and RIPE was always input based active measurements right. You need every measurement tools to measure different things. Yes, encryption decreases visibility and yes, as a researcher resee that we cannot measure things in the same way we need to come up with new solutions, and actually measuring things at an endpoint provides also new opportunities because endpoint has actually good visibility about what it's doing when you look at it at the network you are always guessing. What we need is really working on making measurement a first class citizen and adding things to protocols that make it still possible to provide measurements in a privacy preserving way. My second point I really want to agree to like you users don't care about privacy, I mean if you go ask users, it's a very abstract concept to them but I do think we care about privacy. I care about privacy I want to continue doing this.
GEOFF HUSTON: The problem with inside measurement is when we try anden roll users into it, we tend to get beneficial. I remember looking at APNIC's website and seeing there was a lot of v6. The answer was only the v6 folk came to us to look at the way v6 worked and everyone else was ignoring it. Even RIPE Atlas is biased. It's biased towards the technically competent that know what they are doing and want to host one of these probes the will our ad measurement system tries to look behind that by conscripting folk. Now there is a huge ethical issue in basically forcing a user to run an add that they are unaware that the measurement is going on. And I can see I am down at the far end of a line of ethics that say is this right? In some ways how else do you get that data that reflects the broad majority of users rather than that specialised core some of whom are in a room with you who are competent in what another doing, they will happily give you measurement data but it's not the full spectrum of what we need to understand the bigger network. So there is a few tough calls going on inside that system, so it's not quite easy just to say let's just measure at the edge and we're all done. Not quite the same.
AUDIENCE SPEAKER: No, but I don't think these problems are new and these are challenges that we all need to work on.
GEOFF HUSTON: Yeah, yeah, absolutely. I am running through other people's speaking time.
STEPHEN STROWES: You are good. But we will take one last question fairly quickly.
AUDIENCE SPEAKER: I just wanted to say that users actually care about privacy, I would just give a little analogy here. You all on your way to this meeting actually passed over a bridge, right. Did you care about the security of the bridge? Well, it's the security of your life. You did care. Did anybody of you ask the authorities if the bridge is secure? We don't do that because we expect the security and what users expect from the Internet is privacy and security. They just don't ask for it because it's expected. I think that is something that we should keep in mind and not just say oh they don't ask for it.
GEOFF HUSTON: There is a difference between expect and is willing to pay for. One the of the more interesting measurements, and it is a measurement that's available right now, is measuring the extent to which Apple users are willing to pay for Apple private data relay. It's not the a hundred percent, it's not even 20%, it's a lot lower than that. The issue is it's available to anyone with, you know, with a Mac and willing to tick the box if they are paid for that service. Will and that's a sort of the disappointing part that we're willing to take many things and assume many things as long as we have to pay for it but it privacy something that users on the whole are actually willing to fund, and the depressing answer is well the evidence doesn't really suggest that that's the case. So, there is a lot to do and possibly a lot more to talk about at the same time. Thank you very much.
STEPHEN STROWES: Thank you Geoff.
(Applause)
.
We appreciate joining in the middle of the night.
Our next speaker today is Rafaella. His current work explores strategies to increase transparency in the DNS ecosystem. Aiming to prevent abuse in malicious activity. Today, he is talking to us about disrupting the Internet in the name of copyright from the Italian perspective.
RAFFAELE SOMMESE: Welcome everybody. Good afternoon. The story that I'm going to tell you today it's a very Italian story, it's a story of how people in Italy are trying to sort of disrupt the Internet in the name of copyright. And they are trying to do it with this platform, it's calls piracy shield, it was a platform that was introduced in 2023 in Italy to allow copyright holderers, mostly of football streaming, to request the blocking of an IP, it will be IPv4 but the platform actually supports IPv6. IPv4 address or domain names involved in illegal football streaming within 30 minutes. So these people can request a block and within that 30 minutes that result is blocked globally.
What this platform is an unvetted blocking powers granted to private entities. With no clear expiration or review process for the block orders. With a lack.transparency because there is no public registry of the blocked resources, there has also been requested on the Freedom of Information Act a that have not been granted. With a lot of colateral damages with incident involving like services like Google drive and CloudFlare that made it to the news. For more details about these, you can check the presentation of mass Max Stucchi at RIPE /# 9 about this topic.
Today I want to talk about the fact that many people claim that this platform has been a success. Many people claim that despite what this appears to be a recipe for disaster, the Internet says oh but we block many many IP addresses and many many domains. If you think about is like it's effective because we put a lot of people in jail, that's not a good metric.
So what's Regulator.behind this 10,000 IPv4 and 40 thousand FQDNs that have been blocked? Can we shed a bit of light on this?
.
Well the problem is that there is no public list. So my presentation should end here, right. Unfortunately I am a bit stubborn, so I didn't take this argument. So, there was unverified leaked list on GitHub of the block and there was a web page of you can verify if a resource was blocked or not. The only problem is that there was a capture on that. Will so so I went through that on a night, I saw the capture and I validated the list. So we're back on track. So what we have there. We have that we found out that it appeared between February 2024 and June 2025, 10,000 IPs and 40 thousand FQDNs were blocked Italy. These were originated from 3782 blocking requests from copyright holders, and 98% of these IPs and 44% of these FQDNs were still blocked as June 2025. That's the last data point that we collected for this study. Of course, it's football. So, blocking will peak during the weekend.
So, while this dataset is not complete because we missed data before 2024 that were not in the GitHub leaked repo, it still provided a good basis to understand what this platform is doing. Let's try to understand what this platform is doing.
The first thing we looked at were the are /PAOEURTS and the interesting thing is that /PAOEURTS is like a new flag on their boat. So out of all these IP blocks that spans across 2134 /24s and 262 ASNs. 77% were geolocated within European Union and 38 percent in the Netherlands. A single hoster, a loan hosted like 10% of all the blocked IP in 15/4. But there are some legitimate companies, there is what's up there. And the interesting things for these companies that they expect a different behaviour. We ski a bit more scattered resource /‑FRBGT the more IP blocked more scattered /24. It means legal streamers tried to abuse the resource for doing their streamers. OVH has the highest number of unblocked IPs. So IPs that have been blocked and later on released by the platform. That means probably they have been released because they start to use it for benign reason. The other interesting part is that only 51ers /P of the blocked IPs are still responsive to probes nowadays.
FQDN also has a lot of release activity. But this is more to overcome platform limitation so the plaintiff initially /STAEBLD a maximum of 20 K FQDN that could be blocked, so they started to move old entities.
.in general, we also see that illegal /STRAOERPL of course abandon resource over time because the a abusers for a while and then abandon it later. The problem is that the blocking is almost forever, so if your resource gets blocked it's very hard to get out of the list. Could recall damage, there is IP leasing of course. And out of the 10,000 IPs, 24% were leased, and we don't know if streamers legitimately use leasing, that they purposely use leasing address to basically cycle over different IPs to avoid blocking or if they just use hosting and then by themselves use leasing. But at the end of day this creates a potential for collateral damage, and in fact 4% of all these IPs were leased out after two new users after the block was implemented, that with the help of IPX O we identified 250 IPs that were released to different companies after the blocking date. Meaning that these new companies were basically acquiring resources that was unusable in the Italian market.
The second collateral damage is of course I mean I don't need to explain to this room, an IP address can be used for multiple purposes and websites at the same time. You have the concept of visual host t means that multi.domains can buoyant to an IP address. What we did is we used the open Intel, CT logs to get as much information as we can. How many domains pointed out these resources and what could they have mean the collateral damage do to this /PHROBG. We identified a total of 7 FQDNs pointing to this resource as potential collateral damage. Amongst these around 2 thousand responded to HTPS and again manual verification, we went through the entire list which had the streaming website or the non‑streaming website. And actually 50 were non‑streaming related websites, or reg met business. And 131 blocked IPs were responsible for the collateral damage of 508 collateral blockings.
The interesting thing that most of the affected websites were in Europe from French, Spanish, German and nine were tally Italian, meaning also national damage. One notable case involved where the 19 al bane yen website were hosted on a single IP assigned to a W II T Cloud, and these sites I checked recently are still unreachable from Italy. You ask why. Will
Looking at the historical data. This is what was not reachable at the moment but you also look at what was not reachable at a certain period of time in the past and we found another 7 thousand FQDN that were impacted by blocking 600 of which were definitely non‑streaming website.
Another interesting case mail delivery notification. Looking at historical data, this ended up top renting by a Portuguese hosting company that basically disrupted the operation of 3 hundred websites and 169 of these websites used these IP also for e‑mail and web hosting, and we reached the company and the company told that you say they were completely unaware of the block and they just realised that after a while because they were unable to send e‑mails to Italy so, they were unable to reach tool yen customers. They didn't know that this was through piracy shield. Will
.
Another collateral damage is the fact that the platform ended up also blocking addresses of that are NIST Anycast. We found out 176 address that were Anycast and were blocked. And the problem here is that Anycast is used a lot for example for the data protection services and we found cases of storm wall, DDoS guard. These are service that is may be used from website from a really short period of time only when they are under DDoS and then this website becomes unreachable from Italy because their address is in the block list of piracy shield. Then we think we are blocked because becauseier under the DDoS and we are unreachable. That's not true. That's not the case.
So, we also found a very nice case of collateral damage and we found this in Google, an address belonging to Google, and this address of Anycast, then I looked it up, I tried to /EP /OT web page of this address and that was in the Netherlands and I got back the page of blocking of piracy shield. I thought I am in the Netherlands, what's going on here? Well, it turns out that this IP was the IP used by Telecom Italia to serve basically the blocking page of content of piracy shield. So the platform managed to block itself.
So, another thing we tried to investigate basically how stream err reacts to this blocking, and the things is that the platform really supports the blocking of IPv6 but they don't really block IPv6, so there is zero blocking on IPv6 now. Let's look on open NT data if after they get block fiduciary they migrate to IPv6, if the FQDN migrates and gets blocked to EIX /P. Yes they do. So 1,500 of them start to serve up over IPv6 after the block, and a 5 thousand of them migrated to a new IP but only 1,000 were actually caught. This basically means that the streamers are getting ahead of these platforms are trying to migrate.
So quickly conclude. What do I want to have for your takeaways is that I think the AGCOM and Italian policy makers should revise this platform. IP blocking should never be used because it can cause collateral damage. And FQDN blocking, if it's used, should only used for a tightened period of time for when the abuse happens and not forever. They need to release a release because we need to be able to vet these activities. Innen /SKWR, operator also should be informed if one of their resource gets blocked.
All these hosts are are here, can we do something better without harming the Internet? Can we get to them without harming the Internet?
.
Thanks. This is get presented next week. This is the original of the full study.
If you have questions. (Applause)
.
IGNAS BAGDONAS: Thank you very much. That was interesting, we have questions, it seems. I think in the middle.
AUDIENCE SPEAKER: Hello. The the question is is there somebody fighting with that, maybe some not government organisation or somebody go to the Court or something like this, or people just accept this situation and that's all.
RAFFAELE SOMMESE: The problem that you need to be aware that you are blocked the case of this Portuguese hosting company, they didn't know because there is no public list so, there is no way to know that your resource has been blocked for that reason. Unless you check everything single agrees that you have through the capture like I did. But that's not fees /‑BLG, it's not escapable. So it's really hard for someone to claim basically that there is collateral damage. And I think this study is the first study that really tries to do this.
AUDIENCE SPEAKER: Have you conducted someone of in such an organisation for example XS now who fight with store ship.
RAFFAELE SOMMESE: I hope that someone will react to this.
AUDIENCE SPEAKER: Because I know the situation in sore ship in Russia and Ukraine, and if nobody cares the situation will become only worse.
RAFFAELE SOMMESE: I agree with you and I think this should be picked up. That's the reason why I did with it.
AUDIENCE SPEAKER: Mico Donovan, I notice the MAT Working Group, I think this is probably something that sob similar you'll cast in the Security Working Group. System context, I don't know if you thought about presenting at the Security Working Group as well, but I know the two of them are happening right now. It's pretty impossible to do that but...
RAFFAELE SOMMESE: I know that this is a cross between measurements and security. This was like...
AUDIENCE SPEAKER: I think it's probably relevant to certainly make them aware of this talk on the mailing list at the very least.
RAFFAELE SOMMESE: Definitely.
RUDIGER VOLK: I just escaped from security and kind of thinking about who should be concerned. I would question: Well isn't it something for cooperation?
RAFFAELE SOMMESE: I think this has many places but this ‑‑ it's really a problem that should be solved. Let's put it this way. So, I hope it will get solved somehow.
AUDIENCE SPEAKER: Regarding cooperation, because no cooperation is possible. We as Italian ISPs really did not manage to influence much a whole of the process even if we partially tried. By the way, the Russian regulator, actually was an Italian before blocking my domains or IP addresses but the Italian regulator does not.
NINA BARGISEN: Thank you. All right, any other questions? ? Thank you so much.
(Applause)
Next up we have Remi Hendriks, who is going to talk been empireical evaluation of longitudinal Anycast catchment stability.
He is a Ph.D. student at the University of Twente. His research is focussed on measuring performance and resilience of Anycast deployments with a particular interest in deploying Anycast in under researched regions of the world. He is open to collaborations with Anycast operators that have a global footprint and is currently looking for an internship position. So, think about that, operators.
REMI HENDRIKS: The title empirical evaluation of longitudinal Anycast catchment stability. I'll explain what it means.
So, I would just like to say that this is based on a paper and it's a joint effort between these authors. I am a Ph.D. student. Already mentioned, so let me just get into it.
First, I'll introduce what this complicated title means. Then I will go over the methodology that we used to measure this. I'll share the results and then finally conclusion with the Q and A.
What is Anycast? Anycast is announcing an IP prefix at multiple locations and when somebody connects be to that IP they get routed to the nearest using BGP. And the example usage are content delivery networks, CloudFlare, Akamai, but also the domain name system so that the resolvers, the root letters and authoritatives. In the bottom right for example you see K‑root, Anycast deployment. If you are a client in New Zealand you will likely reach the K‑root site in Aukland, if you are here in Bucharest you'll likely reach a nearby point of presence in Bucharest.
What is a catchment mapping? So each point of presence of an Anycast deployment catches a set of Internet prefixes that routes to it that contains clients. Will and to know which client goes where you need to make these catchment mappings. And one way, or there is multiple ways to do these mappings so you can use passive data, you can see in your passive traffic logs who routes where. You can use RIPE Atlas for example to measure it using a measurement platform. And you can also use ferfploeter to measure examples using Anycast ping basically. So, on this image you have an Anycast deployment which is the red dots, so you have one in the middle of US, one in northwest, and one on the east coast. Will and what you do is you send the out pings with an Anycast source address to ping responsive he hosts on the Internet which are these light blue dots, so you send these pings which are the red arrows, then the host sends back a ping reply to the Anycast destination address which routes to the catching point of presence. Will so for example on the west coast, you see salt lake city, Las Vegas and even if I can IX, their reply goes back to Seattle ‑‑ you see the point. And using this you can inform salt lake city and LAAS‑CNRS vague use are in the catchment of Seattle. Is the advantage of this you can do is actively and on demanded. There is a large number of ping responsive hosts on the Internet.
So this. CA mappings are really useful for Anycast operators that can give you a lot of insights into the performance of your Anycast network, and it also allows you to for example troubleshoot suboptimal routing where a client does not reach a nearby point of presence but it routes to a fairly distant one.
But the problem is that Internet routing is dynamic. And Anycast routing is also dynamic. So you need to repeat these catchment mappings because they are not stable over time. So if you do a catchment mapping today, how long is this valid? Is it still valued tomorrow or next week or next month?
.
So, how did we measure the stability of Anycast catchment over time?
So, we did this daily verfPloeter mappings for six months. As targets we used the USC hit list for IPv4 and for the IPv6 we used the public IPv6 hit list in combination with open Intel AAAA addresses. We wanted to ping a single target for each. This has limitations but this is a consideration because we do not spam the Internet with too many things. Will in total we have three and a half million /24s responsive throughout the entire period and for IPv6 half a million.
So,ing the Anycast deployment that we measure is our own test bed and we deployed in using filter which is 32 locations, they span 20 countries and 6 continents and we perform our daily mappings at midnight We feel this is representative for a small to medium sized Anycast deployment, for larger deployments like CloudFlare, these results might be very different.
So let's go over the results.
This plot shows the catchment distribution. So each colour represents a different Anycast site. So the pink on the top is our site in Frankfurt which has roughly half a million /24s that route to it.
And we plot this longitudinally so you can see how it develops over time. So there is some events visible. So first in July, we see that Tokyo, the pink one loses a lot of traffic and Sydney, the blue one on the top, expands five fold.
Next, in August, we see that soul in South Korea increases by 200,000 and Tokyo is even further. So the main takeaways from this graph is that there is a large difference in catchment sizes. So, you saw that Frankfurt is very large, but the orange one all the way at the top, which is Melbourne is really small. This is due to topological differences, so our site in Frankfurt is close to dash ex which has a good connectivity, but our site in Melbourne does not have a really good Internet Exchange nearby.
And you also see that there is large catchment shifts visible by I showed with the events. And they suggest large rerouting events.
So, let's zoom in a bit more on Tokyo that lost a lot of its traffic. These dots represent networks that were routing to Tokyo on the 1 April, and then we coloured them by where they routed to on the 1 July, six months later.
So let's zoom in on the areas where most prefixes are.
So, here in northeastern China we see that these were routing to Tokyo, then suddenly they shifted to Frankfurt. Here in Vietnam we see that these switch to soul in South Korea.
And to put it into perspective as to the performance impact, the ones that switch to Frankfurt, which were 125 thousand /24s, they saw an increase of 250% in round trip time. And but the one that stretched to Seoul for example, only increased a little bit, which makes sense because Tokyo and Seoul are close to each other. But this just shows why it's important to measure this longitudinally, because it has major impact r performance of your deployment.
So, this plot, these plots show the catchment stability over time. So we compared the number of prefixes that still routed the same Anycast point of presence that we detected on the 1 April. So on the top left you see at the beginning it's a hundred percent of prefixes route to where we measured them to be. Then 75 days later, this drops to 0.86 roughly, so this means that 40% of prefixes no longer route to the same catching point of presence.
On the right, we plot the same for IPv6. So, what are the takeaways?
.
We see that IPv4 mappings remain valid for two months if you accept an error rate of 10%. So if you accept that your catchment mapping falsely ‑‑ has false examples for 10% of your clients. We see that there is a virtual decline with accelerations. So around 60 days I believe it dropped suddenly, and these are large rerouting events.
And we also see that IPv6 is way more stable. So if you look at the escape, it drops to 0.5. So only 50% were still routing to the same point of presence. But I would like to say that the hit list we used has eye Basses and I cannot attribute this to IPv6 or IPv4. It's a poor comparison.
So, this plot shows the same as the previous except we break it down by month. So each coloured line is a separate month. On the right we compare the average of all months for both v4 and v6.
Let me just go for the takeaways for time.
Examples remains stable on average for an entire month. But there are two outliers. So you have the red one and the purple one, July and August, and they quickly deteriorated after the first week. If you think back about the events I showed, these are the events in Tokyo where a lot of traffic shifted.
Again, we see that IPv4 appears more stable, but due to the his list, I'm not confident to say it's because IPv6 is less stable.
So let's conclude.
These attachments we discussed at this sort of analysis helps operators improve their Anycast services, but since routing is dynamic, this analysis needs to be repeated. Our findings show that these examples are stable in the short‑term, so for day‑to‑day operations, we saw that IPv4 is more stable than IPv6, but begin the hit list. And our suggested frequency, which is of course a guideline because results differ a lot per deployment, we advise to do reassess your examples weekly for IPv4 or when you see large rerouting events in passive traffic logs for example.
Future work is to understand why these catchment shifts happen. Is it transits that make a decision or is it changes in Internet Exchange points? So we could look at BGP data or before traceroute to see where on the path this change happens.
.
We also want to investigate if you can use these catchment analysis to detect spoof traffic because if you send spoof traffic to an Anycast deployment, then it will not reach the expected point of presence because it's spoofed. And you can use this to flag it.
That's it. So if there are any questions, I'd be happy to answer them.
(Applause)
AUDIENCE SPEAKER: Wolfgang Tremmel, thanks for the showing that a good Internet Exchange has a good influence. Have you seen similar results with other IXPs over the world?
REMI HENDRIKS: Yes, so there is a few large catchment, there is a few large points of presence with a large catchment, and I think AMS‑IX is also attracting a lot of traffic. One thing that I would like to mention is that we also saw the case with China where a lot of prefixes routed to... and when we looked into it further we saw that it was from China Unicom I think or a large Chinese telecom that had a remote peering link with dataset IX, that's why they were all routing to Frankfurt.
AUDIENCE SPEAKER: Shane Kerr. Sorry if I missed it, so you put these sites into different locations but how is all the peering managed in all of that? Sorry if you already went over that? How is the peering managed? How is your connectivity arranged?
REMI HENDRIKS: This is largely in the control of our vulture on which we deployed it, we did not do any traffic engineering. Do I want to mention that we also run Anycast census and we see that there is more operators that use vulture and they can expect to see the same connectivity.
MASSIMO CANDELA: Okay, so thank you very much for your presentation. And we go to the next presenter, which is Ties from RIPE NCC.
TIES DE KOCK: What I want to show you today is something that started off as a tool that we need internally when working on RIS and when debugging issues in RIS. We realise it's a useful tool for everybody. I won't explain RIS here because everybody knows what it is, but it's important to remember that a route collector project is passive BGP speak. It collects, it collects BGP traffic and it stores that. The collectors what we call RRCs are just an infrastructure artefact because we need to... otherwise it becomes too big to handle, we think, I think, that our users care about getting the relevant information and about being able to process it.
We run the infrastructure for the route collector project which is a collaboration with the community because without the peering sessions, especially the most relevant and interesting ones, we do not have something interest to appeal to you.
The data from RIS has always about available in MRT format. And MRT files contain the raw BGP messages, that's the MRT parser, then pars the attributes that are in there to make sense of what's in the message. RIS and all the route collector projects produce, dump the global table every few hours, and also flow routes the BGP updates. For RIS we have the B views every eight hours and unupdate file per collector every five minutes. So that's with 23 collectors that 276 files per hour.
Then when you want to use these files, I think the intention is actually visible already. There are many ways to do so. I'll show a few and I'll end with what I think is nice.
So, some web tools such as RIPEstat use the route collector data and provide tools based on it. For example, there is upstream visibility where you see the number of RIS players see a router disappear over time. When you look at the second hop from the original, you see that the upstream is changed when the RPKI status changed and that an upstream that wasn't visible in the beginning became the dominant upstream when it was RPKI invalid.
When people use MRT data themselves which is probably what the researchers here are interested in, the work flows vary. The most simple one that I have seen is basically shell based where you download a file, you parse it with a tool and you grab it to get the relevant information. As a software engineer I think why would you do that? When I tried it and tried to get the prefixes, the announcements for K‑root here, it actually worked for me. It wasn't that bad actually. So I understand why people do that.
Now, if you want to write a programme for this, it has a much higher barrier to entry because I need to write the code, for example what I show here. You you need to do an analysis. Such as here I download all the dumps from RIS for the same point of time. I process all these files and all the routes in these files. I look at the AS path and I look at the first upstream of that prefix. So, here I count the first upstream of the VeriSign prefix here, and you can see how that is divided, and I think in their setup, this shows what POP you end up. It might be that deployment, it might be another one that takes it that way.
The work flow I want to show that's already in use and that we recently used in RIPE Labs and is actually also kind of interesting is to parse the data and put it in a database, and we showed that you can just use these previous tools that show dump this output, you go the steps of various values, take a database, click house in my case, pipe in this there, wait a bit, use some disc space and you can query it. And what I show here, and it's nicely on the screen, is the same query I did before, so we look at the first upstream for a prefix and how it's distributed over all the RIS peers that see that prefix, but this time it's for K‑root. So here you say that, you see that if you look at K‑root from RIS, you see mostly Hurricane Electric, then you see CERN, and at the bottom you have SURFnet in the Netherlands.
Then thinking about this, you can see that the tools that are available online that Indexes for you, they limit what you can do with this dataset. And the reason that I said the upstream from the origin is that it's a very inefficient one to Index, it's hard to do, not always available and there are more creative that you want to do.
Then if you looked at the example where I used code to process the MRT files, you see that the files are quite slow to process, like 2 thousand seconds is slightly more than 15 minutes. And in practice, it means that when academics analyse these files they take a selection, they often take a dataset over time, then they take a selection of route collectors for which they take the information. So this crowd from a IPPA, they selected three out of these 23 collectors and by picking these three they implicitly make some selection where I don't understand what their selection criteria is, I think it's mostly it contains some interesting data that they know of and it has a manageable size. And then when I tried to do a more complicated analysis myself I ended up with a, work flow where I parse the file first and he process these in the next script. I take the 20 minutes hit of data extraction. There is something that's manageable for me.
Finally, when I was working with the database like Clickhouse or Postgres, either it was still quite heavy to query, or became very expensive storage wise to store the data there. So I think one set of all the full tables from RIS and route views together took close to 200100 gig bytes of space in a database, once I added the nice Indexes and I love that prefix space Index INET ops the Postgres had because I can select more or less specifics but it becomes router big. So if I want to have data for multiple days, it is not laptop sized anymore with my laptop at least.
So, after this slightly long introduction I have arrived at what we ended up using ourselves and what we're now exposing publicly, which is a set of files that contain the RIS data. We liked this internally but we don't know if it has value for you so we want to publish this for twelve months, see if we see adaption. If nobody uses it, we tried it, if it's usable, let us know, we think it's usable but we need confirmation to keep producing this. As with the other RIS dumps, we have one file with a full table, we do one a day for now, at midnight and we have hourly files with all the updates from one hour, and use a wild card in your query and your database reads all the file from disc. Also because we don't want to make something big unless it's necessary. We have 14 days of data available. And actually I need to say we have 14 days of data available because it's not automated yet, but that will happen next week I hope.
This contains a lot of columns from the BGP measurements, kind of similar to what's in BGP reader output. What's most relevant for you is that the prefixes in there, the ASes paths are in there and the communities are in there. The file for the RIB is about 2.4 gigabytes for one point in time. It's slightly smaller for all the ribs in that point of time f you take data for one hour it's around 400 megabits but it depends on BGP volume and BGP volume is kind of peak‑ish, so it's like 302, 6 megabits for an hour.
We can add more attributes but we want to capture what's relevant and what we needed. What I'm publishing today contains less columns than it used to, but we never used them. But we do want to build something that's based not at the BGP message level where you would have one row which contains many prefixes but which is really based at the update and announcement level, because that's easier to query. Will but it also needs to be correct for academic use, so, I would really love it if the AS path can be a list of integers, but AS‑SETs exist and they are still around, so the AS path is fortunately a list of things. And which we try to build something that's good for queries was also kind of compact.
If I go back to my previous example of the gTLD server prefix where I looked at all the upstreams. This is the sequel query which collects all the announcements for this prefix for the RPKI file that contains the data for all the MRTs at one point in time.
Now, if I do it slightly, I am used to SQL by now so I think this is a very easy query where I Index the array of the AS path. I take the first upstream and I count again, it's the same thing. I see the distribution of the routes to the gTLD server prefix again, and I count how often this AS is the upstream. And at the bottom you can see that when the file is on your disc and you have noise MacBook it takes 1.7 seconds to do this query. Suddenly instead of waiting a long time, you can iterate a lot and you can explore a lot of ideas instead of being fixed with what you do.
Then if you think about how you usually use SQL, you have multiple tables and I do joins, so, when we look at RIS we want to see how many prefixes do we see for each of the Tier 1 providers. That's something you can do with a joining SQL. It's slightly heavier, but you can do more difficult analysis this way. Will and you can also spot that we lack a full table via 701.
So, this was just some SQL, but you can just take this into your data signs or academic work flow where you work with data frames etc., and it really fits in nicely into the data science ecosystem because it's block A based and also all the libraries and the clients are reefficient because it's column another. If you just use the prefix column, it just reads the prefix column. So if you do it over HTTP, it doesn't download 2.4 gigabits, it downloads 12 megabytes. So even a query other HTTP is possible. You can do statistics, you can do aggregations, and if you want to see how noisy BGP is or what prefixes cause noise, you just take an LM and your SQL scales, and you can see that some prefixes are really much more noisy than others. When we had the first prototype of this in February, we also helped to measure operator realise that they had a very noisy Anycast deployment, which they then fixed after we mention a NANOG.
You can also do this in prototype of analysis. You may have seen publications about the cable cards, where we realised that AS adjacency was only visible in the path after the cable was cut. If you look at the files of the updates over time, that point in time you see the change. And you can really explore the data this way, it becomes much more interactive for data analysis for me than when I was grabbing and when I was trying to write a script and waiting 20 minutes.
The first version of these files. I am saying the first version because I think we need to iterate the scheme over time, is available at these URLs. Will
.
As I said, the updates are not automated yet so I am running a manual job every few days now to motivate myself to automate this, but, let us know if you like it because we will prioritise this. We think this is useful.
I also have a few Python work books that use this data. And do some other analysis. You can use this from any tool that uses it.
In summary, I think this is something very useful. I think it's very nice for anybody who works with MRT day to day, but if the column is in there not in there that you need, let us know. We can adjust t we want to know how useful it is for you are and what you think of it. We think it's useful but we don't want to do something that's validating that's not actually useful.
(Applause)
STEPHEN STROWES: I see a question.
AUDIENCE SPEAKER: More a comment. Gert Dˆring. I have been collecting IPv6 BGP data since the ancient ARC days and when they are lying around in text files and I was wondering how to do a proper database out of that and this is giving me lots of material to think about, thank you.
STEPHEN STROWES: Anybody else? I don't see anything in the queue. So, thank you.
(Applause)
.
Our next and final speaker today is William.
WILLEM TOOROP: Today, he is going to be talking to us about which shadowing organisations are really running e‑mail.
WILLEM TOOROP: The goal for this presentation is to better expose the consolidation and centralisation of e‑mail hosting better than research which I will refer to in the next slide. Even more, to showcase the unique position that DNS resolvers have, and their value for insightful data, especially the state of deployment metrics.
And also this for me more apparently I guess to collaborate and have fun.
So the research that I mentioned for which this is a follow‑up research is the research that Tobias did in the context of his masters in security and network engineering of the university of Amsterdam. I supervised this work and he looked into the consolidation and centralisation of e‑mail hosting in European ccTLDs, especially the ccTLDs that he measured were dot CH for Switzerland, dot E, for Estonia, dot FR for France, dot SE for Sweden and dot SK for Slovakia. He also wanted to study the historical trend to see if consolidation was increasing or decreasing, and also look into national sovereignty if consolidation was moving to a broad or not. He used open intel data, to open intel is very valuable database of DNS responses. DNS responses are executed daily to lists of DNS names like its trending err lists, but also what they also used with open Intel are publicly available and also private zoned files for example these ccTLD files. Moreover, if the list is public data, then the Open Intel data will also be public. So, Open Intel actually sends daily X query to the domain names in these lists, so from there they can see what the is ‑‑ what the SMTP servers are receiving e‑mails for those registered names in the TLDs, in the ccTLDs. This is one much his more interesting graphs in the report so the colours are the different countries. And the columns are the different e‑mail hosting providers.
We can see that ‑‑ oh, and the big columns are divided into smaller columns we are the years of the measurements, and so it's clear from his study that some e‑mail hosting providers are increasing in the number of domain names for which they serve e‑mail. Especially outlook.com and google.com. But not all of them. And what's also is obvious from this graph and interesting that the number one e‑mail provider for each country is actually a local provider. And so quoting from his report, in each ccTLD studies the top provider is consistently a local one from the respective country.
And his conclusion he says the degree of centralisation and providers involved are less non‑European than hypothesised. But in future work he says, what this method fails to take into consideration is the actual traffic to these MX servers.
But it would be impossible to actually measure all the traffic to all those SMTP serverses, right? Or not.
Well, as you might know, every trust anchors on the Internet happens or starts with a DNS look up and also for, or sending e‑mail. So, on its picture you see SMTP server sends an e‑mail to someone@ripe.net. So first thing is does, is look up what SMTP server do I need to contact to send this e‑mail to. Only then in step two it will contact that e‑mail server and send it.
Right. So I also want to highlight that this is also not only useful for looking into mail consolidation, or MX deployment, but also in other things like for example the state of the DNSSECC deployment. We also made this observation in a report that we made earlier in looking into the state of DNSSEC deployment that the total number of signed domain names enter name of validator resolvers give a distorted views if widely used domain names are not protected and prop leader resolvers are not validating it. We therefore propose a metrics focussed on the number of transactions protected with DNSSEC. Which is something you can get from resolvers.
So, to illustrate. Here is a DNSSEC deployment web page. So you see that the percentage of names in the root zone is near to a hundred percent if DNSSEC finds of the and growing, and with the second level domains it's currently at 7% and growing, and the validating resolvers is 37 percent measured APNIC, by the way, and growing, and so everything is going great with DNSSEC, right. Or not.
So, the report we say ideally we recommend collecting this metric directly on recursive covers resolvers. Now there is a party that actually did this for DNSSEC deployment. Which is APNIC, so they lended out 1.1.1.1 address to the CloudFlare resolver, because they owned the space and they get a portion of their query data. So they can actually measure how much queries are DNSSEC validated.
You know, so currently the last measurement was 3.5%, or 3.6 percent, but it's not ‑‑ like, it's not growing, it has been more, it went up, it went down, it's more realistic but not as pretty picture of the state of the DNSSEC deployment I guess.
So, unfortunately we at LN lab nets don't have nice kind of space to lend to our DNSSEC resolvers, we don't have this special relationship with CloudFlare. But luckily, there is also the Quad9 resolver, and one of their goals is to help researchers, if the research is for of the benefit and security of the performance of the DNS.
So, I reached out to them and we came to an agreement that I needed to sign, there are a few conditions or limitations, I should say: You are not allowed to share the validator with anyone else. You can use it for research only. You don't combine it with other data to enhance demographics in that data, and don't do anything to anonymise it, deanonymise it.
So this is a very simple, or an example of an entry of the data that we get from Quad9. So all the samples that we get are from a single POP apples, which is POPPed in Amsterdam, we get the domain name that's queried and analysed in this example, the R data, the time, we get not all queries but a sample, 0 out of I don't know to be honest, maybe someone else can tell me that.
So, here are some of the results. If you look at all the responses and not taking into account the TLDs of the query names, then the number 1 is Outlook. And number 2 is Google. Number 3 is Yahoo. Right. So the top three is all American Cloud e‑mail providers, and only then we move into marketing and that kind of things.
Okay, so I also made ‑‑ introduced a new category if the X record overlaps at the second level domain, or the R data with the query data, then I considered this as self e‑mail hosting. So, this is number 2, which is not bad I suppose. So this is the the response data sorted by top source geo IP or target locality, or the TLD of the query name. So they are surprisingly, this POP is in Amsterdam but only 2.7% of the queries is for .nl domains. Will so, comparing with Tobias's results, on the next five slides I have those ccTLDs that Tobias measured. On the left hand side you see Tobias's results by number of registrations, and on the right‑hand side is what we measure by looking at MX responses. So, for Switzerland actually the number one seen in response data is blue net.CH, which is associated with Swisscom, so it's a television subscriptions. But Outlook it a good second one. This is Estonia, also a local one, number one, but Outlook also way more prominent than the number of registered names in the .eu zone. For .FR, Outlook is number one, orange, which is also the television and Internet provider at number two, and only then OVH, which was the big number one for the number of registered domain names.
For Sweden, so this is surprisingly similar to the registered domain names in the .SE domain, except that Outlook One.com seemed to be swamped. Also Outlook most prominent in Sweden. In Slovakia, it's Outlook also is the winner, local ones after that.
So, I'll talk about the measurement methods, and some observations that I had for follow‑up research is that e‑mail servers for a locality like a country may be registered in any TLD and also they are an excellent registration of TLD are not necessarily for parties that are from that respective country.
So, Whois or RDAP or the MX query name might be better, but it might be interesting for research.
So that was the last slide. And I am open for questions or comments.
(Applause)
AUDIENCE SPEAKER: Marco. Unless I missed something big, I don't believe that your methodology is sound because I do not expect that Quad9 sees actually is representative a cross section of queries from mid‑servers about mid‑servers. Just one of the reasons is that all of the big public resolvers are blocked by Spamhaus, so if you can not use them for DNS VLs, probably they will not be used by a mail server.
WILLEM TOOROP: Yes, I see your point. So I am also proposing for other resolvers to be used for this kind of data, for example the new DNS for EU would be a good one.
AUDIENCE SPEAKER: I don't believe that any public resolver will ever be ‑‑ in my experience all decently sized mail servers use local resolvers.
WILLEM TOOROP: Okay.
AUDIENCE SPEAKER: Sebastian. We have been tracking this kind of information for Ireland for three or four years, and what are you seeing on the local zones being kind of a local provider being the most relevant one is a cookie cutter set up of domains. So, they are part domains, they get default names, default MX records and that's it but they are not active domains.
Second question following on the relevancy of resolver data. You can use the authoritative data, so in the Netherlands, the Netherlands we also in Ireland we have most of the traffic from the authoritative side store available on RPK 15, that be be queried for historical reasons and it will give you a sort of sense of oh well these domains are set up and they are actively uselessed and will give you a different picture. So if you want to talk about something...
WILLEM TOOROP: I know but you also have to take into account the ccTLD and the caching properties of the resolvers querying the...
AUDIENCE SPEAKER: There are ways to do it. We can talk about it later.
WILLEM TOOROP: Maybe it should be a combination of resolvers and authoritative data, yeah.
AUDIENCE SPEAKER: Jim Reid, speaking for myself. This is fascinating work and I think you and your colleagues are doing some useful activities here. Although I have to disagree with the methodologies that you are using at the moment. I think they are probably not going to be yield the kind of results you are hoping to get out of them. You look first of all at the issue of DNSSEC I think you have proved a kind that DNSSEC is just a turkey, it's never ever going to fly and the fact that you are only getting 3 or 4% up take in terms of signed domain delegations I think tells its own story. We really have to switch it off. It's never ever going to amount to anything, I am being provocative here. Some of your colleagues mentioned using the DNS as the basis of your metrics. You have got this problem of caching. So, you might only see one will you please for let's say a hotmail MX record and that will last for a day or whatever it happens to be. So that will distort the measurements that you are going to make and how you get around that I think is going to be quite difficult.
WILLEM TOOROP: Not if you look at the queries towards the resolver, because then you don't, then the cashing is not there. Reed read but yes, where is the caching taking place? If someone is talking to the local resolving server, that's never going to be visible to you and that may well be the case in lots of organisations, particularly companies. So they are not out sort sourcing it to the all fours, they are doing it themselves. You are not going to get reasonable data there. Even if you were getting reasonable data, you still have to make allowances for caching and time the live values of those MX records. I think there is an even bigger problem though which is not something in fairness that you can actually tackle, but I think it's something that needs to be looked at is that for some of these mail providers, they are probably using back‑ends Cloud providers to provide the storage, and we saw for example with the problem we had with AWS going offline earlier this week. We may have the situation for some of these mail providers are relying on these big backing titans as it were to provide all the infrastructure, they just provide the front‑end to it. Who is really doing the mail provision then, is it Jim Reid's mailing hosting service or is it the thing that I am relying on it to provide that hosting service in how you manager that I think is going to be really, really hard.
WILLEM TOOROP: Definitely there are some nuances that can be made here. I think that also a possibility for follow‑up research for example by looking into the IP addresses in use by those MX records data.
JIM REID: There was just one other quick point I had to make there. Sorry to wrap things up. Just to say that lots of organisations might have or governments will have the same system to keep all that data and mail hosting inside their borders but many of these providers might need to move this data elsewhere for operational reasons, for example if a data centre catches fire. How you measure that stuff too is also difficult because you are not going to see the internal dynamics about how these internal organisations do their stuff.
NINA BARGISEN: Thank you. It was interesting.
So now, between coffee or whatever you would prefer at this hour, we now have the closing, and for the closing of the Working Group I just have to remind you: Rate the talks please. If you have any additional comments, there is a room in the survey, but you can also write to us the Working Group Chairs, and please join the mailing list. Sometimes we actually do have discussions there.
.
And then I have some service messages from the big conference. Voting has opened for the Programme Committee election. So make sure that you vote by Thursday at 5 p.m or 1700. If you are registered to vote in the NRO NC election, you can vote today, starting at 1700 and voting will close at Friday at 9 a.m. Finally the buses to the networking event tonight will leave the hotel entrance at 20:15 and 20:30. And that was it for today. Thank you very much.
See you next time.
(Applause)
.
(Coffee break)