Skip to content

Open Source Transcript

Chaired By:
Marco d'Itri, Marcos Sanz, Sasha Romijn
Session:
Open Source
Date:
Time:
(UTC +0300)
Room:
Side Room
Meetecho chat:
View Chat

RIPE 91.

Open Source Working Group

Wednesday. 22nd October to 25. Side room ‑‑ 9am.

MARCO SANZ: OK, good morning everybody. Welcome to the OpenSource working group session and if you are here you are doing it consciously, waking up at 9am and coming here to the connect working group. My name is Marco Sanz, I am one of the co‑chairs of the working group, the other one is Marco d'itri over here and our dear Sasha who was not able to join in person, so the three of us wish you welcome and a couple of words, first thing we have probably a scribe from RIPE NCC taking notes, thank you very much, sir.

And we have our link, you can join Meetecho, look at the live stream, thank you, our stenographer, and we want to make you aware that we have a code of conduct, there's a RIPE document, you can scan the first QR code if you want to take a look at that and we all invite you. Remember to rate the talks, that's very useful for us indeed, when the RIPE is over, we get those ratings and analyse them and give feedback to the presenters, please do rate the talks.

We send the minutes of the last meeting a link to it to the list, we didn't receive any comment and unless someone objects to that now, then we will consider those minutes approved; which we are doing now.

If you want to take a look at the minutes, it's the second QR code. Yeah, finally let's finalise the agenda, this is our agenda for today. Three interesting presentations, interesting discussions, the is there any other business that we should allocate some time for at the end of the meeting? Because if not then now let's start with the first presentation and I would ask Ondrej Zajicek to come here, he didn't feel in the presenters pro tile in pretalks, normally I would read your profile but I know him because he has been here for very long, I think your first presentation about BIRD at RIPE was like 2013, so you have been at least working 12 years original the project, at least.

ONDREI ZAJICKEK: I am working from 2008 and I think I was on the first ‑‑ my first RIPE was in 2008 in Ireland I think.

MARCO SANZ: Oh wow. The floor is yours.

ONDREI ZAJICKEK: Hello, is this working? Hello. So I would like to speak about Scope Creep in OpenSource because that's something that we like, everybody who are working in many many software tools and also a side developer better think daemon, we have plenty of interaction with this.

What is Scope Creep? A nice definition is just a gradual expansion of project beyond some natural boundaries and adding new features, but some feature fit well, what should the project implement, some doesn't fit well. But you can expand your scope horizontally to get more features on the same level of obstruction or position in like your software or you can expand your scope vertically to have like features of completely different level for example if you have routing daemon, you can expand vertically to have graphic user interface for it or something like that.

Expanded horizontally would be like having a complete different protocol holding, for example.

So what are the causes of Scope Creep? I noticed two basic causes, but probably much more. I restrict to the course specific to OpenSource so there are probably other causes in like commercial software where you have like marketing pressures but from the OpenSource perspective one of the most important reason is users ask for features that are related but not really in scope. Like we use this software and we would also need this which could be in some commonly associated but not and the second reason that sometimes we need to implement in scope features that there are some dependencies for out‑of‑scope functionality and once the out‑of‑scope functionality is here and implemented, why not just use it for other features, for other features anyway. So this kind of chaining of Scope Creep could happen.

So why accept Scope Creep? I would like to talk from the perspective that there already is, that we have thinking of some new feature and there is already existing implementation outside of project, why implement it in inside of project? Well, the most important reason is to have consistent user experience because you know every tool is different and people are ‑‑ you get used to one tool and to also reuse existing code infrastructure because if I know my project implementing something new inside it is probably order of magnitude easier than than just starting completely new project. And also it enables tight integration because having these kind of boundaries between project can bring plenty of issues.

But why reject Scope Creep? If there is already implementation, implement something new just dilutes developers resources and maintenance borders and also people who specialise in something different, they can bring much better product than if it is some kind of border focus for us. As I said, it can expand if there is some kind of slippery slope, if I accept some out‑of‑scope feature, then scope expands more out‑of‑scope features that could be on the boundary so and if some successful project have scope that could lead to centralisation, there's some alternative loosely coupled component so I would do some kind of step outside and use some more examples from my project. So BIRD internet routing daemon, you probably know it, traditional union IX daemon like everybody knows, it has routing protocol and management of sorting tables and it has simple configuration, and simple common interface.

So where the Scope Creep risks in BIRD?

Well there are many accompanying protocols that are not necessary related to routing but you should need it on your router anyway RA, LLDPP, link level... DHCP v6. And some of these we implemented in BIRD, and am of these we talk about and second is some feature that on our mailing list is asked many times and it's like a rejected it from the Scope Creep reason is to have some interface and IP address configuration, like everybody ‑‑ well on service people usually use the default down scripts which are in some kind of horrible and to have something better would be nice, I think that there is some new scripts that are not like not like focused more on the service but in the past they were not here so people would like to expands BIRD also for these kind of features and sometimes it's natural but also when we started to, when we added support for example simple IP configuration, will DHCP clear, what about wireless configuration of wireless interfaces. So yeah it's like opening of can of worms, we didn't really want to enter.

And third point is BIRD configuration management. Like we have simple, it's kind of example of vertical Scope Creep, we have simple configuration file but some people are used on dynamic reconfiguration so it would be like adding new complete functionality on a level that could already be done outside of our project by configure files.

And four, an interesting example is E VPN, it breaks many features and this is an example of the second cause in scope features sometimes require out of scope functionality so more about this. EVPN, I don't know if everybody knows, it's like something like distributed layer two switch, it uses BGP so it's definitely in scope, it brings layer of layer 2 features, could be also used for other but there are some things that is required, very specific set up of kernel bridge and VXLAN interfaces and there are plenty of should, these these interfaces be created by BIRD or should we assume that user use existed interface management software to set up these interfaces to do exact state that we need them. There are arguments that if user expect BIRD not to set up interfaces because it is, because BIRD is not in business of set up configuring interfaces, it would be strange or unexpected if BIRD has to do one exception for this specific interfaces so there are plenty of questions and also we are set ‑‑ set up VLANs on these interfaces and there are multiple like you have to have the configurations of VLAN in BIRD configure file and also for interface part. But if we do not configure it, then user have to configure this issue, these things in two different places. So we finally decided in some kind of habit approach, we do not set up these interfaces but we verify that they are configured in the exact way we need them, report to users and we also configure the VLANs on them because that's something that could be.dynamically changed based on BIRD configuration.

As I mentioned, BIRD is not really full pledged routing stack. It is essentially just one level or one part of full routing stack and like most things in OpenSource, full fledged routing stacks would be loosely coupled collection of components and BIRD is just one of these components.

So let's move from the point of view of integrated, of Scope Creep and getting to have full integrated components and there are advantages of loosely coupled components, they bring maximum flexibility, users can get what they want, users don't need to know for the components, you can select these components that they know or it also eliminates some communication complexity between developers like there is clear separation of concern, every developer can focus on their own project and different could compete but there are some problems with this model. Sometimes it's too much flexibility, user sometimes don't want to choose between four different implementation of the same things because if you need to choose four different implementations, you probably need to test which is better, so you have plenty of work just to do some informed decision which to use. And loosely coupled components brings like inconsistent user experience like every tool is different, configuration files, different inconsistent quality, for example BIRD handles very well a situation where interface disappears and appears on the fly, many unique software do not not handle this cases even in this area so inconsistent coverage of some cases like you can have version and compatibility issues like recently we break our dependency on and we move to functionality and sometimes changed the output of our CLI format and so yeah, plenty of complexity with things like every company have different error recording, you could have some cascading failure, where one component fails and nobody handles that.

And that is completely specific for integration of components, if you have loosely equation of components, then set up all these components together, it's something that requires non‑trivial work and knowledge. Like it essentially becomes the devil of mantle on the new level. So instead of just you using someone who do not much about you are essentially moving to the position of developer choosing, selecting components, putting them together to know the limitations and to know how to connect them. If you have like plenty of tools that have to be connected, that the number of connections or possible connections is critical. So all of these can have different specific information about you need to know. And integration is often left to users so it's as I said, creates between many users would like integration of independent loosely couple he willed components is often to complex task that we people develop tools to just to avoid that. This is such such as Docker. Something that you packed loosely coupled component in a package and distribute is as so yeah.

Counter example, dynamic libraries. Dynamic libraries in somewhat independent component, they are tight coupled by in contrast to regular, in contrast to like to like multiple programmes working together, because of tight system of the language that is used to implement and there are some difference us because if you have two programmes that you have to connect together, you usually need to manually connect them together but if you have programme and dynamic library or dynamic library, then this connection is complete automatic, you just ask from the user point of view ‑‑ if for example running a programme with dynamic libraries would be similar like running multiple components or multiple programme together you would have to start every dynamic library as independently configure it in a config file of your programme and that would be horrible but sometimes in the dynamic library, this works in loosely coupled user space programmes, the integration is much worse. So perhaps my last point make software for like libraries, like define stable interfaces and API versions so we could have some kind of broker that finds the required component and configure the requirement in one final end point and let programmes ask for responsibilities without necessary manual interaction for the user.

So that's probably all I would like to say. Questions?

ONDREJ SURY: I have so many things to say, I will leave it myself, we can discuss it later, I am totally with you on this, my question when we like get asked do you want this? We say who is going to maintain this and there's usually crickets. But the other thing that you say that the product with stable interfaces, it's so complicated in the future development to keep stable interfaces. So my question is: What is the path you are going to choose? Because these are like generally guiding us. Is there anything you are going to try in BIRD project and you can report in three years?

ONDREI ZAJICKEK: Well, we try to have API generators based on the yank models that could be used automatic generator on code like interfaces code. If you, for example, look on VLAN, it has like a very approach to API stability that you can request specific version of each API. And so yeah, it's a question of like code generators for API descriptions.

ONDREJ SURY: You went with Netcof?

ONDREI ZAJICKEK: We are planning to use yank models based on something like Netcof, I don't know if Netcof will be here with see bother format.

ONDREJ SURY: Thanks.

MARCO SANZ: I have a question Ondrej. Did you find a situation where you have such a component which was so independent, so loosely coupled that I am going to open a new project for this because it's not BIRD any more.

ONDREI ZAJICKEK: Yes, I think that one, I did not mention we think that perhaps in the future you should split some features from BIRD like, lines thick CLI and URI to different like ‑‑ so different package component and let it communicate with the functionality with API but it's something like aspirational more than a specific plan.

MARCO SANZ: OK, I see. Good. Can I say a round of applause, thank you very much.

(APPLAUSE.) Our next speaker is Annika and she writes OpenSource software. She's the maintainer of Alice Looking Glass and B3 scale and she has two profiles at Github and Mastodon and I am not going to read out the URLs.

ANNIKA HANNIG: OK. Thank you. Hey everyone. So I should take some water first. So welcome to when to actually rewrite it in rust.

This talk has a bit of a background so last time in Lisbon, there were quite some bickering or some lighthearted comments about this trend in rust to reinvent software, rewrite software for the sake of the rewrite, and I took this a bit personal and so I was thinking yeah while this was going on between the audience and the presenters I was thinking OK, so when do you actually rewrite it in Rust? Do I have a nice example?

And so. First, hi, I am Annie. I write software. I maintain the Alice BGP Looking Glass which I will use as basically the case study for this.

I maintain also B3 scale which is also quite a popular load balancer for big blue button. Yes. And I love writing software. I am a very passionate about programmes languages and OpenSource in general. Yeah, I really love OpenSource, it open an entire world of software development to me when I was really young and I basically stayed with it.

So, when to actually rewrite it in Rust. This is a multi, very multiple layered question, so I think we need to tackle first the "but why" in general. So when we have a rewrite, we first need to take a look at some source code I guess. So we have these often enough historically grown sources. Alice is a project that is going on now for almost 10 years. The BIRD watching component of this Looking Glass that is the component that is communicating with BIRD and ingesting all the CLI interface output and converts this into a structured form and right now JSON, in order to be digested from the Looking Glass. This is a software that has been around for a while and yeah it has I guess rough edges. And I should have maybe added a code warning for regular expressions here. So yeah, I guess this might require some work.

So this was kinds of always my kind of motivation, there is the software part and it has grown over time and I wanted to maybe rewrite this.

But why Rust, why explicitly Rust? I could do the rewrite in Go, it is written in Go, go was performing fine, so far.

I want to motivate this with a reallisation I had earlier this year, that is that the entirety of memory safety is actually one of the least interesting parts of Rust. Rust comes with a good typing system, so there are a lot of I say good, I think it's really great but I mean I am, there are opinions about typed programmes languages and typing in general and so, yeah.

But this typing system has some interesting properties. It helps guide you through the code and even manages to enforce a couple of properties you need to do what they call fearless concurrency, and this is like, yeah, actually having on compile time a check that, for example, a certain data structure cannot be shared between threads without for example wrapping this in like a new text or some kind of other way to facilitate this concurrent data access.

So I guess this is ‑‑ I mean the best evidence I guess is another anecdote. So earlier this year I had ‑‑ so I just covered a new window manager this year. I finally figure out how to run rayon on Nvidia hardware and I thought OK I thought I would try a different window manager, and it's written in Rust and it's really cool but it was like liking a feature, like liking a key binding, for me to switch between directly between monitors and what I did was basically, I added this configuration option to the configuration part of the software, so the parser that's ingesting the configuration and then I just followed the arrow messages and this feature that I built for for example switching between monitors, it worked on the first time when it compiled and I found this quite impressive. So a couple of days later, I also challenged myself to a small like Hackathon day so to speak and I challenged myself to maybe a non‑trivial feature like wouldn't it be great if I could just take a define another output in my configuration which will act as a virtual monitor that I can, for example, then stream over a web RTC in order to have my phone as virtual second screen. One thing about Waylan is yes, it's very nice in components, however a lot of these come '/TORS implement features again and again and again and like these virtual output features is one of those features that I actually saw in action I think with Ganome and I added this in their compositor, however you are using not Ganome ands in one of the draw backs to you need to kind of, yeah, really implement functionality there so long story short and like a couple of hours I was able to wire up a prototype which worked with rendering back‑end and with like yeah, just with a testing rendering back‑end because the software was provided really nice testing facilities and yeah like afternoon I actually had this implementation up and running with basically rendering a virtual output into an off screen buffer and streaming this to my phone. I am not a graphics programmer. I am not ‑‑ I have no prior real experience with like the direct rendering infrastructure and all these things you come in touch with this. However Rust and the properties of this language basically facilitated or basically managed to give me enough information together with an awesome documentation of source to add this non‑trivial feature in a non‑trivial code base with really not much friction and with with quite some confidence actually that it works when it compiles. So this is kind of my take‑away from working with unknown Rust code bases, this language and the compiler and all these things give you, yes, structure and the tools and empowerment to contribute with confidence. And I hope that this will inspire a couple of more people to actually take some random source code in Rust and just dig into it, maybe even contribute to this project.

So back to my original thing about the BIRD watcher. So with all of this and with my me glazing Rust to no end, how well did the rewrite actually go.

So I think we talked about this quite for a long time. I think it was like at least three four years ago when I already started circulating the idea of yeah, I am doing this rewrite and I am coming up with a functionality replacement. And yeah, but why did this take so long? So I dug a bit into my file system and I unearthed a very early form of this in 2020 that I totally forgot about. It was my initial try to implement a parser for the BIRD output for the protocols but this was very much incomplete, the Rust code looked horrible and I was thinking OK, no, I am going to start from scratch.

And this time for real. And this worked actually quite well. So I pretty fast came up with my initial parser for the neighbour an for the routes and then I ran into my first obstacle which was like concurrency because well we are dealing with a lot of routes and to, well injection these routes from the ACL I and convert them very fast to yeah to the structured JSON output, yeah we need to have some kind of parallelisation. And the first intuition I had was OK, I am going to try to outsource this to a library and it looked like this for example Rayon library for Rust seemed to be quite fitting and I maybe need to do some work in converting all, converting the parser for the routes to an async interface because I think the library was requiring async functions there and yeah, this did not work. Maybe it was a skill issue on my side and my Rust skills were just not there yet, but it was very frustrating. I ran into problems and at some point I just decided yeah, OK, I am starting from scratch again and take the parser code, revert all the async stuff I added, abandon this idea and took the. ..tool which gets the CLI data over channels and returns the responses into a collection channel, roughly.

Interestingly this is pretty much like BIRD watcher already did. So yeah, I kind of copied the concept that I was already employing in Go, but at the same time it felt a bit more low level. So, which was nice.

So fast forward to 2024, I managed to build this, it seemed to work and I was like this year, 2025 Lisbon, hey everyone, remember this BIRD watcher replacement I was working on, I am pretty confident it's almost done. Yeah. No.

I have forgotten about all the gritty details which were ‑‑ so I call this the rest the owl, like having a parser that seems to work is only ‑‑ yeah, you need to expose this to actually real world data because I mean even if you have nice data structures an nice parsing and so on, this, yeah, this did not, this was not sufficient.

OK. So. What was still missing? Cache limits, rate limiting, concurrency limits and so on. I think I skipped a slide earlier which was marking my overall strategy for how I tackled this project. So I was thinking OK, let's do this really ‑‑ in really with a small scope, let's not do the Scope Creep, let's focus on the most easy thing like which is the single table route server. I will define the minimal data structure which are required for Alice to work with. So a lot of data that was parsed by BIRD watcher ended up being just thrown away. It was to a degree exposed through the Alice API but within the Alice UI, this data was often just ignored.

I will need to get some feedback later on how much of the extended data might be actually relevant for people to use because in Alice there was this concept of yeah, there is this, when you access the API, you get the structured form that also the UI is ingesting but below you get some more extended raw article . I know of some people that are using scripts that are actually working with some of those fields. Some of those fields have been merged back but yeah. Also I kinds of wanted to get rid of all the Regex, not all the Regex but yeah, keep it to a minimum and try to parse the output using like more state machine approach like keep track where you are in the basically in the route section and then employ specialised parsing functions for for example ingesting all the BGP attributes and so on.

And lastly I was thinking OK, it would be really nice to interface with BIRD directly, I mean the thing is the CLI and BIRD the interface you see pretty much is on the socket, minus control markers.

So yeah. This worked quite well and I then was now confronted with the rest of the owl. So caching, rate limiting and concurrency and so on and I was thinking to myself: Can I vibe code this?

So my first approach was basically taking yeah, let's do this with the rate limiting feature, and I fired up this coding agent and I typed in a prompt, not very unlike this, and I am actually mixing languages and like all of these things because these models don't care. They are just key words there at this stage. So I kind of tried to figure out a way to minimise the amount of fluff and stuff that was with key words so to get this mechanical squirrel to run in the right direction as well through the peanut.

Also planning ahead really works.

So to my surprise this worked. And end of my talk, I am done, I stopped writing software. See you. It was nice.

So yeah, this worked. And it didn't ‑‑ it felt so good and it looks like with Rust, you remember all the properties of this language I told you about, this works, way too well for my taste.

However, the code still has six fingers. So what do I mean with the code has like six fingers, it's like, for example, a situation where you could do a type conversion of a variable and you do the type conversion before the loop because you need to have the converted type within the loop and you will only convert it once. With a code that has a six finger will do the type conversion in the loop. And the thing is I have seen code in the wild like this, not written by LLM so well but yeah. OK, let's do this better, let's do it the right way, this is software that is running on route servers and is interacting with infrastructure that meets careful ‑‑ needs careful work. So I abandoned this approach and yeah did it all my myself. And this took me a couple of days again and isn't it beautiful?

So these are actually a couple of regular expressions I have left. I did not eliminate them all. And I am pretty sure if you look long and closely enough you will also spot like six fingers here which I introduced myself, maybe. And yeah, I actually think this is way more maintainable and way more approachable than the Regex hell I showed you earlier.

Another thing that motivated me in the first place to do the rewrite was well there were some kind of memory management issues with Go, with a memory managed language, and you do not have that much control over garbage collection and this memory management cycle, you have to a degree, you can function in a lot of things and I did a lot of functioning work, and I tried this also with BIRD watcher, however it ran into issues so I was quite ‑‑ it was quite nice to see that the Rust implementation seemed to be really faster on this. So this is the same routing data, this is I mean for the BIRD watcher benchmark, I put this through the BIRD CLI with you but it's the same socket data and the Rust and the ingestion of the routes basically, yeah, took like half the time in Rust, which is nice. Sure I am not making an entirely fair comparison here because remember when I said there are a lot of data that is ingested by the BIRD watcher is not structured and is basically coupled together with maps like dynamic maps so you have key value parse there a lot. And the Rust variant is very much focused on the specific attributes you need and the specific fields you need, so it's a bit unfair. But we were talking about rewrites, so I mean this is like OK, I would hope if I would have rewritten it in Go, I could actually present you a similar slide where I would say yeah, my new Go version is way better and more memory efficient and you should all try it.

Which brings me to what's next.

So BCIX, my friends were already testing this, I want to really thank Alex and Andre for doing this and backing this project also financially, thank you so much for this.

And I kept actually working on the BIRD watcher replacement and I really have to say so like back in Lisbon when I said so the entire thing is practically done, I can here now say multi‑table support is practically done and yeah, there's only like two interfaces, two end points I might need to add, I really need to check this again but yeah and what I really now would be interested in would be your experience with this software, maybe in a lab environment, I was testing this against, for example, the BIRD net lab set‑up where you can fire up a like BIRD, a BIRD testing environment with various configurations, and yeah, this worked quite fine so I'm waiting for the bug reports.

And also don't be afraid to dig into the source code and maybe be inspired to contribute. So. Thank you all.

(APPLAUSE.)

MARCO SANZ: Thank you, Annika. Any questions? From the room?

SPEAKER: Hello, I have one question. I am Ondrej Zack sec, did you consider using ‑‑ (Inaudible)

ANNIKA HANNIG: These are hand crafted.

ONDREI ZAJICKEK: Bespoke.

ANNIKA HANNIG: Once we can have some formal definition of BIRD API, I am pretty sure he can we can do a lot with code generation and so on but the thing is yeah, we were talking about this for almost as long as I am working on the BIRD watcher replacement so it will come. At some point we will not need the BIRD watcher any more and we can do everything with nice well defined APIs which will split our software into components, we can nicely wire together like libraries. OK.

SPEAKER: Hi, thank you very much for the presentation, Annika. I was curious about your thoughts from more of a community perspective with the rewrite in Rust might mean for project, it's my understanding thrust has a bit of a steeper learning curve than most programming language, do you perhaps think it might slightly limit the number of potential contributors to OpenSource project, what kind of effects that might have

ANNIKA HANNIG: Yeah. So of course a steep learning curve. And sometimes these error messages can be a bit well idiosyncratic. So, for example, you need to know what it means that data structure cannot be sent, not going into detail there. But yes, I think there is quite some learning curve but I think the learning curve is manageable, a lot. I mean, it's a lot of people are using this programming language and I mean maybe, maybe this is also ‑‑ I don't know, maybe this is also like a good thing that it maybe has some barrier but I don't know, it's I am sorry, I have a hang up. So the thing is I really think it enables people to contribute with confidence but you need to get to a level first that you actually have that much of a grasp of the language to actually write in this language and contribute but when you contribute then, I can actually can quite confident that yeah, your code will work if you add some kind of union etest and if it come piles, yeah, I don't know if it answers your question, sorry.

SPEAKER: Thank you, I would love to discuss this further.

ONDREJ SURY: Hi, this is Ondrej, part of your presentation was when to rewrite things in Rust, do you have an answer for it after this presentation?

ANNIKA HANNIG: I was waiting for this question, was it actually worth it. For me it was absolutely worth it because I learned a lot of ‑‑ I mean I learned yeah, for me it was quite good learning experience. And this is what already what I was hinting at with like OK, I am not sure if a rewrite in Go for example would also yield like very good performance and memory results and this is why I marked this take it with a grain of salt but yeah, I think it was worth it and I guess time will tell. Time will tell when I hopefully see contributions coming in and this will test this hypothesis that yes, you can contribute with confidence.

SPEAKER: I am hearing there's no clear answer and the answer might be when it makes sense, right? For the project, for you? OK. Thank you.

SPEAKER: I am Marco d'Itri, a developer, I want to stress one of the big issues that I found with Rust, not Rust language but the ecosystem is thrust programmers took a terrible habit of depending on very specific versions of libraries, not having stable libraries, APIs and this makes really, really happy for distributions package Rust software because if we have multiple packs ages which want to depend on multiple versions of the same library, then that's a big problem because Debian security team will kill me if I propose either vendoring too many libraries or duplicating the version of a library, and that's for all I know still an unsolved problem in this ecosystem. Thanks

ANNIKA HANNIG: Yeah, you are totally right. I agree the Rust ecosystem has a bit of this note and dependency problem and I hope that the future will bring better tooling and better ways to actually deal with libraries and so on but yeah, you are absolutely right and this is I think one of the major draw backs of Rust right now at this point.

MARCO SANZ: Any questions from online, remote? No, thank you very much.

(APPLAUSE.) Our last speaker is Ondrej Sury: . He is director of DNS engineering, he might know a thing or two about DNS. He was involved in the knot DNS knot resolver and other. .he wrote a lot of stuff here. He enjoys smart algorithms to solve promising new innovative ways in the RIPE community and research is one of the RIPE of a /TARS, in the DNS community he is one of the DNS working group chairs at the at ICANN he is a key shareholder for the ‑‑ and outside of the RIPE community you may know Ondrej as the developer who maintains ‑‑ in Debian ‑‑ ‑‑

ONDREJ SURY: Hi, I need to make this shorter because I didn't know that Marcos would read it, sorry for suffering through this. Right now I am not going to talk about algorithms anything it's just my pet peeve, so there's one thing you need to understand, by 9 an all project and there was a lot of inertia about it and what we are trying to do is make things go smoother and not spending time on things that we don't need to spend time on. So this is more like about making a complex project release process to go smooth. There's like ever lasting clash between, the software engineering and QA engineer and user is just one person. Software engineers want to release fast and fix bugs and you need to use the nightly version and user want, I want my swears to never change so I don't have to upgrade but you need knew he new features and fix bugs but don't change the software. And QA engineers, they want tests and they are unhappy, their words; not mine.

So what is the release? What is Release Engineering. We take the source and package it, right. So what about documentation? What about packages? And what about containers? What about signing? There's a huge thing as well, if you have a complex team, then you just, it's not one developer that is like a signing key or something, it's more people that have access to the key and there needs to be a little bit of process. And then during the release you also want to test to parse and stuff, this is all part of the release process.

So what if there are multiple releases like in BIND we have multiple trains line 18, 19, 20 and developer versions so we have at least three releases every time and you need to do all of this for all these three. So this is actually the like current check list, it's very detailed, so we usually the click most of them but the release engineer which in our case is our QA engineers, silent screaming into the void.

And then comes the CRA compliance that will require more stuff, even for the OpenSource and now the release engineers are loudly screaming. So does it have to be this complex.

So what is the release thing? We already talked about it but it contains source code documentation and build system on one part, there are some artefacts like source code tar balls, binary packages, release notes, documentation, test results, all of this is part of the release.

So ideally we want reviewed and tested source code, again what does it mean. So well formally we just change the thing. The software engineers are usually like I don't want to do reviews, I want to write the code, new stuff, yeah. I don't want to document this, just read the source, it works, you have to believe me. Nothing can break, that's my favourite phrase.

So before we change a process, there was ‑‑ we had a merge window and one week code freeze when we could continue writing code but couldn't integrate it into the release branches so the code development and it just like stopped the flow. So right now, this is, we changed this last month actually, so there's like a release checkpoint and we branch‑off the release branches got their own release branches and then, but the point is that the release branches like the main branches that we do development should be always stable, always reviewed, always tested, to the extent possible. Always documented, which means always ready for release, if I want to make a release, it should be like I click and there's a release.

Simplified. Well, I was told that release engineers and the QAs are still unhappy but it's getting better. I don't think they will be ever happy. It's their job to be unhappy.

But this is much better than what we had before. So what is testing anyway? It's source code testing, we have unit test, system testing cross version test, interoperability testing, performance testing, user deployment testing, this is the best one, you give the readers the thing and they will report it back, the bugs, but the users are usually not happy about finding the bugs for us.

So we try to prevent this.

But it's not just like the functional testing, it's also for the quality assurance, it's also style and formatting. My recommendation is to just agree on single source code and style and formatting and use tools for this like clang format, clang tidy, Cosinel, for Python, there's black, Pylint, but just make an agreement inside the team and use one style and enforce it, well I am emphasising use tools to help you but they also enforce the style, for shell there's also tools. The go, I know there's go FM T or something, Rust probably has something like this. Just use the tools to you don't have to think about this. Thinking about the style and far formatting takes away the energy you already don't have, it's also documentation, I don't mean like the user documentation, it's commit messages for the poor soul that comes after you, comments in the code is also documentation, the source itself is also documentation as done up in his later programming pointed out the computer programming is an art. These all are art. But not everybody is Davinci, but it's art, it should be at least for you, it should be like ‑‑ you should feel emotions looking at good, it doesn't have to be good emotions but it, you know, looking at a code, your code, your own code, other people's code, it always evokes emotions, right, software engineers, it was like the presentation from Annika, that was like the reason to rewrite, it was emotions, right, in a way because I just like, I saw the Regex and I was like I feel like this would be better written, right? So and so we are all artists as software engineers and other engineers here in this room who doesn't do software, so approach it with passion. It's why my passionate interjection of source code.

So but there's also the documentation for end users, we have a reference manual which is the big book for long winter evenings if you don't have anything else to read and you read manuals, we have a knowledge base that is like snippets of how do I do stuff, and there's manual pages, more local to the tools and there are release notes which you can read to know what to do if you are upgrading and also so why does this not work any more. So you go to release notes and ah, this is the reason.

Then we have like quite big support team at ISC for supporting our customers with help with them with BIND and Kia and they also use reference manuals, they also use knowledge base and they also write it and they also use release notes and there's also something that's called change log, it's more detailed than release notes, so the software engineers can look, did it change? But it's like the border between the release notes which are meant for users and the change log which are more detailed and which are like meant for technical staff, there's a very thin line and I mostly think that most of the stuff that's important should be somehow in the change log but not everything is suitable for this. So for release notes process, which initially I was thinking about the talk here about is like before what we had, well before before we had like ‑‑ horrible, now it's restructure text but before we made the change which we made like a year ago or something, we modified the source code, we modified like the release notes, there was single file per release and we created merge request and fixed the spelling and then we had to rebase the branch, it's not always merged like in order we had to resolve the rebase conflict in the release notes and repeat and repeat until this is fixed so it was really annoying, it was workable but annoying.

But how we changed this, there's a tool called Git change log. Which we customise a little bit for our use case, we modified the source code, we created a merge request and write a solid merge request description in GitLab or your favourite GitLab is going through entrification right now, I have been seeing ads in Gitlab, but still it works.

But the point is like write it in the merge request, the description and title and spell check it and merge it and that's it. And when the release comes, it gets automatically converted into release notes, here some examples, it's tagged like this is a new feature meant for users and as you can see this is from the top one, does it work? No, so the top left is the merge request description and it gets ultimately converted into the release note.

This is like remove feature, again meant for users. This is a bug fix. And it gets converted into own sections into the release notes automatically. Not all is ideal, because it's easy to typo and it gets some manual mashing before the release. But my point is this is the job of the software engineers together with the like QA and the support team to make it good in the very beginning so there's less work in the end.

This is our syntax, so it's mostly what we had before. So there's like action and there's the audience and it just defines where it gets put in the end.

It's not that complicated. And I quite like that it gets automated and it produces really nice release notes for end users that they can read and understand. And then there's security release and this is where everything gets different. So the security releases are the same but different because everything needs to be done in the private. So it's more complicated, you need to have a different repository that's not open to the public, there's only internal testing, it helps to have a small group of outside testers like customers under NDA, thank you, these customers, ICANN named they found ‑‑ like we had to postpone a release buy a week, they will be released today, but it should have been last week, because one of the customers deployed in connection and it was assertion failure and there's shame here but it happens. You can't just cover everything with tests because you don't know the DNSes, wild, wild, wild world. And it will eat you.

So we found the bug and then we had to spin everything again but much worse is the coordinated releases so and this is this one, so we forced the power DNS and MLabs to also postpone the release because of us, I apologise for that but things happen like this.

And the release engineer is just weeping at this point. So sorry, keep the release branches always in good shape. Make the Release Engineering automated. Ideally it should be one click and the magic happens. It takes a lot of time to automate, but in the end it saves time. It should save time, not always the case. But it should save time. For the release notes again use tools, not a manual process. And a thing that is important is also that all parts of the team should work together, software engineers, QA engineers and support, they they should all like be on the same board and not fight. Fixing engineering conflicts is always annoying and error prone.

And I was really fast this time. So.

MARCO SANZ: Thanks awesome, thank you very much. (APPLAUSE.)

Thanks for sharing your wisdom, it was very nice. Any questions from the room?

ONDREI ZAJICKEK: I would have a question about the release branches, like if you have some ‑‑ or the release branch or anything like rebase and just have one final patch in the release branch and put it back through mat master because I would prefer some approach like move the development from main branch to some next branch, do the releasing on the main branch and the very next branch after the real lease, I would have ‑‑ main branch attacks for each release.

ONDREJ SURY: That's a good question but I thought about including this but already did 27 slides for 20 minutes, so I thought like not. But we have some time.

So the way we do the development, we have a main branch at the development and then we have like BIND dash 9 on 20 to and BIND ‑‑ 18, for the most part, you create a ‑‑ ‑‑ ‑‑ you tag it with 9 on 21, 9 on 20 and when you merge the main merge request, there's a job that automatically tries to back port the whole merge request on to the root branches, it's all automated but of course it breaks from time to time, there was some refactor in meanwhile, there's a manual process how to apply the patches.

But it's mostly automatic, thank you our QA, thank you. And it works quite well, so you create one and it gets automatically back ported to the other release branches.

ANNIKA HANNIG: Hi, a quick question about the tooling, so I saw you recommending like code from, employing code from matting tools, for example Go FMT or Rust FMT and I personally like them a lot, they work really well especially forgo and Rust. However I think black is horrible. I try so the thing is I wrote a lot of /PEP paid compliant code by self and I always employed linters for this, but I was once forced on a project to basically pack it through black and it completely destroyed the readability of the source code so my question would be like when would you consider yeah this actually being a benefit over something like actually linters or other tooling.

ONDREJ SURY: So with Python, BIND is written in C. So the Python is just like in the tests, so we use pie test for it so we don't have to push this anywhere. So basically this is like just to help us with not thinking about the stuff.

I totally understand it if this is like a proper Python package or something then it might be more complicated than this. So while the recommendation is use it when it makes sense but use something so you can like off‑load it to the tool, the part of the brain that should think about formatting and it should be, not busy, just off‑load it to some tool, use what suits you but use something that's coordinated among other people and it's the other people is not just the team, it's also other contributors because for the project in C, you can include a .clang format and there's other inferior editors like VI, that can help you just format code. And I am also contributor to, I don't know Lip UV or other projects and I just created a page for open SSL and it was just so painful to get the formatting right without these tools, so as a contributor, don't want to just like clang format and it's done. I don't want to think about how to, what is the like current formatting for the project I am contributing to because there are a lot of like forgo FMT it's easy because there's one style, but for C there was a lot of options where you put the curly brace and stuff like this

ANNIKA HANNIG: Similar the case with Python, this is why I especially hinted on this because in Python you can easily write a compliant code which will will be the not left intact by black and I have the thing that actually found it detrimental to the overall readability of the source code and I am using this tool yeah.

ONDREJ SURY: I understand, I understand. Maybe we can change it now, but maybe there should have been like one style for the whole ecosystem from the very beginning and then it wouldn't be this problem

ANNIKA HANNIG: OK.

SHANE KERR: I am Shane Kerr from ISC. Many years ago I worked at ISC and was on the pie team, at that time we had a stage in the release process, I think I got rid it when I was there, we handed it over to the people running the name servers and said run this on the F root for a couple of days, just to see what happens. And it was a really worthless step for us at the time, we didn't have a structure process and it was see if anything breaks. And I really hated it. And the operations guys hated it too because it made them very nervous and they weren't really sure what to do, so it was bad. But fast forward a few decades and here we are, and I think BIND is a lot more solid than it was then. And do you do anything like that now where you ask internally to do release production things, no matter how good your testing, there's always weird shit in the real word, you had the last step as the users test it but you can also be the user, is that the kind of work you do?

ONDREJ SURY: Well, that's a good question. So the problem part in DNS is the resolver, right. The author of is is like ‑‑ I can do that from my sleep now. But the resolver that's, that's where the demons and dragons are, some of the developers run it on their home networks all the time, that helps a little bit and when there's the security release, we have a couple of good customers that I can't name that are very good at like reporting back with the core dumps and traces, so we can actually ‑‑ so it's not all question marks. And zero addresses and that helps again, that helps a lot. It was actually last year from end of January or something we had a customer who like reported problems in what we thought was solid, I wish there was a better way but what we did and I don't have a timeline here but previously there was like alpha release, beta release and stuff, I don't think anyone ran these any more. So what we did instead was that higher cadency of releases, so we do a release every month if there's stuff because then we can, if there's something that like is just a little bit broken, you don't don't have to wait half a year to get it fixed because you know that it will get fixed in the next release and also it lessons the pressure on the developers because you know if you don't get it, if you don't get your nice new feature into this release, you will get it in the next release and this is just a month between these two points in time. So it takes like, there's less pressure on trying to get the stuff at the last possible time, there's a big release, I need to merge this like in five minutes, you don't, it will be in the next release and it's fine

SPEAKER: Everybody should switch to Arch LINUX and you can get the always have the latest BIND?

ONDREJ SURY: Yeah.

SPEAKER: Part of the open VPN team. Thanks for bringing this, it gives me something to think about. This actually is related to the first talk of today in a good way. Because what we strive for is that every commit that goes into the official tree passes all the tests, it compiles in all the platform and runs all the client server tests we have and still it breaks for users because we have too many. So with all the features creep of to years of software development in that particular package, there's no way to test every combination as hard as we try because then we find new users that have combinations that are not officially supported but they still expect them to continue working. So yes. It's complicated.

ONDREJ SURY: Do you want to hack? !

SPEAKER: Well we don't do DNS so we have it on the easy side.

ONDREJ SURY: We can talk about this later, we have some more stuff that's, more automated like per voice testing of the options and hyper test is also nice, it can generate test cases in some cases so if you want, I'm available for the rest of the week.

SPEAKER: I will grab you in the coffee break.

MARCO SANZ: Any questions from remote Marco? OK, then thanks for making us feel as artists.

(APPLAUSE.)

And with that, we are done. See you all in Edinburgh or in the mailing list. Thank you very much for coming.

(APPLAUSE.)

Coffee break.