blank

FOSDEM: open-source, waffles, and beer

2024-02-17T00:00:00+01:00

I have experienced conferences before; I have experienced developer workshops; I have also experienced small meetups; what I never had experienced was FOSDEM, or the way I see it, Comic-Con for open-source people. FOSDEM (Free and Open Source Developers’ European Meeting) is an annual event organized by and for people involved with open-source software in some capacity. The event attracts 1000s of developers from around the world to enable free-flowing discussions on open-source software over a pint of beer or some Belgian waffles. I was privileged enough to be able to travel to Belgium to take part in these discussions. The whole conference offers talks, booths, devrooms, fringe events (which are conferences of their own), and even “after-parties” (post-conference socials) (at this point, I won’t be surprised if there were after-parties for the after-parties).

I planned on attending two fringe events, the main conference, and two post-conference socials, but things don’t always go as planned, do they? I arrived in Belgium on the 29th evening and reached my hotel at 6ish PM after fighting with Vodafone UK for an hour at the train station. Three people from ARC (David, Matthew, and I) planned on attending CHAOSSCon, a conference discussing open-source project health metrics and tools. David and I had the same train, and Matthew was arriving a bit late, which meant David and I had some free time to kill before dinner. As any non-Belgian human in Belgium would do, we started hunting for waffles, which were our first meal in Belgium. We still had some free time on our plate, and to keep ARC’s social hour traditions alive, we spent all of it playing Fussball in the hotel lobby. To end the day, we chatted over “distributed” dinner (found a food court) and some Belgian beer once Matthew arrived, and this recharged me for the next day - CHAOSSCon.

Funky VLC postcards (FOSDEM) and CHAOSSCON posters.

CHAOSSCon, or Community Health Analytics in Open Source Software Conference, was a one-day-long focused event highlighting the importance of metrics and tools used to measure the health of open-source ecosystems, including ways of preventing and fixing damage. The conference brought together the most diverse set of people to discuss open-source under one roof. I met people from big tech like Google and IBM, banks like ING, independent consultants and activists, and people from academia like the CERN’s Open Source Program Office (OSPO) and us, Research Software Engineers from UCL’s ARC. The conference made me realize that open-source requires much more than just developers or programmers to sustain itself in the long run. I absolutely enjoyed Katie McLaughlin’s “Who does the dev? or: How we failed to make a taxonomy of open source contribution.” The talk focused on how non-code contributions are often swept under the rug by popular metric systems (such as the contributor’s dashboard in GitHub) and how that is harmful to open-source ecosystems. Apart from the fantastic topic, Katie’s delivery style was very engaging, and I genuinely enjoy talks that are presented in a unique, non-traditional way. The last unique non-traditional talk I attended was at a FOSSUnited meetup in New Delhi, where a speaker took us through her journey in open-source through the perspective of her pet cat. I equally enjoyed George Link’s “Analyzing Risk from an OSS Project Health Perspective - Tools, Traces, and Threats,” a talk showcasing how industries use health metrics to decide if they want to depend on an open source software. Needless to say, I met some incredible people, but I (and the ARC people) were particularly interested in learning how industries sustain the “giving back” or the working on open-source software culture, both financially and socially. My main highlights from the coffee table discussions came from conversations with Tim Bonnemann from IBM and Giacomo Tenaglia from CERN regarding their organizations’ strategies to promote open-source contributions.

CHAOSSCon group picture.

CHAOSSCon exhausted me. The discussions, talks, and breakout sessions were incredible, but there was no space for rest. Given that the conference had no parallel activities, it felt natural to sit in one room for the entire day and only move out for breakout sessions. This decision to sit in one place throughout the day made me leave the post-conference social early and go straight to my hotel room. To recharge for FOSDEM, I decided to skip the second planned fringe event, the EU Open Source Policy Summit, the next day and instead work on my ARC projects while chilling in my hotel room, but something wasn’t right. I couldn’t just sit in my hotel and work, given that I was in a new country, but I didn’t want to go out and roam during work hours. I found an interesting hack to solve my dilemma: go to another country. Luxembourg, the capital of Luxembourg, a small (but RICH) landlocked European country, is apparently just 3.5 hours (and €10 one way) away from Brussels. Going to Luxembourg meant that the round trip would take 7ish hours, precisely how many hours I work (do you see where I am getting with this). I decided to take my laptop on the train, work, step into a new country for lunch, roam around, get back on the train, and work. Everything went according to the plan, and I saw some of Luxembourg. My short review of the country would be that it is clean, efficient, beautiful, rich, but boring.

Luxembourg.

I also had a side quest for this trip - visiting a Burmese noodle place in Antwerp. You see, Burmese food places are rare (for better perspective: the entirety of Switzerland has just one Burmese restaurant; you can only find 4-5 Burmese places even in big metropolitan cities like London and New Delhi), and my Burmese-Chinese girlfriend (who recommended me this place) would have revolted against me if I skipped this opportunity. Considering that the next two days will be busy, I decided to visit Noung Inle, a traditional Burmese noodle place 20 minutes from Antwerpen-Centraal, on the evening of the 31st. Once back in Brussels, I immediately left for Antwerp, and over the next 3 hours, I had the best Burmese food I have ever had (sorry, Mel). I won’t bore you with the details, so here is a picture of their Shan noodle soup and coconut pudding (along with pictures of Antwerp at night) -

Shan noodle soup, coconut pudding, and Antwerp at night.

I do know some Burmese phrases, but I was too scared to order in Burmese, so I ended up saying bye in Burmese (and leaving them in a state of confusion - they probably remember me as the guy who talked and ordered in English and said bye in Burmese?).

The D-Day was here, the actual FOSDEM conference, and being a very responsible person, I overslept. I had everything planned perfectly, but I missed one parameter: the conference was on a weekend, and my alarms don’t ring on a weekend. After a late start, I took a long and packed bus ride to the venue, heard people talk about software on the entire journey, and got off to witness open-source in all its glory. My first impression? This does not look like a conference. The venue was really spread apart. The conference was distributed in different buildings, and every floor of a building had different booths and devrooms. It took me some time to process all of the information, and a free cookie from Mozilla definitely helped (see below). I went through the entire venue once, made a mental map, and then pulled out the talks I wanted to attend. First up on my list was Tool the Docs devroom. I have always appreciated good documentation. I know how hard it is to write good documentation and how often the skillset of a technical writer is overlooked by developers, so I decided to attend a few talks in the Docs devroom, which included my highlight of the day - “Patterns for maintainer and tech writer collaboration.” I spent most of the remaining first day in the Open Research devroom, but I would occasionally go out and wander around. I roamed the entire venue multiple times on both days, but I am sure I would have missed a few things because there was so much to do and so little time. My highlights from the open-research devroom were “Preserving LHC Analyses with Rivet: A Foundation for Reproducible and Reusable Particle Physics Research,” “How Open-Source Software is Shaping the Future of Healthcare,” and “Research Software, Sustainability, and RSEs.” Besides the talks, the sheer amount of stickers and booths amused me. My favorite booths were CERN OSPO (who gave me a cute alum pin after I told them I worked there briefly), Software Heritage (with whom I had some pleasant conversations), and Weblate (with whom I discussed Scientific Python’s ongoing efforts to translate Python documentation). Unfortunately, there was no Google Summer of Code booth this year, but I met Stephanie Taylor and talked about how the program was a pivotal moment in my programming journey and how I have been part of it for the past 4 years (always good to appreciate people for the work they do)! I had planned to go to 2 “after-parties” - Linux Foundation drinks and Google Summer of Code meetup - but given that they were far from the venue, I had socialized enough, and the ARC team was getting pizzas from a place nearby, I decided to switch my plans and have a pizza.

Mozilla handing out free cookies at FOSDEM and the ARC team with pizzas (picture credits: Christian Gutschow).

The second day was much more relaxed as I already knew my way around. The more conferences I attend, the more I prefer talking to people over attending talks. Moreover, most of the talks I attend are the ones where I want to talk to the speaker after their talk. FOSDEM did not disappoint in terms of networking at all. It is unbelievable how, at FOSDEM, you can bump into strangers with whom you have so much in common. I bumped into Abhishek Dasgupta from Oxford University’s RSE group on my second day. It was funny how I was walking, saw a familiar face, stopped, and asked him if he worked at Oxford. We had a delightful conversation and then parted ways to attend more talks. Subsequently, while aimlessly looking at the booths, I saw someone wearing a CSCS (Swiss National Supercomputing Centre) hoodie, and in the exact same fashion, I asked if they worked at CSCS. It turns out that they were Nur Aiman Fadel, the head of Scientific Computing at CSCS. They recognized my CMS hoodie and pointed out how I was the first person at the conference to acknowledge their CSCS hoodie. It was intriguing how our work had so many intersections (including common colleagues!), but we had never met each other. Apart from talking to people, eating waffles, and collecting more stickers, I spent most of the second day in the HPC, Big Data & Data Science devroom, where my favorite talks were EuroHPC Federated Platform’s infrastructure by Henrik Nortamo and an introduction to HPSF by Gregory Becker. After the talk, we discussed the idea for the UK universities to get access to the platform using Federated identity and Single-Sign-On. Though they were very open to such an idea, I haven’t followed up yet.

FOSDEM did change my perspective on open-source, making it more holistic and inclusive. I will encourage everyone involved or wanting to be involved with open-source software to attend FOSDEM at least once. With the end of FOSDEM, I concluded my Brussels trip (with side quests to Antwerp and Luxembourg), adding two new (13th and 14th) countries to my travel journal. I have now returned back to my normal life and I hope to attend the conference again next year.

Brussels skyline.

Bonus: People here might not know (how will you? I wrote the last blog in 2023), but I have developed a sweet tooth after moving to Europe. The sweets in India are great, but they are a bit too sweet for my liking. On the other hand, European sweets hit the right spot (especially tangy-sweet desserts, like one of my favorites, Tarte au Citron). Given that I was in Belgium, I obviously couldn’t resist having waffles - choco chip waffle, choco stick waffle, vanilla waffle, waffle with syrup, waffle on a stick (?) - you name it and they have it. Don’t tell my mom, but my lunch on the second day of FOSDEM was a bag of waffles (how can I say no to a bag of 7-8 randomly picked waffles being sold for €6.5).

A few of the waffles I devoured.

]]>

Canadian Summer, Type Theory, and Letters to Div: Part 1

2023-11-12T00:00:00+01:00

I have to tell you so much about Canada, but God knew we would be too powerful if we lived our lives on the same continent. The moment I left for Canada, you were preparing to return to India; the moment I landed back in India, you were preparing to leave for Europe; the moment you return, I will be preparing to leave for Europe, and of course, you won’t be here when I return. We meet once a year (twice if the timelines match) to catch up, but how much can we share in a single day, especially when one of us is jet-lagged? I received your last letter via email, and I am so so so proud of you and your adventures across Europe. I have wanted to write about my time in Canada for quite some time now, and what’s better than writing it in the form of a letter addressed to you.

(Let me turn on Brahmastra while I write this - Diwali and I haven’t seen that movie yet.)

I obviously remember where it all started: September 2022, the horrifying REU application season. I was prepared for this application season, as this was my second year applying to such programs, but I still remember the utter stress I was under. Like every nerdy undergrad kid interested in pursuing research, I wanted to apply to international programs. Don’t get me wrong, top Indian institutes like IISc and IITs are excellent at what they do, but funded research opportunities outside of AI and ML in India for undergrads are too few. I know, I know, “AI is the future” - average tech-bro from Twitter, but AI alone does not interest me, and I’ll take this opportunity to explain my research interests to you (knowing that you will forget them the next time we meet).

I am interested in using Computer Science to advance the Natural Sciences. I write code for Science - I want my code running at places like CERN or NASA to study the unknown Universe. I think this interest stems from my school-level interest in Physics and Mathematics, which I could not pursue because of Indian society. I am broadly interested in Scientific Computing and Research Software Engineering. I write code, but I don’t write code for the general public (like fun apps on Play Store or dating sites); instead, I write code for scientists. I am also interested in Formal Methods and Theory of Computation (spoiler, my work in Canada). This side of my interest is more theoretical, with no direct or visible real-world impact, for example, developing proof assistants for Mathematicians or studying finiteness and enumerability of Mathematical Sets (I know you hate Maths). I am also interested in using AI, but not in creating a recommendation system in Netflix or the next ChatGPT; instead, I want to use AI to help scientists, for instance, modeling the solutions of a Partial Differential Equation using Physics-Informed Neural Networks. Finally, I want to make Science and Code accessible to all, so I am passionate about Open-Science and Open-Source. I voluntarily work on and maintain a few Open-Science and Open-Source codebases on the web to keep this passion alive.

Okay, enough weird-looking jargon, and back to the REU season. I applied to ~4-5 proper programs and cold mailed ~5-6 professors. Now that I look back, I should have applied to more programs because I was rejected from almost all the programs (all but one) I applied to. On the other hand, nearly all of my cold emails worked, and I received remote internship offers from UPenn and Cornell. By the way, I maintain a detailed list of the programs I have applied to, been rejected from, skipped, or missed on my website. Anyways, one of the few research programs that I applied to was the Mitacs Globalink Research Internship Program. Quick info about the Mitacs Globalink Research Internship Program -

Select and rank 7 projects and write an application (CV, SOP, QnA, etc.)
Mitacs screens your application and forwards it to individual professors if it is good.
Professors make the final judgment.
Mitacs mails you if you are selected, waitlisted, or rejected.

After submitting my application in September, I waited and waited and waited. I received an interview call from one of the professors in December, and I absolutely crashed the interview; I can’t even describe how bad it was. After waiting for an eternity, on a very random day in February, I received an email saying I was offered an internship position at McMaster University through Mitacs. I was shocked because 1. It had been almost 6 months since I submitted the application, and I had given up at this point 2. The professor made an offer but never contacted me for an interview.

At this point, I was already working remotely under a Professor at UPenn, and I contacted several people (including you) for advice. Taking up the position at McMaster would mean giving up on my UPenn project, which I liked very much. Furthermore, I had never worked on Mathematical proof assistants and functional programming languages before, making the work at McMaster entirely new for me (another point to add in my initial acceptance shock). Ultimately, I accepted the McMaster offer and wrote a big sad email to the professor at UPenn. Who knew that accepting Mitacs’ offer will turn out to be one of the best decisions of my life. I would now work under Prof. Jacques Carette on “Formalising Mathematics and Computing in Agda” at McMaster University, Hamilton!!!

As it turned out, accepting the offer was the easiest part of the timeline because I was headed to the visa hell next. I love how we weirdly bond over our weak passports, like the last thing people bond over is their passport privileges. I received a Schengen Visa rejection from the Italian Embassy last year (PyCon Italia), and I was super worried about this application. Surprisingly, the Canadian visa application was entirely online, and I only had to visit VFS to submit my biometric information. This was already so much better than the Schengen application; at least I did not have to print a book of documents only to realize I had forgotten a printout and look for printing shops near VFS in an absolute panic. I submitted my application and waited for another eternity. They finally decided to process my application after 56 days of submission. My application was approved!! I had to wait ~10 more days to get the visa stamped on my passport. I remember I was at Rajiv Chowk, returning from college with Nikunj, when I received the passport collection mail. I was so freaking happy that I took an immediate U-turn and went back to the New Delhi metro station to take the airport express line to VFS. This moment was also my first happy visit to VFS.

After reaching home, I rushed to my laptop and booked the cheapest (super costly because of late booking) flights. I was going to fly out of India through Lufthansa AC9587 to Munich and then catch the connecting AirCanada AC837 flight to Toronto, after which I had planned to take a bus to Hamilton. Everything was set - visa, flights, insurance, initial accommodation. The insurance and housing had their own problems, but I managed to fix all the loose ends in the end. Now I just had to face my end-semester examinations, pack 2 big bags, and board a flight to Canada!

I will write about my first steps in Canada in the following blog/letter. Let me call you quickly because you are roaming on a ship near the equator with only 100 MB of data on Diwali. So dreamy, scary, and cool!

With love
Saransh

]]>

Logistic regression in Julia (from scratch and with FluxML)

2023-04-01T00:00:00+02:00

My experience working as a Technical Writer with FluxML - Part 2

2022-10-28T00:00:00+02:00

“One thing that open-source can’t get enough of is documentation”

— Anonymous

I have finally completed my official period as a Technical Writer at FluxML, and I have enjoyed every second of it! This blog serves as a summary of the work done during the second half of my time at FluxML. You’d notice that the contributions listed in this blog would cover the entire FluxML ecosystem and not just Flux.jl. Furthermore, I will also list some additional documentation contributions made to the much larger Julia ecosystem during this period!

A quick introduction to FluxML, quoting their documentation -

Flux is a 100% pure-Julia stack and provides lightweight abstractions on top of Julia’s native GPU and AD support. It makes the easy things easy while remaining fully hackable.

Flux is a library for machine learning geared towards high-performance production pipelines. It comes “batteries-included” with many useful tools built in, but also lets you use the full power of the Julia language where you need it.

Honestly, the nature of my work at FluxML diverged so quickly that summarising it would be a difficult task, but I will try my best! Buckle up! Psst, you can find more about the first half of the time spent at FluxML here.

Getting started section

Let’s start from where I left the last blog - the new “Getting Started” section. The good news is that Flux has a better structured “Getting Started” section now! The bad news is that I did not work on it, but the second good news is that it turned out better than the one I had in mind!

Flux had a lot of tutorials and examples, but they were scattered around and very hard to navigate through. The old “Getting Started” section, “Overview” section, and “Basics” section had valuable information for beginners, but the information was scattered among these three sections. Now, all of these sections have been combined (a huge thanks to @mcabbott) in a single “Getting Started” section in Flux’s docs! We still have a Getting Started page on Flux’s website, but it will be either migrated or scrapped very soon!

Have a look at the new “Getting Started” section here.

Tutorials section

The new (and different than the one I imagined) “Getting Started” prompted a discussion on revamping the existing “Tutorials” section present on FluxML’s website. This section is currently heavily outdated and is not updated very regularly, which often misguides or scares newcomers.

We are planning to migrate these tutorials to Flux’s documentation to keep them up-to-date using doctests. The contents I initially planned for the new “Getting Started” section are now planned to go into the revamped “Tutorials” section.

Have a look at the new “Tutorials” section here.

Independent docs for NNlib.jl

NNlib now has standalone docs! NNlib’s documentation used to live as an independent page under Flux’s documentation, which was not helpful, as NNlib is an independent Julia package. This was not convenient for both the users and the developers of NNlib, as one had to constantly refer to the section in Flux’s documentation to get NNlib working.

My work here started with migrating the existing page from Flux’s documentation to NNlib, which was then iterated through helpful reviews! This page was then included in a brand new documentation infrastructure, which was merged way back, but the documentation did not appear live until a few days back. The delay in the deployment of the documentation was caused by a few minor Documenter.jl and DOCUMENTER_KEY related bugs which have been resolved.

Have a look at NNlib’s updated and independent documentation here.

Revamped docs for Metalhead.jl

Metalhead.jl saw a great amount of development this summer, thanks to a GSoC project led by @theabhirath and @darsnack. With all of this development, it was required to revamp the old Publish.jl infrastructure to a modern Documenter.jl infrastructure. This invited numerous discussions from FluxML maintainers, given that both the packages have their ups and downs, but in the end, everyone agreed to migrate Metalhead’s docs to Documenter.jl.

I aimed for a 1:1 port, not adding any information and not removing any information in the porting process, and the new documentation looks beautiful!

Have a look at Metalhead’s revamped documentation here.

Better community health

I have noticed that Julia’s ecosystem can be a bit daunting for new contributors because of the missing “Community Health” files. Most of the repositories under FluxML lacked issue and PR templates, making the review and triage process harder than it should be. FluxML also lacked an organization-wide README, recently introduced by GitHub.

FluxML now has default issues and PR templates, which can be overridden by every repository! The organization-wide README is also in its place, guiding newcomers better than ever!

Have a look at the default issue templates here.

Have a look at the organization-wide README here.

Create a PR in a repository under FluxML to see the default PR template 😉

Better docs for OneHotArrays.jl

Flux.jl maintainers recently decided to move the “One Hot Encoding” functionalities of Flux into a separate independent package. As you might have guessed, this called for some documentation audits!

I have updated and added the docstrings of the OneHot* structs to OneHotArray’s manual, making them accessible to the users. I also debugged and helped to build the missing v0.1.0 documentation of OneHotArrays.jl!

Have a look at OneHotArray’s v0.1.0 documentation here.

Migrating FluxML’s website to Franklin.jl

Franklin.jl is a modern static site generator written in Julia! Quoting their documentation -

Franklin is a simple, customisable static site generator oriented towards technical blogging and light, fast-loading pages.

FluxML maintainers decided to port the existing FluxML website (built using Jekyll) to Franklin, and this was primarily carried out by @logankilpatrick and @darsnack. Where do I come in? I helped set up the infrastructure of the new site! I primarily worked on enabling PR previews using GitHub Actions, something not documented in Franklin’s documentation (and has not been done before). I also fixed some minor bugs revolving around relative paths and production deployment!

Have a look at the migrated website here.

Adding a new section to Franklin.jl’s docs 😉

Referring to the section above, I decided to update Franklin.jl’s documentation to include a section on deploying PR previews using GitHub Actions. This eliminated the need for extra infrastructure and also lead to restructuring the original “deploy” documentation of Franklin. The revamped “deploy” documentation provides much more options to a user and is easier to navigate through.

I also took the liberty to update outdated documentation of Franklin, including outdated GitHub Actions and variable definitions!

Have a look at the revamped deployment page here.

Misc

I cannot possibly think of writing this section without missing out on something. I worked extensively on miscellaneous issues, not limited only to Flux.jl, but also related to FluxML as a whole. These miscellaneous issues ranged from refining existing FluxML’s logos to developing and fixing CI/CD pipelines of FluxML packages.

Additionally, while working with Franklin.jl, I discovered that Julia’s website is open-source and written in Franklin! I took this opportunity to update some outdated Actions and syntax on the website. Currently, I am also looking at migrating Julia’s website to the new GitHub Pages infrastructure, or the new PR preview infrastructure (mentioned in the section above).

Refer to my GitHub for all the “misc” work carried out by me during the past six months 🙂

Minor code contributions 😉

Surprisingly, I also contributed to Flux’s code. The code contributions weren’t in the form of feature additions, instead, I worked on minor bugs and refined the public API.

For instance, I deprecated rng_from_array() in the favor of default_rng_value(), and then marked it as @non_differentiable. These contributions weren’t too big, but adding code to Flux’s repository did feel good. I look forward to writing more code for FluxML in the future!

Final words

I will be forever grateful to the whole FluxML community for giving me such a wonderful time! I do feel like I have accomplished a bit in the last 6 months, and the documentation of FluxML is at a better place than the point when I started.

A special thanks to @DhairyaLGandhi, @ToucheSir, @mcabbott, and @darsnack for bearing with me through my untidy PRs and helping me out as always!

Oh, I also joined FluxML’s GitHub organisation a few weeks back! 🥳

Tweets

Follow me on Twitter :)

Oh also, I joined @FluxML's GitHub org a few weeks back :)

Feels like an achievement
— Saransh Chopra (@saranshchopra7) October 16, 2022

Appendix

PRs and issues

Either I created these PRs/issues, or I had some significant involvement in them 😄

Flux.jl

Metalhead.jl

Migrate docs to Documenter.jl

NNlib.jl

OneHotArrays.jl

fluxml.github.io

Franklin.jl

www.julialang.org

.github

]]>

Linear regression in Julia (from scratch and with FluxML)

2022-08-21T00:00:00+02:00

My experience working as a Technical Writer with FluxML - Part 1

2022-07-27T00:00:00+02:00

“One thing that open-source can’t get enough of is documentation”

— Anonymous

This summer, I started working as a technical writer with FluxML under Julia Season of Contributions, and as expected, this experience was very different from writing code.

During the beginning of the summer, I decided to take up a technical writer’s job that involved writing documentation and tutorials for Machine Learning. At the same time, I was learning Julia, and the FluxML ecosystem sounded like a perfect place for me. I applied for the position through Google Season of Docs but unfortunately couldn’t get in because of limited openings. Fortunately, the Julia Language decided to fund me for the next few months to work on FluxML under Julia Season of Contributions!

In the following blog, I will share my experience and the work I have done so far as a part of Julia Season of Contributions!

Doctests

Flux’s documentation lacked doctests, making its code examples stale after each release. Further, many current examples thought to be covered by doctests, were already outdated.

Some of the doctests were straightforward to add, but most of them require in-depth discussions with the mentors and the community members. In addition, the documentation written in markdown format also required periodic testing to ensure that the changing API does not break the examples.

Flux now has doctests written for every single public API, and most of the markdown example are also covered under these doctests. Some of these changes haven’t been merged yet, but they are under review!

Missing docstrings

Some of Flux’s public API was undocumented, or had a relatively less clear documentation. For example, most of the neural network layers provided by Flux have two constructors — one for initializing layers with pre-defined weights and biases and another for generating weights and biases from a distribution. In most cases, only one constructor was documented, and the other was not.

Other such instance of missing docstrings was Flux‘s manual. Here the docstrings were present in the codebase, but they were not present in the manual. Such cases were solved by adding the docstrings to the manual or creating a new section for such missing docstrings.

In addition to Flux, other packages under FluxML also had a similar problem. Some of these packages like NNLib.jl, Zygote.jl, Optimisers.jl, Functors.jl, and MLUtils.jl (under JuliaML) were also referenced in Flux‘s documentation. The missing docstrings in these packages were transmitted to Flux‘s docs, resulting in even more missing docstrings.

All the missing docstrings have now been added to the respective manuals and functions, including that of other FluxML packages!

Broken documentation

Documentation is incomplete without cross-references, and Julia’s wonderful documentation package makes this super smooth. Flux had these cross-references in place, but some of them lead to nowhere.

Additionally, some of the docstrings were not rendered correctly in the manual, making the newcomers stray off from the ecosystem. These instances of broken documentation have been fixed, and once the PR is merged, users should not see 404 pages or un-rendered docstrings anymore!

CI/CD

The user-facing documentation of Flux is facilitated by a CI/CD service, which keeps this documentation deployed and available to users at all times. It is common for open-source projects to open-source their deployment recipes and work on it collaboratively.

FluxML‘s ecosystem had some minor issues in this CI service. For instance - Zygote.jl had doctests running twice, using twice the CI time and resources. On the other hand, Flux specified different versions of Julia in its CI and make.jl file, making the CI environment very ambiguous.

Further, Documenter.jl allows a user to generate documentation previews for pull requests, and Flux had a bot set up to facilitate this, but it was down for a long time. These documentation previews are collected in the gh-pages branch, which was getting very bulky and had to be cleaned up with an automated workflow.

I have restarted this bot and have added a workflow for cleaning these generated previews periodically! I have also fixed some issues in Optimisers.jl‘s documentation that were causing its CI to fail.

FluxML’s ecosystem

The FluxML ecosystem, in addition to Flux.jl, also had some documentation problems. For instance, Optimisers.jl and Zygote.jl had shortcomings in their CI suite, which have been discussed above. Similarly, Functors.jl and MLUtils.jl had broken documentation, missing docstrings, and an incomplete manual, which has also been discussed in great detail above!

Lastly, the “ecosystem” pages present on Flux‘s website and documentation were out of sync and had redundant information. The “ecosystem” page was completely revamped and updated with the latest advancements related to Machine Learning in Julia!

Minor code changes

Yes! The documentation changes were also accompanied by minor code changes!

Flux‘s public and internal API is very hard to differentiate if you are new to the codebase; hence, newcomers used to find it hard to navigate through this. The API was made clear by removing internal instances from the documentation and prepending an underscore to the internal API.

Furthermore, I also stumbled upon a bug while writing doctests for tversky loss. The tversky loss has two parameters, α, and β, and Flux internally calculates the value of α as 1-β. The loss mathematically is defined as 1-tversky index, and the tversky index mathematically is defined as:

S(P, G; α, β) = P G / ( P G + α P \ G + β G \ P )

where α and β control the magnitude of penalties for FPs and FNs, respectively.

Flux implements the loss in the following way -

1 — sum(|y .* ŷ| + 1) / (sum(y .* ŷ + β*(1 .- y) .* ŷ + (1 — β)*y .* (1 .- ŷ)) + 1)

with the following code -

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
 """
     tversky_loss(ŷ, y; β = 0.7)

 Return the [Tversky loss](https://arxiv.org/abs/1706.05721).
 Used with imbalanced data to give more weight to false negatives.
 Larger β weigh recall more than precision (by placing more emphasis on false negatives)
 Calculated as:
     1 - sum(|y .* ŷ| + 1) / (sum(y .* ŷ + β*(1 .- y) .* ŷ + (1 - β)*y .* (1 .- ŷ)) + 1)
 """
 function tversky_loss(ŷ, y; β = ofeltype(ŷ, 0.7))
     _check_sizes(ŷ, y)
     #TODO add agg
     num = sum(y .* ŷ) + 1
     den = sum(y .* ŷ + β * (1 .- y) .* ŷ + (1 - β) * y .* (1 .- ŷ)) + 1
     1 - num / den
 end

Notice how the term (1 .- y) .* ŷ (False Positives) is multiplied by β, whereas it should be multiplied with α (which is 1-β). Similarly, the term y .* (1 .- ŷ) is multiplied with α (that is 1-β), whereas it should be multiplied with β.

This detail makes the loss function behave in a manner opposite to its documentation. For example -

julia> y = [0, 1, 0, 1, 1, 1];

julia> ŷ_fp = [1, 1, 1, 1, 1, 1]; # 2 false positive -> 2 wrong predictions

julia> ŷ_fnp = [1, 1, 0, 1, 1, 0]; # 1 false negative, 1 false positive -> 2 wrong predictions

julia> Flux.tversky_loss(ŷ_fnp, y)
0.19999999999999996

julia> Flux.tversky_loss(ŷ_fp, y) # should be smaller than tversky_loss(ŷ_fnp, y), as FN is given more weight
0.21875

Here the loss for ŷ_fnp, y should have been larger than the loss for ŷ_fp, y as the loss should give more weight or penalize the False Negatives (default β is 0.7; hence it should give more weight to FN), but the exact opposite happens.

Changing the implementation of the loss yields the following results -

julia> y = [0, 1, 0, 1, 1, 1];

julia> ŷ_fp = [1, 1, 1, 1, 1, 1]; # 2 false positive -> 2 wrong predictions

julia> ŷ_fnp = [1, 1, 0, 1, 1, 0]; # 1 false negative, 1 false positive -> 2 wrong predictions

julia> Flux.tversky_loss(ŷ_fnp, y)
0.19999999999999996

julia> Flux.tversky_loss(ŷ_fp, y) # should be smaller than tversky_loss(ŷ_fnp, y), as FN is given more weight
0.1071428571428571

which looks right!

(This bug has not been confirmed by someone other than me yet, and the fix is still under review.)

Getting started section

Flux has a lot of tutorials and examples, but they are scattered around and are very hard to navigate through. The current “Getting Started” section, “Overview” section, and “Basics” section have valuable information for beginners, but the information is scattered among these three sections.

Additionally, one of these three sections is on Flux’s website, and two are available on the documentation website, making it difficult for newcomers to navigate between these hackable yet basic examples. Instead, these three tutorials could be moved out of their current places and combined under a single section named “Getting Started”, which could then be added to the documentation and linked on the website.

I have started working on this section, and two extensive tutorials have already been added as PRs: one on linear regression (with and without Flux) and one on logistic regression (with and without Flux). These PRs are currently under review and should be merged in the upcoming weeks.

I have had an incredible time contributing to Flux and its neighbor repositories, and I hope to continue these contributions with the same momentum. I have also learned a lot, including that documentation additions require more discussions than code additions.

This work wouldn’t have been possible without my mentor @DhairyaLGandhi and a lot of other FluxML’s maintainers (@ToucheSir, @mcabbott, @CarloLucibello, @darsnack). They have been very patient with my questions and messy PRs 😆

Tweets

Follow me on Twitter :)

May was a tough month for me, but it's finally over!

A life update: I'll be working as a technical writer this summer to improve @FluxML's documentation. Feel free to open a documentation-related issue if you think something in Flux's documentation needs improvement! :) pic.twitter.com/C57FtnAWXh
— Saransh Chopra (@saranshchopra7) June 2, 2022

Appendix

Pull requests

Flux.jl

Issues / discussions

Flux.jl

Zygote.jl

Doctests running twice

Bonus

(Same post, but on medium (I am migrating my blogs from medium to my website))

]]>

“execvp” system call in Python - Everything you need to know!

2022-03-06T00:00:00+01:00

The exec family of the system calls is used to run a command or a code file in a new process. In Linux, this new process is created by replacing the process that made the exec call, but in Windows, things are a bit different. In this blog, we will be covering the execvp system call in Python. We will also be solving an interesting problem using the stuff learned throughout this blog!

Problem statement and the motivation

I was recently assigned an assignment for Operating Systems, which had an interesting question dealing with the execvp system call. Now, there are some implementations for this question available on the internet, but all of them are written in C. In pursuing a pythonic implementation, I went on a journey of exploration, and I will use it here as motivation! We will be applying the theoretical knowledge of execvp to solve this particular question at the end of this blog! The question -

Write a collection of programs p1, p2 and p3 such that they execute sequentially with the same PID and each program should also print its PID. The user should be able to invoke any combination of these programs to achieve the required functionality. For example - Consider three programs twice, half and square which accept only one integer as argument and does some specific operation. These operations may be like -

$ twice 10 prints 20 and some number which is its PID

$ half 10 prints 5 and some number which is its PID

$ square 10 prints 100 and some number which is its PID

Now the user should be able to combine these programs in any combination to achieve the desired result.

For example -

$ twice square half twice half 10

should calculate half(twice(half(square(twice(10))))) and print 200 as result. It should also print the process ids of each program as it executes. Note that the process-id printed by each of these programs should be the same, in this case.

$ square twice 2

should calculate twice(square(2)) and print 8 as result, and the process id of square and twice, which should be the same. The evaluation order is from left to right

Note that the last argument is integer, and the remaining arguments are the programs to be invoked.

This should be generally applicable to any n number of processes, all of which are written by you.

Documentation

Let us start by going through the documentation of execvp -

os.execvp(file, args)

These functions all execute a new program, replacing the current process; they do not return. On Unix, the new executable is loaded into the current process, and will have the same process id as the caller. Errors will be reported as OSError exceptions.

Key takeaways -

All the exec calls are implemented in the os library.
os.execvp takes in 2 arguments, file and args.
The new process is not a child process but rather the new process substitutes the current process.
No return value.
On UNIX systems, the process ID (PID) remains the same as the caller.
Saving you some time, I discovered that the file is supposed to be an executable file and the args are supposed to have the executable file name too. We will look into this in more detail below!

A minimal example

Now that we know some stuff about execvp, let us try using it in our code. The tradition is to start with a “Hello world” program, and we cannot go against the tradition. Further, we will be needing 2 Python files, a caller, and a file that prints “Hello world” -

caller.py

1
2
3
4
 import os

 command = ["python", "hello.py"]
 `os.execvp`(command[0], command)

hello.py

1
 print("Hello world")

Notice how execvp in caller.py takes in the name of an executable file (python) as the first argument and the complete command (including the executable file’s name) as the second argument. Running caller.py gives us “Hello world”, everything as expected!

Internals of execvp and playing with process IDs

According to the documentation, execvp should replace the caller process with the new process instead of creating a subprocess in UNIX systems. This means that the process ID should not change when the system call is made. Let us try this out, but first, let us create 3 different files for 3 different functions — square, half, and double —

1
2
3
4
5
6
7
8
9
10
 import os


 def square(num):
     print("SQUARE PID:", os.getpid(), "| RESULT:", num ** 2)
     return num ** 2


 if __name__ == "__main__":
     square(10)

1
2
3
4
5
6
7
8
9
10
 import os


 def half(num):
     print("HALF PID:", os.getpid(), "| RESULT:", num / 2)
     return num / 2


 if __name__ == "__main__":
     half(10)

1
2
3
4
5
6
7
8
9
10
 import os


 def double(num):
     print("DOUBLE PID:", os.getpid(), "| RESULT:", num * 2)
     return num * 2


 if __name__ == "__main__":
     double(10)

The only new thing in these files is os.getpid(), which would return the process ID of the process running these files. Let us also modify our caller to execute one of these files —

1
2
3
4
5
6
7
 import os
 import sys


 print("CALLER PID:", os.getpid())
 command = [sys.executable, "double.py"]
 `os.execvp`(command[0], command)

We now check the process ID in our caller too. Additionally, to avoid making a mistake in the executable file’s name, we will be using sys.executable now! Executing caller.py in Windows gives us the following output -

CALLER PID: 3156
DOUBLE PID: 20940 | RESULT: 20

The results are definitely weird as the process IDs are not the same. Let us try the same code in WSL (Windows Subsystem for Linux) -

CALLER PID: 34
DOUBLE PID: 34 | RESULT: 20

The results match with the documentation! As the process IDs are the same, the caller process must’ve been replaced with a new process through the execvp system call. Note that we did not have to change the executable file’s name from python to python3 as sys.executable automatically picked it up!

Thus, os.execvp behaves differently on Windows and on UNIX systems. On Windows, it spawns or creates a new process (a child or a subprocess) whereas, on a UNIX system, it replaces the original process with a new process!

Developing a CLI

Now that we know how to use execvp in Python let us move on to our original question! Let us create a file to control everything through the command line —

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
 import os
 import sys
 import argparse
 import subprocess
 from half import half
 from double import double
 from square import square


 if __name__ == "__main__":
     parser = argparse.ArgumentParser()
     parser.add_argument(
         "-l", "--list", nargs="+", help=" Set flag", required=True
     )
     args = parser.parse_args()

     print("CLI PID:", os.getpid())
     print(args.list)
     print()

     init_val = args.list[-1]
     args.list = [file + ".py" for file in args.list[:-1]]
     args.list.append(init_val)

     for i in range(len(args.list) - 1):
         if args.list[i] == "half.py":
             cmd = [sys.executable] + args.list
             `os.execvp`(cmd[0], cmd)
         elif args.list[i] == "double.py":
             cmd = [sys.executable] + args.list
             `os.execvp`(cmd[0], cmd)
         elif args.list[i] == "square.py":
             cmd = [sys.executable] + args.list
             `os.execvp`(cmd[0], cmd)

This file, when executed, accepts a list of CLI arguments which are the function names and a number. The last CLI arg must be a number that has to be processed through various operations. A usage example for this file would be —

python cli.py -l twice square half twice half 10

which should internally translate to —

half(twice(half(square(twice(10)))))

and should print all the results and relevant process IDs.

Remember, os.execvp does not return anything. Hence, once a new python file is called for execution, the flow of control won’t come back to our CLI file. To tackle this we must add additional code in our operation functions which would call the next file without returning back to cli.py.

Modifying the operation files

The code in every operation file should be modified by adding —

1
2
3
4
5
6
7
8
9
10
11
12
 if __name__ == "__main__":

     n = operation(float(sys.argv[-1]))
     sys.argv[-1] = str(n)

     if sys.argv[1] not in ["double.py", "half.py", "square.py"]:
         print()
         print("FINAL PID:", os.getpid(), "| FINAL RESULT:", sys.argv[1])
         sys.exit(0)

     cmd = [sys.executable] + sys.argv[1:]
     `os.execvp`(cmd[0], cmd)

where operation is either double, half, or square.

The code takes in the last CLI argument using sys.argv[-1] and passes it into the relevant operation function. This last argument is then replaced with the obtained result, and all the CLI args except the first one are passed into os.execvp, which calls the next operation file!

In between, we also need to add a condition to exit if we have reached the last argument, which is a number. This last argument would be the final result as we would have processed all the arguments (function names) before that!

Let us modify every operation file —

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
 import os
 import sys


 def double(num):
     print("DOUBLE PID:", os.getpid(), "| RESULT:", num * 2)
     return num * 2


 if __name__ == "__main__":

     n = double(float(sys.argv[-1]))
     sys.argv[-1] = str(n)

     if sys.argv[1] not in ["double.py", "half.py", "square.py"]:
         print()
         print("FINAL PID:", os.getpid(), "| FINAL RESULT:", sys.argv[1])
         sys.exit(0)

     cmd = [sys.executable] + sys.argv[1:]
     `os.execvp`(cmd[0], cmd)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
 import os
 import sys


 def square(num):
     print("SQUARE PID:", os.getpid(), "| RESULT:", num ** 2)
     return num ** 2


 if __name__ == "__main__":

     n = square(float(sys.argv[-1]))
     sys.argv[-1] = str(n)

     if sys.argv[1] not in ["double.py", "half.py", "square.py"]:
         print()
         print("FINAL PID:", os.getpid(), "| FINAL RESULT:", sys.argv[1])
         sys.exit(0)

     cmd = [sys.executable] + sys.argv[1:]
     `os.execvp`(cmd[0], cmd)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
 import os
 import sys


 def half(num):
     print("HALF PID:", os.getpid(), "| RESULT:", num / 2)
     return num / 2


 if __name__ == "__main__":

     n = half(float(sys.argv[-1]))
     sys.argv[-1] = str(n)

     if sys.argv[1] not in ["double.py", "half.py", "square.py"]:
         print()
         print("FINAL PID:", os.getpid(), "| FINAL RESULT:", sys.argv[1])
         sys.exit(0)

     cmd = [sys.executable] + sys.argv[1:]
     `os.execvp`(cmd[0], cmd)

These files will now call each other repeatedly and the whole system would work without returning to a previous file!

Final results

Running the following —

python cli.py -l double square half double half 10

on Windows results in —

CLI PID: 7376
['double', 'square', 'half', 'double', 'half', '10']
DOUBLE PID: 32496 | RESULT: 20.0
SQUARE PID: 15144 | RESULT: 400.0
HALF PID: 4860 | RESULT: 200.0
DOUBLE PID: 24928 | RESULT: 400.0
HALF PID: 27868 | RESULT: 200.0
FINAL PID: 27868 | FINAL RESULT: 200.0

IT WORKS! The program gives us the desired output! Notice how the PIDs are not the same, something that was discussed in great detail above.

Running the following —

python3 cli.py -l double square half double half 10

on Windows Subsystem for Linux results in —

CLI PID: 33
['double', 'square', 'half', 'double', 'half', '10']
DOUBLE PID: 33 | RESULT: 20.0
SQUARE PID: 33 | RESULT: 400.0
HALF PID: 33 | RESULT: 200.0
DOUBLE PID: 33 | RESULT: 400.0
HALF PID: 33 | RESULT: 200.0
FINAL PID: 33 | FINAL RESULT: 200.0

THIS WORKS TOO! The program gives us the desired output again! This time the PIDs stay the same as no subprocesses were created!

Summary

In the above blog, we understood how to use a system call belonging to the exec family of system calls. We further saw how the behavior of execvp differs in Windows and UNIX systems. In the end, we solved a question that had no Pythonic solution on the internet! :)

(Same post, but on medium (I am migrating my blogs from medium to my website))

]]>

Ingesting Reddit memes into Elasticsearch using node.js - Locally and on Google Cloud

2022-01-17T00:00:00+01:00

Covering unit-tests running in sub-processes/threads on GitHub Actions using coverage.py

2021-12-29T00:00:00+01:00

Implementing logistic regression as a neural network from scratch

2021-10-17T00:00:00+02:00