How to work on open-source project: "Make small PR and move fast"

2025. 2. 22. 09:42미국박사유학

office
somewhere in bellevue

 

I have wanted to work on open-source project a lot. Open-sourcing projects is obvious trend even in system especially when it comes to cloud-native system, especially LLM-related systems. There are many good blog posts about why working on open-source project is fruitful, so I will skip this part in this post. Instead, this post will focus on "how" part. How to work on open-source projects from the efficiency and productivity perspective. Especially, why you need to make small PR(pull request) and move fast.

 

"Make small PR (pull request) and move fast"

Background

I am working on top of our team's open-source project. The first work item was running some benchmark experiment. We were trying to make a submission, so I decided to try to focus on getting numbers asap. I modified benchmark code, client code (workload generator), and related parts. I ended up making many modifications to the original client code. I didn't make any PR since PR has nothing to do with the submission. And also I was not making changes incrementally. I needed to change different parts in parallel to make it work. This made even harder to make PR because each PR should work. In other word, what will be the title of your PR? It can't be "it is not working but improving" or "making some changes in func A and some in func B". Instead, it should be clear, for example, "Handling failed request in client" or "Added more prints for debugging (completion ratio while running the experiment)"

 

I was doing all in parallel.  After finishing the experiment, now when I want to make PR for my changes, it became too much work. and someone else took it since I needed to do other stuff again. And then your work will not be credited since it is not your PR. Other people outside the team will not know that was actually done by me but just committed by someone else. It is fine since it was not the main feature PR but it is definitely not right.

 

What's the solution? What should I have done?

Fisrt, the works should be broken down into sub-items so that they can be merged into main branch incrementally and separately. It should have been merged as soon as each sub-task was done.

 

why small PR

why do you need to make small PR? Some might think that  PR with more complete code update would be better since it is going to be main branch, right? I thought so too. The main branch is important and will affect all users! However, you still need to make small PR and move fast. 

 

Good things about small PR

- It is going to be less overhead for reviewers. Checking if the codes are good for main branch is important (good: working and nice)

 

- It shows your ability to break down a big change into small reasonable PR. Other contributors and code reviewers will like it and you!

 

- It lets you avoid rebase. Merging is a race! If you move fast, the amount of rebasing main you need to do is less. It saves A LOT OF your times. In other words, if you do slowly and other PRs are merged while you are working on your PR, then you need to rebase them. Earlier, less work

 

- When your mental model is tailored to making small PR frequently, it kinda forces you to think about how to make it compatible with other codes at the time you write the code. This is because now you are already thinking that the your code is going to be part of main branch. It sounds obvious but it is easy to miss. You need to be more conscious about it.

 

All that being said, one simple thing you should do is "Make a PR every day". It will simply make you work in the above style.

 

Examples in other popular system open-source project

As an example, this is Ray repo's commit history of the main branch at a single day (Feb 23rd 2025, Friday). Ray has total 24,000 commits so far. At a single day, there are roughly 20 commits in this example. One commit about every hour. The amount of rebase you need to do is increasing every hour. Even if there is no conflict, you need to rebase. If there is a conflict, oh no...

 

 

vLLM has about the same number of commits for a single day as well.

 

 

 

There are obvisouly overhead of making small PRs

Your PR should pass all CI checks

DCO commit message checking

lint, build, unit test, e2e test

Also, you should squash your commits if needed -> which requires force push. 

etc.

However, this overhead will go down if you get used to the process. And overhead of making PR is less than the overhead of making a big PR.