How Python is Developed
National Science Foundation
May 11, 2004
A.M. Kuchling
www.amk.ca
amk @ amk.ca
Python will be examined as a detailed example. Other groups
(Linux, Apache, etc.) have similar processes.
Python overview
- Agile programming language
- Designed/implemented by Guido van Rossum
- First implemented around 1991
- Applications: scripts, numeric programming,
Web tasks, GUI applications, teaching...
- Copyrights held by non-profit Python Software Foundation
- License allows commercial use, closing the source.
- Some commercial software has Python inside it
- See www.python.org for more info.
Good software engineering practices
- Use a version control system.
- Write specifications before code.
- Separate systems into independent modules.
- Review code for correctness.
- Provide new developers with mentoring.
- Use tools to track the code.
The code: Structure of core Python
- Language interpreter
- C extension modules
- regular expressions, POSIX interfaces, math functions,
Unicode data, data compression, date/time types, Tk support
- ~132,000 lines of C
- Python modules
- XML parsing, Internet protocols, file reading
- building packages, development environment
- utilities, portability, ...
- ~200,000 lines of Python
The code: Version control
Why version control?
- Makes it easy to roll back changes
... revert to a previous version
... determine the history of a file (down to the last time
a line of code was modified)
... determine what you've changed.
- Helps resolve conflicts when multiple people are editing the same code.
- Provides access control
- Anyone can see the current code
- Making changes is restricted to the developers
Version control needs network capability.
Python uses CVS. Perl uses Perforce; Linux uses Bitkeeper.
Subversion is up-and-coming.
The code: Change notifications
Changes are sent to python-checkins
mailing list
- ~100 readers
- Usually around 10 e-mails per day.
- Occasionally hundreds of e-mails (major reorganizations)
Index: xmlrpclib.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/xmlrpclib.py,v
retrieving revision 1.22
diff -u -r1.22 xmlrpclib.py
--- xmlrpclib.py 1 Nov 2002 17:14:16 -0000 1.22
+++ xmlrpclib.py 9 Jan 2003 18:36:50 -0000
@@ -550,11 +550,12 @@
- def __init__(self, encoding=None):
+ def __init__(self, encoding=None, allow_none=False):
self.memo = {}
self.data = None
self.encoding = encoding
-
+ self.allow_none = allow_none
+
Purposes:
- Provides second level of review.
- Keeps developers aware of which sections of code are changing.
Python reviews after commit. Mozilla reviews before commit.
The code: Rules for committing changes
- Stability is important; don't leave the tree in a broken state.
- If in doubt, record your patch in the SF patch
manager and get it reviewed.
- ... especially if you've just been granted CVS write access.
- Run the test suite before committing.
- If you're fixing a bug, add a test that would have caught it.
Usually, if you check out a copy of the CVS trunk, it will
compile and run just fine.
Some projects have their CVS tree broken or difficult to use for
long stretches (e.g. GNOME).
Source distribution includes 277 test scripts.
The code: Tracking bugs and patches
Planning: The python-dev list
python-dev is the mailing list where the developers of the Python
core congregate.
Currently has ~600 subscribers, but most of them are
lurkers.
~10 people perform the bulk of the work.
~40 people contribute intermittent assistance.
~60-100 offer opinions.
Lists are primary; there are dozens/hundreds of them.
IM not used much in Python, though the PSF directors use it
Often used for assisting users, or for chats between a few individuals.
Large meetings are difficult (time zones, keeping the meeting on track).
Sprints are face-to-face meetings, usu. at conferences, and
often have an agenda. Developers work on a focused set of tasks.
Sprints often happen at conferences; can also be standalone.
Planning: Day-to-day and long-term
van Rossum is the Benevolent Dictator For Life.
- In theory, has final say on all design decisions, and on
whether code is included in the core.
- In practice, he'll often defer the decision to someone else
responsible for the given area.
There's an informal voting process inspired by Apache's voting
scheme:
- +1 indicates that the poster is in favor of the suggestion.
- -1 indicates they're against it.
- +0 indicates ``I don't care, but go ahead''
- -0 means ``I don't care, so why bother?''.
- GvR gets +/- infinity.
Also described in PEP 10
Torvalds is another BDFL. Perl has a BDFL role that rotates.
Special Interest Groups (SIGs)
Current and past SIGs:
- I18N SIG
- Produced Unicode string type, localization tools
- Matrix SIG
- Produced numeric array data type, minor core language changes
- Database SIG (ongoing)
- Web SIG: ongoing...
Python Enhancement Proposals (PEPs)
- PEPs are documents describing a proposed change:
- Documentation
- Design rationale
- Alternative designs, and why they weren't used.
- Modeled on IETF RFCs.
- Available from www.python.org/peps/
- Requiring a PEP imposes some rigidity and contemplation.
Some example PEPS
num title owner
--- ----- -----
I 0 Index of Python Enhancement Proposals Warsaw
I 1 PEP Guidelines Warsaw, Hylton
I 2 Procedure for Adding New Modules Faassen
I 3 Guidelines for Handling Bug Reports Hylton
I 6 Bug Fix Releases Aahz
I 7 Style Guide for C Code GvR
I 8 Style Guide for Python Code GvR, Warsaw
...
SF 201 Lockstep Iteration Warsaw
SF 202 List Comprehensions Warsaw
SF 203 Augmented Assignments Wouters
...
S 302 New Import Hooks JvR
S 303 Extend divmod() for Multiple Divisors Bellman
S 320 Python 2.4 Release Schedule Warsaw, Hettinger
S 327 Decimal Data Type Batista
S 328 Imports: Multi-Line and Absolute/Relative Aahz
For each release, there's a PEP listing the new features and giving
a planned schedule.
- This means a release doesn't drag out endlessly.
- Unplanned patches are still accepted, but the
release won't be delayed for their sake.
Community: Scheduling
- Unscheduled -- "It'll be done when we feel like it"
- By features -- "It's be done when X,Y, and Z are done"
- By date -- "Release every N months; what's done is done"
Forks: Experimenting and adapting
Anyone can take the Python code and make their own version.
Well-known Python forks:
- CPython: the original
- Jython: Python implemented in Java
- IronPython: Python on CLR, implemented in C#
Jython: reimplementation of Python in Java. Uses standard library from CPython. Dana Moore will talk about it later.
IronPython: presented at recent PyCon; CLR implementation whose speed for some things
is better than CPython.
These forks let the community explore new implementation styles. Some forks die (e.g. Vyper),
some find a user community (Stackless),
some become parallel (Jython).
Evolution in action: if CLR takes over the world, Python will adjust to it.
Good software engineering practices
- Use a version control system.
- Write specifications before code.
- PEPs, python-dev discussion
- Separate systems into independent modules.
- core/extension module/library separation
- Review code for correctness.
- open source, python-checkins
- Provide new developers with mentoring.
- python-dev discussion, PEPs describing procedures
- Use tools to help manage development
- mailing list manager, CVS, bug/patch trackers
Concluding points
- Open source follows good software engineering practices
- Open source development is managed
- in some ways, loosely
- distributed team
- changeable schedules
- weak ownership of code
- in other ways, tightly
- good communication
- automated tracking tools
- code reviews
- PEPs
Loosely: you can't make people do things. Tightly: the distributed nature requires tools
for good communication.
SE practices: this is really in self-defense; projects which don't follow them don't survive very long.