This repository contains questions and answers for Senior and Lead Python developers. The material is divided into several parts:
- Python Technical Questions
- PostgreSQL Questions
- Common Questions for Senior and Lead Developers
- Release Strategy
- Code Standards
- Cross-discipline Questions
- Mutable and Immutable Objects
- Ways to execute Python code: exec, eval, ast, code, codeop, etc.
- Advanced differences between 2.x and 3.x in general
deepcopy
, methodcopy
, slicing, etc.- OrderedDict, DefaultDict
hashable()
- Strong and weak typing
- Frozenset
- Weak references
- Raw strings
- Unicode and ASCII strings
- Python Statements and Syntax
- Functions in Python
- Scopes in Python
- Modules in Python
- OOP in Python
- SOLID
- The four basics of object-oriented programming
- abstract base class
- getattr(), setattr()
__getattr__
,__setattr__
,__delattr__
__getattribute__
- Name mangling
- @property(getter, setter, deleter)
- init, repr, str, cmp, new , del, hash, nonzero, unicode, class operators
- Rich comparison methods
__call__
- Multiple inheritance
- Classic algorithm
- Diamond problem
- MRO, super
- Mixins
- metaclass definition
- type(), isinstance(), issubclass()
__slots__
- Troubleshooting in Python
- Unit testing in Python
- Memory management in Python
- Threading and multiprocessing in Python
- Distributing and documentation in Python
- Python and C interaction
- Python tools
list, dict, set, bytearray
- int, float, complex, string,
- tuple (the "value" of an immutable object can't change, but its constituent objects can.),
- frozenset [note: immutable version of set],
- bytes
- Python handles mutable and immutable objects differently.
- Immutable are quicker to access than mutable objects.
- Mutable objects are great to use when you need to change the size of the object, example list, dict etc.. Immutables are used when you need to ensure that the object you made will always stay the same.
- Immutable objects are fundamentally expensive to "change", because doing so involves creating a copy. Changing mutable objects is cheap.
Its important for us to know difference between mutable and immutable types and how they are treated when passed onto functions. Memory efficiency is highly affected when the proper objects are used.
For example if a mutable object is called by reference in a function, it can change the original variable itself.
Hence to avoid this, the original variable needs to be copied to another variable. Immutable objects can be called by reference because its value cannot be changed anyways.
The exec(object, globals, locals)
method executes the dynamically created program, which is either a string or a code object. Returns None
. Only side effect matters!
Example 1:
program = 'a = 5\nb=10\nprint("Sum =", a+b)'
exec(program)
Sum = 15
Example 2:
globals_parameter = {'__builtins__' : None}
locals_parameter = {'print': print, 'dir': dir}
exec('print(dir())', globals_parameter, locals_parameter)
['dir', 'print']
The eval(expression, globals=None, locals=None)
method parses the expression passed to this method and runs python expression (code) within the program. Returns the value of expression!
>>> a = 5
>>> eval('37 + a') # it is an expression
42
>>> exec('37 + a') # it is an expression statement; value is ignored (None is returned)
>>> exec('a = 47') # modify a global variable as a side effect
>>> a
47
>>> eval('a = 47') # you cannot evaluate a statement
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
a = 47
^
SyntaxError: invalid syntax
If a code
object (which contains Python bytecode) is passed to exec
or eval
, they behave identically, excepting for the fact that exec ignores the return value, still returning None
always. So it is possible use eval
to execute something that has statements, if you just compiled it into bytecode before instead of passing it as a string:
>>> eval(compile('if 1: print("Hello")', '<string>', 'exec'))
Hello
>>>
Abstract Syntax Trees
, ASTs, are a powerful feature of Python. You can write programs that inspect and modify Python code, after the syntax has been parsed, but before it gets compiled to byte code. That opens up a world of possibilities for introspection, testing, and mischief.
In addition to compiling source code to bytecode, compile
supports compiling abstract syntax trees (parse trees of Python code) into code
objects; and source code into abstract syntax trees (the ast.parse
is written in Python and just calls compile(source, filename, mode, PyCF_ONLY_AST))
; these are used for example for modifying source code on the fly, and also for dynamic code creation, as it is often easier to handle the code as a tree of nodes instead of lines of text in complex cases.
The code
module provides facilities to implement read-eval-print loops in Python. Two classes and convenience functions are included which can be used to build applications which provide an interactive interpreter prompt.
The codeop
module provides utilities upon which the Python read-eval-print loop can be emulated, as is done in the code
module. As a result, you probably don't want to use the module directly; if you want to include such a loop in your program you probably want to use the code module instead.
If we are porting our code or executing python 3.x code in python 2.x, it can be dangerous if integer division changes go unnoticed (since it doesn't raise any error). It is preferred to use the floating value (like 7.0/5 or 7/5.0) to get the expected result when porting our code.
This is the most well-known change. In this, the print keyword in Python 2.x is replaced by the print() function in Python 3.x. However, parentheses work in Python 2 if space is added after the print keyword because the interpreter evaluates it as an expression.
In Python 2, an implicit str type is ASCII. But in Python 3.x implicit str type is Unicode.
xrange() of Python 2.x doesn't exist in Python 3.x. In Python 2.x, range returns a list i.e. range(3) returns [0, 1, 2] while xrange returns a xrange object i. e., xrange(3) returns iterator object which works similar to Java iterator and generates number when needed.
There is a small change in error handling in both versions. In python 3.x, 'as' keyword is required.
The idea of the future module is to help migrate to Python 3.x. If we are planning to have Python 3.x support in our 2.x code, we can use future imports in our code.
Six is a Python 2 and 3 compatibility library. It provides utility functions for smoothing over the differences between the Python versions with the goal of writing Python code that is compatible on both Python versions. See the documentation for more information on what is provided.
The copy()
returns a shallow copy of list and deepcopy()
return a deep copy of list.
Python slice()
function returns a slice object.
A sequence of objects of any type(string
, bytes
, tuple
, list
or range
) or the object which implements __getitem__()
and __len__()
method then this object can be sliced using slice()
method.
An OrderedDict is a dictionary subclass that remembers the order that keys were first inserted. The only difference between dict() and OrderedDict() is that:
OrderedDict
preserves the order in which the keys are inserted. A regular dict doesn't track the insertion order and iterating it gives the values in an arbitrary order. By contrast, the order the items are inserted is remembered by OrderedDict.
Defaultdict
is a container like dictionaries present in the module collections. Defaultdict
is a sub-class of the dictionary class that returns a dictionary-like object. The functionality of both dictionaries and defaultdict are almost same except for the fact that defaultdict never raises a KeyError. It provides a default value for the key that does not exists.
from collections import defaultdict
def def_value():
return "Not Present"
d = defaultdict(def_value)
An object is hashable if it has a hash value that does not change during its entire lifetime. Python has a built-in hash method ( __hash__()
) that can be compared to other objects. For comparing it needs __eq__()
or __cmp__()
method and if the hashable objects are equal then they have the same hash value. All immutable built-in objects in Python are hashable like tuples while the mutable containers like lists and dictionaries are not hashable.
lambda
and user functions are hashable.
Objects hashed using hash()
are irreversible, leading to loss of information.
hash()
returns hashed value only for immutable objects, hence can be used as an indicator to check for mutable/immutable objects.
Python is strongly, dynamically typed.
- Strong typing means that the type of value doesn't change in unexpected ways. A string containing only digits doesn't magically become a number, as may happen in Perl. Every change of type requires an explicit conversion.
- Dynamic typing means that runtime objects (values) have a type, as opposed to static typing where variables have a type.
bob = 1
bob = "bob"
This works because the variable does not have a type; it can name any object. After bob = 1
, you'll find that type(bob)
returns int
, but after bob = "bob"
, it returns str
.
The frozenset()
function returns an immutable frozenset object initialized with elements from the given iterable.
Frozen set is just an immutable version of a Python set
object. While elements of a set can be modified at any time, elements of the frozen set remain the same after creation.
Due to this, frozen sets can be used as keys in Dictionary or as elements of another set. But like sets, it is not ordered (the elements can be set at any index).
Python contains the weakref
module that creates a weak reference to an object. If there are no strong references to an object, the garbage collector is free to use the memory for other purposes.
Weak references are used to implement caches and mappings that contain massive data.
Python raw string is created by prefixing a string literal with 'r' or 'R'. Python raw string treats backslash () as a literal character. This is useful when we want to have a string that contains backslash and don't want it to be treated as an escape character.
Unicode is international standard where a mapping of individual characters and a unique number is maintained. As of May 2019, the most recent version of Unicode is 12.1 which contains over 137k characters including different scripts including English, Hindi, Chinese and Japanese, as well as emojis. These 137k characters are each represented by a unicode code point. So unicode code points refer to actual characters that are displayed. These code points are encoded to bytes and decoded from bytes back to code points. Examples: Unicode code point for alphabet a is U+0061, emoji 🖐 is U+1F590, and for Ω is U+03A9.
The main takeaways in Python are:
- Python 2 uses str type to store bytes and unicode type to store unicode code points. All strings by default are
str
type — which is bytes~ And Default encoding is ASCII. So if an incoming file is Cyrillic characters, Python 2 might fail because ASCII will not be able to handle those Cyrillic Characters. In this case, we need to remember to use decode("utf-8") during reading of files. This is inconvenient. - Python 3 came and fixed this. Strings are still
str
type by default but they now mean unicode code points instead — we carry what we see. If we want to store thesestr
type strings in files we use bytes type instead. Default encoding is UTF-8 instead of ASCII. Perfect!
Technically, in Python, an iterator is an object which implements the iterator protocol, which consist of the methods __iter__()
and __next__()
.
Generators are iterators, a kind of iterable you can only iterate over once. Generators do not store all the values in memory, they generate the values on the fly
yield
is a keyword that is used like return
, except the function will return a generator.
To master yield
, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object, this is a bit tricky.
send()
- sends value to generator, send(None) must be invoked at generator init.
def double_number(number):
while True:
number *= 2
number = yield number
throw()
- throw custom exception. Useful for databases:
def add_to_database(connection_string):
db = mydatabaselibrary.connect(connection_string)
cursor = db.cursor()
try:
while True:
try:
row = yield
cursor.execute('INSERT INTO mytable VALUES(?, ?, ?)', row)
except CommitException:
cursor.execute('COMMIT')
except AbortException:
cursor.execute('ABORT')
finally:
cursor.execute('ABORT')
db.close()
Coroutines declared with the async
/await
syntax is the preferred way of writing asyncio applications. For example, the following snippet of code (requires Python 3.7+) prints "hello", waits 1 second, and then prints "world":
>>> import asyncio
>>> async def main():
... print('hello')
... await asyncio.sleep(1)
... print('world')
>>> asyncio.run(main())
hello
world
Structural pattern matching with match
and case
statements is a powerful feature introduced in Python 3.10. It allows for more elegant and readable code when dealing with complex data structures.
- Pattern matching for sequences, mappings, and objects
- Guards and capture patterns
- Example:
def process_command(command):
match command.split():
case ["go", direction]:
return f"Moving {direction}"
case ["pick", "up", item]:
return f"Picking up {item}"
case ["quit"]:
return "Quitting"
case _:
return "Unknown command"
Exception Groups provide a way to handle multiple exceptions simultaneously, making error handling more robust and flexible.
- Handling multiple exceptions simultaneously
- Exception group hierarchy
except*
syntax for handling exception groups- Example:
try:
raise ExceptionGroup("group", [
ValueError("invalid value"),
TypeError("invalid type")
])
except* ValueError as e:
print(f"Handled ValueError: {e}")
except* TypeError as e:
print(f"Handled TypeError: {e}")
The new type parameter syntax provides a more intuitive way to work with generic types and type parameters.
- Generic type parameters with square brackets
- Type aliases with type parameters
- Example:
type List[T] = list[T]
def first[T](items: list[T]) -> T:
return items[0]
The Per-Interpreter GIL feature allows for better concurrency by providing separate GILs for different interpreters.
- Sub-interpreter support with separate GIL
- Improved concurrency with multiple interpreters
- Example:
import subinterpreter
def run_in_subinterpreter():
with subinterpreter.create() as interp:
interp.run("""
import threading
# Each subinterpreter has its own GIL
""")
Once when program is launched
functools.partial(func, /, *args, **keywords)
Return a new partial object which when called will behave like func called with the positional arguments args and keyword arguments keywords. If more arguments are supplied to the call, they are appended to args. If additional keyword arguments are supplied, they extend and override keywords.
The basic idea is to use a function, but return a partial object of itself if it is called with parameters before being used as a decorator:
from functools import wraps, partial
def decorator(func=None, parameter1=None, parameter2=None):
if not func:
# The only drawback is that for functions there is no thing
# like "self" - we have to rely on the decorator
# function name on the module namespace
return partial(decorator, parameter1=parameter1, parameter2=parameter2)
@wraps(func)
def wrapper(*args, **kwargs):
# Decorator code- parameter1, etc... can be used
# freely here
return func(*args, **kwargs)
return wrapper
And that is it - decorators written using this pattern can decorate a function right away without being "called" first:
@decorator
def my_func():
pass
Or customized with parameters:
@decorator(parameter1="example.com", ...):
def my_func():
pass
import functools
def require_authorization(f):
@functools.wraps(f)
def decorated(user, *args, **kwargs):
if not is_authorized(user):
raise UserIsNotAuthorized
return f(user, *args, **kwargs)
return decorated
@require_authorization
def check_email(user, etc):
# etc.
def require_authorization(action):
def decorate(f):
@functools.wraps(f):
def decorated(user, *args, **kwargs):
if not is_allowed_to(user, action):
raise UserIsNotAuthorized(action, user)
return f(user, *args, **kwargs)
return decorated
return decorate
Preserves original name of the function
- Just use inheritance
- Use decorator, that returns class
def addID(original_class):
orig_init = original_class.__init__
# Make copy of original __init__, so we can call it without recursion
def __init__(self, id, *args, **kws):
self.__id = id
self.getId = getId
orig_init(self, *args, **kws) # Call the original __init__
original_class.__init__ = __init__ # Set the class' __init__ to the new one
return original_class
@addID
class Foo:
pass
- Use metaclass
Indeed, metaclasses are especially useful to do black magic, and therefore complicated stuff. But by themselves, they are simple:
- intercept a class creation
- modify the class
- return the modified class
>>> class Foo(object):
... bar = True
>>> Foo = type('Foo', (), {'bar':True})
class UpperAttrMetaclass(type):
def __new__(cls, clsname, bases, attrs):
uppercase_attrs = {
attr if attr.startswith("__") else attr.upper(): v
for attr, v in attrs.items()
}
return type(clsname, bases, uppercase_attrs)
The main use case for a metaclass is creating an API. A typical example of this is the Django ORM.
- use another variable for this function
- use
partial
- use as parameter
def indirect(func, *args)
- use nested func and return it (functional approach)
eval("func_name()")
-> returns func resultexec("func_name()")
-> returns None- importing module (assuming module foo with method bar):
module = __import__('foo')
func = getattr(module, 'bar')
func()
locals()["myfunction"]()
globals()["myfunction"]()
- dict()
functions = {'myfoo': foo.bar}
mystring = 'myfoo'
if mystring in functions:
functions[mystring]()
Introspection is an ability to determine the type of an object at runtime. Everything in python is an object. Every object in Python may have attributes and methods. By using introspection, we can dynamically examine python objects. Code Introspection is used for examining the classes, methods, objects, modules, keywords and get information about them so that we can utilize it. Introspection reveals useful information about your program's objects.
type()
: This function returns the type of an object.dir()
: This function return list of methods and attributes associated with that object.id()
: This function returns a special id of an object.help()
It is used it to find what other functions dohasattr()
Checks if an object has an attributegetattr()
Returns the contents of an attribute if there are some.repr()
Return string representation of objectcallable()
Checks if an object is a callable object (a function)or not.issubclass()
Checks if a specific class is a derived class of another class.isinstance()
Checks if an objects is an instance of a specific class.sys()
Give access to system specific variables and functions__doc__
Return some documentation about an object__name__
Return the name of the object.
Functional programming is a programming paradigm in which the primary method of computation is evaluation of pure functions. Although Python is not primarily a functional language, it's good to be familiar with lambda
, map()
, filter()
, and reduce()
because they can help you write concise, high-level, parallelizable code. You'll also see them in code that others have written.
list(
map(
(lambda a, b, c: a + b + c),
[1, 2, 3],
[10, 20, 30],
[100, 200, 300]
)
)
list(filter(lambda s: s.isupper(), ["cat", "Cat", "CAT", "dog", "Dog", "DOG", "emu", "Emu", "EMU"]))
reduce(lambda x, y: x + y, [1, 2, 3, 4, 5], 100) # (100 + 1 + 2 + 3 + 4 + 5), 100 is initial value
##Function attributes
def func():
pass
dir(func)
Out[3]:
['__annotations__',
'__call__',
...
'__str__',
'__subclasshook__']
func.a = 1
dir(func)
Out[5]:
['__annotations__',
'__call__',
...
'__str__',
'__subclasshook__',
'a']
print(func.__dict__)
{'a': 1}
func.__getattribute__("a")
Out[7]: 1
Python resolves names using the so-called LEGB rule, which is named after the Python scope for names. The letters in LEGB stand for Local, Enclosing, Global, and Built-in. Here's a quick overview of what these terms mean:
-
Local (or function) scope is the code block or body of any Python function or lambda expression. This Python scope contains the names that you define inside the function. These names will only be visible from the code of the function. It's created at function call, not at function definition, so you'll have as many different local scopes as function calls. This is true even if you call the same function multiple times, or recursively. Each call will result in a new local scope being created.
-
Enclosing (or nonlocal) scope is a special scope that only exists for nested functions. If the local scope is an inner or nested function, then the enclosing scope is the scope of the outer or enclosing function. This scope contains the names that you define in the enclosing function. The names in the enclosing scope are visible from the code of the inner and enclosing functions.
-
Global (or module) scope is the top-most scope in a Python program, script, or module. This Python scope contains all of the names that you define at the top level of a program or a module. Names in this Python scope are visible from everywhere in your code.
dir()
-
Built-in scope is a special Python scope that's created or loaded whenever you run a script or open an interactive session. This scope contains names such as keywords, functions, exceptions, and other attributes that are built into Python. Names in this Python scope are also available from everywhere in your code. It's automatically loaded by Python when you run a program or script.
dir(__builtins__)
: 152 names
The LEGB rule is a kind of name lookup procedure, which determines the order in which Python looks up names. For example, if you reference a given name, then Python will look that name up sequentially in the local, enclosing, global, and built-in scope. If the name exists, then you'll get the first occurrence of it. Otherwise, you'll get an error.
When you call dir()
with no arguments, you get the list of names available in your main global Python scope. Note that if you assign a new name (like var here) at the top level of the module (which is __main__
here), then that name will be added to the list returned by dir()
.
The statement consists of the global keyword followed by one or more names separated by commas. You can also use multiple global statements with a name (or a list of names). All the names that you list in a global statement will be mapped to the global or module scope in which you define them.
Similarly to global names, nonlocal names can be accessed from inner functions, but not assigned or updated. If you want to modify them, then you need to use a nonlocal statement. With a nonlocal statement, you can define a list of names that are going to be treated as nonlocal.
The nonlocal statement consists of the nonlocal keyword followed by one or more names separated by commas. These names will refer to the same names in the enclosing Python scope.
This technique by which some data (hello in this case) gets attached to the code is called closure in Python.
def print_msg(msg):
# This is the outer enclosing function
def printer():
# This is the nested function
print(msg)
return printer # returns the nested function
# Now let's try calling this function.
another = print_msg("Hello")
another()
# Output: Hello
The criteria that must be met to create closure in Python are summarized in the following points.
- We must have a nested function (function inside a function).
- The nested function must refer to a value defined in the enclosing function.
- The enclosing function must return the nested function.
Python Decorators make an extensive use of closures as well.
globals()
always returns the dictionary of the module namespacelocals()
always returns a dictionary of the current namespacevars()
returns either a dictionary of the current namespace (if called with no argument) or the dictionary of the argument.
It does not automatically update when variables are assigned, and assigning entries in the dict will not assign the corresponding local variables.
Reload a previously imported module. The argument must be a module object, so it must have been successfully imported before. This is useful if you have edited the module source file using an external editor and want to try out the new version without leaving the Python interpreter. The return value is the module object (which can be different if re-importing causes a different object to be placed in sys.modules).
from importlib import reload # Python 3.4+
import foo
while True:
# Do some things.
if is_changed(foo):
foo = reload(foo)
In software engineering, SOLID is a mnemonic acronym for five design principles intended to make software designs more understandable, flexible, and maintainable. The principles are a subset of many principles promoted by American software engineer and instructor Robert C. Martin, first introduced in his 2000 paper Design Principles and Design Patterns.
The SOLID ideas are
- The single-responsibility principle: "There should never be more than one reason for a class to change." In other words, every class should have only one responsibility.
- The open–closed principle: "Software entities ... should be open for extension, but closed for modification."
- The Liskov substitution principle: "Functions that use pointers or references to base classes must be able to use objects of derived classes without knowing it." See also design by contract.
- The interface segregation principle: "Many client-specific interfaces are better than one general-purpose interface."
- The dependency inversion principle: "Depend upon abstractions, not concretions."
The SOLID acronym was introduced later, around 2004, by Michael Feathers.
-
Encapsulation - binding the data and functions which operate on that data into a single unit, the class
-
Abstraction - treating a system as a "black box," where it's not important to understand the gory inner workings in order to reap the benefits of using it.
-
Inheritance - if a class inherits from another class, it automatically obtains a lot of the same functionality and properties from that class and can be extended to contain separate code and data. A nice feature of inheritance is that it often leads to good code reuse since a parent class' functions don't need to be re-defined in any of its child classes.
-
Polymorphism - Because derived objects share the same interface as their parents, the calling code can call any function in that class' interface. At run-time, the appropriate function will be called depending on the type of object passed leading to possibly different behaviors.
They make sure that derived classes implement methods and properties dictated in the abstract base class. Abstract base classes separate the interface from the implementation. They define generic methods and properties that must be used in subclasses. Implementation is handled by the concrete subclasses where we can create objects that can handle tasks. They help to avoid bugs and make the class hierarchies easier to maintain by providing a strict recipe to follow for creating subclasses.
from abc import ABCMeta, abstractmethod
class AbstactClassCSV(metaclass = ABCMeta): # or just inherits from ABC, helper class
def __init__(self, path, file_name):
self._path = path
self._file_name = file_name
@property
@abstractmethod
def path(self):
pass
hasattr(object, name)
function:
Determines whether an object has a name attribute or a name method, returns a bool value, returns True with a name attribute, or returns False.
getattr(object, name[,default])
function:
Gets the property or method of the object, prints it if it exists, or prints the default value if it does not exist, which is optional.
setattr(object, name, values)
function:
Assign a value to an object's property. If the property does not exist, create it before assigning it.
>>> # this example uses __setattr__ to dynamically change attribute value to uppercase
>>> class Frob:
... def __setattr__(self, name, value):
... self.__dict__[name] = value.upper()
...
>>> f = Frob()
>>> f.bamf = "bamf"
>>> f.bamf
'BAMF'
Note that if the attribute is found through the normal mechanism, __getattr__()
is not called. (This is an intentional asymmetry between __getattr__()
and __setattr__()
.) This is done both for efficiency reasons and because otherwise __getattr__()
would have no way to access other attributes of the instance.
>>> class Frob:
... def __init__(self, bamf):
... self.bamf = bamf
... def __getattr__(self, name):
... return 'Frob does not have `{}` attribute.'.format(str(name))
...
>>> f = Frob("bamf")
>>> f.bar
'Frob does not have `bar` attribute.'
>>> f.bamf
'bamf'
If the class also defines __getattr__()
, the latter will not be called unless __getattribute__()
either calls it explicitly or raises an AttributeError.
>>> class Frob(object):
... def __getattribute__(self, name):
... print "getting `{}`".format(str(name))
... object.__getattribute__(self, name)
...
>>> f = Frob()
>>> f.bamf = 10
>>> f.bamf
getting `bamf`
In name mangling process any identifier with two leading underscore and one trailing underscore is textually replaced with _classname__identifier
where classname is the name of the current class. It means that any identifier of the form __geek
(at least two leading underscores or at most one trailing underscore) is replaced with _classname__geek
, where classname is the current class name with leading underscore(s) stripped.
class Student:
def __init__(self, name):
self.__name = name
s1 = Student("Santhosh")
print(s1._Student__name)
class Person:
def __init__(self, name):
self._name = name
@property
def name(self):
print('Getting name')
return self._name
@name.setter
def name(self, value):
print('Setting name to ' + value)
self._name = value
@name.deleter
def name(self):
print('Deleting name')
del self._name
p = Person('Adam')
print('The name is:', p.name)
p.name = 'John'
del p.name
__init__
The task of constructors is to initialize(assign values) to the data members of the class when an object of class is created.repr()
The repr() function returns a printable representation of the given object.- The
__str__
method in Python represents the class objects as a string – it can be used for classes. The str method should be defined in a way that is easy to read and outputs all the members of the class. This method is also used as a debugging tool when the members of a class need to be checked. __cmp__
is no longer used.__mew__
Whenever a class is instantiated__new__
and__init__
methods are called.__new__
method will be called when an object is created and__init__
method will be called to initialize the object.
class A(object):
def __new__(cls):
print("Creating instance")
return super(A, cls).__new__(cls)
def __init__(self):
print("Init is called")
Output:
-
Creating instance
-
Init is called
-
__del__
The__del__()
method is a known as a destructor method in Python. It is called when all references to the object have been deleted i.e when an object is garbage collected. Note : A reference to objects is also deleted when the object goes out of reference or when the program ends -
__hash__()
class A(object):
def __init__(self, a, b, c):
self._a = a
self._b = b
self._c = c
def __eq__(self, othr):
return (isinstance(othr, type(self))
and (self._a, self._b, self._c) ==
(othr._a, othr._b, othr._c))
def __hash__(self):
return hash((self._a, self._b, self._c))
__lt__
, __gt__
, __le__
, __ge__
, __eq__
, and __ne__
def __lt__(self, other):
...
def __le__(self, other):
...
def __gt__(self, other):
...
def __ge__(self, other):
...
def __eq__(self, other):
...
def __ne__(self, other):
...
object()
is shorthand for object.__call__()
class Product:
def __init__(self):
print("Instance Created")
# Defining __call__ method
def __call__(self, a, b):
print(a * b)
# Instance created
ans = Product()
# __call__ method will be called
ans(10, 20)
Python has known at least three different MRO algorithms: classic, Python 2.2 new-style, and Python 2.3 new-style (a.k.a. C3). Only the latter survives in Python 3.
Classic classes used a simple MRO scheme: when looking up a method, base classes were searched using a simple depth-first left-to-right scheme. The first matching object found during this search would be returned. For example, consider these classes:
class A:
def save(self): pass
class B(A): pass
class C:
def save(self): pass
class D(B, C): pass
If we created an instance x of class D, the classic method resolution order would order the classes as D, B, A, C. Thus, a search for the method x.save() would produce A.save() (and not C.save()).
One problem concerns method lookup under "diamond inheritance." For example:
class A:
def save(self): pass
class B(A): pass
class C(A):
def save(self): pass
class D(B, C): pass
Here, class D inherits from B and C, both of which inherit from class A. Using the classic MRO, methods would be found by searching the classes in the order D, B, A, C, A. Thus, a reference to x.save() will call A.save() as before. However, this is unlikely what you want in this case! Since both B and C inherit from A, one can argue that the redefined method C.save() is actually the method that you want to call, since it can be viewed as being "more specialized" than the method in A (in fact, it probably calls A.save() anyways). For instance, if the save() method is being used to save the state of an object, not calling C.save() would break the program since the state of C would be ignored.
Although this kind of multiple inheritance was rare in existing code, new-style classes would make it commonplace. This is because all new-style classes were defined by inheriting from a base class object. Thus, any use of multiple inheritance in new-style classes would always create the diamond relationship described above. For example:
class B(object): pass
class C(object):
def __setattr__(self, name, value): pass
class D(B, C): pass
Moreover, since object defined a number of methods that are sometimes extended by subtypes (e.g., setattr()), the resolution order becomes critical. For example, in the above code, the method C.setattr should apply to instances of class D.
To fix the method resolution order for new-style classes in Python 2.2, G. adopted a scheme where the MRO would be pre-computed when a class was defined and stored as an attribute of each class object. The computation of the MRO was officially documented as using a depth-first left-to-right traversal of the classes as before. If any class was duplicated in this search, all but the last occurrence would be deleted from the MRO list. So, for our earlier example, the search order would be D, B, C, A (as opposed to D, B, A, C, A with classic classes).
In reality, the computation of the MRO was more complex than this. Guido discovered a few cases where this new MRO algorithm didn't seem to work. Thus, there was a special case to deal with a situation when two bases classes occurred in a different order in the inheritance list of two different derived classes, and both of those classes are inherited by yet another class. For example:
class A(object): pass
class B(object): pass
class X(A, B): pass
class Y(B, A): pass
class Z(X, Y): pass
Using the tentative new MRO algorithm, the MRO for these classes would be Z, X, Y, B, A, object. (Here 'object' is the universal base class.) However, I didn't like the fact that B and A were in reversed order. Thus, the real MRO would interchange their order to produce Z, X, Y, A, B, object.
Thus, in Python 2.3, we abandoned my home-grown 2.2 MRO algorithm in favor of the academically vetted C3 algorithm. One outcome of this is that Python will now reject any inheritance hierarchy that has an inconsistent ordering of base classes. For instance, in the previous example, there is an ordering conflict between class X and Y. For class X, there is a rule that says class A should be checked before class B. However, for class Y, the rule says that class B should be checked before A. In isolation, this discrepancy is fine, but if X and Y are ever combined together in the same inheritance hierarchy for another class (such as in the definition of class Z), that class will be rejected by the C3 algorithm. This, of course, matches the Zen of Python's "errors should never pass silently" rule.
In Python, the MRO is from bottom to top and left to right. This means that, first, the method is searched in the class of the object. If it's not found, it is searched in the immediate super class. In the case of multiple super classes, it is searched left to right, in the order by which was declared by the developer. For example:
A mixin is a special kind of multiple inheritance. There are two main situations where mixins are used:
- You want to provide a lot of optional features for a class.
- You want to use one particular feature in a lot of different classes.
Metaclasses are the 'stuff' that creates classes.
You define classes in order to create objects, right?
But we learned that Python classes are objects.
Well, metaclasses are what create these objects. They are the classes' classes, you can picture them this way:
MyClass = MetaClass()
my_object = MyClass()
# You've seen that type lets you do something like this:
MyClass = type('MyClass', (), {})
###type()
Parameters
The type()
function either takes a single object parameter.
Or, it takes 3 parameters
name
- a class name; becomes the __name__
attribute
bases
- a tuple that itemizes the base class; becomes the __bases__
attribute
dict
- a dictionary which is the namespace containing definitions for the class body; becomes the __dict__
attribute
The type()
function returns
type of the object, if only one object parameter is passed a new type, if 3 parameters passed
The isinstance() function returns True if the specified object is of the specified type, otherwise False.
If the type parameter is a tuple, this function will return True if the object is one of the types in the tuple.
The issubclass() function checks if the class argument (first argument) is a subclass of classinfo class (second argument).
The syntax of issubclass() is:
issubclass(class, classinfo)
The special attribute __slots__
allows you to explicitly state which instance attributes you expect your object instances to have, with the expected results:
- faster attribute access.
- space savings in memory.
The space savings is from Storing value references in slots instead of dict.
Denying __dict__
and __weakref__
creation if parent classes deny them and you declare __slots__
.
Quick Caveats
class Base:
__slots__ = 'foo', 'bar'
class Right(Base):
__slots__ = 'baz',
Serious software development calls for performance optimization. When you start optimizing application performance, you can't escape looking at profilers. Whether monitoring production servers or tracking frequency and duration of method calls, profilers run the gamut
You can do several things with trace:
- Produce a code coverage report to see which lines are run or skipped over (
python3 -m trace –count trace_example/main.py
). - Report on the relationships between functions that call one other (
python3 -m trace –listfuncs trace_example/main.py | grep -v importlib
). - Track which function is the caller (
python3 -m trace –listfuncs –trackcalls trace_example/main.py | grep -v importlib
).
By contrast, faulthandler has slightly better Python documentation. It states that its purpose is to dump Python tracebacks explicitly on a fault, after a timeout, or on a user signal. It also works well with other system fault handlers like Apport or the Windows fault handler. Both the faulthandler and trace modules provide more tracing abilities and can help you debug your Python code. For more profiling statistics, see the next section.
If you're a beginner to tracing, I recommend you start simple with trace.
Datadog in my production
Now let's delve into profiling specifics. The term "profiling" is mainly used for performance testing, and the purpose of performance testing is to find bottlenecks by doing deep analysis. So you can use tracing tools to help you with profiling. Recall that tracing is when software developers log information about a software execution. Therefore, logging performance metrics is also a way to perform profiling analysis.
But we're not restricted to tracing. As profiling gains mindshare in the mainstream, we now have tools that perform profiling directly. Now the question is, what parts of the software do we profile (measure its performance metrics)?
- Method or function (most common)
- Lines (similar to method profiling, but doing it line by line)
- Memory (memory usage)
- Speed (time)
- Calls (frequency)
- Method and line profiling
Both cProfile and profile are modules available in the Python 3 language. The numbers produced by these modules can be formatted into reports via the pstats module.
Here's an example of cProfile showing the numbers for a script:
import cProfile
import re
cProfile.run('re.compile("foo|bar")')
197 function calls (192 primitive calls) in 0.002 seconds
Another common component to profile is the memory usage. The purpose is to find memory leaks and optimize the memory usage in your Python programs. In terms of generic Python options, the most recommended tools for memory profiling for Python 3 are the pympler
and the objgraph
libraries.
>>> from pympler import classtracker
>>> tr = classtracker.ClassTracker()
>>> tr.track_class(Document)
>>> tr.create_snapshot()
>>> create_documents()
>>> tr.create_snapshot()
>>> tr.stats.print_summary()
active 1.42 MB average pct
Document 1000 195.38 KB 200 B 13%
When we do profiling, it means we need to monitor the execution. That in itself may affect the underlying software being monitored. Either we monitor all the function calls and exception events, or we use random sampling and deduce the numbers. The former is known as deterministic profiling, and the latter is statistical profiling. Of course, each method has its pros and cons. Deterministic profiling can be highly precise, but its extra overhead may affect its accuracy. Statistical profiling has less overhead in comparison, with the drawback being lower precision.
cProfile, which I covered earlier, uses deterministic profiling. Let's look at another open source Python profiler that uses statistical profiling: pyinstrument.
Pyinstrument differentiates itself from other typical profilers in two ways. First, it emphasizes that it uses statistical profiling instead of deterministic profiling. It argues that while deterministic profiling can give you more precision than statistical profiling, the extra precision requires more overhead. The extra overhead may affect the accuracy and lead to optimizing the wrong part of the program. Specifically, it states that using deterministic profiling means that "code that makes a lot of Python function calls invokes the profiler a lot, making it slower." This is how results get distorted and the wrong part of the program gets optimized.
This module provides basic mechanisms for measuring and controlling system resources utilized by a program.
Symbolic constants are used to specify particular system resources and to request usage information about either the current process or its children.
resource.getrusage(who)
This function returns an object that describes the resources consumed by either the current process or its children, as specified by the who parameter. The who parameter should be specified using one of the RUSAGE_* constants described below.
A simple example:
from resource import *
import time
# a non CPU-bound task
time.sleep(3)
print(getrusage(RUSAGE_SELF))
# a CPU-bound task
for i in range(10 ** 8):
_ = 1 + 1
print(getrusage(RUSAGE_SELF))
The with statement in Python is a quite useful tool for properly managing external resources in your programs. It allows you to take advantage of existing context managers to automatically handle the setup and teardown phases whenever you're dealing with external resources or with operations that require those phases.
Besides, the context management protocol allows you to create your own context managers so you can customize the way you deal with system resources. So, what's the with statement good for?
# writable.py
class WritableFile:
def __init__(self, file_path):
self.file_path = file_path
def __enter__(self):
self.file_obj = open(self.file_path, mode="w")
return self.file_obj
def __exit__(self, exc_type, exc_val, exc_tb):
if self.file_obj:
self.file_obj.close()
>>> from contextlib import contextmanager
>>> @contextmanager
... def writable_file(file_path):
... file = open(file_path, mode="w")
... try:
... yield file
... finally:
... file.close()
...
>>> with writable_file("hello.txt") as file:
... file.write("Hello, World!")
# site_checker_v1.py
import aiohttp
import asyncio
async def check(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
print(f"{url}: status -> {response.status}")
html = await response.text()
print(f"{url}: type -> {html[:17].strip()}")
async def main():
await asyncio.gather(
check("https://realpython.com"),
check("https://pycoders.com"),
)
asyncio.run(main())
A mock object substitutes and imitates a real object within a testing environment. It is a versatile and powerful tool for improving the quality of your tests.
One reason to use Python mock objects is to control your code's behavior during testing.
For example, if your code makes HTTP requests to external services, then your tests execute predictably only so far as the services are behaving as you expected. Sometimes, a temporary change in the behavior of these external services can cause intermittent failures within your test suite.
>>> from unittest.mock import Mock
>>> mock = Mock()
>>> mock
<Mock id='4561344720'>
A Mock must simulate any object that it replaces. To achieve such flexibility, it creates its attributes when you access them.
>>> from unittest.mock import Mock
>>> # Create a mock object
... json = Mock()
>>> json.loads('{"key": "value"}')
<Mock name='mock.loads()' id='4550144184'>
>>> # You know that you called loads() so you can
>>> # make assertions to test that expectation
... json.loads.assert_called()
>>> json.loads.assert_called_once()
>>> json.loads.assert_called_with('{"key": "value"}')
>>> json.loads.assert_called_once_with('{"key": "value"}')
datetime = Mock()
datetime.datetime.today.return_value = "tuesday"
requests = Mock()
requests.get.side_effect = Timeout
@patch('my_calendar.requests')
def test_get_holidays_timeout(self, mock_requests):
mock_requests.get.side_effect = Timeout
or
with patch('my_calendar.requests') as mock_requests:
mock_requests.get.side_effect = Timeout
And there are MagicMock and Async Mock as well.
Coverage.py is one of the most popular code coverage tools for Python. It uses code analysis tools and tracing hooks provided in Python standard library to measure coverage. It runs on major versions of CPython, PyPy, Jython and IronPython. You can use Coverage.py with both unittest and Pytest.
nose2
is the successor to nose. It's unittest with plugins.
nose2
is a new project and does not support all of the features of nose. See differences for a thorough rundown.
nose2
's purpose is to extend unittest to make testing nicer and easier to understand.
nose2 vs pytest nose2 may or may not be a good fit for your project.
If you are new to python testing, we encourage you to also consider pytest
, a popular testing framework.
The doctest module searches for pieces of text that look like interactive Python sessions, and then executes those sessions to verify that they work exactly as shown. There are several common ways to use doctest:
-
To check that a module's docstrings are up-to-date by verifying that all interactive examples still work as documented.
-
To perform regression testing by verifying that interactive examples from a test file or a test object work as expected.
-
To write tutorial documentation for a package, liberally illustrated with input-output examples. Depending on whether the examples or the expository text are emphasized, this has the flavor of "literate testing" or "executable documentation".
python example.py -v
The main garbage collection algorithm used by CPython is reference counting. The basic idea is that CPython counts how many different places there are that have a reference to an object. Such a place could be another object, or a global (or static) C variable, or a local variable in some C function. When an object's reference count becomes zero, the object is deallocated. If it contains references to other objects, their reference counts are decremented. Those other objects may be deallocated in turn, if this decrement makes their reference count become zero, and so on. The reference count field can be examined using the sys.getrefcount function (notice that the value returned by this function is always 1 more as the function also has a reference to the object when called):
x = object()
sys.getrefcount(x)
2
y = x
sys.getrefcount(x)
3
del y
sys.getrefcount(x)
2
The main problem with the reference counting scheme is that it does not handle reference cycles. For instance, consider this code:
container = []
container.append(container)
sys.getrefcount(container)
3
del container
In this example, container holds a reference to itself, so even when we remove our reference to it (the variable "container") the reference count never falls to 0 because it still has its own internal reference. Therefore it would never be cleaned just by simple reference counting. For this reason some additional machinery is needed to clean these reference cycles between objects once they become unreachable. This is the cyclic garbage collector, usually called just Garbage Collector (GC), even though reference counting is also a form of garbage collection.
In order to limit the time each garbage collection takes, the GC uses a popular optimization: generations. The main idea behind this concept is the assumption that most objects have a very short lifespan and can thus be collected shortly after their creation. This has proven to be very close to the reality of many Python programs as many temporary objects are created and destroyed very fast. The older an object is the less likely it is that it will become unreachable.
To take advantage of this fact, all container objects are segregated into three spaces/generations. Every new object starts in the first generation (generation 0). The previous algorithm is executed only over the objects of a particular generation and if an object survives a collection of its generation it will be moved to the next one (generation 1), where it will be surveyed for collection less often. If the same object survives another GC round in this new generation (generation 1) it will be moved to the last generation (generation 2) where it will be surveyed the least often.
Generations are collected when the number of objects that they contain reaches some predefined threshold, which is unique for each generation and is lower the older the generations are. These thresholds can be examined using the gc.get_threshold function:
import gc
gc.get_threshold()
(700, 10, 10)
>>> import gc
>>> gc.get_count()
(596, 2, 1)
You can trigger a manual garbage collection process by using the gc.collect()
method
import gc
class MyObj:
pass
# Move everything to the last generation so it's easier to inspect
# the younger generations.
gc.collect()
0
# Create a reference cycle.
x = MyObj()
x.self = x
# Initially the object is in the youngest generation.
gc.get_objects(generation=0)
[..., <__main__.MyObj object at 0x7fbcc12a3400>, ...]
# After a collection of the youngest generation the object
# moves to the next generation.
gc.collect(generation=0)
0
gc.get_objects(generation=0)
[]
gc.get_objects(generation=1)
[..., <__main__.MyObj object at 0x7fbcc12a3400>, ...]
The garbage collector module provides the Python function is_tracked(obj), which returns the current tracking status of the object.
For this reason some additional machinery is needed to clean these reference cycles between objects once they become unreachable. This is the cyclic garbage collector, usually called just Garbage Collector (GC), even though reference counting is also a form of garbage collection.
As a general rule, instances of atomic types aren't tracked and instances of non-atomic types (containers, user-defined objects…) are. However, some type-specific optimizations can be present in order to suppress the garbage collector footprint of simple instances. Some examples of native types that benefit from delayed tracking:
Tuples containing only immutable objects (integers, strings etc, and recursively, tuples of immutable objects) do not need to be tracked
Dictionaries containing only immutable objects also do not need to be tracked
General rule: Don't change garbage collector behavior
The Python program, just like other programming languages, experiences memory leaks. Memory leaks in Python happen if the garbage collector doesn't clean and eliminate the unreferenced or unused data from Python.
Python developers have tried to address memory leaks through the addition of features that free unused memory automatically.
However, some unreferenced objects may pass through the garbage collector unharmed, resulting in memory leaks.
The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines.
However, some extension modules, either standard or third-party, are designed so as to release the GIL when doing computationally-intensive tasks such as compression or hashing. Also, the GIL is always released when doing I/O.
Past efforts to create a "free-threaded" interpreter (one which locks shared data at a much finer granularity) have not been successful because performance suffered in the common single-processor case. It is believed that overcoming this performance issue would make the implementation much more complicated and therefore costlier to maintain.
>>> import sys
>>> # The interval is set to 100 instructions:
>>> sys.getcheckinterval()
100
# new python^
>>> sys.getswitchinterval()
0.005
The problem in this mechanism was that most of the time the CPU-bound thread would reacquire the GIL itself before other threads could acquire it. This was researched by David Beazley and visualizations can be found here.
This problem was fixed in Python 3.2 in 2009 by Antoine Pitrou who added a mechanism of looking at the number of GIL acquisition requests by other threads that got dropped and not allowing the current thread to reacquire GIL before other threads got a chance to run.
Straight forward:
from time import sleep, perf_counter
from threading import Thread
def task():
print('Starting a task...')
sleep(1)
print('done')
start_time = perf_counter()
# create two new threads
t1 = Thread(target=task)
t2 = Thread(target=task)
# start the threads
t1.start()
t2.start()
# wait for the threads to complete
t1.join()
t2.join()
end_time = perf_counter()
print(f'It took {end_time- start_time: 0.2f} second(s) to complete.')
Better:
from concurrent.futures import ThreadPoolExecutor
from time import sleep
def cube(x):
result = x * x * x
print(f'Куб числа {x}: {result}')
return result
if __name__ == '__main__':
values = [3, 4, 5, 6]
with ThreadPoolExecutor(max_workers=5) as executor:
# Используем map для применения функции cube ко всем значениям
results = list(executor.map(cube, values))
print("\nРезультаты:")
for value, result in zip(values, results):
print(f"Куб числа {value}: {result}")
Operations associated with queue.Queue
are:
maxsize
– Number of items allowed in the queue.empty()
– Return True if the queue is empty, False otherwise.full()
– Return True if there are maxsize items in the queue. If the queue was initialized with maxsize=0 (the default), then full() never returns True.get()
– Remove and return an item from the queue. If queue is empty, wait until an item is available.get_nowait()
– Return an item if one is immediately available, else raise QueueEmpty.put(item)
– Put an item into the queue. If the queue is full, wait until a free slot is available before adding the item.put_nowait(item)
– Put an item into the queue without blocking. If no free slot is immediately available, raise QueueFull.qsize()
– Return the number of items in the queue.
Simple case:
#!/usr/bin/python
from multiprocessing import Process
import time
def fun():
print('starting fun')
time.sleep(2)
print('finishing fun')
def main():
p = Process(target=fun)
p.start()
p.join()
if __name__ == '__main__':
print('starting main')
main()
print('finishing main')
Nice pool:
#!/usr/bin/python
import time
from timeit import default_timer as timer
from multiprocessing import Pool, cpu_count
def square(n):
time.sleep(2)
return n * n
def main():
start = timer()
print(f'starting computations on {cpu_count()} cores')
values = (2, 4, 6, 8)
with Pool() as pool:
res = pool.map(square, values)
print(res)
end = timer()
print(f'elapsed time: {end - start}')
if __name__ == '__main__':
main()
pipes
— Interface to shell pipelines. The pipes module defines a class to abstract the concept of a pipeline — a sequence of converters from one file to another.
Only C treads:
#include "Python.h"
...
PyObject *pyfunc(PyObject *self, PyObject *args)
{
...
Py_BEGIN_ALLOW_THREADS
// Threaded C code.
// Must not use Python API functions
...
Py_END_ALLOW_THREADS
...
return result;
}
Mixing C and Python (please don't do it without glasses):
include <Python.h>
...
if (!PyEval_ThreadsInitialized())
{
PyEval_InitThreads();
}
...
distutils
is deprecated with removal planned for Python 3.12. See the What's New entry for more information.
Most Python users will not want to use this module directly, but instead use the cross-version tools maintained by the Python Packaging Authority. In particular, setuptools
is an enhanced alternative to distutils that provides:
-
support for declaring project dependencies
-
additional mechanisms for configuring which files to include in source releases (including plugins for integration with version control systems)
-
the ability to declare project "entry points", which can be used as the basis for application plugin systems
-
the ability to automatically generate Windows command line executables at installation time rather than needing to prebuild them
Setuptools
is a fully-featured, actively-maintained, and stable library designed to facilitate packaging Python projects.
For basic use of setuptools, you will need a pyproject.toml
with the exact following info, which declares you want to use setuptools to package your project:
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
Then, you will need a setup.cfg or setup.py to specify your package information, such as metadata, contents, dependencies, etc. Here we demonstrate the minimum
from setuptools import setup
setup(
name='mypackage',
version='0.0.1',
packages=['mypackage'],
install_requires=[
'requests',
'importlib; python_version == "2.6"',
],
)
~/mypackage/
pyproject.toml
setup.cfg # or setup.py
mypackage/__init__.py
python -m build
https://packaging.python.org/en/latest/tutorials/packaging-projects/
packaging_tutorial/
├── LICENSE
├── pyproject.toml
├── README.md
├── setup.cfg
├── src/
│ └── example_package/
│ ├── __init__.py
│ └── example.py
└── tests/
autosummary
, an extension for the Sphinx documentation tool.
autodoc
, a Sphinx-based processor that processes/allows reST doc strings.
pdoc
, a simple Python 3 command line tool and library to auto-generate API documentation for Python modules. Supports Numpydoc / Google-style docstrings, doctests, reST directives, PEP 484 type annotations, custom templates ...
pdoc3
, a fork of pdoc for Python 3 with support for Numpydoc / Google-style docstrings, doctests, LaTeX math, reST directives, PEP 484 type annotations, custom templates ...
PyDoc
, a documentation browser (in HTML) and/or an off-line reference manual. Also in the standard library as pydoc.
pydoctor
, a replacement for now inactive Epydoc, born for the needs of Twisted project.
Doxygen
can create documentation in various formats (HTML, LaTeX, PDF, ...) and you can include formulas in your documentation (great for technical/mathematical software). Together with Graphviz, it can create diagrams of your code (inhertance diagram, call graph, ...). Another benefit is that it handles not only Python, but also several other programming languages like C, C++, Java, etc.
Simple C function:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main() {
FILE *fp = fopen("write.txt", "w");
fputs("Real Python!", fp);
fclose(fp);
return 1;
}
And make it python-compatible:
#include <Python.h>
static PyObject *method_fputs(PyObject *self, PyObject *args) {
char *str, *filename = NULL;
int bytes_copied = -1;
/* Parse arguments */
if(!PyArg_ParseTuple(args, "ss", &str, &filename)) {
return NULL;
}
FILE *fp = fopen(filename, "w");
bytes_copied = fputs(str, fp);
fclose(fp);
return PyLong_FromLong(bytes_copied);
}
static PyMethodDef FputsMethods[] = {
{"fputs", method_fputs, METH_VARARGS, "Python interface for fputs C library function"},
{NULL, NULL, 0, NULL}
};
static struct PyModuleDef fputsmodule = {
PyModuleDef_HEAD_INIT,
"fputs",
"Python interface for the fputs C library function",
-1,
FputsMethods
};
Build it with distutils
setup.py:
from distutils.core import setup, Extension
def main():
setup(name="fputs",
version="1.0.0",
description="Python interface for the fputs C library function",
author="<your name>",
author_email="your_email@gmail.com",
ext_modules=[Extension("fputs", ["fputsmodule.c"])])
if __name__ == "__main__":
main()
python3 setup.py install
>>> import fputs
>>> fputs.__doc__
'Python interface for the fputs C library function'
>>> fputs.__name__
'fputs'
>>> # Write to an empty file named `write.txt`
>>> fputs.fputs("Real Python!", "write.txt")
13
>>> with open("write.txt", "r") as f:
>>> print(f.read())
'Real Python!'
https://realpython.com/build-python-c-extension-module/
cffi
- C Foreign Function Interface for Python. Interact with almost any C code from Python, based on C-like declarations that you can often copy-paste from header files or documentation. https://cffi.readthedocs.io/en/latest/
from cffi import FFI
ffibuilder = FFI()
# cdef() expects a single string declaring the C types, functions and
# globals needed to use the shared object. It must be in valid C syntax.
ffibuilder.cdef("""
float pi_approx(int n);
""")
# set_source() gives the name of the python extension module to
# produce, and some C source code as a string. This C code needs
# to make the declarated functions, types and globals available,
# so it is often just the "#include".
ffibuilder.set_source("_pi_cffi",
"""
#include "pi.h" // the C header of the library
""",
libraries=['piapprox']) # library name, for the linker
if __name__ == "__main__":
ffibuilder.compile(verbose=True)
SWIG
is an interface compiler that connects programs written in C and C++ with scripting languages such as Perl, Python, Ruby, and Tcl http://www.swig.org/exec.html
SIP
is a collection of tools that makes it very easy to create Python bindings for C and C++ libraries. https://www.riverbankcomputing.com/static/Docs/sip/examples.html
https://www.boost.org/doc/libs/1_78_0/libs/python/doc/html/tutorial/index.html
Following C/C++ tradition, let's start with the "hello, world". A C++ Function:
{
return "hello, world";
}
can be exposed to Python by writing a Boost.Python wrapper:
#include <boost/python.hpp>
BOOST_PYTHON_MODULE(hello_ext)
{
using namespace boost::python;
def("greet", greet);
}
That's it. We're done. We can now build this as a shared library. The resulting DLL is now visible to Python. Here's a sample Python session:
>>> import hello_ext
>>> print hello_ext.greet()
hello, world
https://docs.python.org/3/library/
Just read their names and short descriptions at least. You would be surprised how many task you can do with pure python.
math
This module provides access to the mathematical functions defined by the C standard. https://docs.python.org/3/library/math.htmlrandom
This module implements pseudo-random number generators for various distributions. https://docs.python.org/3/library/random.htmlre
This module provides regular expression matching operations similar to those found in Perl. https://docs.python.org/3/library/re.htmlsys
This module provides access to some variables used or maintained by the interpreter and to functions that interact strongly with the interpreter. It is always available. https://docs.python.org/3/library/sys.htmlos
This module provides a portable way of using operating system dependent functionality. If you just want to read or write a file see open(), if you want to manipulate paths, see the os.path module, and if you want to read all the lines in all the files on the command line see the fileinput module. For creating temporary files and directories see the tempfile module, and for high-level file and directory handling see the shutil module. https://docs.python.org/3/library/os.htmltime
,datetime
,argparse
Parser for command-line options, arguments and sub-commands https://docs.python.org/3/library/argparse.htmloptparse
Deprecated since version 3.2: The optparse module is deprecated and will not be developed further; development will continue with the argparse module.