I'm trying to understand how new instances of a Python class should be created when the creation process can either be via the constructor or via the __new__
method. In particular, I notice that when using the constructor, the __init__
method will be automatically called after __new__
, while when invoking __new__
directly the __init__
class will not automatically be called. I can force __init__
to be called when __new__
is explicitly called by embedding a call to __init__
within __new__
, but then __init__
will end up getting called twice when the class is created via the constructor.
For example, consider the following toy class, which stores one internal property, namely a list
object called data
: it is useful to think of this as the start of a vector class.
class MyClass(object):
def __new__(cls, *args, **kwargs):
obj = object.__new__(cls, *args, **kwargs)
obj.__init__(*args, **kwargs)
return obj
def __init__(self, data):
self.data = data
def __getitem__(self, index):
return self.__new__(type(self), self.data[index])
def __repr__(self):
return repr(self.data)
A new instance of the class can be created either using the constructor (not actually sure if that is the right terminology in Python), something like
x = MyClass(range(10))
or via slicing, which you can see invokes a call to __new__
in the __getitem__
method.
x2 = x[0:2]
In the first instance, __init__
will be called twice (both via the explicit call within __new__
and then again automatically), and once in the second instance. Obviously I would only like __init__
to be invoked once in any case. Is there a standard way to do this in Python?
Note that in my example I could get rid of the __new__
method and redefine __getitem__
as
def __getitem__(self, index):
return MyClass(self.data[index])
but then this would cause a problem if I later want to inherit from MyClass
, because if I make a call like child_instance[0:2]
I will get back an instance of MyClass
, not the child class.
First, some basic facts about __new__
and __init__
:
__new__
is a constructor.__new__
typically returns an instance ofcls
, its first argument.- By
__new__
returning an instance ofcls
,__new__
causes Python to call__init__
. __init__
is an initializer. It modifies the instance (self
) returned by__new__
. It does not need to returnself
.
When MyClass
defines:
def __new__(cls, *args, **kwargs):
obj = object.__new__(cls, *args, **kwargs)
obj.__init__(*args, **kwargs)
return obj
MyClass.__init__
gets called twice. Once from calling obj.__init__
explicitly, and a second time because __new__
returned obj
, an instance of cls
. (Since the first argument to object.__new__
is cls
, the instance returned is an instance of MyClass
so obj.__init__
calls MyClass.__init__
, not object.__init__
.)
The Python 2.2.3 release notes has an interesting comment, which sheds light on when to use __new__
and when to use __init__
:
The
__new__
method is called with the class as its first argument; its responsibility is to return a new instance of that class.Compare this to
__init__
:__init__
is called with an instance as its first argument, and it doesn't return anything; its responsibility is to initialize the instance.All this is done so that immutable types can preserve their immutability while allowing subclassing.
The immutable types (int, long, float, complex, str, unicode, and tuple) have a dummy
__init__
, while the mutable types (dict, list, file, and also super, classmethod, staticmethod, and property) have a dummy__new__
.
So, use __new__
to define immutable types, and use __init__
to define mutable types. While it is possible to define both, you should not need to do so.
Thus, since MyClass is mutable, you should only define __init__
:
class MyClass(object):
def __init__(self, data):
self.data = data
def __getitem__(self, index):
return type(self)(self.data[index])
def __repr__(self):
return repr(self.data)
x = MyClass(range(10))
x2 = x[0:2]