2019-09-29

2: Minimal Guide for Static Types Checking for Python with mypy

<The previous article in this series | The table of contents of this series | The next article in this series>

Meant to be sufficient (mostly), although minimal, while the mypy official document is not minimal and most introductions are not sufficient.

Topics


About: The Python programming language
About: static types checking
About: mypy

The table of contents of this article


Starting Context


  • The reader has a basic knowledge on Python.

Target Context


  • The reader will master minimal sufficient (presumably) technique for doing static types checking for Python.

Orientation


There is an article on how static types checking is craved for most cases.

There is an article on whether to use or not to use the mypy strict optional-checking feature.

There is an article on how to mend any mypy stub in the 'typeshed' repository.


Main Body


1: Some Notes


Hypothesizer 7
"By what right are you determining what are "minimal sufficient" for us?" is of course a very legitimate question.

I admit that my above expressions may sound obtrusive, but while really accurate expressions tend to have to become lengthy more or less, as typical search engines show only unreasonably short part of the title and the description and most earthlings are incurably too impatient to read through as-long-as-required-to-be-accurate expressions, I regrettably made the choice to cut short my expressions.

To state facts-based-ly, it is minimal sufficient for ME, but "So what? What does it have to do with me?" will be an even more legitimate question.

So, my intention is that here is minimal sufficient technique for me, which (the technique) I hope will be (more or less) minimal sufficient also for some people, not that all the people should accept it as minimal sufficient.

The reason why such a minimal sufficient guide is demanded is that while on one hand, the official document of 'mypy' seems to offer too much I-am-not-interested-at-all-right-now information for me to want to read through (I am not blaming the document at all: I understand that 'to be comprehensive' is its mission), on the other hand, any of the 'mypy' introductions I have found on the internet does not offer some these-are-obviously-necessary information (I am not blaming the introductions at all: probably, 'to encourage the readers to begin to use mypy' is their mission, not 'to be a practical usage guide'.

For example, most introductions touch just using some built-in types, but actually, the built-in types are not my primal concerns. . . . In fact, if a function argument is known to be a string, I will not worry much about it further, because the details of the 'str' type are well-known. . . . On the other hand, if a function argument is claimed to be a "dodo", I will worry very much about it, wondering "What, the hell, is "dodo"?! How is a "dodo" supposed to walk, quack, or whatever-k?". . . . So, such introductions are not any practical usage guide for me.

As another example, incorporating static types checking almost inevitably necessitates type casting: any statically typed programming language I know can do type casting, because it is indispensable. . . . But most introductions do not touch type casting, being not a practical usage guide for me.

And I include also 'generics' as a necessity: it is an of-course-expected for any serious programming language, for me.

Besides, there are some tricks that I have found to be indispensable in most cases.

By the way, I had better warn that the technique may depend on the versions of the concerned products (Python and mypy in this case).

While it is rather universal for software more or less, as static types checking for Python is relatively new, the tendency may be more prominent than usual.

My descriptions are based on Python 3.8 and mypy 0.780.


2: Annotating Any Variable or Any Function Return


Hypothesizer 7
When I annotate any variable, I do like this.

@Python Source Code
from collections import OrderedDict
from typing import Callable
from typing import Collection
from typing import Container
from typing import Dict
from typing import Iterable
from typing import Iterator
from typing import List
from typing import Optional
from typing import Set
from typing import Sized
from typing import Tuple

# The built-in non-container types Start
l_integer: int = 1
l_float: float = 1.1
l_bool: bool = True
l_string: str = "ABC"
l_bytes: bytes = b"ABC"
l_classAClass: type = ClassAA # 'ClassAA' is a class.
# The built-in non-container types End
# The built-in container types Start
l_list: List [int] = [1, 2]
l_set: Set [float] = {1.1, 2.2}
l_dictionary: Dict [bool, str] = {True: "ABC", False: "DEF"}
l_elementsOrderPreservedDictionary: "OrderedDict [bytes, int]" = OrderedDict ({b"DEF": 2, b"ABC": 1})
l_tuple: Tuple [float, ...] = (1.1, 2.2, 3.3, )
# The built-in container types End
# The user-defined class instance types
l_classA: "ClassA" = ClassA ("ABC") # 'ClassA' is a class with the one-string-argument constructor
# Making it optional
l_optionalInteger: Optional [int] = None
# The function types
l_function: Callable [ [int, int], Tuple [int, int]] = divmod
# Some implicit-interface types Start
l_iterable: Iterable [int] = [1, 2]
l_iterator: Iterator [int] = iter ([1, 2])
l_sized: Sized = [1, 2]
l_container: Container [int] = [1, 2]
l_collection: Collection [int] = [1, 2]
# Refer to 'https://mypy.readthedocs.io/en/stable/protocols.html' for the details of each implicit-interface type
# Refer to 'https://mypy.readthedocs.io/en/stable/protocols.html' for the other predefined so-called "protocol" types (I call them 'implicit-interface types').
# Some implicit-interface types End

Note that some types ("ClassA" and "OrderedDict [bytes, int]") are double-quoted. Actually, any type can be double-quoted. In fact, if a user-defined class type is used after it is fully defined (not inside the class definition), it does not have to be double-quoted. As I like the expressions to be consistent, I make it a rule to double-quote any user-defined class type anywhere.

The reason why "OrderedDict [bytes, int]" has to be double-quoted, while 'Dict [bool, str]' does not have to, is that 'OrderedDict' is a class, while 'Dict' is an instance of 'typing._GenericAlias', and any class does not allow such a generics expression.

The above notations are applicable to any variable, whether it is a class instance variable, a class variable, a local variable, a function argument, or whatever variable.

As for the first argument of any class instance method or any class method, the manual of 'mypy' and all the introductions I have read recommend not-annotating it, but I have discontent on that way: the stub (explained later) generator, 'stubgen', annotates such arguments as 'Any'.

'Any' is 'typing.Any' in full name and means that any operation that involves the variable or the function return is exonerated from static types checking, while such a treatment is something I never want. . . . In fact, that name is misleading: that name is as though the gist of the type is 'to take any datum', but that is not the point, although, in fact, the type takes any datum. . . . If my intention is to let a variable or a function return take any value, I should use 'object' instead.

The stub generator, 'stubgen', gratuitously annotates any not-annotated argument as 'Any', without lifting any finger to import 'typing.Any', causing an error when using the stub (unless 'from typing import Any' is explicitly added by me). . . . Besides, annotating the argument as 'Any' allows any datum to be passed into the so-called "self" argument by directly calling the so-called "unbound method".

So, I make bold to annotate the first argument of any class instance method or any class method like this.

@Python Source Code
from typing import Type
from typing import TypeVar

l_classBoundByClassB = TypeVar ("l_classBoundByClassB", bound="ClassB")

class ClassB:
	def methodA (a_this: "ClassB", a_string: str) -> str:
		return a_string
	@classmethod
	def methodB (a_class: Type [l_classBoundByClassB], a_string: str) -> str:
		return a_string

Note that although any subclass instance of 'ClassB' may be passed into 'a_this', that is OK because the '"ClassB"' specification can take any subclass instance of 'ClassB'.

When I annotate any function return, well, in fact, it is already shown in the above example.


3: What 'typing.TypeVar' and 'typing.Type' Are and How to Use Them


Hypothesizer 7
In the previous section, I have used 'typing.TypeVar' and 'typing.Type'. What are they, exactly?

Well, 'typing.TypeVar' is a class whose instance is used in a type annotation and represents a type at each 'mypy' types checking. . . . Enigmatic? Let me clarify the meaning step by step.

As for the "whose instance is used in a type annotation" part, after an 'typing.TypeVar' instance is created, it is not meant to be used outside any type annotation. Let me see a wrong code example.

@Python Source Code
from typing import TypeVar

T = TypeVar ("T")

def functionA (a_type: TypeVar) -> None: # A wrong usage
	None

functionA (T) # A wrong usage

The 'typing.TypeVar' class itself (not its instance) cannot be used as any type in any type annotation. Certainly, if the type annotation, "TypeVar", is changed to 'object', the instance, 'T', can be passed into the function, 'functionA', but I have no idea how the instance can be useful outside any type annotation.

Let me see a correct (but not useful at all) example.

@Python Source Code
from typing import TypeVar

T = TypeVar ("T")

def functionB (a_type: T) -> object:
	return a_type

l_object1: object = functionB ("ABC") # The 1st 'functionB' call checking is done on this line.
l_object2: object = functionB (1)     # The 2nd 'functionB' call checking is done on this line.

Note that the instance, 'T', is used in the type annotation.

I see that 'functionB' is called twice and each call is checked by 'mypy'. In fact, at the 1st 'functionB' call checking, 'T' represents 'str'; at the 2nd 'functionB' call checking, 'T' represents 'int', which is what I mean by "instance represents a type at each 'mypy' types checking".

Well, how is that useful? . . . Actually, the above example is not useful at all, because why do not I just use 'object' instead of 'T' as the type annotation?

Let me see a useful example.

@Python Source Code
from typing import TypeVar

T = TypeVar ("T")

def functionC (a_type: T) -> T:
	return a_type

l_string: str = functionC ("ABC") # The 1st 'functionC' call checking is done on this line.
l_integer: int = functionC (1)    # The 2nd 'functionC' call checking is done on this line.

That example is useful because 'T' is used to interlock the type of the argument and the type of the return: passing a 'str' instance into the function makes the return type 'str', for example.

I can restrict the possible types that any 'TypeVar' instance can take, like these.

@Python Source Code
from typing import TypeVar

T = TypeVar ("T", bound="ClassA")     # can represent only 'ClassA' and its any descendant
U = TypeVar ("U", "ClassA", "ClassB") # can represent only 'ClassA' or 'ClassB'

So, what is 'typing.Type'? 'typing.Type' is a function that can be used in any type annotation, which returns the type of the argument.

Let me see an example.

@Python Source Code
from typing import Type
from typing import TypeVar

class ClassC:
	def __init__ (a_this: "ClassC", a_string: str) -> None:
		a_this.i_string: str
		
		a_this.i_string = a_string
	
	def methodA (a_this: "ClassC") -> None:
		print ("From ClassC " + a_this.i_string)

class ClassCA (ClassC):
	def __init__ (a_this: "ClassCA", a_string: str) -> None:
		ClassC.__init__ (a_this, a_string)
	
	def methodA (a_this: "ClassCA") -> None:
		print ("From ClassCA " + a_this.i_string)

T = TypeVar ("T", bound="ClassC")

def functionD (a_type: Type [T], a_string: str) -> T:
	return a_type (a_string)

l_classC: ClassC = functionD (ClassC, "ABC")
l_classCA: ClassCA = functionD (ClassCA, "ABC")

I see why the type annotation of 'a_type' is 'Type [T]', not 'T': if it was 'T', the argument would take any instance of 'ClassC' or 'ClassCA', but not 'ClassC' or 'ClassCA' itself.


4: Type Casting


Hypothesizer 7
As static types checking is incorporated, type casting becomes almost inevitable.

Let me look at an example.

@Python Source Code
from typing import Dict

class ClassD:
	def __init__ (a_this: "ClassD", a_name: str) -> None:
		a_this.i_name: str
		
		a_this.i_name = a_name
	
	def methodA (a_this: "ClassD") -> str:
		return "From ClassD " +  a_this.i_name

class ClassDA (ClassD):
	def __init__ (a_this: "ClassDA", a_name: str) -> None:
		ClassD.__init__ (a_this, a_name)
	
	def methodAA (a_this: "ClassDA") -> str:
		return "From ClassDA " +  a_this.i_name

class ClassDB (ClassD):
	def __init__ (a_this: "ClassDB", a_name: str) -> None:
		ClassD.__init__ (a_this, a_name)
	
	def methodAB (a_this: "ClassDB") -> str:
		return "From ClassDB " +  a_this.i_name

l_dictionary1: Dict [str, "ClassD"] = {"Key1": ClassDA ("Name1"), "Key2": ClassDB ("Name2"), "Key3": ClassDB ("Name3"), "Key4": ClassDA ("Name4")}

As the dictionary has to contain both 'ClassDA' instances and 'ClassDB' instances, it has the type, 'Dict [str, "ClassD"]', but if that means that any retrieved value cannot be treated as any 'ClassDA' instance or any 'ClassDB' instance, . . . such a programming language will be unacceptable (at least is unacceptable for me). . . . So, type casting is inevitable.

Relievedly, type casting is possible for 'mypy'.

This code, which is a continuation of the above code, will be enough for understanding how to use type casting.

@Python Source Code
~
from typing import cast

l_elementKey: str
l_elementValue: "ClassD"
for l_elementKey, l_elementValue in l_dictionary1.items ():
	if isinstance (l_elementValue, ClassDA):
		l_classDA: "ClassDA" = cast ("ClassDA", l_elementValue)
		print (l_classDA.methodAA ())
	if isinstance (l_elementValue, ClassDB):
		l_classDB: "ClassDB" = cast ("ClassDB", l_elementValue)
		print (l_classDB.methodAB ())

However, it will be important to know that 'mypy' types casting is quite magnanimous: it allows any casting. . . . For example, this casting is allowed.

@Python Source Code
l_string: str = "ABC"
l_integer: int = 1

l_integer = cast (int, l_string)

Of course, this is checked, but what is checked is not the casting, but the assignment.

@Python Source Code
~
l_integer = cast (float, l_string)

To state to be sure, as 'mypy' types casting is completely static checking, no runtime types inconsistency will be flagged, unlike in Java. For example, this works happily with an output, 'ABC'!

@Python Source Code
l_object: object  = "ABC"
l_integer = cast (int, l_object)
print (str (l_integer))


5: Generics


Hypothesizer 7
Generics is also a necessity for me.

Well, the manual of 'mypy' addresses generics to some extent, but the treatment is a little disappointing to me.

Why? . . . As for generics function, that manual is on the undoubting assumption that at least one of the arguments depends on each type parameter, which is not always the case.

This is a piece of code that conforms to the assumption.

@Python Source Code
from typing import List
from typing import TypeVar

T = TypeVar ("T")

def functionE (*a_items: T) -> List [T]:
	l_list: List [T] = []
	l_item: T
	for l_item in a_items:
		l_list.append (l_item)
	return l_list

l_list1: List [str] = functionE ("ABC", "DEF")
l_list2: List [object] = functionE ("ABC", "DEF") # This does not cause any error, surprisingly.

Um? I expected that the assignment to 'l_list2: List [object]' would cause an error (remember that 'List [object]' is not any superclass or subclass of 'List [str]' (the article is on Java array, but the same reasoning applies here)), but that was not the case. . . . Well, surprisingly (not particularly pleasantly for me), 'T' seems to take into account also the assigned-to variable type.

However, that does not entirely dissipate my concern because this still causes an error.

@Python Source Code
functionE ("ABC", "DEF").append (1) # Still, this causes an error.

In fact, I want 'functionE' to return a 'List [object]' this time (not always, of course: if it is always so, there will be no necessity for generics), but that 'functionE' call returns a 'List [str]', understandably. . . . Note that that is a legitimate demand because 'functionE' is supposed to just initialize the list, not finalize the list: all the initial elements happened to be strings, but the list has to accept also non-string elements hereafter.

b Namely, the return type has to be parameterized directly, not to be determined by the arguments.

In Java, or whatever programming language that supports generics I know, the type parameter value can be explicitly specified, which enables me to specify the return type independently of the argument types, but as Python does not allow that, the only possible solution seems to add a argument (or some arguments if there are multiple type parameters) that takes a type. This is an example.

@Python Source Code
from typing import List
from typing import Type
from typing import TypeVar

T = TypeVar ("T")

def functionF (a_type0: Type [T], *a_items: T) -> List [T]:
	l_list: List [T] = []
	l_item: T
	for l_item in a_items:
		l_list.append (l_item)
	return l_list

l_list1: List [str] = functionF (str, "ABC", "DEF")
l_list2: List [object] = functionF (object, "ABC", "DEF")
l_list3: List [object] = functionF (str, "ABC", "DEF") # I am not happy about this behavior, but that seems unpreventable.
functionF (object, "ABC", "DEF").append (1) # This does not cause any error.
functionF (str, "ABC", "DEF").append (1) # An error, rightfully.

Note that specifying the 'object' type parameter argument value prevents 'T' from shrinking into 'str' even if all the 'a_items' arguments are strings.

When I need to parameterize a whole class, not only some methods, I can do like this.

@Python Source Code
from typing import Generic
from typing import TypeVar

T = TypeVar ("T")

class ClassE (Generic [T]):
	def __init__ (a_this: "ClassE", a_t: T) -> None:
		a_this.i_t: T
		
		a_this.i_t = a_t
	
	def methodA (a_this: "ClassE") -> T:
		return a_this.i_t


6: Installing 'mypy'


Hypothesizer 7
As I have learned how to annotate source files, let me move to how to invoke checking.

I can install 'mypy' by this command (after 'pip' has been installed).

@bash Source Code
python3 -m pip install mypy


7: Some Tips for Using 'mypy'


Hypothesizer 7
Namespace packages are indispensable for me. That is because I place almost all of my Python code (in multiple projects) under the 'theBiasPlanet' package. . . . I think that that is a standard practice: a company or a person places its or his or her code under a single package that represents the company or the person, in order to avoid any duplication of module names with code by the other entities. I do not understand why following such a standard practice should require any special treatment, actually . . .

Anyway, when any namespace package is used, I have to specify the '--namespace-packages' flag to the 'mypy' command execution.

'stub' is a skelton of a module, which can be used by 'mypy' in order for 'mypy' to know what the classes, functions, etc. in the module are like. . . . It resembles 'C++ header file' in a sense.

For example, these are a module and its stub.

@Python Source Code
from datetime import datetime
from typing import Generic
from typing import TypeVar

T = TypeVar ("T")

class ClassF (Generic [T]):
	def __init__ (a_this: "ClassF", a_t: T) -> None:
		a_this.i_t: T
		a_this.i_datetime: datetime
		
		a_this.i_t = a_t
		a_this.i_datetime = datetime.now ()
	
	def methodA (a_this: "ClassF") -> T:
		return a_this.i_t

@Python stub Source Code
from datetime import datetime
from typing import Generic
from typing import TypeVar


T = TypeVar('T')

class ClassF (Generic [T]):
    def __init__ (a_this: "ClassF", a_t: T) -> None:
        a_this.i_t: T
        a_this.i_datetime: datetime
        ...
    def methodA(a_this: ClassF) -> T: ...


In the stub, the logics in the methods are replaced with "...".

Well, . . . do I have to create a thing like a stub? . . . In fact, not necessarily: I can just make 'mypy' read the source file instead of any stub file. . . . However, if I do not intend to distribute the source files, but the compiled 'pyc' files, I will have to distribute the stubs because 'mypy' cannot read the compiled files.

So, do I have to create the stubs manually? . . . 'mypy' includes a stub generator, 'stubgen', . . . but that is not something that creates the stubs satisfactorily.

Actually, this is the stub file that 'stubgen' generates against the above source code.

@Python stub Source Code
# Stubs for ClassF (Python 3)
#
# NOTE: This dynamically typed stub was automatically generated by stubgen.

from typing import TypeVar

T = TypeVar('T')

class ClassF:
    def __init__(a_this: ClassF, a_t: T) -> None: ...
    def methodA(a_this: ClassF) -> T: ...

. . . That is unsatisfactory because 1) the definitions of the instance member variables are gone and 2) the 'Generic [T]' superclass designation is gone.

As for 1), it is not OK because any subclass of 'ClassF' that (the subclass) accesses those instance member variables will be flagged as an error by 'mypy', even if those instance member variables are not directly accessed from the outside of 'ClassF'. . . . So, the first some lines of the '__init__' method have to be preserved in the stub, which requires also the 'from datetime import datetime' line to be preserved.

As for 2), it is not OK, without any necessity of explanation.

In fact, my Gradle scripts included in my samples cited in this site automatically transform such unsatisfactory stubs to less unsatisfactory stubs based on some source file rules, to which my code conforms. The rules are 1) the 'import' lines are written from the top line consecutively (without any blank line in the block but a blank line after the block; lines with the beginning '#' are allowed; 'if ' and 'else ' lines are allowed) and 2) the class instance member variables are defined in the first consecutive lines of the '__init__' method and a blank line follows the block.

The modules paths 'mypy' searches are set in the 'MYPYPATH' environment variable.

This is a command format for checking any individual file (I need to know how to check any individual file, because I do not afford to check the whole source code when only one of the modules is modified).

@bash Source Code
mypy %the source file path%

This is a command format for generating the stub of any individual file using the 'stubgen' command (I need to generate the stub per module, because I do not afford to re-generate the stubs of the whole source code when only one of the modules is modified).

@bash Source Code
stubgen -o %the output directory%  %the source file path%

Note that any stub should have the file name extension of 'pyi'.

What if a third-party library does not offer the source code with any type annotation or any stub file?

Well, I can create some stubs manually if I will, but probably, I will give up checking my source code against the third-party library.

Here is how to do it: create a 'mypy.ini' file with some specifications in it and specify the file in the 'mypy' execution with the '--config-file' flag.

This is a sample 'mypy.ini' file, where 'aaa.Aaa' and 'aaa.Bbb' are the modules to be ignored.

@'mypy' configuration Source Code
[mypy]

[mypy-aaa.Aaa]
ignore_missing_imports = True

[mypy-aaa.Bbb]
ignore_missing_imports = True


8: The Conclusion and Beyond


Hypothesizer 7
Now, I seem to have practically sufficient technique for doing static types checking in Python.

Of course, I do not claim that it is absolutely sufficient, but I feel it is sufficient for doing static types checking in most cases.

When I find more special techniques, I will report them in this series.


References


<The previous article in this series | The table of contents of this series | The next article in this series>