NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Python Type Checker Comparison: Empty Container Inference (pyrefly.org)
Boxxed 1 days ago [-]
My favorite part about the type annotations in python is that it steers you into a sane subset of the language. I feel like it's kind of telling that python is this super dynamic language but the type annotations aren't powerful enough to denote all that craziness.
reubenmorais 1 days ago [-]
That's nice if you're starting from scratch, but if you have existing code to deal with, you don't have the privilege of ignoring the insane subset.
yunnpp 22 hours ago [-]
The type hints are not even enforced at runtime. They are mostly documentation.
LtWorf 9 hours ago [-]
They can be used at runtime though. I wrote typedload, to load external data (json/bson/yaml) into python typed objects. In this way you know that if the data doesn't match the expectations you will have an exception at a specific point in the code, and after that it's safe to use the objects, rather than having to manually check at every access.

Now there are several other libraries that do this thing, but at the time (python3.5 and 3.6) it was the only option.

yunnpp 5 hours ago [-]
That seems to handle deserialization? But would it protect you from assigning a value of the wrong type to the object later on?
maleldil 1 hours ago [-]
That depends on what you're using. If you're using Pydantic, which lets you define a struct-like data type with validation, you can tell it to validate assignments as well [1]. Or you can set the class as frozen and forbid assignment entirely [2].

However, if you mean annotating a local variable with a type, then no, nothing will stop it at runtime. If you use a type checker, though, it will tell you that statically.

The ecosystem also offers other runtime validation options, such as beartype [3]. For example, you can annotate a function such that it always checks the data types of input parameters when called. You can even apply this to a whole module if you want, but I don't think that's commonly done.

[1] https://docs.pydantic.dev/latest/api/config/#pydantic.config...

[2] https://docs.pydantic.dev/latest/api/config/#pydantic.config...

[3] https://beartype.readthedocs.io/en/latest/eli5/

LtWorf 49 minutes ago [-]
Checking types on all function calls adds a considerable amount of extra work that I personally am not willing to pay, especially since static type checkers exist.
LtWorf 5 hours ago [-]
It must be used in combination with a static checker to be useful.

So you can do like a = typedload.load(json_data, int) and then "a" is considered to be an int and at runtime will be an int.

Of course your static checker should prevent you from doing a + "string" later on because that would fail.

loevborg 1 days ago [-]
FWIW, Typescript is using Strategy 2: https://www.typescriptlang.org/play/?#code/GYVwdgxgLglg9mABM...

I'm a bit confused by the fact that the array starts out typed as `any[]` (e.g. if you hover over the declaration) but then, later on, the type gets refined to `(string | number)[]`. IMO it would be nicer if the declaration already showed the inferred type on hover.

sheept 1 days ago [-]
I agree, it's always been unsettling to see any[] on hover, even though it gets typed in the end.

I think one reason might be to allow the type to be refined differently in different code paths. For example:

    function x () {
        let arr = []
        if (Math.random() < 0.5) {
            arr.push(0)
            return arr
        } else {
            arr.push('0')
            return arr
        }
    }
In each branch, arr is typed as number[] and string[], respectively, and x's return type is number[] | string[]. If it decided to retroactively infer the type of arr at declaration, then I'd imagine x's return type would be the less specific (number | string)[].
bastawhiz 1 days ago [-]
It depends on your tsconfig. An empty array could be typed as never[], forcing you to annotate it.
wk_end 1 days ago [-]
I don't believe this is correct. There's no settings that correspond to that AFAIK, and it'd actually be quite bad, because you could access the empty array and then get a `never` object, which you're not supposed to be able to do.

https://www.typescriptlang.org/play/?#code/GYVwdgxgLglg9mABM...

`unknown[]` might be more appropriate as a default, but TypeScript does you one better: with OP's settings, although it's typed as `any[]`, it'll error out if you don't do anything to give it more information because of `noImplicitAny`.

loevborg 1 days ago [-]
Which setting specifically? Can you repro in the typescript playground?
tl2do 1 days ago [-]
Is there a compile-to-Python language with built-in type safety, similar to how TypeScript transpiles to JavaScript? I'm aware of Mojo and mypyc, but those compile to native code/binaries, not Python source.
exyi 1 days ago [-]
Python does not need that, as it has built-in type annotation support. The annotation is any expression, so you can in theory express anything a custom type-only language would allow you (although you could make it less verbose and easier to read).

However, the it IMHO just works much worse than TS because: * many libraries still lack decent annotations * other libraries are impossible to type because of too much dynamic stuff * Python semantics are multiple orders of magnitude more complex than JavaScript. Even just the simplest question: Is `1` allowed in parameter typed `float`? What about numpy float64?

tl2do 1 days ago [-]
Thanks for helping me understand. I wasn't aware of Python's type annotation support. I did some quick research and learned that type annotations don't cause compile errors even when there are type errors. Is that why type checkers like Pyrefly exist?
linsomniac 23 hours ago [-]
Correct, currently in Python the type checking is implemented more in a linting phase than in a compiling or runtime phase. Though you can also get it from editors that do LSP, they'll show you type errors while editing the code.
tl2do 23 hours ago [-]
Thanks linsomniac and exyi. I didn't realize Python's type hints are checked by linters, not the compiler. Learned something today.
LtWorf 6 hours ago [-]
Yes, but there are also runtime type checkers that can be used to check that input data conforms to the expected types (aka a schema but defined using python types and classes).
ajb 22 hours ago [-]
The only language I'm aware of that's a bit like that is rpython, but it's the other way round: designed for python to compile to it. If you think about it, you get more benefit from the typed language being the base one, as the compiler or JIT can make more assumptions, producing faster code . Typescript had no alternative but to do it the other way, since it's a lot harder to get things adopted into the browser than to ship them independently.
sakesun 22 hours ago [-]
You can compile F# to Python with Fable https://github.com/fable-compiler/Fable.Python
jez 1 days ago [-]
A more complicated version of this problem exists in TypeScript and Ruby, where there are only arrays. Python’s case is considerably simpler by also having tuples, whose length is fixed at the time of assignment.

In Python, `x = []` should always have a `list[…]` type inferred. In TypeScript and Ruby, the inferred type needs to account for the fact that `x` is valid to pass to a function which takes the empty tuple (empty array literal type) as well as a function that takes an array. So the Python strategy #1 in the article of defaulting to `list[Any]` does not work because it rejects passing `[]` to a function declared as taking `[]`.

dupdrop 21 hours ago [-]
Only Python, is a language soooo dynamic, that the question "Does this code type-checks?" may get the valid response: "With which of the 5 existing type checkers?"
Daishiman 21 hours ago [-]
It's actually a fairly frequent fact of programming language development that type resolution can change across versions. Haskell famously has a ton of extensions that enhance the type system in various, potentially incompatible ways.

In fact the question of whether a code type-checks is itself undecidable.

Sinidir 1 days ago [-]
In the example given in the article i think the correct behavior would have been to infer the type backwards from the return type of the function. Is that not why mypy actually errors here?
ocamoss 24 hours ago [-]
If you're referring to the `first_three_lines` example in strategy 3, Mypy would give the same error even if we changed the return value to something unrelated like `return ["something"]`.
electroglyph 1 days ago [-]
my wishlist for pyrefly: when using decorated functions, show the underlying type hints instead of the decorators
IshKebab 1 days ago [-]
I think it would be worth mentioning that in normal use (strict mode) Pyright simply requires you to add type annotations to the declaration. Occasionally mildly annoying but IMO it's clearly the best option.
veber-alex 1 days ago [-]
It's not "mildly annoying".

I don't enable strict mode on multiple projects because people don't want to type anything outside of function signatures.

Inferring the type from the first use is 100% the correct choice because this is what users want 99% of the time, for the rest you can provide type information.

maleldil 23 hours ago [-]
Annotating empty collections is one of the few places you need to annotate outside function signatures. It's not a big deal. It doesn't happen that often.
nomel 20 hours ago [-]
And, when it does, you can just put them when the empty container is assigned:

    things: set[tuple[str, str, int]] = set()
    users: list[User] = []
Many people don't seem to know this exists.
maleldil 1 hours ago [-]
Yes, that's what I was referring to. I get it that Pyrefly wanted to advertise their approach here, but it's weird that they didn't at least acknowledge this. It's what I use because it works on every type check, and I don't need to rely on their particular implementation for this.

In fact, I recently migrated a project from Pyright to Pyrefly for performance reasons, and there was very little I had to change between. The most annoying thing was Pyrefly's lack of exhaustive pattern matching for StrEnum and Literal[...]

ocamoss 12 minutes ago [-]
It's acknowledged at the end of the "infer any" strategy, but perhaps worded poorly.

> To improve type safety in these situations, type checkers that infer Any for empty containers can choose to generate extra type errors that warn the user about the insertion of an Any type. While this can reduce false negatives, it burdens developers by forcing them to explicitly annotate every empty container in order to silence the warnings.

ie: "type checkers that don't infer container types can emit an error and require users to annotate"

ocamoss 24 hours ago [-]
Requiring the annotations on empty containers is the only way to have type safety if the type checker cannot infer the type of the container, like Pyright.

If the type checker can infer a type then the annotation would only be required if the inferred type doesn't match the user's intent, which means one would need to add fewer annotations to an arbitrary working-but-unannotated program to satisfy the type checker.

IshKebab 14 hours ago [-]
Yes but also having more complicated type inference makes the actual type checking less useful as a check. You see that in languages with global type inference too.

Adding explicit types strategically (e.g. in function signatures) tells the compiler (and readers) explicitly what the type should be, so if you add code that violates that it gives you an error instead of silently inferring a different type.

brainzap 1 days ago [-]
In early typescript I was too lazy and just set an inital value and then zero the list
curiousgal 1 days ago [-]
I can't help but find type hints in python to be..goofy? I have a colleague who has a substantial C++ background and now working in python, the code is just littered with TypeAlias, Generic, cast, long Unions etc.. this can't be the way..
tialaramex 1 days ago [-]
Typing is a relatively easy way for the human author and the machine to notice if they disagree about what's going on before problems arise. It is unfortunate that Python doesn't do a good job with types, I was reading earlier today about the mess they made of booleans - their bool type is actually just the integers again.
nubg 1 days ago [-]
> I was reading earlier today about the mess they made of booleans

Can you elaborate on that?

tech2 1 days ago [-]
It's not entirely fair.

Prior to 2.3 Python didn't have booleans, just "truthiness". In 2.3 they added the Boolean class as a subclass of int (because of patterns of development it was a pragmatic choice). True and False were introduced, but they were able to be reassigned which could cause all manner of fun. 3.x made them keywords which put a stop to that but the int aspect remained.

tialaramex 1 days ago [-]
Because Python decided that (for the usual New Jersey reason, simplicity of implementation) bool should just be an integer type the Liskov criterion comes into play. If we can X an integer and we've agreed bool is an integer => we can X a bool. That's not what booleans are but hey, it's sorta close and this was easier to implement.

So, can we add two bools together? Adding booleans together is nonsense, but we've said these are a kind of integer so sure, I guess True + True = 2 ? And this cascades into nonsense like ~True being a valid operation in Python and its result is true...

unanimous 1 days ago [-]
Out of curiosity, I tried running `~True` in a Python 3.14.2 repl and got this output (the -2 is part of the output):

>>> ~True

<python-input-1>:1: DeprecationWarning: Bitwise inversion '~' on bool is deprecated and will be removed in Python 3.16. This returns the bitwise inversion of the underlying int object and is usually not what you expect from negating a bool. Use the 'not' operator for boolean negation or ~int(x) if you really want the bitwise inversion of the underlying int.

-2

tialaramex 4 hours ago [-]
Yes, the article I was reading was about proposals to er, undeprecate this feature. Reasoning that well, sure it's obviously a footgun - but it works for integers and we've said bools are integers so...
__mharrison__ 1 days ago [-]
This is actually useful in pandas. It enables asking questions like "what percent of cars get greater than 40mph?"
jcgl 1 days ago [-]
> So, can we add two bools together? Adding booleans together is nonsense, but we've said these are a kind of integer so sure, I guess True + True = 2 ? And this cascades into nonsense like ~True being a valid operation in Python and its result is true...

The bitwise negation is indeed janky and inaccurate, but True + True = 2 is absolutely a valid thing to say in boolean algebra. Addition mean "or", and multiplication means "and."

tialaramex 4 hours ago [-]
> True + True = 2 is absolutely a valid thing to say in boolean algebra

Nope. The Boolean algebra only has two values, and it lacks the addition operation entirely.

jcgl 9 minutes ago [-]
I always remember learning that 2 was a legit enough way to represent the result of 1 + 1, but the internet seems to agree with you mostly. Though I contend that 1 + 1 = 2 is unambiguous, so is fine.

But multiplication and addition do work just fine for boolean arithmetic: https://en.wikipedia.org/wiki/Two-element_Boolean_algebra

IshKebab 1 days ago [-]
He did - booleans are integers:

  >>> isinstance(False, int)
  True
A related screw-up is implicitly casting everything to bool. A lot of languages made that mistake.

Overall I'd say they didn't do an awful job though. The main problems with Python are the absolutely abysmal tooling (which thankfully uv fixes), the abysmal performance (which sometimes isn't an issue, but it usually becomes an issue eventually), and the community's attitude to type checking.

Actually type checking code you've written yourself with Pyright in strict mode is quite a pleasant experience. But woe betide you if you want to import any third party libraries. There's at least a 50% chance they have no type annotations at all, and often it's deliberate. Typescript used to have a similar problem but the Javascript community realised a lot quicker than the Python community that type hints are a no-brainer.

maleldil 23 hours ago [-]
> TypeAlias, Generic

This is mitigated by modern (3.12+) generic and `type` syntax, which just looks like any other static language.

wiseowise 1 days ago [-]
What is the way in your opinion?
IshKebab 1 days ago [-]
I strongly disagree. Python has actually done a decent job of adding type annotations into the language IMO.

If you ignore the bit where they don't actually specify their semantics anyway.

> this can't be the way..

The alternative is fragile and unmaintainable code. I know which I prefer!

b00ty4breakfast 21 hours ago [-]
the alternative should be using a real statically-typed language instead of glorified comments that don't do anything without outside tools.

I understand that very large code bases have been built in python and this is a compromise to avoid making them rewrite Ks upon Ks of LoC but as it stands, Python type annotations are akin to putting a Phillip's head screwdriver on a ball peen hammer; the screwdriver is not a real screwdriver and the ergonomics of the hammer have been compromised.

IshKebab 15 hours ago [-]
Well yes I agree using Rust or whatever would be better, but if your options are Python or Python with type hints, then the latter gets you closest to proper static typing. They're really not that bad with Pyright in strict mode. Mypy is rubbish.
nimbus-hn-test 1 days ago [-]
Enforcing explicit annotations in strict mode is a productivity multiplier. It prevents `list[Unknown]` from polluting the rest of the codebase, which is much harder to fix later.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 23:04:33 GMT+0000 (Coordinated Universal Time) with Vercel.