Linus Torvalds Spoke About the Collisions of SHA-1 in Git Repositories: There is Nothing to be Afraid of
Google's co-workers and the Centre of Mathematics and Computer Science in Amsterdam, presented the first algorithm generating collisions for SHA-1. During ten years of existence of the SHA-1 was not aware of any practical way to create documents with the same hash SHA-1 and digital signature as another document, but now this opportunity has appeared.
A hash function SHA-1 is used overall, so the news of the generation of documents with an identical hash caused natural concern for users. Including users of version control system Git, which also uses the SHA-1 hashes. Detailed answers to these concerns gave Linus Torvalds. In short, nothing to fear.
Linus said that this attack would do nothing critical in the search for this collisions. According to him, there is a big difference between using a cryptographic hash for digital signatures and encryption systems for the generation of "Content ID" in the system such as Git.
In the first case, the hash – is a kind of declaration of trust. The hash acts as a source of confidence that fundamentally protects you from the people you cannot check in other ways.
On the other hand, Git hash isn't used for "trust." Here the faith applies to people, not on the hash, says Linus. In projects like Git, hash SHA-1 is used for entirely different technical objectives – only to avoid accidental conflicts and as an excellent way to detect errors. It's just a tool that helps you quickly identify the distorted data. This is not about data security and the technical convenience of deduplication and error detection. Other version control systems are often used for error detection techniques such as CRC.
Linus admits that SHA-1 is used as the signature Git branches, so in that sense, it is also a part of the web of trust, so the appearance of the attack to find collisions has adverse consequences for Git. But actually, it should admit that this particular attack very easily avoided for several reasons.
Firstly, through this attack, the attacker cannot just create a document with a predetermined hash. He needs to create two documents at once because the attack is conducted on an identical prefix. Secondly, developers have worked to find the SHA-1 collision published scientific articles and posted the tools to recognize the signs of an attack. It can be effortless to identify documents that have the prefix is suitable for generating a second document with the same hash.
That is, in practice, if implemented appropriate protection measures against documents with this prefix, the attack is not feasible. By the way, this protection has already been implemented in Gmail and GSuite. The detector works sensitive documents publicly available on the website shattered.io. Collision detection sha1collisiondetection Library can be found on Github.
When all the data is in the public domain, the real attack is almost impossible. The authors cite the example of research attacks on PDF documents with the same prefix. This attack is successful because the prefix itself "closed" within a document, as a blob. If we have open source code in the repository, then it is another matter. It is hardly possible to do such a prefix of the source code (only the blob). In other words, to create the same prefix and following code generation branches with the same SHA-1 hashes will have to implement the code in some random data that will be immediately noticed. Linus said that there are places where you can hide data, but git fsck already catches such tricks.
Linus Torvalds admits that the real fear can only be tracking PDF documents Git tools. It is possible to recommend to use the instruments to detect signs of attacks, as described above. These patches have been created for hosting the kernel.org github.com and, soon, they will become active, so there is nothing to worry about.
Well, among other things, the future will go from using Git SHA-1, said Linus, have a plan, so that no one even had to convert their repositories. But what is clear, is not such a critical thing to rush into it.
By the way, said Torvalds, tracking problem of PDF-documents with identical hashes, SHA-1 has already proved itself in the version control system Apache SVN, which is used in WebKit repository and the other main projects. Friday night at the Web site of the attack to search for the SHA-1 Collision new information regarding the attack actions on the SVN version control system. They pointed out that the PDF-files with the same SHA-1 hashes is already scratching their SVN repository.
It turns out that if you pour two different files with the same hashes, then the version control system cannot cope with the bug. Someone poured such files in WebKit repository, and then he messed up and stopped accepting new commits.
Here are the two PDF files with the same hash: