Home > mysql > How to store static list of hashes in MySQL effectively?

How to store static list of hashes in MySQL effectively?

October 4Hits:1
Advertisement

Problem: I have a list of pairs (md5_hash, id). I want to store the data in MySQL and I want to make queries like this

SELECT id FROM table WHERE md5_hash = <some_hash> 

The number of pairs can be tens or hundred of millions and the number is static - i.e., I do not add new records. It is important for me to save disk space. The time is not so important in this case (if a lookup takes less than say 1 second it is okey).

My thoughts:

I started by creating a table where md5_hash is represented in this way:

CREATE TABLE `myTable` (    `md5` binary(16) NOT NULL, ' the values are not unique but we can suppose they are.    `id` int(10) unsigned NOT NULL ) ENGINE=MyISAM 

so md5 hashes are represented as binary numbers instead of hexadecimal characters to save space. For the same reasons the columns are defined as "NOT NULL". I chose MyISAM engine because my tests show that MyISAM requires less disk space than InnoDB. In addition MyISAM tables can be compressed by myisampack utility.

Now the hard part comes. When I create an index on md5 column the index requires more disk space than the data itself! I tested it with 17 millions records and my table took around 300 MB of disk space and the index took about 330 MB of disk space. The size of the index is crazy.

One idea is to provide the md5 hashes presorted to MySQL which should result in smaller index size. But I do not know how to do that.

Another idea is to divide myTable in several smaller tables in order to decrease size of the index. I tried mysql partitioning. However, the purpose of the feature is to improve query time and not disk space usage.

Do you have any idea how to decrease the required disk space usage?

Answers

Some options:

  • Use InnoDB. If not for these reasons, then for compression! With InnoDB you can:
    ALTER TABLE my_table ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
    
    

    Replace 8 with 4 or 2 or 1 to (hopefully) get better compression. Since the table is static, I think this is a great solution for you (with strong compression, writes become slower, but you don't care about that).

    Not only data is compressed - indexes also. I would suggest this is the easiest option you have.

  • Only index part of your column. You agree that the column is not UNIQUE but can be assumed to be. OK; is your index a UNIQUE index? If not, try to:
    ALTER TABLE my_table ADD INDEX `md5_prefix_idx` (md5(8))
    
    

    to only index first 8 bytes of the columns.

  • Try TokuDB, which is an alternative storage engine to InnoDB, and which has an amazing compression (I've seen data get 20 times smaller than InnoDB with TokuDB aggressive compression).

Related Articles

  • How to store static list of hashes in MySQL effectively?October 4

    Problem: I have a list of pairs (md5_hash, id). I want to store the data in MySQL and I want to make queries like this SELECT id FROM table WHERE md5_hash = <some_hash> The number of pairs can be tens or hundred of millions and the number is static

  • Where should I store static resources when I build spring-boot app by gradle?December 24

    I've found samples that stores static files in /src/main/webapp/ and /src/main/resources/static/. What's the difference and what's the best place to store static files in spring-boot app? --------------Solutions------------- If your resources are in

  • CDN or seperate site to store static content?January 1

    If I understand this correctly, I have two options for static files: Use a CDN and throw all my static files on it. Use a separate domain just to store the static files so users can download it simultaneously. So I assume it is either one to choose o

  • The best way to store static key value pair listApril 28

    I wish to store a static key value pair list, around 200 pairs. A few ways i can think of are the - variable_get set - custom table - cache - define constant in code - hardcoded in code The list will be referenced on every node. I am using views as w

  • Is it a good idea to store Email addresses as hash only?May 8

    I am currently building a web service at http://write-math.com similar to http://detexify.kirelabs.org/ that should help users to get LaTeX code from drawn formulae. It is part of my bachelors thesis and a main goal of this project is to make it easi

  • Do sites store login password with hash? If so, can people can use hash collision to log in?June 26

    I was researching about hash, and I thought, If sites store passwords with hash algorithms, then can't this happen: User A has the password 'hello' User B finds out the hash code of the password of user A User B knows the username of user A and uses

  • US server for PHP & static file, UK server for MySQL (need suggestion)November 9

    I have 2 servers: one in US, one in UK both servers are good hardware and connected with unmetered 100Mbps tier-1 but because US server is always give better search engine positioning, I'm going to setup server in US for PHP & static file, while serv

  • Can a salted hash be an effective MAC?January 22

    An HMAC is basically a "keyed hash". Only the correct message and the correct key will produce a particular hash digest efficiently. Conceptually speaking, the same can be said for a salted hash; only the correct message and the same salt value

  • imovie. Adding a static image without the gentle zooming effect

    imovie. Adding a static image without the gentle zooming effectApril 20

    I'm using imove 10.0.3 When I add a static image (a jpeg file) to my movie it adds a gentle zooming effect. Very pretty but I didn't want it, and I cannot work out how to switch it off. Infuriating! I'm trying to add the image either by dragging it o

  • How should I store image, text type files in mysql?March 23

    I came across some answers mentioning the "disasters" of storing files outside database and saving the path inside database. I've also noticed dozens of answers about storing images- which suggest to store images outside database and keep path i

  • Is it safe to store the password hash in a cookie and use it for "remember-me" login?December 5

    I want to store an encrypted string of the password hash in a cookie and use the hash to lookup the user and log them in (if they want to be remembered). Is this safe? The password is one-way hashed with SHA-512, 1024 iterations, using a timestamp fo

  • Is it possible to store Hashed data in a different field than userPassword in LDAPSeptember 17

    I've been asked to store hashed data in a LDAP server. Putting hashed passwords is easy but I'm not sure if LDAP allows to store data like email hashed or even encrypted. Creating a LDIF file with a string "{MD5}contents" gives me no errors but

  • Is it safe to store a password hash history for preventing user to keep same password repeatedly in some cases?April 1

    I am developing an application in PHP and it uses bcrypt encryption to store passwords. I want to keep the history of hashes whenever the user changes the password. By doing this I want to stop the user from entering the previous passwords in some sc

  • How feasible it is to store lot of images in database? August 6

    So I'm using an app that stores images heavily in the DB. What's your outlook on this? I'm more of a type to store the location in the filesystem, than store it directly in the DB. What do you think are the pros/cons? --------------Solutions---------

  • Is salting a hash really as secure as common knowledge implies?May 8

    (I did search on this topic, but I found no complete question/answer that addressed it, or even good portions of questions that might be relevant.) I'm implementing a salt function for user passwords on my web page, and I'm wondering about some thing

  • When to used static visibility? June 20

    This question already has an answer here: Using static in PHP 3 answers OO PHP static keyword, should I use it? 3 answers I can't attain the primary used of Static Visibility. My question is why they used static method & properties in this code? clas

  • What is the solution for handling exceptions in password hashing schemes?October 29

    There is a technique called password hashing that describes creating unique passwords based on a unique master key and a static and arbitrary hash value. Using this technique, the user only has to remember the master key to have a unique password for

  • How to authenticate client at server using a hashed password with salt?

    How to authenticate client at server using a hashed password with salt?January 24

    I try to implement a authentification algorithm. My basic sequence for now is the following: Right now i doubt that this is the correct way to do it. With the salt i want to prevent rainbow tables but requesting the salt from the server would let a m

  • How do I compare two hashes using SHA256Managed?February 7

    I can hash my user-entered password, but I'm unable to find out how to compare the stored hash and the new hash for a user-entered password. This is my hashing code: public static string CalculateHash(string clearTextPassword, string salt) { //Conver

  • Static variables in JavaScriptOctober 8

    How can I create static variables in Javascript? --------------Solutions------------- If you come from a class-based, strongly typed object-oriented language (like Java, C++ or C#) I assume that you are trying to create a variable or method associate

Copyright (C) 2018 ceus-now.com, All Rights Reserved. webmaster#ceus-now.com 14 q. 0.666 s.