Home > python > Downloading files using Python-requests

Downloading files using Python-requests

June 11Hits:4
Advertisement

I wrote a Python script to download files using multiple (source) IP addresses -- kindly suggest any improvements.

import cgi import os import posixpath import Queue import threading import urllib import urlparse import random import re import shutil import time  import requests import requests_toolbelt  def get_IPs():     """Returns all available IP addresses in a list."""     # TODO: Windows only. Other options?     out = []     for i in os.popen('ipconfig'):         i = i.strip()         if i.startswith('IP'):             out.append(i.rsplit(' ', 1)[-1])      return out  def get_info(url):     """Returns name and size of file to be downloaded."""     try:         resp = requests.head(url, allow_redirects=True)         name = cgi.parse_header(resp.headers['content-disposition'])[1]['filename']     except KeyError:         path = urlparse.urlsplit(url).path         name = posixpath.basename(path)     name = urllib.unquote_plus(name)     size = int(resp.headers['content-length'])     return name, size  def worker(url, session, ud, part, size):     """Downloads a part of the file specified by 'part' parameter."""     # TODO: optimal tries, timeout?     for _ in xrange(2):         try:             open('%s/%04d' % (ud, part), 'wb').write(                 session.get(url, timeout=(2, 7), headers={'range': 'bytes=%s-%s' % (                     part*chunk, min(size, part*chunk + chunk - 1))}).content)             break         except:             pass     else:         worker(url, sessions_queue.get(), ud, part, size)      sessions_queue.put(session)  def summary(name, size, elapsed):     """Prints summary of the download after it is completed."""     print (         '--\n'         '%s download completed.\n'         'Time elapsed: %.2fs\n'         'Average download speed: %.2f MB/s\n'         '--' % (name, elapsed, size/elapsed/2**20))  def download(url):     """Downloads the file pointed to by 'url' parameter."""     start = time.clock()     name, size = get_info(url)     # random id of length 20     ud = '%0x' % random.getrandbits(80)     os.mkdir(ud)     threads = []     for i in xrange(size/chunk + (size%chunk != 0)):         t = threading.Thread(target=worker, args=(url, sessions_queue.get(), ud, i, size))         threads.append(t)         t.start()      # characters \/:*?"<>| not allowed in filenames in Windows     name = re.sub(r'[\\/:*?"<>|]', '_', name)     # TODO: check if a file is already present with same name     out = open(name, 'ab')     for i, t in enumerate(threads):         t.join()         out.write(open('%s/%04d' % (ud, i), 'rb').read())      summary(name, size, time.clock() - start)     shutil.rmtree(ud)  def main():     IPs = get_IPs()     print len(IPs), 'IPs available.'     for ip in IPs:         adapter = requests_toolbelt.adapters.SourceAddressAdapter(ip)         session = requests.Session()         session.mount('http://', adapter)         session.mount('https://', adapter)         sessions_queue.put(session)      while True:         threading.Thread(target=download, args=(raw_input(),)).start()   if __name__ == '__main__':     sessions_queue = Queue.Queue()     KB = 1024     MB = 1024*KB     # TODO: optimal size?     chunk = 100*KB     main() 

I am using it with about 100 IP addresses on my Ethernet -- each with about 100 KB/s speed. What'd be optimal configuration? (numbers of threads, chunk size)

Answers

You could rewrite your get_IPs function to be a list comprehension instead:

return [i.rsplit(' ', 1)[-1] for i in map(str.strip, os.popen('ipconfig'))
        if i.startswith('IP')]

map will call strip on all of the results from 'ipconfig' and then you can iterate over that, ignoring any values that don't start with "IP".

In worker you're using a loop to retry after timeouts. But you're just using 2 arbitrarily. Use a constant here so it's clear what you're doing, and easy to change later:

You also multiple times open files, but you should always try to use with, known as the context manager. It automatically closes the file even in the event that an error is raised. It's the safest way to open a file.

with open(filepath) as filename:
    execute_code_with(filename)
print("Done with filename")

Once you leave that indented block, the file is automatically closed. No need to even call filename.close().

Related Articles

  • Downloading files using Python-requestsJune 11

    I wrote a Python script to download files using multiple (source) IP addresses -- kindly suggest any improvements. import cgi import os import posixpath import Queue import threading import urllib import urlparse import random import re import shutil

  • requesting for downloadable files via HEADSeptember 6

    I bet google is checking web pages via HTTP HEAD and HTTP GET request methods. So I'm thinking I should also allow proper header output for downloadable files from my website via HTTP HEAD request but no documentation tells me the proper way to go. C

  • Uploading file to Spring REST server using Python RequestsFebruary 1

    I am trying to upload a file using python requests to my java/scala spring rest server. I am getting the following response: {"timestamp":1454331913056,"status":400,"error":"Bad Request","exception":"

  • Downloading php files from python simple http serverAugust 27

    I started python -m SimpleHTTPServer on one computer on lan and used wget to download php files from it to another. As far as i see, they seem to be downloaded correctly - i got php sources instead of html layout. Why? Is this because this server doe

  • While downloading files, python crashed and seems to have reserved a filenameJuly 22

    I was downloading files to my disk with urllib.urlretrieve when I received an IO error: Traceback (most recent call last): File "utils.py", line 21, in ? urllib.urlretrieve(file[1], GetLogFile(int(file[0][0:4]),int(file[0][5:7]),int(file[0][8:10

  • Use python requests to download CSVFebruary 12

    Here is my code: import csv import requests with requests.Session() as s: s.post(url, data=payload) download = s.get('url that directly download a csv report') This gives me the access to the csv file. I tried different method to deal with the downlo

  • How do I pipe a downloaded file to standard output in bash?June 15

    How is it possible to pipe out wget's downloaded file? If not what alternatives should I use? UPD: Thanks all for your answers, I managed to tune my downloader (curl) to send files to stout. However the files I'm receiving are zipped and I'm now stru

  • Where does wget put downloaded files on Windows? December 1

    I am using Windows XP. I just downloaded wget and put it into my C:\Windows folder. I then ran this command line (trying to download an image from a website): wget -r -A ".jpg,.gif,.png" http://somesite'sURL/lang2.JPG A black DOS-like window app

  • IE8 Unable to download filesApril 15

    I recently installed Windows 7. I can browse to any webpage using IE8, but if I click on any links to download files, I receive the following error: Unable to download [filename] from [website]. Unable to open this Internet site. The requested site i

  • Use command line to download file which is accessible only in a given session April 14

    I have some "corporative" systems I have to use at work. Some of them are terrible, with frames, pop-ups and such. I was thinking about automating some stuff using curl or something like that. I need login, click on hyperlinks, set some data, an

  • Solution for noisy network that corrupts downloaded filesJune 3

    The network on the company where my wife works sucks BIG TIME. Almost every single file bigger then a few hundred kB is corrupted during download and even web pages are sometimes garbled. We checked all we could possibly check and it really isn't a p

  • How can I limit downloaded file size in wget?July 10

    I want to use wget (from a php script) to download image files, but don't want to download files over a certain size. Can I limit file size with wget? If not, what is a better way? --------------Solutions------------- If you are scripting downloads,

  • Download file from Document Library using Lists.asmxSeptember 28

    How can I download files from document library using Lists.asmx web service ? How do I instantiate an object of Lists web service in my code ? Does any one have complete code for this ? --------------Solutions------------- How to download files from

  • Run downloaded files in Google Chrome/Firefox instead of saving them like in IEOctober 5

    In Internet Explorer, there is a very good option while trying to download something: Run (it comes before Save). With this option the file (i.e an installer file) will be downloaded to Temporary Internet Files and instantly runs after it's finished

  • List of downloadable files on a websiteMarch 18

    Is there a way to list all the downloadable files on a specific website (even at a specific level say)? My typical issue is: An author puts a preprint toto.pdf on his website, then the paper gets published and the author removes (or redirects) the li

  • Download File does not do anything October 26

    Possible Duplicate: How do I install Ubuntu? 1 Saved download file (ISO) to USB stick but it doesn't do anything other than want to creat a DVD. 2 Saved the file to hard drive but it cannot be read by windows. Is there a way to download the working p

  • Do Governor limits apply to downloading files?February 1

    Governor limits include Maximum size of callout request or response (HTTP request or Web services call) - 3 MB Does this include downloading files via HTTP from external servers? Will those files also be restricted to 3 megabytes? --------------Solut

  • continue downloading files starting from the url where it stops : WGETMarch 6

    what I want to do is to download files using wget. The maximum depth is up to 3. Now, I know that there is a big possibility that the files in the url will be updated. Is there a way that these updated/modified files will be the only ones to be downl

  • How to download files with wget where the page makes you wait for download?May 1

    I am trying to download a file from sourceforge using wget, but as we all know we have to click on the download button and then wait for it to auto download. how do you download this type of file using wget? I am trying to download this: http://sourc

  • Script or Someway to Watch People Downloading Files from My Server (HTTP)August 9

    I am trying to figure out a way to in real time see people downloading files to ensure that the download from that person is successful. I cant seem to get the search string right to find what i need. Any help is appreciated! I do have some bash scri

Copyright (C) 2018 ceus-now.com, All Rights Reserved. webmaster#ceus-now.com 14 q. 0.917 s.