Python is a language that is known to be user-friendly, but it has a flaw. Python is quite slower. If we had to compare on the online platform, if C/C++ time provided was X, Java will be 2X, and in Python for data science will be 5X.
Various input/output procedures are present to improve the speed of code execution.
Example:
Suppose we need to find the sum of N numbers entered by the user.
Input any number N
Enter N numbers separated by single whitespace in a line.
Input:
4
1 2 3 4
Output:
10
Python’s Solutions for Above Problem:
Normal Method Python: (Python 2.7)
1. raw_input()- accepts an optional prompt argument. Apart from this, it strips the trailing newline character from the string it returns.
2. print- It is nothing but a thin wrapper that formats the inputs (space between arguments and newline at the end) and then calls the write function of an object.
Using stdin, stdout for a bit faster method: (Python 2.7)
1. sys.stdin- It is a File Object. It is like creating a file object to read input from the file. The file will be a standard input buffer in this case.
2. Stdout.write(‘D/n’)- It is faster than print ‘D’.
3. Much more faster is to write all once by stdout.write(“”.join(list-comprehension)). But this has a flaw; it makes memory usage dependent on the size of the input.
Adding a buffered pipe io: (Python 2.7)
1. Simply add the buffered IO code before submission code to make output faster.
2. io.BytesIO objects have a benefit; they implement a common-ish interface (called file-like object). BytesIO houses an internal pointer and for every call to read(n) the pointer advances.
3. The atexit module, which is present provides a simple interface to register functions to be called when we close a program usually without any anomalies. The sys module provides a hook, sys.exitfunc, but here only one function can be registered. The atexit registry, on the other hand, can be used by multiple modules and libraries at a time.
While working with a huge amount of data in python for data science, the normal method will not be able to perform within the given time. Method 2 mentioned above helps in handling a large amount of I/O data. While on the other hand, method 3 is the fastest. Method 2 and 3 are usually used when the input data files are greater than 2 or 3 MB.
Note: Above mentioned codes are in Python 2.7, to use them in Python 3.X versions simply replace the raw_input() with Python 3.X’s input() syntax.