[python] 파이썬에서 파일의 첫 N 줄을 읽습니다.

지정된 크기로 자르려는 큰 원시 데이터 파일이 있습니다. .net c #에서 경험이 있지만 파이썬 에서이 작업을 단순화하고 관심을 끌기를 원합니다.

파이썬에서 텍스트 파일의 첫 N 줄을 얻는 방법은 무엇입니까? 사용중인 OS가 구현에 영향을 줍니까?

답변

파이썬 2

with open("datafile") as myfile:
    head = [next(myfile) for x in xrange(N)]
print head

파이썬 3

with open("datafile") as myfile:
    head = [next(myfile) for x in range(N)]
print(head)

다른 방법이 있습니다 (Python 2 & 3)

from itertools import islice
with open("datafile") as myfile:
    head = list(islice(myfile, N))
print head

답변

N = 10
with open("file.txt", "a") as file:  # the a opens it in append mode
    for i in range(N):
        line = next(file).strip()
        print(line)

답변

첫 줄을 빠르게 읽고 성능에 신경 쓰지 않으려면 .readlines()반환 목록 객체를 사용 하고 목록을 슬라이스하면됩니다.

예를 들어 처음 5 줄의 경우 :

with open("pathofmyfileandfileandname") as myfile:
    firstNlines=myfile.readlines()[0:5] #put here the interval you want

참고 : 전체 파일을 읽었으므로 성능 관점에서 최고는 아니지만 사용하기 쉽고 빠르며 기억하기 쉽기 때문에 일회성 계산을 수행하려는 경우 매우 편리합니다

print firstNlines

다른 답변과 비교하여 한 가지 장점은 라인 범위를 쉽게 선택할 수 있다는 것입니다. 예를 들어 처음 10 줄을 건너 뛰 [10:30]거나 마지막 10 줄을 건너 뛰 [:-10]거나 짝수 줄만 사용할 수도 있습니다 [::2].

답변

내가하는 일은을 사용하여 N 줄을 호출하는 것 pandas입니다. 나는 성능이 최고가 아니라고 생각하지만, 예를 들어 N=1000:

import pandas as pd
yourfile = pd.read('path/to/your/file.csv',nrows=1000)

답변

파일 객체에 의해 노출 된 줄 수를 읽는 특정 방법은 없습니다.

가장 쉬운 방법은 다음과 같습니다.

lines =[]
with open(file_name) as f:
    lines.extend(f.readline() for i in xrange(N))

답변

gnibbler 최고 투표 답변 (11 : 20 ’09 at 0:27)을 기반으로 :이 클래스는 파일 객체에 head () 및 tail () 메서드를 추가합니다.

class File(file):
    def head(self, lines_2find=1):
        self.seek(0)                            #Rewind file
        return [self.next() for x in xrange(lines_2find)]

    def tail(self, lines_2find=1):
        self.seek(0, 2)                         #go to end of file
        bytes_in_file = self.tell()
        lines_found, total_bytes_scanned = 0, 0
        while (lines_2find+1 > lines_found and
               bytes_in_file > total_bytes_scanned):
            byte_block = min(1024, bytes_in_file-total_bytes_scanned)
            self.seek(-(byte_block+total_bytes_scanned), 2)
            total_bytes_scanned += byte_block
            lines_found += self.read(1024).count('\n')
        self.seek(-total_bytes_scanned, 2)
        line_list = list(self.readlines())
        return line_list[-lines_2find:]

용법:

f = File('path/to/file', 'r')
f.head(3)
f.tail(3)

답변

이를 수행하는 가장 직관적 인 두 가지 방법은 다음과 같습니다.

파일 한 줄 한 줄에 반복하고, break이후 N라인.
next()메소드 N시간을 사용하여 파일을 한 줄씩 반복하십시오 . (이것은 본질적으로 최고의 답변이하는 것과 다른 구문입니다.)

코드는 다음과 같습니다.

# Method 1:
with open("fileName", "r") as f:
    counter = 0
    for line in f:
        print line
        counter += 1
        if counter == N: break

# Method 2:
with open("fileName", "r") as f:
    for i in xrange(N):
        line = f.next()
        print line

결론은 당신이 사용하지 않는 한,이다 readlines()또는 enumerate메모리에 전체 파일을 보내고, 당신은 많은 옵션을 가지고있다.