[unix] grep을 사용하여 일련의 키워드로 시작하는 행을 필터링하는 방법은 무엇입니까?

나는 큰 파일 (화학 물질 데이터베이스)이 있고, 나는으로 시작하지 않는 선으로 정의 된 경우에만 헤더 레코드를 표시해야합니다 : ATOM, CONNECT, HETATM, TER, 또는 END. 나는 grep이것을 하기 위해 사용해야 합니다. 파일 샘플은 다음과 같습니다 (전체 파일은 다음과 같습니다 ).

HEADER    TRANSFERASE                             15-OCT-12   4HKD
TITLE     CRYSTAL STRUCTURE OF HUMAN MST2 SARAH DOMAIN
COMPND    MOL_ID: 1;
COMPND   2 MOLECULE: SERINE/THREONINE-PROTEIN KINASE 3;
COMPND   3 CHAIN: A, B, C, D;
COMPND   4 FRAGMENT: SARAH DOMAIN, UNP RESIDUES 436-484;
COMPND   5 SYNONYM: MAMMALIAN STE20-LIKE PROTEIN KINASE 2, MST-2, STE20-LIKE
COMPND   6 KINASE MST2, SERINE/THREONINE-PROTEIN KINASE KRS-1, SERINE/THREONINE-
COMPND   7 PROTEIN KINASE 3 36KDA SUBUNIT, MST2/N, SERINE/THREONINE-PROTEIN
COMPND   8 KINASE 3 20KDA SUBUNIT, MST2/C;
COMPND   9 EC: 2.7.11.1;
COMPND  10 ENGINEERED: YES
SOURCE    MOL_ID: 1;
SOURCE   2 ORGANISM_SCIENTIFIC: HOMO SAPIENS;
SOURCE   3 ORGANISM_COMMON: HUMAN;
SOURCE   4 ORGANISM_TAXID: 9606;
SOURCE   5 GENE: STK3, KRS1, MST2;
SOURCE   6 EXPRESSION_SYSTEM: ESCHERICHIA COLI;
SOURCE   7 EXPRESSION_SYSTEM_TAXID: 562;
SOURCE   8 EXPRESSION_SYSTEM_STRAIN: BL21 (DE3) CODON PLUS;
SOURCE   9 EXPRESSION_SYSTEM_VECTOR_TYPE: PLASMID;
SOURCE  10 EXPRESSION_SYSTEM_PLASMID: HT-PET28A
KEYWDS    HOMODIMERIZATION, HETERODOMERIZATION, SAV1, NEK2, RASSF, TRANSFERASE
EXPDTA    X-RAY DIFFRACTION
AUTHOR    G.G.LIU,Z.B.SHI,Z.C.ZHOU
REVDAT   1   04-SEP-13 4HKD    0
JRNL        AUTH   G.G.LIU,Z.B.SHI,Z.C.ZHOU
JRNL        TITL   CRYSTAL STRUCTURE OF HUMAN MST2 SARAH DOMAIN
JRNL        REF    TO BE PUBLISHED
JRNL        REFN
REMARK   2
REMARK   2 RESOLUTION.    1.50 ANGSTROMS.
REMARK   3
REMARK   3 REFINEMENT.
REMARK   3   PROGRAM     : PHENIX (PHENIX.REFINE: 1.8_1069)
REMARK   3   AUTHORS     : PAUL ADAMS,PAVEL AFONINE,VICENT CHEN,IAN
REMARK   3               : DAVIS,KRESHNA GOPAL,RALF GROSSE-
REMARK   3               : KUNSTLEVE,LI-WEI HUNG,ROBERT IMMORMINO,
REMARK   3               : TOM IOERGER,AIRLIE MCCOY,ERIK MCKEE,NIGEL
REMARK   3               : MORIARTY,REETAL PAI,RANDY READ,JANE
REMARK   3               : RICHARDSON,DAVID RICHARDSON,TOD ROMO,JIM
REMARK   3               : SACCHETTINI,NICHOLAS SAUTER,JACOB SMITH,
REMARK   3               : LAURENT STORONI,TOM TERWILLIGER,PETER
REMARK   3               : ZWART
REMARK   3
REMARK   3    REFINEMENT TARGET : ML
REMARK   3
REMARK   3  DATA USED IN REFINEMENT.
REMARK   3   RESOLUTION RANGE HIGH (ANGSTROMS) : 1.50
REMARK   3   RESOLUTION RANGE LOW  (ANGSTROMS) : 34.86
REMARK   3   MIN(FOBS/SIGMA_FOBS)              : 1.380
REMARK   3   COMPLETENESS FOR RANGE        (%) : 91.9
REMARK   3   NUMBER OF REFLECTIONS             : 29481
REMARK   3
REMARK   3  FIT TO DATA USED IN REFINEMENT.
REMARK   3   R VALUE     (WORKING + TEST SET) : 0.197
REMARK   3   R VALUE            (WORKING SET) : 0.195
REMARK   3   FREE R VALUE                     : 0.231
REMARK   3   FREE R VALUE TEST SET SIZE   (%) : 5.080
REMARK   3   FREE R VALUE TEST SET COUNT      : 1497
REMARK   3
REMARK   3  FIT TO DATA USED IN REFINEMENT (IN BINS).
REMARK   3   BIN  RESOLUTION RANGE  COMPL.    NWORK NFREE   RWORK  RFREE
REMARK   3     1 34.8685 -  3.3427    0.97     2878   149  0.1998 0.2322
REMARK   3     2  3.3427 -  2.6535    0.98     2711   175  0.2033 0.2452
REMARK   3     3  2.6535 -  2.3182    0.96     2660   155  0.1968 0.2148
REMARK   3     4  2.3182 -  2.1063    0.94     2620   114  0.1875 0.2318
REMARK   3     5  2.1063 -  1.9553    0.91     2533   113  0.1909 0.2295
REMARK   3     6  1.9553 -  1.8400    0.91     2476   143  0.1883 0.2137
REMARK   3     7  1.8400 -  1.7479    0.90     2465   128  0.1840 0.2029
REMARK   3     8  1.7479 -  1.6718    0.90     2446   130  0.1783 0.2144
REMARK   3     9  1.6718 -  1.6074    0.90     2419   129  0.1864 0.2400
REMARK   3    10  1.6074 -  1.5520    0.90     2487   120  0.1938 0.2588
REMARK   3    11  1.5520 -  1.5030    0.85     2289   141  0.1993 0.2471
REMARK   3
REMARK   3  BULK SOLVENT MODELLING.
REMARK   3   METHOD USED        : FLAT BULK SOLVENT MODEL
REMARK   3   SOLVENT RADIUS     : 1.11
REMARK   3   SHRINKAGE RADIUS   : 0.90
REMARK   3   K_SOL              : NULL
REMARK   3   B_SOL              : NULL
REMARK   3
REMARK   3  ERROR ESTIMATES.
REMARK   3   COORDINATE ERROR (MAXIMUM-LIKELIHOOD BASED)     : 0.130
REMARK   3   PHASE ERROR (DEGREES, MAXIMUM-LIKELIHOOD BASED) : 21.520
REMARK   3
REMARK   3  B VALUES.
REMARK   3   FROM WILSON PLOT           (A**2) : NULL
REMARK   3   MEAN B VALUE      (OVERALL, A**2) : NULL
REMARK   3   OVERALL ANISOTROPIC B VALUE.
REMARK   3    B11 (A**2) : NULL
REMARK   3    B22 (A**2) : NULL
REMARK   3    B33 (A**2) : NULL
REMARK   3    B12 (A**2) : NULL
REMARK   3    B13 (A**2) : NULL
REMARK   3    B23 (A**2) : NULL
REMARK   3
REMARK   3  TWINNING INFORMATION.
REMARK   3   FRACTION: NULL
REMARK   3   OPERATOR: NULL
REMARK   3
REMARK   3  DEVIATIONS FROM IDEAL VALUES.
REMARK   3                 RMSD          COUNT
REMARK   3   BOND      :  0.007           1771
REMARK   3   ANGLE     :  1.179           2367
REMARK   3   CHIRALITY :  0.083            255
REMARK   3   PLANARITY :  0.006            317
REMARK   3   DIHEDRAL  : 14.379            737
REMARK   3
REMARK   3  TLS DETAILS
REMARK   3   NUMBER OF TLS GROUPS  : NULL
REMARK   3
REMARK   3  NCS DETAILS
REMARK   3   NUMBER OF NCS GROUPS : NULL
REMARK   3
REMARK   3  OTHER REFINEMENT REMARKS: NULL
REMARK   4
REMARK   4 4HKD COMPLIES WITH FORMAT V. 3.30, 13-JUL-11
REMARK 100
REMARK 100 THIS ENTRY HAS BEEN PROCESSED BY PDBJ ON 22-OCT-12.
REMARK 100 THE RCSB ID CODE IS RCSB075574.
REMARK 200
REMARK 200 EXPERIMENTAL DETAILS
REMARK 200  EXPERIMENT TYPE                : X-RAY DIFFRACTION
REMARK 200  DATE OF DATA COLLECTION        : 16-APR-12
REMARK 200  TEMPERATURE           (KELVIN) : 100
REMARK 200  PH                             : 4.6
REMARK 200  NUMBER OF CRYSTALS USED        : 1
REMARK 200
REMARK 200  SYNCHROTRON              (Y/N) : Y
REMARK 200  RADIATION SOURCE               : SSRF
REMARK 200  BEAMLINE                       : BL17U
REMARK 200  X-RAY GENERATOR MODEL          : NULL
REMARK 200  MONOCHROMATIC OR LAUE    (M/L) : M
REMARK 200  WAVELENGTH OR RANGE        (A) : 0.97915
REMARK 200  MONOCHROMATOR                  : SI 111 CHANNEL
REMARK 200  OPTICS                         : NULL
REMARK 200
REMARK 200  DETECTOR TYPE                  : CCD
REMARK 200  DETECTOR MANUFACTURER          : ADSC QUANTUM 315
REMARK 200  INTENSITY-INTEGRATION SOFTWARE : HKL-2000
REMARK 200  DATA SCALING SOFTWARE          : HKL-2000
REMARK 200
REMARK 200  NUMBER OF UNIQUE REFLECTIONS   : 29548
REMARK 200  RESOLUTION RANGE HIGH      (A) : 1.500
REMARK 200  RESOLUTION RANGE LOW       (A) : 50.000
REMARK 200  REJECTION CRITERIA  (SIGMA(I)) : 2.000
REMARK 200
REMARK 200 OVERALL.
REMARK 200  COMPLETENESS FOR RANGE     (%) : 92.3
REMARK 200  DATA REDUNDANCY                : 5.300
REMARK 200  R MERGE                    (I) : NULL
REMARK 200  R SYM                      (I) : NULL
REMARK 200  <I/SIGMA(I)> FOR THE DATA SET  : 17.1000



답변

귀하의 의견 은 올바른 접근법입니다. 당신이 사용해야 grep한다면 아마도 사용해야 -v합니다. 그런 다음 언급 한 키워드로 시작하는 모든 줄을 일치시켜야합니다.

$ grep -Ev '^(ATOM|CONECT|HETATM|TER|END)' /path/to/your/file

-E확장 정규식을 사용하는 것입니다. ^줄의 시작과 일치하고, (a|b|c)수단 ” a이나 b또는 c“. CONNECT귀하의 질문에 ” “이 (가) 파일에 나타나지 않기 때문에 오타 라고 생각 합니다. 그래서 CONECT여기로 변경했습니다 .


답변