64

محاضرة المدونات اللغوية وأدواتها

  • Upload
    iwanrg

  • View
    673

  • Download
    2

Embed Size (px)

Citation preview

Page 1: محاضرة المدونات اللغوية وأدواتها
Page 2: محاضرة المدونات اللغوية وأدواتها
Page 3: محاضرة المدونات اللغوية وأدواتها
Page 4: محاضرة المدونات اللغوية وأدواتها
Page 5: محاضرة المدونات اللغوية وأدواتها
Page 6: محاضرة المدونات اللغوية وأدواتها
Page 7: محاضرة المدونات اللغوية وأدواتها
Page 8: محاضرة المدونات اللغوية وأدواتها
Page 9: محاضرة المدونات اللغوية وأدواتها
Page 10: محاضرة المدونات اللغوية وأدواتها
Page 11: محاضرة المدونات اللغوية وأدواتها
Page 12: محاضرة المدونات اللغوية وأدواتها
Page 13: محاضرة المدونات اللغوية وأدواتها
Page 14: محاضرة المدونات اللغوية وأدواتها
Page 15: محاضرة المدونات اللغوية وأدواتها
Page 16: محاضرة المدونات اللغوية وأدواتها
Page 17: محاضرة المدونات اللغوية وأدواتها
Page 18: محاضرة المدونات اللغوية وأدواتها
Page 19: محاضرة المدونات اللغوية وأدواتها
Page 20: محاضرة المدونات اللغوية وأدواتها
Page 21: محاضرة المدونات اللغوية وأدواتها
Page 22: محاضرة المدونات اللغوية وأدواتها
Page 24: محاضرة المدونات اللغوية وأدواتها
Page 25: محاضرة المدونات اللغوية وأدواتها
Page 26: محاضرة المدونات اللغوية وأدواتها

Page 27: محاضرة المدونات اللغوية وأدواتها

0

50,000,000

100,000,000

150,000,000

200,000,000

250,000,000

300,000,000

Page 28: محاضرة المدونات اللغوية وأدواتها
Page 29: محاضرة المدونات اللغوية وأدواتها
Page 30: محاضرة المدونات اللغوية وأدواتها

و الواقعبني التصميم على األوعية والفرق توزيع حمتوى املدونة

Page 31: محاضرة المدونات اللغوية وأدواتها

و الواقعبني التصميم على الفرتات الزمنية والفرق توزيع حمتوى املدونة

Page 32: محاضرة المدونات اللغوية وأدواتها
Page 33: محاضرة المدونات اللغوية وأدواتها
Page 34: محاضرة المدونات اللغوية وأدواتها
Page 35: محاضرة المدونات اللغوية وأدواتها

Al-Thubaity, A. (2014).A 700M+ Arabic corpus: KACST Arabic corpus design and construction. Language resources and evaluation. DOI 10.1007/s10579-014-9284-1

Page 37: محاضرة المدونات اللغوية وأدواتها
Page 38: محاضرة المدونات اللغوية وأدواتها
Page 39: محاضرة المدونات اللغوية وأدواتها
Page 40: محاضرة المدونات اللغوية وأدواتها
Page 41: محاضرة المدونات اللغوية وأدواتها
Page 42: محاضرة المدونات اللغوية وأدواتها
Page 43: محاضرة المدونات اللغوية وأدواتها
Page 44: محاضرة المدونات اللغوية وأدواتها
Page 45: محاضرة المدونات اللغوية وأدواتها
Page 46: محاضرة المدونات اللغوية وأدواتها
Page 47: محاضرة المدونات اللغوية وأدواتها
Page 48: محاضرة المدونات اللغوية وأدواتها
Page 49: محاضرة المدونات اللغوية وأدواتها
Page 50: محاضرة المدونات اللغوية وأدواتها
Page 51: محاضرة المدونات اللغوية وأدواتها
Page 52: محاضرة المدونات اللغوية وأدواتها

Page 53: محاضرة المدونات اللغوية وأدواتها

Alfaifi, Abdullah and Atwell, Eric (2014). Comparative Evaluation of Tools for Arabic Corpora Search and Analysis. Paper presented at BAAL / Cambridge University Press Applied Linguistics 2014, 14 Jun 2014. Leeds Metropolitan University, UK.

Al-Thubaity, A., Al-Khalifa, H., Alqifari, R. and Almazrua, M. “Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora,” The Scientific World Journal, vol. 2014, Article ID 602745, 10 pages, 2014. doi:10.1155/2014/602745

Page 54: محاضرة المدونات اللغوية وأدواتها
Page 55: محاضرة المدونات اللغوية وأدواتها
Page 57: محاضرة المدونات اللغوية وأدواتها
Page 58: محاضرة المدونات اللغوية وأدواتها

Corpus Name Reference Source Purpose

Modern Standard

Arabic Corpus

Abdelali et al.

(2005)

Ahmad Abdelali

http://aracorpus.e3rab.com/index.php

?content=english

Specific/Arabic NLP

Akhbar Al Khaleej

2004

Abbas and Smaili

(2005)

Mourad Abbas

http://sourceforge.net/projects/arabic

corpus/files

Specific/Arabic NLP

Corpus of

Contemporary Arabic

Al-Sulaiti and

Atwell

(2006)

University of Leeds

http://www.comp.leeds.ac.uk/eric/lati

fa/research.htm

Specific/Teaching Arabic as a

foreign language and Arabic

language engineering

International Corpus

of Arabic (ICA)

Alansari et al.

(2007)

Bibliotheca Alexandrina

http://www.bibalex.org/ica/en/

General

Open Source Arabic

Corpus (OSAC)

Saad and Ashour

(2010)

Motaz Saad

https://sites.google.com/site/motazsit

e/arabic/osac

Specific/Arabic NLP

Alwatan-2004 Abbas et al. (2011) Mourad Abbas

http://sourceforge.net/projects/arabic

corpus/files

Specific/Arabic NLP

Arabic GigawordCorpus Fifth Edition

Parker, Robert, et

al (2011)

LDC

https://catalog.ldc.upenn.edu/LDC201

1T11

Specific/Arabic NLP

Table 1a: Table 1a: Arabic corpora released between 2005 and 2013. Criteria: Reference, Source, and Purpose

Page 59: محاضرة المدونات اللغوية وأدواتها

Corpus Name Reference Source Purpose

King Saud University

Corpus of Classical

Arabic (KSUCCA)

Alrabiah et al.(2013) King Saud University

http://ksucorpus.ksu.edu.sa/

General

KACST Text

Classification Corpus

Khorsheed and Al-

Thubaity (2013)

KACST

Contact Authors

Specific/Arabic

NLP

2012 Arabic

Newspapers Corpus

Al-Thubaity et al.

(2013)

Abdulmohsen Al-Thubaity

http://sourceforge.net/projects/kacst-

acptool/

Specific/Arabic

NLP

Arabic Learner Corpus

v2

Alfaifi and Atwell

(2013)

University of Leeds

http://www.arabiclearnercorpus.com/

Specific/

language teaching

and learning

research

arTenTen12 Belinkov et al. (2013) Sketch Engine

https://www.sketchengine.co.uk/

General

arabiCorpus Dilworth Parkinson,

Brigham Young

University

Brigham Young University

http://arabicorpus.byu.edu/

General

Leeds Internet Arabic

Corpora

Serge Sharoff,

University of Leeds

University of Leeds

http://smlc09.leeds.ac.uk/query-ar.html

General

Table 1a: Table 1a: Arabic corpora released between 2005 and 2013. Criteria: Reference, Source, and Purpose

Page 60: محاضرة المدونات اللغوية وأدواتها

Corpus Name Mode and Language Size Availability Date Location

Modern Standard

Arabic Corpus

Written MSA 113M Free to download 2002 11 Arab countries

Akhbar Al Khaleej

2004

Written MSA 3M Free to download 2004 1 Arabic country (Bahrain)

Corpus of

Contemporary Arabic

Written and Spoken

MSA

1M Free to download 1990s up to 2005 Different Arab countries

(No distribution given)

International Corpus

of Arabic (ICA)

Written MSA 100M Free to explore Not Available Distribution given for newspapers &

magazine; 7 countries )

Open Source Arabic

Corpus (OSAC)

Written CA and MSA 18M Free to download Not Available No distribution given

Alwatan-2004 Written MSA 10M Free to download 2004 1 Arabic country (Oman)

Arabic Gigaword Corpus Fifth Edition

Written MSA 1077M Fees to download 2002-2010 6 countries (UK, France, China, Egypt, Tunisia, Lebanon)

King Saud University

Corpus of Classical

Arabic (KSUCCA)

Written CA 50M Free to download Fees to explore from the pre-Islamic era until the end of

the fourth Hijri century

Not Available

KACST Text

Classification Corpus

Written MSA 11.55M Free to download Permission

required

2008 Mostly from Saudi Arabia

2012 Arabic

Newspapers Corpus

Written MSA 2.5 M Free to download 2012 All Arab countries

(Evenly distributed)

Arabic Learner Corpus

v2

Written and Spoken

MSA

282K Free to download 2012 and 2013 Saudi Arabia

arTenTen12 Written CA, MSA and

dialects

6.6G Fees to explore 2012 (No distribution is given)

arabiCorpus Written, CA, MSA and

dialects

173M Free to explore No distribution given Distribution given for newspapers, 7

countries

Leeds Internet Arabic

Corpora

Written MSA 317M Free to explore No distribution given No distribution given

Table 1b: Arabic corpora released between 2005 and 2013. Criteria: Mode and Language, Size, Availability, Date, Location

Page 61: محاضرة المدونات اللغوية وأدواتها

Corpus Name Mode and

Language

Size Availability Date Location

King Saud University

Corpus of Classical

Arabic (KSUCCA)

Written CA 50M Free to download Fees

to explore

from the pre-Islamic era

until the end of the fourth

Hijri century

Not Available

KACST Text

Classification Corpus

Written MSA 11.55M Free to download

Permission required

2008 Mostly from Saudi

Arabia

2012 Arabic

Newspapers Corpus

Written MSA 2.5 M Free to download 2012 All Arab countries

(Evenly distributed)

Arabic Learner Corpus

v2

Written and

Spoken MSA

282K Free to download 2012 and 2013 Saudi Arabia

arTenTen12 Written CA,

MSA and

dialects

6.6G Fees to explore 2012 (No distribution is

given)

arabiCorpus Written, CA,

MSA and

dialects

173M Free to explore No distribution given Distribution given for

newspapers, 7 countries

Leeds Internet Arabic

Corpora

Written MSA 317M Free to explore No distribution given No distribution given

Table 1b: Arabic corpora released between 2005 and 2013. Criteria: Mode and Language, Size, Availability, Date, Location

Page 62: محاضرة المدونات اللغوية وأدواتها

Corpus Name Medium Domain

1 Modern Standard Arabic

Corpus

Newspapers Not Available

2 Akhbar Al Khaleej 2004 Newspapers Economy, Local News, International News, and

Sports

3 Corpus of Contemporary

Arabic

Magazines, radio,

websites, newspapers,

and emails

Natural Sciences, Applied Sciences, Social Sciences,

Politics, Commerce, Life, Arts, and Leisure

4 International Corpus of Arabic

(ICA)

Newspapers, books,

Magazines, electronic

press and Internet

articles

Strategic Sciences, Social Sciences, Sports,

Religion, Literature, Humanities, Natural Sciences,

Applied Sciences, Art, Biography, and Miscellaneous

5 Open Source Arabic Corpus

(OSAC)

Websites 10 categories: Economics, History, Entertainments,

Education & Family, Religious & Fatwas, Sports,

Health, Astronomy, Law, Stories, and Cooking

Recipes

6 Alwatan-2004 Newspapers Culture, Religion, Economy, Local News,

International News, and Sports

7 Arabic Gigaword Corpus Fifth Edition

Newswires Not Available

Table 1c: Arabic corpora released between 2005 and 2013. Criteria: Medium, Domain, Tagged

Page 63: محاضرة المدونات اللغوية وأدواتها

Corpus Name Medium Domain

8 King Saud University Corpus of

Classical Arabic (KSUCCA)

Old Manuscripts Religion, Linguistics, Literature, Science, Sociology, and Biography

9 KACST Text Classification

Corpus

Saudi Press Agency (SPA),

Saudi Newspapers (SNP),

Websites, Writers,

Forums, Islamic Topics,

and Arabic Poems

SPA (Cultural, Sports, Social, Economic, Political,

General), SNP (Cultural, Sports, Social, Economic,

Political, General, IT), Websites (IT, Economics, Religion,

Medical, Cultural, Scientific), 10 writers’ opinions,

Forums (IT, Economics, Religion, Medical, Cultural,

Scientific, Sport, General), Islamic Topics (Hadeeth,

Aqeedah, Lughah, Tafseer, Feqh), Arabic Poems (Love,

Wisdom, Description, Praise, Bemoaning, Lampoons)

10 2012 Arabic Newspapers

Corpus

Newspapers Politics, Economics, Cultural, Religion, Sports, and

Science

11 Arabic Learner Corpus v2 215 texts written by

learners of Arabic in Saudi

Arabia

Materials produced by Arabic learners

12 arTenTen12 Websites Not Available

13 arabiCorpus Newspapers, Books, Old

Manuscripts

Newspapers, Modern Literature, Nonfiction, Islamic

Discourse, Egyptian Colloquial, Pre-modern, Arab

Literature, Grammarians, Medieval Philosophy/Science,

Hadith Literature, Quran, 1001 Nights

14 Leeds Internet Arabic Corpora Internet General, Wikipedia, Legal, Scientific

Table 1c: Arabic corpora released between 2005 and 2013. Criteria: Medium, Domain, Tagged

Page 64: محاضرة المدونات اللغوية وأدواتها

Corpus Name Tokenized Lemmatized Tagged

1 Modern Standard Arabic Corpus No No No

2 Akhbar Al Khaleej 2004 No No No

3 Corpus of Contemporary Arabic No No No

4 International Corpus of Arabic (ICA) No No Yes

5 Open Source Arabic Corpus (OSAC) No No No

6 Alwatan-2004 No No Yes, KALIMAT Corpus

7 Arabic Gigaword Corpus Fifth Edition Partially Yes (ATB) Partially Yes (ATB) Partially Yes (ATB)

8 King Saud University Corpus of

Classical Arabic (KSUCCA)

No Yes, Via Sketch

Engine

Yes, Via Sketch Engine

9 KACST Text Classification Corpus No No No

10 2012 Arabic Newspapers Corpus No No No

11 Arabic Learner Corpus v2 No No Partially annotated for

errors

12 arTenTen12 Partially Yes Partially Yes Partially Yes

13 arabiCorpus No No No

14 Leeds Internet Arabic Corpora Yes No Yes, Via Sketch Engine

Table 1d: Arabic corpora released between 2005 and 2013. Criteria: Tokenized, Lemmatized and Tagged