TOWARDS A HYBRID ASSESSMENT MODEL FOR...

Preview:

Citation preview

81

TOWARDS A HYBRID ASSESSMENT MODEL FOR MUSIC CONSERVATORY ENTRANCE EXAMS

OZAN BAYSAL, BARIŞ BOZKURT, TURAN SAĞER, NILGÜN DOĞRUSÖZ

Ozan Baysal1, Barış Bozkurt2, Turan Sağer3, Nilgün Doğrusöz1

1 Istanbul Technical University, Turkey;

2 Universitat Pompue Fabra, Barcelona, Spain;

3 Yıldız Technical University, Turkey.

Abstract

ThispaperdiscussesthenecessityforemployingMusicInformationRetrieval(MIR)TechnologiesinMusicConservatoryEntranceExaminations.InTurkey,acceptancetoamusicconservatoryisdeterminedthroughamusicalaptitudeexaminationthatisusuallyconductedbyajurycommittee.Whilethecontentsofthisexamhasbecomeastandard–includingmostlyquestionsonpitchrecognitionandmelody/rhythmrepetition-,factorssuchastheamountoftimeandenergydevotedtotheexam,differencesofassessmentcriteriabetweenjurymembersandtheusageoflimitedsetofmanuallyconstructedquestionpackages(toavoidanyleakingoftheexamoutside)presentsomeshortcomingsforastandardizedevaluationofapplicants.Althoughtherehasbeenagooddealofresearchmaderegardingthisissue,theseresearchesinvestigatesolelythereliabilityscoresofjurycommittees,whilenotmakingasoundanalysisoftheapplicantperformancerecordingsandcomparingthemwiththejuryscores.Ourtalkwillpresentthefindingsofsucharesearchprojectthatcomparesjuryscoreswithperformancerecordings.Attheendwewouldbeproposingahybridassessmentmodel,MAST(MusicalAptitudeStandardTest),whichwebelievewouldsignificantlycontributetothequalityofmeasurementandevaluationwhileconsuminglessresourcesinmusicconservatoryentranceexams.

Keywords:ConservatoryEntranceExams,MusicalAptitudeTests,MusicalCompetence,MAST,MusicalAptitudeStandardTest,MusicPerformanceAssessment

82

Introduction

Amusicalaptitudeexaminationisageneralrequirementwhenapplyingtoamusicconservatoryschool.Aimingtotestandmeasurethemusicalproficiencyofanapplicant,therearevariouskindsofdifferentapproachesinhowtomeasuremusicalcompetence.ThispaperwouldpresentthepotentialbenefitsofemployingMusicInformationRetrieval(MIR)TechnologiesinMusicConservatoryEntranceExaminations.Inthefirstpart,abriefoverviewoftwomainapproachesinmeasuringmusicalproficiencywillbepresented;(i)standardizedtestformat,and(ii)jurycommitteeevaluations.Bothoftheapproacheshavetheirownadvantagesandshortcomings,butinTurkeyitisusuallythejurycommitteeevaluationsthatarepreferredinmusicconservatoryschoolexams.Thecontentsofthiskindofanexaminationhasbecomeastandard–includingmostlyquestionsonpitchrecognitionandmelody/rhythmrepetition-,yetsinceitinvolvesajurycommittee,itmaypresentsomeshortcomingsforastandardizedevaluationofapplicants.Althoughtherehasbeenagooddealofresearchmaderegardingthisissue,theseresearchesinvestigatesolelythereliabilityscoresofjurycommittees,whilenotanalyzingtheapplicant’sperformancerecordingsthroughmusic/soundtechnologiesandcomparingthemwithjuryscores.Thesecondpartofthepaperwillpresentthefindingsofsucharesearchprojectthatcomparesthejuryscoreswiththeanalysistakenfromperformancerecordingsviasoundengineeringtools.Thispartwillrevealtheexistenceofdifferentassessmentcriteriabetweenjurymembers.Inaddition,itwilldemonstratetheproblemofusinglimitedsetofquestionpackagesfordifferentapplicants(toavoidanyleakingoftheexamoutside).Thus,thescopeofthisessayislimitedwiththeneedofusingnewtechnologicaltoolsasanaidforthejurycommittees.Attheendwewouldbeproposingahybridassessmentmodel,MAST(MusicalAptitudeStandardTest),whichwebelievewouldsignificantlycontributetothequalityofmeasurementandevaluationwhileconsuminglessresourcesoftimeandenergyinthemusicalhearingportionoftheconservatoryentrance.Ourgoalistopresentsupportingandpracticalmechanismsinordertomaketheexamsasefficientaspossible.

Musical Aptitude Tests & Music Conservatory School Examinations

Onecancategorizethemethodsofmeasuringmusicalproficiencyduringamusicconservatoryschoolexaminationundertwomainheadings;

Standardizedteststhatareusedtodeterminevariousdimensionsofauralability,

Auditionprocessesinwhichabilitiesonmusicalperceptionandmusicalexpressionareevaluatedbyajurycommission.

Standardized Tests

Standardizedtestsaredesignedtomeasuretheauralabilitiesintheperceptionofvariousmusicalelements.Theseexamsareinmultiple-choicetestformatinwhichtheapplicantsareexposedtosoundcomingfromspeakers(orheadphones)andareexpectedtoanswerquestionsregarding

83

abstractedmusicalelements-suchasvolume,dynamics,musicalinterval,timbre,texture,tempo,rhythm,melodyandharmony–bymakingcertaincomparisonsanddiscriminations.ThemostknownexamplesofthismethodareSeashoretest,Wingtest,BentleytestandGordontests(includingGordonMAP,GordonPMMAandGordonIMMA).InhischapteronMusicalAptitudeTests,Tarmangivesadetaileddiscussionofthesetestdesigns(Tarman,2016:103-113).SimilardesignshavealsobeenimplementedinTurkeysuchasDYT(“DenemeYetenekTesti”–AptitudeTrialTest)(Göğüş,1994),MÖZYES(“MerkeziÖzelYetenekSınavı”–CentralSpecialTalentExam)–which,accordingtoTarmanwasimplementedonlytwiceduringtheexamsof1994-1995and1995-1996(Tarman,2016:113)-,OMÜ-MAT(“OndokuzMayısÜniversitesiMüzikselAlgılamaTesti”–OndokuzMayısUniversityMusicalAptitudeTest)(Ibid.114),andMAÖ(“MüzikselAlgılamaÖlçeği”–MeasureofMusicalPerception)(AtakYayla,2009:372-377).Therearetwomainadvantagesofthistypeofmultiplechoicetests;firstofall,sinceeachapplicantisaskedthesamequestion,thesamewayandevaluatedequally,theirevaluationresultsaremuchmoreobjectivewhencomparedtothatofjurycommitteeevaluations.Secondly,theyusemuchlesstimeandenergy;anassignedexamsuperintendentcancarryontheexamprocedureinaroomoraconferencehallwithasmanyapplicantsaspossibleatthesametime,andthemultiplechoiceanswersheetscanbequicklyprocessedlaterthroughanopticalreader.Yet,besidesthesetwoimportantadvantages,theusefulnessofthesetestdesignsarealsoopentodebate.Thefirstproblematicistheirmultiple-choicenature;someoftheseexamshavequestionsthatonlyprovidetwochoices,thustheapplicanthasa50%chancetoscorecorrectevenif(s)hedoesn’thaveanyideaabouttheanswer.Theprocessedsoundsthatareplayedbackduringtheexamandtheacousticsoftheexamspaceareotherissues;someofthesequestionsuseunnaturalsounds(suchasanoscillatororaMIDI)whichalsoresultinanalienatednaturefrommusicality,andthespeakersystemplacedwithintheroom/hallmightcauseindividualdifferencesinperceptionofsoundsaccordingtotheacousticsofthespace.However,probablythemostimportantfactoristhat,althoughthesetypesofexamsmaymeasuretheindividualauralabilitiesofapersontosomedegree,itisstillaquestionwhethertheseabstractedabilitiescorrespondtoapotentialformusicality.(Togiveanexample;fromtheresultsoftheirdesignedtests,AtakYaylaandYayla(2009)investigatedthepredictivepoweroftheirtestresultswiththemusicaltalentofthosewhotookthetest.Theresults,althoughtheywereinpositivecorrelation,showedamedium-lowlevelrelationship(r=0,483,r2=0,234))

Thatiswhythesetests–iftheyareused–arepreferredmoreasaqualificationexaminTurkeyandhaveafilteringfunction;onceanapplicantpassestheseexams,(s)heisentitledtoenterthefinalentranceexam,whichisheldbyajurycommittee.

Jury Committee Based Exams

Althoughthedesignofthejuryexams-inwhichtheapplicant’smusicaltalentsareevaluatedbyassignedjurycommittees-varyaccordingtotherespectiveinstitutionspreferences;theyareusuallyevaluatedwithintwomaincriteria:pitchrecognition(includingsinglepitches,intervalsandchords),musicalmemory(bothmelodicandrhythmic).Ineachofthese,thecandidateisrequiredtosingorplaybackwhathasbeenplayedforherwiththepianoreference.Therecanalsobeadditionalquestionssuchasmelodic/rhythmicdictationand/ormelodic/rhythmicsightsinging,howeverasthesequestionsalsorequireamusicalknowledgebesidestalent,theyusuallyarenotencounteredinthequalification(first)examsthatfulfillafilteringfunction(iftheentranceexamshavetwo-tiers).Jury-basedexaminationsystemsaremuchmorepreferredinTurkeythanthestandardizedmultiple-

84

choicetests.Anation-widesurveyamongFineArtHighSchool’smusicdepartmentteachersthatwascarriedoutbyYağcı(Yağcı,2010:228)duringthe2006-2007educationyearshowedthat9.2%ofthesurveyorstotallyagreedwiththeeffectivenessofthejurybasedsystem,while44.6%wereinagreementtoalargeextentand40%partiallyagreed.Therest,6.2%thoughtthattheeffectivenesswasverylittle.Thus,onecansaythatmostoftheteachersnationwidebelievedintheefficiencyofthissystem.Nevertheless,heldonalimitedtimewithnumerousapplicants,thesejury-basedexamsalsobearmanydifficultiesastheyrequiretheevaluationofeachcandidateseparately.Togiveanexample,inthe2015musicalentranceexamsofITUTurkishMusicStateConservatoire,5differentjurycommissions,eachconsistingof3people,separatelyevaluated507candidatesin3fulldays.Ascanbeseentheamountofhumanresource,aswellastimeandenergydevotedtothisprocessissignificantlyhigh.Someoftheshortcomingsofthisexamtypeisalsorelatedwiththisaspect,sinceapersonmaynotbeabletokeepthesameefficiencythroughoutsuchalongandtiringprocess.Thereisalsothepossibilityofdifferentjurycommitteesdevelopingdifferentcriteriaforassessmentduringtheexamperiod;thattheirreferenceperformances(examquestions)mayshowdifferences(intermsofvolume,tempoandaccentuation);thatthejurymembersmayinfluenceeachother.Inaddition,theusageoflimitednumberofmanuallycreatedquestionpackagesinsomecases(toavoidanyleakingoftheexamoutside)mayproducedoubtsabouttheequalityofthedifficultyleveloftheexamamongallapplicants.Suchpotentialobstaclestoanobjectiveandastandardizedmeasurementarethemaindisadvantagesandthedrawbacksofthissystem.Testingofjuryreliabilitiesfromthejuryscoresheetsatfirstseemstoofferacontrolmechanism(asseeninAtılgan(2008),Ece&Kaplan(2008),Tarman(2016:90)…etc.),yet,asTarmanalsounderlines,ahighreliabilityscoredoesnotnecessarilymeanthatthejurymemberhadactedindependentlyand/orevaluatedobjectivelyorconsistently(Tarman,2016:118)

(Surelyonecanavoidsuchpitfallsbysomeimprovementssuchasincreasingthenumberofjurymembersinacommittee,isolatingeachjurymemberfromeachother-sothattheywouldnotknowthescoresofothermembers-,allowinglongertimeintervalsforthejurytorestinbetweensessions…etc.AsimilarimprovedsystemisusedinthemusicentranceexamsofYıldızUniversityDepartmentofMusicandPerformingArtssincetheeducationalyearof2016-2017.Here,exceptfortheheadofthejurycommittee,eachjurymemberisisolatedfromeachother,andentertheirscorestoacomputertheyuseindividually.Whenthescoringoftheapplicantisfinished,theheadofthejurycommitteechecksthevariancesbetweenthejurymembers,andiftherearehugedifferencesaskthemtoreconsiderscoringbyplayingtherecordedversionoftheapplicant’sperformance.Yet,asitisclearfromthisexample,anyofsuchimprovementsalreadyresultwithadditionalcosts.)

.Inordertocheckthosefacts,onealsoneedstoanalyzetheapplicant’sperformancerecordingsthroughmusic/soundtechnologiesandcomparethemwiththejuryscores.Thus,atthispointtheusageofMusicInformationRetrieval(MIR)technologies,whichoffersmanyapproachesforautomaticanalysisofrecordedsounds,mightbeasolutiontoovercomesuchdisadvantages.Thesecondpartofthepaperwillpresentthefindingsofsucharesearchprojectwhichinvestigatedtheeffectivepotentialityofusingsoundengineeringtoolsinthemusicalhearingportionofthemusicalaptitudeexams.

Research Findings Concerning the Standardness of the Jury Based Exams

Thispartwillpresenttwoimportantfindingsofatwo-yearresearchproject(May2016–May2018)thatinvestigatedthepotentialofusingsoundengineeringtoolsinthemusicalhearingportionofthe

85

musicalaptitudeexams.Ingeneral,theprojecttestedthesuccessofusingsuchtechnologicaltoolsinevaluatingtherecordedsoundsofthecandidatesbycomparingthejuryevaluationswithcomputationalanalysesofthecandidates’examperformancerecordings.Thejuryevaluationreports(ofthequalificationexamsofyears2015,2016and2017)andtheexamrecordings(ofyears2015and2017)wereprovidedbyIstanbulTechnicalUniversityTurkishMusicStateConservatoryMusicTheorydepartmentwiththepermissionoftheconservatorydirectorate.Asthemaingoalwastomakethequalificationexamsasefficientaspossible,theprojectteamalsodiagnosedsomepreviouslyunobservedflawsaboutthequestionpackagesandofferedsomeimprovementsfortheexampreparationcommittee.Besidesthis,themostnoticeablefindingwasthatalthoughtheindividualreliabilityscoresofthejurycommitteeswerehigh(basedonthejuryreports),ourcomputationalanalysesshowedthateachjurycommitteeweredevelopingdifferentcriteriaespeciallywhenevaluatingmelodicmemorysections;whichbringstomindTarman’sdoubtsabouttheindependencyofthejurymembersinajurycommittee(Ibid).Belowwewillbesharingthesetwomainfindingsthatmaycompromisethestandardnessofthejurybasedexams.

Problems about Different Question Packages

Asitwasstatedearlier,nearlyalljury-basedexamsinTurkeysharetwomaincriteria:pitchrecognition(includingsinglepitches,intervals,triads)andmusicalmemory(melodicandrhythmic),althoughtherealsomightbesomeextensions(sightsinging,dictationormusicalperformance).Duetoahighnumberofapplicants,someoftheseinstitutionspreferatwo-tierentranceexam,inwhichthefirstexamtestssolelythepreviouslymentionedmusicalabilitiesandfunctionsmoreasaqualificationforthefinalentranceexam.Thusthefirst(qualification)exam,althoughittakeslesstimeforeachapplicant,isalongprocessthatisconductedbydifferentjurycommitteesworkingsimultaneouslywithinmultipledays.Suchasettingrequiresadditionalprecautionsregardingtheconfidentialityofthequestionsaskedintheexam.Oneoftheseprecautionsisdesigningtheexamwithvariousquestionpackages;eachpackagehavingitsownsetofdistinctquestionsaboutpitchrecognition,melodyandrhythm-thusminimizingthechanceofaleakageofthequestionsoutside(i.e.memorizationofamelodybyamorespecializedapplicantandsingingitbackoutsidetoherfriendsthatarewaitingfortheirturn).Yet,suchaprecautionmayalsocreateotherproblems,suchasdifferencesbetweenthequestionpackagesintermsoftheirdifficultylevel.Itisimportanttonotethat,thequalificationsfromtheseexamsarenotdeterminedaccordingtoarankingsystemexam,theapplicantsshouldscoreatleastaboveacertainpercentage;sotheexampreparationcommitteetakesthispercentageofsuccessintoconsiderationnottheranking,andpreparesthequestionsaccordingly.Thustheapplicantsareexpectedtobesuccessfulabovesuchapredeterminedscoreregardlessofwhichquestionpackageisused.However,evenamildvariationbetweentwoquestionpackagesmayproduceamplifiedandsignificantdifferencesinapplicantperformancesduetounpredictablefactors(applicantbackground,examanxietyandindividualcapabilities…etc.).Bringingtheexamclosertoanideallystandardlevelstartsfromtheequaldistributionofquestiondifficultiesamoungvariousquestionpackages.

Thenumberofapplicantswehadanalyzedthejuryevaluationsare;365peoplefromthequalificationexamof2015,456peoplefrom2016and451peoplefrom2017.Thereliabilityscoresofthejury

86

committeeswereingeneralveryhighascanbeseenfromTable1,whichwillbediscussedinthenextsection.Thecontentsoftheseexamsareasfollows;

PitchRecognition

SinglePitchRecognition(x5)

IntervalRecognition(x5)

Triads(x4)

MusicalMemory

MelodicMemory(Tonal&Modal;onequestionforeach)

RhythmicMemory(Straight&Aksak;onequestionforeach)

Table1:2015-2017AnalizedReports:JuryReliabilityScoresUsingVariousMeasurementTests

87

Table2:2015&2016ExamsANOVATests–Successvs.QuestionPackages

88

Table2presentstheANOVAresultsobtainedfromthe2015and2016tests,consideringthepossibleeffectofusing10differentquestionpackagesonthesuccessoftheapplicants.Generallyspeaking,boththeFvaluesandthepvaluessuggestthat,foreachcategorythemeansuccesspercentageissignificantlydifferentforatleastoneofthequestionpackages.Especiallythemelodicandrhythmicmemorycategorieswerethemostproblematicinthissense.Thus,consideringthe“Total_Success”category,whichistheexamscoreoftheapplicants,onecanconcludethatthetestscoreofanapplicantwasalsodependentonwhichquestionpackageshewasevaluatedaccordingto.However,thissurelydoesn’tmeanthedependencyofpassing/failingtheexamtothequestionpackages.Table3presentsthesameeffectontheexamqualifications(forthe2015examthequalificationscorewas60%,forthe2016examitwas50%).Weobservethat,bothin2015and2016,therewasn’tanysignificantrelationshipbetweenthepassing/failingofanapplicantwithherassignedquestionpackage(p>0,05forboth).

Table3:2015&2016ExamsANOVATests–Pass/Failvs.QuestionPackages

Withtheseinformationanddataourresearchteamdesignedandconductedanexperimentfollowingtheexamof2016.Theexperimentwasmodeledfromthequestionsof2016exam,anditsaimwastotestthedegreeofvariationbetweenthequestionpackagesamongthemusicconservatorystudents–thesearethosepeopleweassumethequestionpackagechoicedoesnotplayaroleinthesuccessofthecandidate.Theinformationthatwouldbeobtainedfromthisresearch,inadditiontothepreviousdata,wouldnotonlyhelpusunderstandthedifficultylevelsandtheeaseofperceptionofthequestionsbutmayalsosuggestimprovementsforourquestiondesigns.Wehaveconductedtheexperimentwith26studentsfromMusicologyandMusicTheorydepartments.Thequestionsthatwereusedwerefromthreequestionpackagesusedin2016qualificationexams;thosehavingtheaverage(assignedaspackage#1),lowest(#2)andhighest(#3)amountofsuccessesfromeachcategory–thustheexperimentwasalsocheckingtheresultsof2016qualificationexams.TheMASTexperiment(MusicalAptitudeStandardTest)wasconductedintheMusicologylabindividuallywith

89

usageofcomputers,headphonesandmicrophones.SimilartoaTOEFLexam,theparticipantswereaskedtofollowtheinstructionsappearingonthescreeninfrontofthem.ThequestionswereplayedfromMIDIformatsandtheparticipantswereaskedtosing/performwhattheyheardontheheadphonestothemicrophoneontheirdesks.Meanwhile,oneoftheresearchersoftheprojectwasrecordingtheresponsesoftheparticipantsonadifferentcomputer.Thustherewasnojurycommitteepresentintheroom;theresearchteamlatercompiledtherecordingsandsentthemtoajurycommitteeforevaluating.Aftertheexperiment,theparticipantsalsofilledoutasurveyregardingtherelativeefficiencyofthisexamsystemwhencomparedtoalivejurycommitteesystem.Outof26people6preferredthejurysystem(23%),6wereindifferentbetweenthetwosystems(23%),while14people(54%)foundthissystembetterthanthejurybasedsystemandwrotethatsuchanenvironmenthadapositiveeffectontheirefficiency.WeshouldalsonotethattheMusicologyLabinwhichtheexperimentwasconductedhadapoorsoundisolation,andthatthe6participantswhopreferredthejurysystemalsowroteintheircommentsthattheywereconfusedduetonoisecomingfromoutsidetoroomifnottheyfeltstrangeinsuchanisolatedexamenvironment.ThewholeMASTexperimentprocess,includingtheintroduction,theexperimentandthesurveytookaround15to20minutesforeachparticipant.

Table4:2016MASTExperimentANOVATest–Successvs.QuestionPackages

90

Table4presentstheANOVAtestresultsoftheMASTexperimentthatwasconductedusingonlythreepackagesfromthequalificationexamsof2016.Whatisnoticeableisthatinthepitchrecognitionpart(singlepitch,intervalandtriadidentification),thereisnorelationshipbetweentheassignedquestionpackagesandthedegreeofsuccess.Inotherwords,thelevelofdifficultyof2016questionpackagesweredesignedforthequalifiersbasedonourassumptionthatourexperimentparticipants–alreadybeingconservatorystudents-arepotentialqualifiersoftheexam.Ontheotherhand,theresultsofthemelodicandrhythmicmemorysectionscameoutparallelwiththatof2016qualificationexamresults.AscanbeseenfromFigure1&Figure2,thequestionsofpackage#2–whichwereselectedfromthequestionpackageswiththelowestamountofsuccessin2016-inallfourcategories(melodies1&2,rhythms1&2)gotthelowestscoreaswewell.

Figure1:2016MASTExperiment:MeansPlotsforMelodyQuestionsvs.QuestionPackages

Figure2:2016MASTExperiment:MeansPlotsforRhythmQuestionsvs.QuestionPackages

Astheseresultsconfirmedtheresultsofthe2016melodyandrhythmquestionpackages,ourresearchteamanalyzedthepossiblefactorsthatmaycausesuchadifferenceintermsofthelevelof

91

difficulty.Werealizedthat,althoughallthemelodyquestionsweretwomeasureslong,andalltherhythmquestionsonemeasurelong;factorssuchasnumberofnotes,range/ambitus,shapeofthemelody,theproportionofmelodicstepswithmelodicleaps,theperiodicityandthefamiliarityofthepassagemayalsobecontributingtothisdifferentlevelsofdifficulties.Thus,wedesignedarubricforpreparingmelodyandrhythmquestionsandproposedittotheexampreparationcommitteebeforethepreparationof2017qualificationexams.Theguidelineswehadproposedwereasfollows;

MelodyQuestions

Eachmelodygroupshouldusethesamenumberofnotes,

Thesamenoteshouldnotbeusedconsecutively,

Eachmelodygroupshouldhavethesamerhythmicvalues,

Eachmelodygroupshouldhavethesamerange/ambitus,

Eachmelodygroupshouldhavethesametimesignature,

Eachmelodyshouldbetwomeasureslong,

Eachmelodygroupshouldhavethesametonality,

Themelodiesineachgroupshouldstartandendwiththetonic,

(Fortonalmelodies)Themelodiesshouldimplyasimilarharmonicbackground(suchasI–ii–V7–I),

(Formodalmelodies)Themelodiesshouldhaveasimilarmodalprogression.

RhythmQuestions

Eachrhythmshouldusethesamerhythmicmotifsarrangedindifferentorders(likea-b-c-d;b-a-c-d;c-a-b-d;d-a-b-c;a-c-b-d…etc.),

Therhythmsshouldnotcontainorimplyaperiodicstructure(likea-b-a-c),

Eachrhythmgroupshouldhavethesametimesignature,

Eachrhythmshouldbetwomeasureslong.

Besidesthesewealsosuggestedtheintervalandtriadquestionsbedesignedinsuchawaythatnotonlytheircontentbuttheirorderbearrangedinsimilarways.Table5presentstheresultsofthe2017qualificationexamregardingtherelationshipbetweensuccessandquestionpackages.

92

Table5:2017ExamANOVATest–Successvs.QuestionPackages

TheobservabledecreaseintheFvaluesandtheincreaseinp valuesinthepitchrecognitionsection(singlepitches,intervalsandtriads),whichnowsuggestsnorelationshipbetweenthequestionpackageswiththesuccessinthesecategoriesdemonstratesthebenefitsofusingthesamecontentinthesameorder–thusjusttransposedversionsofthesamequestion–intheintervalandtriadsections.Itseemsthattheguidelineswehadproposedearlierforthepreparationofmelodyandrhythmquestionsdidnothaveanypositiveeffectsonthesesectionsthough,astheamountofsuccessinallfourcategories(melody1,melody2,rhythm1andrhythm2)stillshowadependencywithatleastoneofthequestionpackages(p =<0.001ineach).FromtheBonferronipost-hoctestswespottedthemeansofoneofthemelodiesfrommelody1package,twoofthemelodiesfrommelody2packageandmorethantwooftherhythmsintherhythm1andrhythm2packagesweresignificantlydifferentthanthemeansofthecorrespondingquestionsoftheotherpackages.However,forthefirsttime,thetotalscoregainedfromtheexamresultedwithaninsignificantrelationship;withthevalues

F=1,538andp=0,132.Inadditiontothis,ascanbeobservedfromTable6,therewasnosignificantrelationshiponthepassing/failingoftheexamwiththequestionpackage(forthe2017examthe

qualificationscorewas50%);withthevaluesF=0,951andp=0,480-thelowestFandthehighestpvaluesthroughouttheresearchsofar.Thereasonforthisisprobablythefactthatthesignificantlydifferentquestionsintermsofthelevelofdifficultyweredispersedamongdifferentquestionpackagesforeachcategory;ex.apackagecontainingmelody1withthelowestmean,whereasthemeanofthemelody2fromthesamepackagehadahighmean.Thus,suchresultsofthe2017qualificationexamsmaybeanoutcomeofacoincidence.

93

Table6:2017ExamANOVATests–Pass/Failvs.QuestionPackages

Itseemsthatmoreresearchisneededintheareaofmusicperceptionforthestandardizationofsuchmelodyandrhythmquestions.Thepossibleflawsofusingdifferentquestionpackagescouldbesignificantlyminimizedusingsuchquestionpreparationguidelineswithgenerativerubrics.Inadditiontothis,inordertoincreasetheamountofdispersion,multiplequestions–atleastthree-shouldbeaskedforeachcategoryratherthanonlyone;ex.threetonalmelodyquestions,threemodalmelodyquestions…etc.However,thiswillalsoresultinanincreaseintheamountoftimeusedperapplicant.Thesolutionwewouldproposewillbediscussedinthefinalsection.

Reconsidering Jury Reliabilities

Asthemainaimofourresearchwastoinvestigatethedesignandapplicabilityofautomaticassessmenttoolstosupportthequalifyingexams,therecordedsoundsofthecandidates’examperformanceswereanalyzedincomparisonwiththeevaluationsofthejurymembers.Inotherwords,thealgorithmsweredesignednottodecidewhetherareferencepianosoundmatchedwithanapplicantperformanceornot,buttoimitatethejuryresponsesandmakeanevaluationabouttheapplicantperformances.Here,forthepitchrecognitionsection,wehavemanagedtofindreliableinformationontheacceptablepitchrangesandthresholdsabouttheintervalintonations(whichareinvestigatedanddiscussedseparatelyinKökeretal.andGüneretal.).Thehardestcategorytoaccomplishtheautomaticassessmenttaskwasthemelodysections;sinceevaluationofa“successful”melodicrecallmayhavemultiplefactorsincluding,thecompletenessofthemelody,pitchrow,rhythm,intonation,melodicshape,vibrato…etc.,andthattheimportanceofthesefactorsmaychangefromapersontoperson.Usingthedatasetderivedfrom2015and2016qualificationexams,twodifferentassessmentsystemshavebeendesigned(asdiscussedinBozkurtetal.(2017)andGültekinetal.),havingaverageaccuraciesas0.74and0.856.Duringthisprocesswealsohadthechancetoanalyzethesamplesinwhichtheautomaticassessmenttoolandthejuryevaluationshaddisagreements.Exceptforafewcases,itwasobservedthatallthedisagreementswereontheonesinwhichtheapplicantswerefavoredbythejurycommittee;thatisthejurycommittees(ofthe2015and2016exams)turnedouttobemorepositivelyflexiblethanthealgorithm,evaluatingtheapplicantsassuccessfulincasesthatmightbeconsideredasunsuccessful.Suchaseparationalsoprovideduswithinformationaboutthedifferentevaluationcriteriaofthedifferentjurycommittees;intonation,rhythm,theplace(andthefunction)ofthemissednote…etc.Infact,theexistenceofdifferentcriteriabetweenjurycommitteeswasprobablyoneofthecausesfortheautomaticassessmentsystems–which“learn”howtoevaluatefromthesedifferentcommittees-nothavinghigheraccuracies.

94

Figure3presentstheoveralldistributionofthe2016&2017qualificationexamscores.Thebasescoreneededforqualifyingintheseyearswere50%.Theboldmarkedbarinbothhistogramsreferstotheareaof48%-51,9%.Fromthefrequencydistributionsweobservethatmostoftheapplicantsthatfallinthisareaactuallypassedtheexam(19/24passedin2016and20/22passedin2017ascanbeseenfromTable7).Alsonoticefromthetwohistogramshowdrasticthedifferencesarewiththebarsontheirleft(fail)andtheright(pass).Itisasifa“positive”transferhasbeenmadefromthelefttotheright,which,foreachjurymember,requiresonly5points-thatisthedifferencebetweena“totallysuccessfulmelody”withan“averagelysuccessfulmelody”providedinthejuryevaluationsheets.Thus,itseemsthatthejurycommitteeisactingasagroup,andbytakingtheinitiativealtogether,decidingtogiveasecondchancefortheapplicantinthefinalentranceexams.This“jury-induced”positiveeffectmightalsoexplainthosecasesinwhichthejuryscoreandthedesignedalgorithmwereindisagreementwehavestatedbefore.

Figure3:2016&2017QualificationExams–OverallDistributionofScores

Table7:2016&2017QualificationExams:48%-51,9%ScoreArea

95

Atfirstthismaynotseemasanegativething,especiallywhenconsideredfromtheperspectiveoftheapplicants.However,italsoshowsthatacommunicationbetweenthejurymembersispresent,whichmayoccurinothercasesaswell,andthusbepositivelycontributingtothehighreliabilityscoreswehavepresentedintheprevioussection.Inaddition,bearinmindthat,inTurkey,thoseapplicantswhoaregraduatedfromtheFine-ArtsHighSchools(Güzel Sanatlar Lisesi),gainasignificantamountofextrapointsthanordinaryhighschoolgraduatesatthefinalentranceexams.Suchapplicantsmightappearatthetopportionoftheconservatoryacceptancelist,eveniftheyhadendedupactuallyinthewaitinglistsasaresultoftheirfinalentranceexamperformances.ThusitisopentodiscussionwhetherelevatingafailingapplicantcomingfromaFineArtsHighschoolbackgroundabovethebasescoreisa“positive”act,especiallywhenconsideredfromtheperspectiveofthosecomingfromordinaryschoolbackgroundswhohavepassedthequalificationexamswithoutanyoutsideeffects.

Conclusion: Towards a Hybrid Assessment Model

AsaconclusionweproposeaqualificationexamdesignthatissimilartotheMASTexperimentthatwasmentionedpreviously.Thus,similartotheTOEFLorPEARSONEnglishexams,theapplicantswouldregisterandindividuallyhavetheirqualificationexaminationinisolatedboothswiththeusageofcomputerscreens,headphonesandmicrophonesthroughwhichtheirquestionswouldbeaskedandtheirliveresponseswouldberecorded,assessedusingstateoftheartMIRtechnologiesandfurtherscreenedbyinstructorsespeciallyfortheclosetoboundarycases.RecallfromthesurveyresultsoftheMASTexperimentthatsuchanewsystemwaspreferredbythemajorityoftheparticipants,andthatthe23%whopreferredthejurysystembasedtheircomplaintsonthepoorisolationoftheexamenvironment–whichcanbepreventedusingabetterroom.Consideringthequestionpackageswehavediscussedintheprevioussection,thesystemcouldaskdifferentquestionsforeachcategoryfromarandomlyselectedbigpoolofdifferentpackages;thusgeneratingadifferentexameachtime.Itisalsopossibletocreatesuchquestionsbycomputersthroughgenerativealgorithms(Currently,asimilaralgorithisindesignandtestingphase).Thetestcanalsoincorporatesimilartypeofhearing-basedmultiplechoicequestionsasseeninthestandardizedtestsofSeashore,Wing,BentleyandGordon.Insuchanexamsystemtheapplicantscouldalsogetadetailedevaluationreportoftheirexamperformances,and-iftheyhavescoredaboveabasescore-usethesereportstoapplyforthefinalentranceexamsforthemusicconservatories,inwhichtheycanshowtheirmusicalperformanceskillstothejurycommitteeinmoredetail.Thiswillnotonlysaveagreatdealoftime,energy,infrastructureandcapital,butwillalsoincreasethequalityofthefinalentranceexams(thesecondtierexams),andresultinamuchmoreefficientexaminationprocess.

Acknowledgements:ThisworkissupportedbytheScientificandTechnologicalResearchCouncilofTurkey,TUBITAK,Grant#215K017.

References

AtakYayla,A.,Yayla,F.2009.“MüzikselAlgılamaÖlçeği”.8. Ulusal Müzik Eğitimi Sempozyumu: Türkiye’de Müzik Eğitiminin Sorunları ve Çözüm Önerileri – Bildiriler Kitabı.OndokuzMayısÜniversitesiYayınları.Samsun.372-378.

96

Atılgan,H.2008.“UsingGeneralizabilitytheorytoassessthescorereliabilityoftheSpecialAbility

SelectionExaminationsformusiceducationprogrammesinhighereducation”.International Journal of Research & Method in Education,31:1,63-76.

Bozkurt,B.,Baysal,O.,Yüret,D.(2017).“ADatasetandBaselineSystemforSingingVoiceAssessment”.CMMR201713thInternationalSymposiumonComputerMusicMultidisciplinaryResearch:MusicTechnologywithSwing.25-28September2017.

Ece,A.S.,Kaplan,S.2008.“MüzikÖzelYetenekSeçmeSınavı’nınPuanlayıcılarArasıGüvenilirlikÇalışması”.NationalEducation,36-49.

Göğüş,G.1999.“MüzikYeteneğininTanımı,ÖlçümüveDenemeYetenekTesti”,Uludağ Üniversitesi

Eğitim Fakültesi Dergisi,Cilt:12,sayı:1.79-89.

Güner,B.B.,Baysal,O.,Bozkurt,B.(inpreparation).“MüzikYetenekSınavlarıÇiftSesSoruDeğerlendirmelerindeKabulEdilebilirAralıklar”.

Gültekin,C.,Bozkurt,B.,Baysal,O.(inpreparation).“SingingAssessmentUsingChromaFeatures”.

Köker,O.,Baysal,O.,Bozkurt,B.(forthcoming).“MüzikYetenekSınavlarındaTekSesTekrarlarıİçinKabulEdilebilirPerdeAralığı(Aralıkları)”.Hacettepe Üniversitesi Ankara Devlet Konservatuarı Ulusal

Müzik ve Sahne Sanatları II. Sempozyumu - Bildiri Kitabı–21Aralık2017,Ankara.

Tarman,S.2016(2006).Müzik Eğitiminin Temelleri–Geliştirilmiş2.Basım.MüzikEğitimiYayınları.Ankara.

Yağcı,U.2010.“AGSLMüzikBölümleriYetenekSınavlarıveBuSınavlaraYönelikÖğretmenGörüşleri”.

Pamukkale Üniversitesi Eğitim Fakültesi Dergisi,Sayı27,223-231.